For developers working with TypeScript who need a search solution that balances performance with relevance, bm25-lite offers a compelling option. Authored by @arpad1337, this library provides a lightweight implementation of the Okapi BM25 ranking function, specifically designed for TypeScript environments.
Core Architecture
The library is built around a central class, SearchResultEvaluator, which manages the indexing, filtering, and scoring of documents. Unlike simple string matching, this evaluator uses sophisticated logic to determine the relevance of each result.
1. Data Processing and Indexing
When you initialize the evaluator, it processes your dataset by normalizing the text. It takes a list of field selectors (keys in your data objects) and extracts keywords by:
- Converting text to lower case.
- Removing special characters using regular expressions (e.g.,
!@#$%^&*...). - Splitting the text into individual tokens.
This preprocessing ensures that search queries are robust and not tripped up by punctuation or capitalization.
2. Intelligent Query Handling
The library employs the stopword package to filter out common words from search queries, focusing the search on meaningful terms. When a predicate (search query) is set via setPredicate, the engine cleans the input and identifies "stemmed" words to use for matching.
3. Scoring Algorithm (TF-IDF)
The heart of bm25-lite is its scoring system. It calculates a relevance score for each document based on:
- Inverse Document Frequency (IDF): It calculates how rare a term is across the entire dataset, giving higher weight to unique terms.
- Term Frequency (TF): It assesses how frequently a term appears within a specific document or tag set.
The evaluateResults method computes these values to generate an idftf score and a maxScore for every matching item.
Integration Guide
The library is designed for ease of use in TypeScript projects.
Initialization
To start, you instantiate the SearchResultEvaluator with your data, a set of defined terms (enums), and the specific fields you want to index.
const searchResultEvaluator = new SearchResultEvaluator<
Terms, // fixed list of terms (enum)
IItemWithIDFTF
>(items, Terms as Enum<Terms>, ['selector1', 'selector2', 'selector3']);
Performing a Search
Once initialized, you can define your search criteria using a predicate string and specific term filters.
searchResultEvaluator.setPredicate(predicate);
searchResultEvaluator.setTerms(terms);
const results = searchResultEvaluator.evaluateResults();
Sorting Results
The library provides a static helper method, IDFTFSorter, to sort the results based on their calculated relevance scores. This ensures that the most relevant documents appear at the top of your list.
// Sorts by relevance in descending order
SearchResultEvaluator.IDFTFSorter(selector, SortOrder.DESC);
Licensing
bm25-lite is open-source software. It is released under the MIT License, allowing users to copy, modify, merge, publish, and distribute the software without restriction, provided the original copyright notice is included. The copyright is held by Arpad K., starting in 2025.
Top comments (0)