Before a table can be searched, it must be indexed. ParadeDB uses a custom index type called the BM25 index. The following code block creates a BM25 index over several columns in theDocumentation Index
Fetch the complete documentation index at: https://docs.paradedb.com/llms.txt
Use this file to discover all available pages before exploring further.
mock_items table.
See the getting started guide
for more detail on how to set up your ORM to run index creation commands.
You’ll need to drop the existing
search_idx before you can create a new one:CREATE INDEX.
For instance, if a column contains multiple languages, the ICU tokenizer may be more appropriate.
Track Create Index Progress
To monitor the progress of a long-runningCREATE INDEX, open a separate Postgres connection and query pg_stat_progress_create_index:
blocks_done to blocks_total will provide a good approximation of the progress so far. If blocks_done equals
blocks_total, that means that all rows have been indexed and the index is being flushed to disk.
Choosing a Key Field
In theCREATE INDEX statement above, note the mandatory key_field option.
Every BM25 index needs a key_field, which is the name of a column that will function as a row’s unique identifier within the index.
The key_field must:
- Have a
UNIQUEconstraint. Usually this means the table’sPRIMARY KEY. - Be the first column in the column list.
- Be untokenized, if it is a text field.
Token Filters
After tokens are created, token filters can be configured to apply further processing like lowercasing, stemming, or unaccenting. For example, the following code block adds English stemming todescription: