In user-facing search, autocomplete refers to the process of suggesting relevant results as the user is typing. Several of ParadeDB’s search APIs can be mixed and matched to build a full-fledged autocomplete experience.

This guide uses the mock_items table.

Fuzzy Term

Suppose we want to find all documents containing shoes, but the user typed in shoez. The fuzzy_term query can find search results that approximately match the query term while allowing for minor typos in the input.

SELECT description, rating, category FROM mock_items
WHERE id @@@ paradedb.fuzzy_term(
    field => 'description',
    value => 'shoez'
) ORDER BY rating DESC;

Fuzzy Phrase

Suppose the user provides a misspelled phrase like ruining shoez when searching for running shoes. Because fuzzy_term treats value as a single token, passing the entire phrase to fuzzy_term will not yield any matches. Instead, fuzzy_phrase should be used. fuzzy_phrase finds documents where any of the query’s tokens are a fuzzy_term match.

SELECT description, rating, category FROM mock_items
WHERE id @@@ paradedb.fuzzy_phrase(
    field => 'description',
    value => 'ruining shoez'
) ORDER BY rating DESC;

For a stricter result set, fuzzy_phrase can be configured to match documents if all query tokens match.

SELECT description, rating, category FROM mock_items
WHERE id @@@ paradedb.fuzzy_phrase(
    field => 'description',
    value => 'ruining shoez',
    match_all_terms => true
);

Multiple Fuzzy Fields

Suppose we want to compare a query against both description and category. The boolean query can be used to query across multiple fields.

SELECT description, rating, category FROM mock_items
WHERE id @@@ paradedb.boolean(
    should => ARRAY[
        paradedb.fuzzy_phrase(field => 'description', value => 'ruining shoez'),
        paradedb.fuzzy_phrase(field => 'category', value => 'ruining shoez')
    ]
) ORDER BY rating DESC;

Ngram Term

Suppose we want to suggest results when the user has only typed part of a word, like sho. In this scenario, the ngrams tokenizer can be used to convert documents into ngram tokens.

For the purpose of this example, let’s assume that we have an index called ngrams_idx:

CALL paradedb.create_bm25(
    index_name => 'ngrams_idx',
    schema_name => 'public',
    table_name => 'mock_items',
    key_field => 'id',
    text_fields => paradedb.field(
        'description',
        tokenizer => paradedb.tokenizer('ngram', min_gram => 3, max_gram => 3, prefix_only => false)
    )
);

With description tokenized into n-grams, we can search for partial words.

SELECT description, rating, category FROM mock_items
WHERE description @@@ 'sho'
ORDER BY rating DESC;

Ngram Term Set

When querying against an ngrams field, all ngrams of the query must match in order for the document to be considered a match. This means that a query like hsoes does not match shoes, because the hso token does not match any of the tokens of shoes.

To match documents where any ngram token of the query matches, the fuzzy_phrase query can again be used. Since we are looking for exact ngram matches, the distance parameter can be lowered to 0.

SELECT description, rating, category FROM mock_items
WHERE id @@@ paradedb.fuzzy_phrase(
    field => 'description',
    value => 'hsoes',
    distance => 0
) ORDER BY rating DESC;

Further Customization

This guide has demonstrated how the query builder functions like paradedb.boolean can be used to compose new, powerful queries. We encourage you to use these functions to experiment with queries that satisfy your use case.