Autocomplete - ParadeDB

In user-facing search, autocomplete refers to the process of suggesting relevant results as the user is typing. Several of ParadeDB’s search APIs can be mixed and matched to build a full-fledged autocomplete experience.

This guide uses the mock_items table.

Fuzzy Term

Suppose we want to find all documents containing shoes, but the user typed in shoez. The fuzzy_term query can find search results that approximately match the query term while allowing for minor typos in the input.

SELECT description, rating, category FROM mock_items
WHERE id @@@ paradedb.fuzzy_term(
    field => 'description',
    value => 'shoez'
) ORDER BY rating DESC;

Expected Response

     description     | rating | category
---------------------+--------+----------
 Sleek running shoes |      5 | Footwear
 Generic shoes       |      4 | Footwear
 White jogging shoes |      3 | Footwear
(3 rows)

Fuzzy Matching

Suppose the user provides a misspelled phrase like ruining shoez when searching for running shoes. Because fuzzy_term treats value as a single token, passing the entire phrase to fuzzy_term will not yield any matches. Instead, match should be used. match finds documents where any of the query’s tokens are a fuzzy match.

SELECT description, rating, category FROM mock_items
WHERE id @@@ paradedb.match(
    field => 'description',
    value => 'ruining shoez',
    distance => 2
) ORDER BY rating DESC;

Expected Response

     description     | rating | category
---------------------+--------+----------
 Sleek running shoes |      5 | Footwear
 Generic shoes       |      4 | Footwear
 White jogging shoes |      3 | Footwear
(3 rows)

For a stricter result set, match can be configured to match documents if all query tokens match.

SELECT description, rating, category FROM mock_items
WHERE id @@@ paradedb.match(
    field => 'description',
    value => 'ruining shoez',
    distance => 2,
    conjunction_mode => true
);

Expected Response

Multiple Fuzzy Fields

Suppose we want to compare a query against both description and category. The boolean query can be used to query across multiple fields.

SELECT description, rating, category FROM mock_items
WHERE id @@@ paradedb.boolean(
    should => ARRAY[
        paradedb.match(field => 'description', value => 'ruining shoez', distance => 2),
        paradedb.match(field => 'category', value => 'ruining shoez', distance => 2)
    ]
) ORDER BY rating DESC;

Expected Response

     description     | rating | category
---------------------+--------+----------
 Sleek running shoes |      5 | Footwear
 Generic shoes       |      4 | Footwear
 White jogging shoes |      3 | Footwear
(3 rows)

Ngram Term

Suppose we want to suggest results when the user has only typed part of a word, like sho. In this scenario, the ngrams tokenizer can be used to convert documents into ngram tokens. For the purpose of this example, let’s assume that we have an index called ngrams_idx:

CREATE INDEX ngrams_idx ON public.mock_items
USING bm25 (id, description)
WITH (
    key_field='id',
    text_fields='{"description": {"tokenizer": {"type": "ngram", "min_gram": 3, "max_gram": 3, "prefix_only": false}}}'
);

With description tokenized into n-grams, we can search for partial words.

SELECT description, rating, category FROM mock_items
WHERE description @@@ 'sho'
ORDER BY rating DESC;

Expected Response

     description     | rating | category
---------------------+--------+----------
 Sleek running shoes |      5 | Footwear
 Generic shoes       |      4 | Footwear
 White jogging shoes |      3 | Footwear
(3 rows)

Ngram Term Set

When querying against an ngrams field, all ngrams of the query must match in order for the document to be considered a match. This means that a query like hsoes does not match shoes, because the hso token does not match any of the tokens of shoes. To match documents where any ngram token of the query matches, the match query can again be used. Since we are looking for exact ngram matches, the distance parameter can be lowered to 0.

SELECT description, rating, category FROM mock_items
WHERE id @@@ paradedb.match(
    field => 'description',
    value => 'hsoes',
    distance => 0
) ORDER BY rating DESC;

Expected Response

     description     | rating | category
---------------------+--------+----------
 Sleek running shoes |      5 | Footwear
 Generic shoes       |      4 | Footwear
 White jogging shoes |      3 | Footwear
(3 rows)

Further Customization

This guide has demonstrated how the query builder functions like paradedb.boolean can be used to compose new, powerful queries. We encourage you to use these functions to experiment with queries that satisfy your use case.

Documentation

​Fuzzy Term

​Fuzzy Matching

​Multiple Fuzzy Fields

​Ngram Term

​Ngram Term Set

​Further Customization

Fuzzy Term

Fuzzy Matching

Multiple Fuzzy Fields

Ngram Term

Ngram Term Set

Further Customization