Advanced Search Queries
Overview
The search
function that’s generated for your index can accept query objects instead of a query string. Nearly all the query types implemented in Tantivy, our underlying search engine, are exposed to Postgres.
Query objects are composable, allowing for arbitrarily fine-grained queries. A query string itself is parsed into a query object, and can be used anywhere in a query tree with paradedb.parse
.
paradedb.parse
is implicitly performed if a query string is passed directly to the search
function. These two queries are identical:
SELECT * FROM search_idx.search('description:shoes');
SELECT * FROM search_idx.search(query => paradedb.parse('description:shoes'));
Query Types
The paradedb
schema exposes the following functions to construct query objects.
All
This query type indiscriminately matches every document in the index, assigning a uniform score of 1.0 to each.
SELECT * FROM search_idx.search(
query => paradedb.all()
);
Boolean
Boolean queries filter documents based on the logical relationships defined by their subqueries, considering:
- Documents that satisfy all
must
conditions. - Documents that satisfy none of the
must_not
conditions. - Documents that satisfy at least one condition from either
must
orshould
.
SELECT * FROM search_idx.search(
query => paradedb.boolean(
should => ARRAY[
paradedb.parse('description:shoes'),
paradedb.phrase_prefix(field => 'description', phrases => ARRAY['book']),
paradedb.term(field => 'description', value => 'speaker'),
paradedb.fuzzy_term(field => 'description', value => 'wolo')
]
)
);
A query object or an ARRAY
of query objects as conditions which must be
matched.
A query object or an ARRAY
of query objects as conditions which must not be
matched.
A query object or an ARRAY
of query objects as conditions of which at least
one must be matched.
Boost
A boost query wraps around another query to amplify its scoring impact, without altering the set of matched documents.
SELECT * FROM search_idx.search(
query => paradedb.boost(query => paradedb.all(), boost => 1.5)
);
The factor by which to multiply the score of each result.
Const Score
Applies a constant score across all documents matched by the underlying query. It can avoid unnecessary score computation on the wrapped query.
SELECT * FROM search_idx.search(
query => paradedb.const_score(query => paradedb.all(), score => 3.9)
);
Disjunction Max
Returns documents that match one or more of the specified subqueries. If a document matches multiple criteria, it receives the highest score from those matches, with a slight increase for any additional matches.
SELECT * FROM search_idx.search(
query => paradedb.disjunction_max(
disjuncts => ARRAY[paradedb.parse('description:shoes')]
)
);
A tie breaking increment for matching subqueries
Empty
Serves as a placeholder, matching no documents. It’s useful for testing scenarios or specific edge cases.
SELECT * FROM search_idx.search(
query => paradedb.empty()
);
Fuzzy Term
Fuzzy search allows users to obtain search results that approximately match the query term, accommodating minor typos in the input. This enhances the search experience by providing relevant results even when the query is not spelled correctly.
SELECT * FROM search_idx.search(
query => paradedb.fuzzy_term(field => 'description', value => 'wolo')
);
Specifies the field within the document to search for the term.
Defines the term or phrase you are searching for within the specified field, using fuzzy logic based on Levenshtein distance to find similar terms.
The maximum Levenshtein distance (i.e. single character edits) allowed to
consider a term in the index as a match for the query term. Maximum value is
2
.
When set to true
, transpositions (swapping two adjacent characters) as a
single edit in the Levenshtein distance calculation, while false
considers
it two separate edits (a deletion and an insertion).
When set to true
, the initial substring (prefix) of the query term is
exempted from the fuzzy edit distance calculation, while false includes the
entire string in the calculation.
Phrase
Searches for documents containing an exact sequence of words, with slop
allowing for some flexibility in term proximity. This query type also requires position indexing.
SELECT * FROM search_idx.search(
query => paradedb.phrase(
field => 'description',
phrases => ARRAY['robot', 'building', 'kit']
)
);
Specifies the field within the document to search for the term.
An ARRAY
of words that form the search phrase. These words must appear in
the specified order within the document for a match to occur, although some
flexibility is allowed based on the slop
parameter.
A slop of 0 requires the terms to appear exactly as they are in the phrase and adjacent to each other. Higher slop values allow for more distance and flexibility between the terms, accommodating variations in the phrase’s structure within the document.
Phrase Prefix
Identifies documents containing a given sequence of words followed by a term prefix, requiring the indexing of positions for the search field.
SELECT * FROM search_idx.search(
query => paradedb.phrase_prefix(
field => 'description',
phrases => ARRAY['har']
)
);
Specifies the field within the document to search for the term.
An ARRAY
of words that the search is looking to match, followed by a term
prefix rather than a complete term.
Limits the number of term variations that the prefix can expand to during the search. This helps in controlling the breadth of the search by setting a cap on how many different terms the prefix can match.
Range
Finds documents containing a term that falls within a specified range of values.
SELECT * FROM search_idx.search(
query => paradedb.range(
field => 'rating',
range => '[1,3)'::int4range
),
stable_sort => true
);
Specifies the field within the document to search for the term.
A postgres range specifying the range of values to match the field against.
Range types include int4range
, int8range
, daterange
, tsrange
, and
tstzrange
.
Regex
Finds documents containing terms that match a specific regex pattern, enabling pattern-based searching.
SELECT * FROM search_idx.search(
query => paradedb.regex(
field => 'description',
pattern => '(hardcover|plush|leather|running|wireless)'
)
);
Specifies the field within the document to search for the term.
Term
Matches documents containing a specified term, with scoring based on term frequency, inverse document frequency, and field normalization. The term value is treated as a token and is not tokenized further. It is matched directly against terms in the index.
SELECT * FROM search_idx.search(
query => paradedb.term(field => 'description', value => 'shoes')
);
SELECT * FROM search_idx.search(
query => paradedb.term(
field => 'created_at',
value => TIMESTAMP '2023-05-01 09:12:34'
)
);
Specifies the field within the document to search for the term. If omitted, all indexed fields will be searched.
TermSet
Matches documents containing any term from a specified set, offering flexibility in matching criteria.
SELECT * FROM search_idx.search(
query => paradedb.term_set(
terms => ARRAY[
paradedb.term(field => 'description', value => 'shoes'),
paradedb.term(field => 'description', value => 'novel')
]
)
);
An ARRAY
of paradedb.term
query objects.
Was this page helpful?