Overview

The search function that’s generated for your index can accept query objects instead of a query string. Nearly all the query types implemented in Tantivy, our underlying search engine, are exposed to Postgres.

Query objects are composable, allowing for arbitrarily fine-grained queries. A query string itself is parsed into a query object, and can be used anywhere in a query tree with paradedb.parse.

paradedb.parse is implicitly performed if a query string is passed directly to the search function. These two queries are identical:

SELECT * FROM search_idx.search('description:shoes');

SELECT * FROM search_idx.search(query => paradedb.parse('description:shoes'));

Query Types

The paradedb schema exposes the following functions to construct query objects.

All

This query type indiscriminately matches every document in the index, assigning a uniform score of 1.0 to each.

SELECT * FROM search_idx.search(
    query => paradedb.all()
);

Boolean

Boolean queries filter documents based on the logical relationships defined by their subqueries, considering:

  • Documents that satisfy all must conditions.
  • Documents that satisfy none of the must_not conditions.
  • Documents that satisfy at least one condition from either must or should.
SELECT * FROM search_idx.search(
    query => paradedb.boolean(
	    should => ARRAY[
		    paradedb.parse('description:shoes'),
		    paradedb.phrase_prefix(field => 'description', phrases => ARRAY['book']),
		    paradedb.term(field => 'description', value => 'speaker'),
		    paradedb.fuzzy_term(field => 'description', value => 'wolo')
	    ]
    )
);
must

A query object or an ARRAY of query objects as conditions which must be matched.

must_not

A query object or an ARRAY of query objects as conditions which must not be matched.

should

A query object or an ARRAY of query objects as conditions of which at least one must be matched.

Boost

A boost query wraps around another query to amplify its scoring impact, without altering the set of matched documents.

SELECT * FROM search_idx.search(
	query => paradedb.boost(query => paradedb.all(), boost => 1.5)
);
boost

The factor by which to multiply the score of each result.

query
The query to perform.

Const Score

Applies a constant score across all documents matched by the underlying query. It can avoid unnecessary score computation on the wrapped query.

SELECT * FROM search_idx.search(
    query => paradedb.const_score(query => paradedb.all(), score => 3.9)
);
score
The constant score to use for each result.
query
The query to perform.

Disjunction Max

Returns documents that match one or more of the specified subqueries. If a document matches multiple criteria, it receives the highest score from those matches, with a slight increase for any additional matches.

SELECT * FROM search_idx.search(
	query => paradedb.disjunction_max(
		disjuncts => ARRAY[paradedb.parse('description:shoes')]
	)
);
disjuncts
Query objects to match against.
tie_breaker

A tie breaking increment for matching subqueries

Empty

Serves as a placeholder, matching no documents. It’s useful for testing scenarios or specific edge cases.

SELECT * FROM search_idx.search(
	query => paradedb.empty()
);

Fuzzy Term

Fuzzy search allows users to obtain search results that approximately match the query term, accommodating minor typos in the input. This enhances the search experience by providing relevant results even when the query is not spelled correctly.

Highlighting is not currently supported for fuzzy search queries.
SELECT * FROM search_idx.search(
	query => paradedb.fuzzy_term(field => 'description', value => 'wolo')
);
field

Specifies the field within the document to search for the term.

value

Defines the term or phrase you are searching for within the specified field, using fuzzy logic based on Levenshtein distance to find similar terms.

distance
default: 2

The maximum Levenshtein distance (i.e. single character edits) allowed to consider a term in the index as a match for the query term. Maximum value is 2.

transpose_cost_one
default: true

When set to true, transpositions (swapping two adjacent characters) as a single edit in the Levenshtein distance calculation, while false considers it two separate edits (a deletion and an insertion).

prefix
default: false

When set to true, the initial substring (prefix) of the query term is exempted from the fuzzy edit distance calculation, while false includes the entire string in the calculation.

Phrase

Searches for documents containing an exact sequence of words, with slop allowing for some flexibility in term proximity. This query type also requires position indexing.

SELECT * FROM search_idx.search(
	query => paradedb.phrase(
		field => 'description',
		phrases => ARRAY['robot', 'building', 'kit']
	)
);
field

Specifies the field within the document to search for the term.

phrases

An ARRAY of words that form the search phrase. These words must appear in the specified order within the document for a match to occur, although some flexibility is allowed based on the slop parameter.

slop

A slop of 0 requires the terms to appear exactly as they are in the phrase and adjacent to each other. Higher slop values allow for more distance and flexibility between the terms, accommodating variations in the phrase’s structure within the document.

Phrase Prefix

Identifies documents containing a given sequence of words followed by a term prefix, requiring the indexing of positions for the search field.

SELECT * FROM search_idx.search(
	query => paradedb.phrase_prefix(
		field => 'description',
		phrases => ARRAY['har']
	)
);
field

Specifies the field within the document to search for the term.

phrases

An ARRAY of words that the search is looking to match, followed by a term prefix rather than a complete term.

max_expansion

Limits the number of term variations that the prefix can expand to during the search. This helps in controlling the breadth of the search by setting a cap on how many different terms the prefix can match.

Range

Finds documents containing a term that falls within a specified range of values.

SELECT * FROM search_idx.search(
  query => paradedb.range(
    field => 'rating',
    range => '[1,3)'::int4range
  ),
  stable_sort => true
);
field

Specifies the field within the document to search for the term.

range

A postgres range specifying the range of values to match the field against. Range types include int4range, int8range, daterange, tsrange, and tstzrange.

Regex

Finds documents containing terms that match a specific regex pattern, enabling pattern-based searching.

SELECT * FROM search_idx.search(
	query => paradedb.regex(
		field => 'description',
		pattern => '(hardcover|plush|leather|running|wireless)'
	)
);
field

Specifies the field within the document to search for the term.

pattern
A regex pattern string.

Term

Matches documents containing a specified term, with scoring based on term frequency, inverse document frequency, and field normalization. The term value is treated as a token and is not tokenized further. It is matched directly against terms in the index.

SELECT * FROM search_idx.search(
	query => paradedb.term(field => 'description', value => 'shoes')
);

SELECT * FROM search_idx.search(
  query => paradedb.term(
    field => 'created_at',
    value => TIMESTAMP '2023-05-01 09:12:34'
  )
);
field

Specifies the field within the document to search for the term. If omitted, all indexed fields will be searched.

value
Value to search for in the document field.

TermSet

Matches documents containing any term from a specified set, offering flexibility in matching criteria.

SELECT * FROM search_idx.search(
	query => paradedb.term_set(
	    terms => ARRAY[
	        paradedb.term(field => 'description', value => 'shoes'),
	        paradedb.term(field => 'description', value => 'novel')
	    ]
	)
);
terms

An ARRAY of paradedb.term query objects.