Basic Full Text Search
Basic Usage
Prerequisite Before performing full text search over a table, it must first be indexed.
The search
function returns rows of a table that match a search query. By default, rows are sorted by relevance.
SELECT * FROM <index_name>.search('<query>');
The name of the index.
The ParadeQL query string. See the documentation below for how to construct this string.
ParadeQL
The query string accepts ParadeQL, a mini query language which can be used to construct more expressive queries.
Specifying Fields
Each query must specify which field to search over. In the following example, we are querying for “keyboard” against the “description” field.
'description:keyboard'
Phrase Search
Phrases containing spaces should be wrapped in double quotes.
'description:"plastic keyboard"'
Efficient Filtering
Filters can be applied over numeric fields, which improves query times compared to standard SQL WHERE
clauses.
-- Equality
'description:keyboard AND rating:4'
-- Simple range
'description:keyboard AND rating:>4'
-- Inclusive range
'description:keyboard AND rating:[2 TO 5]'
-- Exclusive range
'description:keyboard AND rating:{2 TO 5}'
-- Boolean
'description:keyboard AND is_available:false'
Filters only work over Postgres columns that have been indexed as
numeric_fields
or
boolean_fields
.
JSON Fields
Use .
to search over text values nested inside JSON. For instance, the following query would search over a field with values like {"metadata": {"color": "white"}}
.
'metadata.color:white'
When dealing with JSON arrays, the array elements are “flattened” so that each element can be searched individually. This means that if a JSON array is encountered, each element in the array is treated as a separate value and indexed accordingly. For example, given the following JSON structure:
{
"metadata": {
"colors": ["red", "green", "blue"]
}
}
The JSON array in the colors field is flattened to emit separate terms for each color. This allows for individual search queries like:
'metadata.colors:red'
'metadata.colors:green'
'metadata.colors:blue'
Each of these queries would correctly match the document containing the JSON array.
Searching for integers in a nested JSON structure is not supported. For example:
SELECT * FROM <index_name>.search('metadata.attributes:4', stable_sort => true)
Datetime Fields
Search terms will use the UTC time zone if not specified and need to be in RFC3339 format for the search
function.
-- To demonstrate time zones, these query strings are equivalent
'created_at:"2023-05-01T09:12:34Z"'
'created_at:"2023-05-01T04:12:34-05:00"'
Boosting
The ^
character is used to boost the scores of individual fields. Boosting is helpful for tuning
the relevance scores of results.
'description:keyboard^2 OR category:electronics^3'
Boolean Operators
AND
, OR
, and NOT
can be used to combine and filter multiple terms.
'description:keyboard OR category:toy'
Use parentheses to group terms and control the order of operations.
'(description:keyboard OR category:toy) AND description:metal'
Slop Operator
The ~
slop operator is used to match phrases separated by words in between. For instance, let’s say
there exists a row with description
set to “ergonomic metal keyboard.” Because words “ergonomic” and “keyboard”
are separated by one word, the following query would find this row.
'"ergonomic keyboard"~1'
The slop operator is distinct from fuzzy search, which is used for typo tolerance.
Set Operator
The set operator is a more CPU-efficient way of combining multiple OR
s.
'description:IN [keyboard, toy]'
Special Characters
The special characters +
, ^
, ```, :
, {
, }
, "
, [
, ]
, (
, )
, ~
, !
, \\
, \*
, and SPACE
must be escaped by a\
inside the query term.
Limit and Offset
Specifying a limit and offset is a more efficient way of iterating through search results compared to
SQL’s LIMIT
and OFFSET
options. For optimal query performance, we recommend using the limit_rows
and
offset_rows
options instead of LIMIT
and OFFSET
.
SELECT *
FROM <index_name>.search(
'<query>',
limit_rows => <limit_rows>,
offset_rows => <offset_rows>
)
Stable Ordering
Search results are always ordered based on their BM25 score, but we can use the stable_sort
parameter for control over the order of equally-scored results.
If false
, equally-scored results will be ordered based on their insertion order into the index.
This is the default, and allows for the fastest possible query times.
If true
, equally-scored results will be ordered based on their key_field
, but query times will be slower.
This is useful for testing or anytime where results need to be deterministic.
SELECT *
FROM <index_name>.search(
'<query>',
stable_sort => true
)
The name of the index.
The query string.
The number of rows to skip before starting to return rows.
A boolean specifying whether ParadeDB should stabilize the order of equally-scored results, at the cost of performance.
Was this page helpful?