Skip to main content
Highlighting is an expensive process and can slow down query times. We recommend passing a LIMIT to any query where paradedb.snippet is called to restrict the number of snippets that need to be generated.
Highlighting is not supported for queries that use fuzziness, like paradedb.fuzzy_term.
Highlighting refers to the practice of visually emphasizing the portions of a document that match a user’s search query. Highlighted snippets of text are by default wrapped in <b></b> tags. This can be modified with the start_tag and end_tag arguments.

Basic Usage

paradedb.snippet(<column>) can be added to any query where an @@@ operator is present. The following query generates highlighted snippets against the description field.
SELECT id, paradedb.snippet(description)
FROM mock_items
WHERE description @@@ 'shoes'
LIMIT 5;
By default, <b></b> encloses the snippet. This can be configured with start_tag and end_tag:
SELECT id, paradedb.snippet(description, start_tag => '<i>', end_tag => '</i>')
FROM mock_items
WHERE description @@@ 'shoes'
LIMIT 5;

Fragment Size

For every highlighted term, a fragment of size max_num_chars is created containing the term and its surrounding text. A fragment can contain multiple highlighted terms if they are within max_num_chars distance of one another. By default, max_num_chars is set to 150.
SELECT id, paradedb.snippet(description, max_num_chars => 100)
FROM mock_items
WHERE description @@@ 'shoes'
LIMIT 5;
If multiple fragments are found, paradedb.snippet uses a two-tiered scoring system to determine which fragment to display:
  1. Each highlighted term receives a score based on its inverse document frequency. This means that fragments containing rarer terms will score higher.
  2. If there is a tie, the fragment that appears earlier in the source text will be displayed.

Byte Offsets

paradedb.snippet_positions(<column>) returns the byte offsets in the original text where the snippets would appear. It returns an array of tuples, where the the first element of the tuple is the byte index of the first byte of the highlighted region, and the second element is the byte index after the last byte of the region.
SELECT id, paradedb.snippet(description), paradedb.snippet_positions(description)
FROM mock_items
WHERE description @@@ 'shoes'
LIMIT 5;
 id |          snippet           | snippet_positions
----+----------------------------+-------------------
  3 | Sleek running <b>shoes</b> | {"{14,19}"}
  4 | White jogging <b>shoes</b> | {"{14,19}"}
  5 | Generic <b>shoes</b>       | {"{8,13}"}
(3 rows)

Snippet Limit and Offset

Both paradedb.snippet and paradedb.snippet_positions accept limit and offset arguments. A limit restricts the number of highlighted terms, while an offset ignores the first offset highlighted terms. This can be useful for paginating through documents that contain large numbers of highlighted terms.
SELECT id, paradedb.snippet(description, "limit" => 1, "offset" => 1)
FROM mock_items
WHERE description @@@ 'shoes' AND description @@@ 'sleek' AND description @@@ 'running';
Expected Response
 id |          snippet
----+----------------------------
  3 | Sleek <b>running</b> shoes
(1 row)
The limit and offset arguments must be wrapped in double quotes because they are reserved keywords in Postgres.
In the output above, notice that sleek is not highlighted because an offset of 1 skips the first highlighted term. Similarly, shoes is not highlighted because of the limit 1.
I