Overview
Similarity search is a technique that matches documents against a query based on semantic meaning.
ParadeDB’s similarity search is powered by pgvector
, a Postgres extension that enables HNSW search over dense vectors. Please refer to the
pgvector
documentation for more complete documentation.
Vector Type
The vector
data type is used to store vectors in ParadeDB. vector
has a maximum dimension (i.e. length) of 2,000
entries.
Sparse Vector Type
Sparse vectors are a special type of vector whose entries are mostly zero. They are generated by specialized embedding models like SPLADE.
Because the size of sparse vectors frequently exceeds the 2,000
entry limit of the vector
type, sparse vectors can be stored in ParadeDB with the sparsevec
type. This type compresses sparse vectors by storing nonzero values and their positions rather
than the entire vector, and can support up to 1,000
non-zero entries.
The format is {index1:value1,index2:value2}/dimensions
and indices start at 1 like SQL arrays.
Was this page helpful?