Similarity search is a technique that matches documents against a query based on semantic meaning.

ParadeDB’s similarity search is powered by pgvector, a Postgres extension that enables HNSW search over dense vectors. Please refer to the pgvector documentation for more complete documentation.

Vector Type

The vector data type is used to store vectors in ParadeDB. vector has a maximum dimension (i.e. length) of 2,000 entries.

SELECT '[1,2,3]'::vector;

Sparse Vector Type

Sparse vectors are a special type of vector whose entries are mostly zero. They are generated by specialized embedding models like SPLADE.

Because the size of sparse vectors frequently exceeds the 2,000 entry limit of the vector type, sparse vectors can be stored in ParadeDB with the sparsevec type. This type compresses sparse vectors by storing nonzero values and their positions rather than the entire vector, and can support up to 1,000 non-zero entries.

-- Note we only need to specify the positions of the non-zero values
SELECT '{1:3,3:1,5:2}/5'::sparsevec;

The format is {index1:value1,index2:value2}/dimensions and indices start at 1 like SQL arrays.