What is a Sparse HNSW Index?

Generated by new models like SPLADE, sparse vectors can detect the presence of exact keywords while also capturing semantic similarity between terms. Unlike dense vectors, sparse vectors contain significantly more entries, most of which are zero. For instance, OpenAI’s text-embedding-ada-002 model outputs dense vectors with 1536 entries, whereas SPLADE outputs sparse vectors with over 30,000 entries.

A sparse HNSW index is a specialized HNSW index designed to handle sparse vectors.

Creating a Sparse HNSW Index

The following command creates a sparse HNSW index over a column:

CREATE INDEX ON <schema_name>.<table_name>
USING shnsw (<column_name> <distance_metric>);
schema_name

The name of the schema, or namespace, of the table. The schema name only needs to be provided if the table is not in the public schema.

table_name
required

The name of the table being indexed.

column_name
required

The name of the column being indexed.

distance_metric
required

The distance metric used for measuring similarity between two vectors. Use svector_l2_ops for L2 distance, svector_ip_ops for inner product, and svector_cosine_ops for cosine distance.

Index Options

The following example demonstrates how to pass options when creating the HNSW index:

CREATE INDEX ON mock_items
USING shnsw (sparse_embedding svector_l2_ops)
WITH (m = 16, ef_construction = 64);
m
default: 16

The maximum number of connections per layer. A higher value increases recall but also increases index size and construction time.

ef_construction
default: 64

A higher value creates a higher quality graph, which increases recall but also construction time.

Deleting a Sparse HNSW Index

The following command deletes a sparse HNSW index:

DROP INDEX <index_name>;
index_name
required

The name of the index you wish to delete.

Recreating a Sparse HNSW Index

A sparse HNSW index only needs to be recreated if the name of the indexed column changes. To recreate the index, simply delete and create it using the SQL commands above.