Skip to main content
ParadeDB provides amcheck-style index verification functions to detect corruption and validate the structural integrity of BM25 indexes. These functions are useful for:
  • Proactive corruption detection before issues become critical
  • Validating index health after hardware failures or unexpected shutdowns
  • Verifying backup integrity
  • Debugging index-related issues

Basic Verification

The pdb.verify_index function performs structural integrity checks on a BM25 index:
SELECT * FROM pdb.verify_index('search_idx');
This returns a table with three columns:
ColumnTypeDescription
check_nametextName of the verification check
passedbooleanWhether the check passed
detailstextAdditional information about the check result

Example Output

               check_name               | passed |                    details
----------------------------------------+--------+-----------------------------------------------
 search_idx: schema_valid               | t      | Index schema loaded successfully
 search_idx: index_readable             | t      | Index reader opened successfully
 search_idx: checksums_valid            | t      | All segment checksums validated successfully
 search_idx: segment_metadata_valid     | t      | 3 segments validated successfully

Heap Reference Validation

To verify that all indexed entries still exist in the heap table, use the heapallindexed option:
SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true);
This adds an additional check that validates every indexed ctid (tuple identifier) references a valid row in the table. This is particularly useful for detecting index entries that reference deleted or non-existent rows.
The heapallindexed option can be slow on large indexes as it must verify every document. Consider using sample_rate for quick spot checks on large indexes.

Options

Sampling for Large Indexes

For large indexes, you can check a random sample of documents instead of all documents:
-- Check 10% of documents
SELECT * FROM pdb.verify_index('search_idx',
    heapallindexed := true,
    sample_rate := 0.1
);

Progress Reporting

For long-running verifications, enable progress reporting to see status updates:
SELECT * FROM pdb.verify_index('search_idx',
    heapallindexed := true,
    report_progress := true
);
Progress messages are emitted via PostgreSQL’s WARNING channel.

Verbose Mode

For detailed logging including segment-by-segment progress and resume hints, enable verbose mode:
SELECT * FROM pdb.verify_index('search_idx',
    heapallindexed := true,
    report_progress := true,
    verbose := true
);

Stop on First Error

To stop verification immediately when the first error is found (similar to pg_amcheck --on-error-stop):
SELECT * FROM pdb.verify_index('search_idx', on_error_stop := true);

Parallel Verification

A single verify_index call processes segments sequentially within one PostgreSQL backend. For very large indexes, you can distribute verification across multiple database connections by specifying which segments each connection should check using the segment_ids parameter. This allows you to utilize multiple CPU cores by running verification in parallel processes.

Listing Segments

First, list all segments in the index:
SELECT * FROM pdb.index_segments('search_idx');
 partition_name | segment_idx | segment_id | num_docs | num_deleted | max_doc
----------------+-------------+------------+----------+-------------+---------
 search_idx     |           0 | b7e661af   |    10000 |           0 |   10000
 search_idx     |           1 | b4fc1b40   |    10000 |           0 |   10000
 search_idx     |           2 | 9894b412   |    10000 |           0 |   10000
 search_idx     |           3 | 4d0168d6   |     5000 |           0 |    5000

Verifying Specific Segments

Then verify specific segments using the segment_ids parameter:
-- Worker 1: Verify even segments
SELECT * FROM pdb.verify_index('search_idx',
    heapallindexed := true,
    segment_ids := ARRAY[0, 2]
);

-- Worker 2: Verify odd segments
SELECT * FROM pdb.verify_index('search_idx',
    heapallindexed := true,
    segment_ids := ARRAY[1, 3]
);

Automation Example

Distribute verification across N workers:
-- Get segments for worker 0 (of 4 workers)
SELECT array_agg(segment_idx) AS segments
FROM pdb.index_segments('search_idx')
WHERE segment_idx % 4 = 0;

-- Run verification with those segments
SELECT * FROM pdb.verify_index('search_idx',
    heapallindexed := true,
    segment_ids := (
        SELECT array_agg(segment_idx)
        FROM pdb.index_segments('search_idx')
        WHERE segment_idx % 4 = 0
    )
);

Verifying All BM25 Indexes

To verify all BM25 indexes in the database at once:
SELECT * FROM pdb.verify_all_indexes();

Filtering by Pattern

Filter indexes by schema or name pattern (using SQL LIKE syntax):
-- Verify indexes in the 'public' schema only
SELECT * FROM pdb.verify_all_indexes(schema_pattern := 'public');

-- Verify indexes matching a name pattern
SELECT * FROM pdb.verify_all_indexes(index_pattern := 'search_%');

-- Combine filters
SELECT * FROM pdb.verify_all_indexes(
    schema_pattern := 'app_%',
    index_pattern := '%_idx',
    heapallindexed := true
);

Listing All BM25 Indexes

To see all BM25 indexes in the database with summary statistics:
SELECT * FROM pdb.indexes();
 schemaname |  tablename  |   indexname   | indexrelid | num_segments | total_docs
------------+-------------+---------------+------------+--------------+------------
 public     | products    | products_idx  |      16421 |            3 |      50000
 public     | documents   | documents_idx |      16435 |            5 |     125000
 app        | articles    | articles_idx  |      16448 |            2 |      10000

Function Reference

pdb.verify_index

Verifies a single BM25 index.
ParameterTypeDefaultDescription
indexregclass(required)The index to verify
heapallindexedbooleanfalseCheck that all indexed ctids exist in the heap
sample_ratefloatNULLFraction of documents to check (0.0-1.0). NULL = check all
report_progressbooleanfalseEmit progress messages
verbosebooleanfalseEmit detailed segment-level progress and resume hints
on_error_stopbooleanfalseStop on first error found
segment_idsint[]NULLSpecific segment indices to check. NULL = all segments

pdb.verify_all_indexes

Verifies all BM25 indexes in the database.
ParameterTypeDefaultDescription
schema_patterntextNULLFilter by schema name (SQL LIKE pattern). NULL = all
index_patterntextNULLFilter by index name (SQL LIKE pattern). NULL = all
heapallindexedbooleanfalseCheck that all indexed ctids exist in the heap
sample_ratefloatNULLFraction of documents to check (0.0-1.0). NULL = check all
report_progressbooleanfalseEmit progress messages
on_error_stopbooleanfalseStop on first error found

pdb.index_segments

Lists all segments in a BM25 index.
ParameterTypeDefaultDescription
indexregclass(required)The index to inspect
Returns:
ColumnTypeDescription
partition_nametextName of the index partition
segment_idxintSegment index (use with segment_ids parameter)
segment_idtextTantivy segment UUID
num_docsbigintNumber of live documents
num_deletedbigintNumber of deleted documents
max_docbigintMaximum document ID

pdb.indexes

Lists all BM25 indexes in the database. Returns:
ColumnTypeDescription
schemanametextSchema containing the index
tablenametextTable the index is on
indexnametextName of the index
indexrelidoidOID of the index
num_segmentsintNumber of Tantivy segments
total_docsbigintTotal documents across all segments