ParadeDB provides amcheck-style index verification functions to detect corruption and validate the structural integrity of BM25 indexes.
These functions are useful for:
- Proactive corruption detection before issues become critical
- Validating index health after hardware failures or unexpected shutdowns
- Verifying backup integrity
- Debugging index-related issues
Basic Verification
The pdb.verify_index function performs structural integrity checks on a BM25 index:
SELECT * FROM pdb.verify_index('search_idx');
This returns a table with three columns:
| Column | Type | Description |
|---|
check_name | text | Name of the verification check |
passed | boolean | Whether the check passed |
details | text | Additional information about the check result |
Example Output
check_name | passed | details
----------------------------------------+--------+-----------------------------------------------
search_idx: schema_valid | t | Index schema loaded successfully
search_idx: index_readable | t | Index reader opened successfully
search_idx: checksums_valid | t | All segment checksums validated successfully
search_idx: segment_metadata_valid | t | 3 segments validated successfully
Heap Reference Validation
To verify that all indexed entries still exist in the heap table, use the heapallindexed option:
SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true);
This adds an additional check that validates every indexed ctid (tuple identifier) references a valid row in the table.
This is particularly useful for detecting index entries that reference deleted or non-existent rows.
The heapallindexed option can be slow on large indexes as it must verify
every document. Consider using sample_rate for quick spot checks on large
indexes.
Options
Sampling for Large Indexes
For large indexes, you can check a random sample of documents instead of all documents:
-- Check 10% of documents
SELECT * FROM pdb.verify_index('search_idx',
heapallindexed := true,
sample_rate := 0.1
);
Progress Reporting
For long-running verifications, enable progress reporting to see status updates:
SELECT * FROM pdb.verify_index('search_idx',
heapallindexed := true,
report_progress := true
);
Progress messages are emitted via PostgreSQL’s WARNING channel.
Verbose Mode
For detailed logging including segment-by-segment progress and resume hints, enable verbose mode:
SELECT * FROM pdb.verify_index('search_idx',
heapallindexed := true,
report_progress := true,
verbose := true
);
Stop on First Error
To stop verification immediately when the first error is found (similar to pg_amcheck --on-error-stop):
SELECT * FROM pdb.verify_index('search_idx', on_error_stop := true);
Parallel Verification
A single verify_index call processes segments sequentially within one PostgreSQL backend.
For very large indexes, you can distribute verification across multiple database connections
by specifying which segments each connection should check using the segment_ids parameter.
This allows you to utilize multiple CPU cores by running verification in parallel processes.
Listing Segments
First, list all segments in the index:
SELECT * FROM pdb.index_segments('search_idx');
partition_name | segment_idx | segment_id | num_docs | num_deleted | max_doc
----------------+-------------+------------+----------+-------------+---------
search_idx | 0 | b7e661af | 10000 | 0 | 10000
search_idx | 1 | b4fc1b40 | 10000 | 0 | 10000
search_idx | 2 | 9894b412 | 10000 | 0 | 10000
search_idx | 3 | 4d0168d6 | 5000 | 0 | 5000
Verifying Specific Segments
Then verify specific segments using the segment_ids parameter:
-- Worker 1: Verify even segments
SELECT * FROM pdb.verify_index('search_idx',
heapallindexed := true,
segment_ids := ARRAY[0, 2]
);
-- Worker 2: Verify odd segments
SELECT * FROM pdb.verify_index('search_idx',
heapallindexed := true,
segment_ids := ARRAY[1, 3]
);
Automation Example
Distribute verification across N workers:
-- Get segments for worker 0 (of 4 workers)
SELECT array_agg(segment_idx) AS segments
FROM pdb.index_segments('search_idx')
WHERE segment_idx % 4 = 0;
-- Run verification with those segments
SELECT * FROM pdb.verify_index('search_idx',
heapallindexed := true,
segment_ids := (
SELECT array_agg(segment_idx)
FROM pdb.index_segments('search_idx')
WHERE segment_idx % 4 = 0
)
);
Verifying All BM25 Indexes
To verify all BM25 indexes in the database at once:
SELECT * FROM pdb.verify_all_indexes();
Filtering by Pattern
Filter indexes by schema or name pattern (using SQL LIKE syntax):
-- Verify indexes in the 'public' schema only
SELECT * FROM pdb.verify_all_indexes(schema_pattern := 'public');
-- Verify indexes matching a name pattern
SELECT * FROM pdb.verify_all_indexes(index_pattern := 'search_%');
-- Combine filters
SELECT * FROM pdb.verify_all_indexes(
schema_pattern := 'app_%',
index_pattern := '%_idx',
heapallindexed := true
);
Listing All BM25 Indexes
To see all BM25 indexes in the database with summary statistics:
SELECT * FROM pdb.indexes();
schemaname | tablename | indexname | indexrelid | num_segments | total_docs
------------+-------------+---------------+------------+--------------+------------
public | products | products_idx | 16421 | 3 | 50000
public | documents | documents_idx | 16435 | 5 | 125000
app | articles | articles_idx | 16448 | 2 | 10000
Function Reference
pdb.verify_index
Verifies a single BM25 index.
| Parameter | Type | Default | Description |
|---|
index | regclass | (required) | The index to verify |
heapallindexed | boolean | false | Check that all indexed ctids exist in the heap |
sample_rate | float | NULL | Fraction of documents to check (0.0-1.0). NULL = check all |
report_progress | boolean | false | Emit progress messages |
verbose | boolean | false | Emit detailed segment-level progress and resume hints |
on_error_stop | boolean | false | Stop on first error found |
segment_ids | int[] | NULL | Specific segment indices to check. NULL = all segments |
pdb.verify_all_indexes
Verifies all BM25 indexes in the database.
| Parameter | Type | Default | Description |
|---|
schema_pattern | text | NULL | Filter by schema name (SQL LIKE pattern). NULL = all |
index_pattern | text | NULL | Filter by index name (SQL LIKE pattern). NULL = all |
heapallindexed | boolean | false | Check that all indexed ctids exist in the heap |
sample_rate | float | NULL | Fraction of documents to check (0.0-1.0). NULL = check all |
report_progress | boolean | false | Emit progress messages |
on_error_stop | boolean | false | Stop on first error found |
pdb.index_segments
Lists all segments in a BM25 index.
| Parameter | Type | Default | Description |
|---|
index | regclass | (required) | The index to inspect |
Returns:
| Column | Type | Description |
|---|
partition_name | text | Name of the index partition |
segment_idx | int | Segment index (use with segment_ids parameter) |
segment_id | text | Tantivy segment UUID |
num_docs | bigint | Number of live documents |
num_deleted | bigint | Number of deleted documents |
max_doc | bigint | Maximum document ID |
pdb.indexes
Lists all BM25 indexes in the database.
Returns:
| Column | Type | Description |
|---|
schemaname | text | Schema containing the index |
tablename | text | Table the index is on |
indexname | text | Name of the index |
indexrelid | oid | OID of the index |
num_segments | int | Number of Tantivy segments |
total_docs | bigint | Total documents across all segments |