Vector Index Trade-offs
Vector indices trade off among recall (accuracy), latency, and memory usage. Understanding these trade-offs enables optimal configuration for your use case.
HNSW Tuning
Build Parameters
// HNSW index configuration
{
"index_type": "hnsw",
"build_params": {
"M": 16, // Connections per node (memory vs recall)
"efConstruction": 200 // Build quality (time vs recall)
},
"search_params": {
"ef": 100 // Search quality (latency vs recall)
}
}
// Higher M = better recall, more memory
// M=8: ~60 bytes/vector overhead
// M=16: ~120 bytes/vector overhead
// M=32: ~240 bytes/vector overhead
Search Tuning
// Dynamic ef based on requirements
function getSearchEf(requirements: SearchRequirements): number {
if (requirements.priority === 'recall') {
return 500; // Higher recall, 10-20ms latency
} else if (requirements.priority === 'latency') {
return 50; // Lower recall, 1-2ms latency
}
return 100; // Balanced default
}
IVF Tuning
Cluster Configuration
// IVF index parameters
{
"index_type": "ivf",
"build_params": {
"nlist": 1024, // Number of clusters
"training_size": 100000
},
"search_params": {
"nprobe": 32 // Clusters to search
}
}
// nlist rule of thumb: sqrt(N) to 4*sqrt(N)
// nprobe trade-off: higher = better recall, higher latency
Quantization
Product Quantization
// PQ configuration
{
"quantization": {
"type": "pq",
"m": 32, // Subquantizers
"nbits": 8 // Bits per subquantizer
}
}
// Memory reduction: 1536 dims * 4 bytes = 6KB
// With PQ (m=32, nbits=8): 32 bytes
// 99% memory reduction with ~5-10% recall loss
Hybrid Index Strategy
Two-Stage Retrieval
// Fast coarse index + precise reranking
async function hybridSearch(query: number[], k: number): Promise<Result[]> {
// Stage 1: Fast approximate search
const candidates = await coarseIndex.search(query, k * 10);
// Stage 2: Precise reranking
const scores = candidates.map(c => ({
id: c.id,
score: dotProduct(query, c.vector)
}));
return scores.sort((a, b) => b.score - a.score).slice(0, k);
}
Benchmarking Recall
Recall Measurement
async function measureRecall(index: VectorIndex, testSet: TestCase[]): number {
let totalRecall = 0;
for (const test of testSet) {
const results = await index.search(test.query, 10);
const resultIds = new Set(results.map(r => r.id));
const truePositives = test.groundTruth.filter(id => resultIds.has(id)).length;
totalRecall += truePositives / test.groundTruth.length;
}
return totalRecall / testSet.length; // Recall@10
}
Conclusion
Vector index tuning requires understanding recall/latency/memory trade-offs. Start with default parameters, benchmark with representative queries, and tune based on your specific requirements.