PERFORMANCE OPTIMIZATION 12 MIN READ 2026.03.03

> Vector Index Performance Tuning

Comprehensive guide to tuning vector index parameters for optimal retrieval latency and recall.

Vector Index Performance Tuning

Vector Index Trade-offs

Vector indices trade off among recall (accuracy), latency, and memory usage. Understanding these trade-offs enables optimal configuration for your use case.

HNSW Tuning

Build Parameters

// HNSW index configuration
{
  "index_type": "hnsw",
  "build_params": {
    "M": 16,              // Connections per node (memory vs recall)
    "efConstruction": 200  // Build quality (time vs recall)
  },
  "search_params": {
    "ef": 100             // Search quality (latency vs recall)
  }
}

// Higher M = better recall, more memory
// M=8: ~60 bytes/vector overhead
// M=16: ~120 bytes/vector overhead
// M=32: ~240 bytes/vector overhead

Search Tuning

// Dynamic ef based on requirements
function getSearchEf(requirements: SearchRequirements): number {
  if (requirements.priority === 'recall') {
    return 500;  // Higher recall, 10-20ms latency
  } else if (requirements.priority === 'latency') {
    return 50;   // Lower recall, 1-2ms latency
  }
  return 100;    // Balanced default
}

IVF Tuning

Cluster Configuration

// IVF index parameters
{
  "index_type": "ivf",
  "build_params": {
    "nlist": 1024,        // Number of clusters
    "training_size": 100000
  },
  "search_params": {
    "nprobe": 32          // Clusters to search
  }
}

// nlist rule of thumb: sqrt(N) to 4*sqrt(N)
// nprobe trade-off: higher = better recall, higher latency

Quantization

Product Quantization

// PQ configuration
{
  "quantization": {
    "type": "pq",
    "m": 32,              // Subquantizers
    "nbits": 8            // Bits per subquantizer
  }
}

// Memory reduction: 1536 dims * 4 bytes = 6KB
// With PQ (m=32, nbits=8): 32 bytes
// 99% memory reduction with ~5-10% recall loss

Hybrid Index Strategy

Two-Stage Retrieval

// Fast coarse index + precise reranking
async function hybridSearch(query: number[], k: number): Promise<Result[]> {
  // Stage 1: Fast approximate search
  const candidates = await coarseIndex.search(query, k * 10);
  
  // Stage 2: Precise reranking
  const scores = candidates.map(c => ({
    id: c.id,
    score: dotProduct(query, c.vector)
  }));
  
  return scores.sort((a, b) => b.score - a.score).slice(0, k);
}

Benchmarking Recall

Recall Measurement

async function measureRecall(index: VectorIndex, testSet: TestCase[]): number {
  let totalRecall = 0;
  
  for (const test of testSet) {
    const results = await index.search(test.query, 10);
    const resultIds = new Set(results.map(r => r.id));
    
    const truePositives = test.groundTruth.filter(id => resultIds.has(id)).length;
    totalRecall += truePositives / test.groundTruth.length;
  }
  
  return totalRecall / testSet.length;  // Recall@10
}

Conclusion

Vector index tuning requires understanding recall/latency/memory trade-offs. Start with default parameters, benchmark with representative queries, and tune based on your specific requirements.

//TAGS

VECTORS INDEXING TUNING PERFORMANCE