Beginner
What problem does a vector database solve?
A vector database stores vectors and answers similarity questions such as "Which stored items are closest to this query vector?" The challenge is scale: comparing a query against every stored vector becomes too slow once you have hundreds of thousands or millions of records. Vector databases solve that by building indexes that avoid brute-force scans while still returning highly relevant neighbors.
- Each record usually contains an ID, a dense vector, optional metadata, and a pointer to source data.
- The query is converted into a vector using the same representation scheme as the stored records.
- The database returns the nearest neighbors according to a similarity function such as cosine, dot product, or Euclidean distance.
- Metadata filters narrow the candidate set, for example by tenant, language, product line, or time range.
Vector index vs. vector database
| Term | Main job | Typical gap |
|---|---|---|
| Vector index | Accelerate nearest-neighbor search over vectors | Usually does not handle filtering, durability, replication, or access control alone |
| Vector database | Operate vector search as a full data system | More moving parts, configuration, and operational trade-offs |
FAISS is a classic example of a vector index library. Systems such as Pinecone, Qdrant, Weaviate, Milvus, and pgvector-backed deployments add database capabilities around indexing so applications can manage vector data over time instead of treating search as a one-off algorithm.
Application record
-> id
-> vector
-> metadata
-> source pointer
Stored in vector database
-> vector index for similarity search
-> metadata store for filters
-> storage/replication for persistence
Simple record structure
records = [
{
"id": "doc-001",
"vector": [0.12, 0.55, -0.20],
"metadata": {"tenant": "acme", "topic": "warranty", "year": 2025},
"payload": {"title": "Warranty policy"}
},
{
"id": "doc-002",
"vector": [0.10, 0.51, -0.18],
"metadata": {"tenant": "acme", "topic": "returns", "year": 2025},
"payload": {"title": "Returns policy"}
}
]
query_vector = [0.11, 0.52, -0.19]
# The database uses an index to find likely nearest neighbors quickly.
Intermediate
Similarity metrics
Nearest-neighbor search depends on how closeness is defined. The metric must match the assumptions of the embedding model and the index configuration.
| Metric | Idea | When it is common |
|---|---|---|
| Cosine similarity | Compare angle between vectors | Text search with normalized embeddings |
| Dot product | Reward both alignment and magnitude | Models trained with inner-product objectives |
| Euclidean distance (L2) | Measure straight-line distance | Vision or geometric feature spaces |
Important: A mismatch between model training objective and database metric can quietly hurt retrieval quality even when the index itself is fast.
Why approximate nearest neighbor search is necessary
Exact search checks every vector, which is feasible for small datasets but expensive for large collections. Approximate nearest neighbor (ANN) methods search only promising regions of the space, trading a small amount of recall for major latency and throughput gains.
| Index family | How it works | Main trade-off |
|---|---|---|
| HNSW | Navigable graph of neighbors across multiple layers | Excellent recall/latency, but memory-heavy |
| IVF | Cluster vectors, then search only selected clusters | Fast and scalable, but needs good partitioning |
| IVF-PQ / PQ | Compress vectors into short codes | Lower memory usage, some precision loss |
| LSH | Hash similar vectors into the same buckets | Very fast for some workloads, less common in modern text search |
The key operating knobs differ by index. For HNSW, search breadth affects recall and latency. For IVF, the number of coarse clusters and probes matters. For PQ, codebook size and compression ratio matter. Good teams benchmark these choices rather than assuming one default fits all datasets.
Filtering and data layout
Vector search almost never runs on vectors alone. Real applications need filters such as tenant separation, language, content type, freshness windows, or authorization tags. That means the database must coordinate vector search with structured filtering.
| Concern | Why it matters | Typical lever |
|---|---|---|
| Recall | Missing close neighbors makes the search system unreliable | Index tuning, candidate expansion, exact re-check |
| Latency | Search must fit interactive SLAs | ANN parameters, caching, shard layout |
| Filtering | Users often need tenant, security, or time scoping | Metadata schema, pre-filter vs post-filter strategy |
| Updates | Knowledge changes over time | Upsert, background compaction, freshness layer |
upsert_request = {
"points": [
{
"id": "faq-17",
"vector": [0.32, -0.14, 0.88],
"metadata": {"tenant": "acme", "region": "us", "status": "published"}
}
]
}
search_request = {
"vector": [0.30, -0.10, 0.90],
"top_k": 5,
"filter": {"tenant": "acme", "status": "published"},
"include_metadata": True
}
Write path
application -> validate schema -> store vector + metadata -> update index
Read path
query vector -> apply filters -> ANN candidate search -> score/rerank -> return top-k
Advanced
Production architecture concerns
A serious vector database is not just an index in memory. It must keep data durable, searchable after restarts, and responsive under changing traffic. That introduces classic distributed systems concerns on top of ANN search.
- Sharding: Partition data across nodes so the collection can grow beyond a single machine.
- Replication: Keep copies of data for fault tolerance and higher read throughput.
- Consistency: Decide how quickly writes must become visible across replicas.
- Freshness: Handle newly inserted vectors quickly even if the main index needs slower rebuild or compaction work.
- Multi-tenancy: Prevent one tenant's scale or hot traffic from degrading another tenant's queries.
- Access control: Restrict which tenants, users, or services can read specific vector collections.
What to measure
Benchmarking vector databases requires both search quality and systems metrics. Fast answers are not useful if the nearest neighbors are wrong, and accurate answers are not useful if p95 latency breaks your SLA.
| Metric | What it tells you | Common failure signal |
|---|---|---|
| Recall@k | Whether ANN search is finding the right neighbors | Relevant items disappear when index parameters are tightened |
| p95 / p99 latency | Tail responsiveness under realistic traffic | Queries occasionally spike far above average |
| Write freshness | How long new vectors take to become searchable | Recent updates cannot be found for seconds or minutes |
| Filter selectivity | How restrictive metadata filters are | Query cost jumps when filters are broad or highly skewed |
| Memory per million vectors | Infrastructure efficiency of the chosen index | HNSW or uncompressed storage becomes too expensive |
Common failure modes
- Using the wrong metric: Cosine, dot product, and L2 are not interchangeable unless the embedding setup makes them equivalent.
- Ignoring filter design: Poor metadata schema can make filtered search much slower than unfiltered search.
- Assuming low average latency is enough: Tail latency usually matters more than the mean for user-facing systems.
- Over-compressing too early: Aggressive quantization or PQ can save memory while silently hurting recall.
- No ground-truth benchmark: Without exact-search comparisons on a sample set, ANN tuning becomes guesswork.
Selection checklist
If your workload needs:
highest recall with enough RAM -> HNSW is often a strong default
lower memory footprint at larger scale -> consider IVF/PQ variants
strict tenant isolation -> prioritize namespaces, ACLs, and filter performance
frequent writes -> verify upsert cost and freshness guarantees
low-ops deployment -> managed/serverless offerings may matter more than raw ANN speed
Exam framing: Vector databases are best understood as ANN search systems plus database operations. The important trade-off is not just speed versus quality, but speed versus quality versus memory versus operational complexity.
To-do list
Learn
- Understand the difference between a vector index and a full vector database.
- Learn when cosine similarity, dot product, and L2 distance are appropriate.
- Study HNSW, IVF, and PQ at the intuition level and know their main trade-offs.
- Learn why filtering, durability, replication, and freshness matter in production.
Practice
- Load a small collection into a local vector database and test multiple similarity metrics.
- Benchmark exact search against ANN search on a sampled evaluation set.
- Measure the effect of metadata filters on latency and returned candidates.
- Compare memory usage for an HNSW-style setup versus a compressed index setup.
Build
- Build a similarity search service with CRUD support for vector records.
- Add namespaces or tenant IDs and verify isolation in queries.
- Create a benchmark script that tracks recall@k and p95 latency together.
- Design a schema for metadata filters that would hold up under production growth.