Vector Databases

Beginner

What problem does a vector database solve?

A vector database stores vectors and answers similarity questions such as "Which stored items are closest to this query vector?" The challenge is scale: comparing a query against every stored vector becomes too slow once you have hundreds of thousands or millions of records. Vector databases solve that by building indexes that avoid brute-force scans while still returning highly relevant neighbors.

Each record usually contains an ID, a dense vector, optional metadata, and a pointer to source data.
The query is converted into a vector using the same representation scheme as the stored records.
The database returns the nearest neighbors according to a similarity function such as cosine, dot product, or Euclidean distance.
Metadata filters narrow the candidate set, for example by tenant, language, product line, or time range.

Vector index vs. vector database

Term	Main job	Typical gap
Vector index	Accelerate nearest-neighbor search over vectors	Usually does not handle filtering, durability, replication, or access control alone
Vector database	Operate vector search as a full data system	More moving parts, configuration, and operational trade-offs

FAISS is a classic example of a vector index library. Systems such as Pinecone, Qdrant, Weaviate, Milvus, and pgvector-backed deployments add database capabilities around indexing so applications can manage vector data over time instead of treating search as a one-off algorithm.

Application record
    -> id
    -> vector
    -> metadata
    -> source pointer

Stored in vector database
    -> vector index for similarity search
    -> metadata store for filters
    -> storage/replication for persistence

Simple record structure

records = [
    {
        "id": "doc-001",
        "vector": [0.12, 0.55, -0.20],
        "metadata": {"tenant": "acme", "topic": "warranty", "year": 2025},
        "payload": {"title": "Warranty policy"}
    },
    {
        "id": "doc-002",
        "vector": [0.10, 0.51, -0.18],
        "metadata": {"tenant": "acme", "topic": "returns", "year": 2025},
        "payload": {"title": "Returns policy"}
    }
]

query_vector = [0.11, 0.52, -0.19]

# The database uses an index to find likely nearest neighbors quickly.

Intermediate

Similarity metrics

Nearest-neighbor search depends on how closeness is defined. The metric must match the assumptions of the embedding model and the index configuration.

Metric	Idea	When it is common
Cosine similarity	Compare angle between vectors	Text search with normalized embeddings
Dot product	Reward both alignment and magnitude	Models trained with inner-product objectives
Euclidean distance (L2)	Measure straight-line distance	Vision or geometric feature spaces

Important: A mismatch between model training objective and database metric can quietly hurt retrieval quality even when the index itself is fast.

Why approximate nearest neighbor search is necessary

Exact search checks every vector, which is feasible for small datasets but expensive for large collections. Approximate nearest neighbor (ANN) methods search only promising regions of the space, trading a small amount of recall for major latency and throughput gains.

Index family	How it works	Main trade-off
HNSW	Navigable graph of neighbors across multiple layers	Excellent recall/latency, but memory-heavy
IVF	Cluster vectors, then search only selected clusters	Fast and scalable, but needs good partitioning
IVF-PQ / PQ	Compress vectors into short codes	Lower memory usage, some precision loss
LSH	Hash similar vectors into the same buckets	Very fast for some workloads, less common in modern text search

The key operating knobs differ by index. For HNSW, search breadth affects recall and latency. For IVF, the number of coarse clusters and probes matters. For PQ, codebook size and compression ratio matter. Good teams benchmark these choices rather than assuming one default fits all datasets.

Filtering and data layout

Vector search almost never runs on vectors alone. Real applications need filters such as tenant separation, language, content type, freshness windows, or authorization tags. That means the database must coordinate vector search with structured filtering.

Concern	Why it matters	Typical lever
Recall	Missing close neighbors makes the search system unreliable	Index tuning, candidate expansion, exact re-check
Latency	Search must fit interactive SLAs	ANN parameters, caching, shard layout
Filtering	Users often need tenant, security, or time scoping	Metadata schema, pre-filter vs post-filter strategy
Updates	Knowledge changes over time	Upsert, background compaction, freshness layer

upsert_request = {
    "points": [
        {
            "id": "faq-17",
            "vector": [0.32, -0.14, 0.88],
            "metadata": {"tenant": "acme", "region": "us", "status": "published"}
        }
    ]
}

search_request = {
    "vector": [0.30, -0.10, 0.90],
    "top_k": 5,
    "filter": {"tenant": "acme", "status": "published"},
    "include_metadata": True
}

Write path
    application -> validate schema -> store vector + metadata -> update index

Read path
    query vector -> apply filters -> ANN candidate search -> score/rerank -> return top-k

Advanced

Production architecture concerns

A serious vector database is not just an index in memory. It must keep data durable, searchable after restarts, and responsive under changing traffic. That introduces classic distributed systems concerns on top of ANN search.

Sharding: Partition data across nodes so the collection can grow beyond a single machine.
Replication: Keep copies of data for fault tolerance and higher read throughput.
Consistency: Decide how quickly writes must become visible across replicas.
Freshness: Handle newly inserted vectors quickly even if the main index needs slower rebuild or compaction work.
Multi-tenancy: Prevent one tenant's scale or hot traffic from degrading another tenant's queries.
Access control: Restrict which tenants, users, or services can read specific vector collections.

What to measure

Benchmarking vector databases requires both search quality and systems metrics. Fast answers are not useful if the nearest neighbors are wrong, and accurate answers are not useful if p95 latency breaks your SLA.

Metric	What it tells you	Common failure signal
Recall@k	Whether ANN search is finding the right neighbors	Relevant items disappear when index parameters are tightened
p95 / p99 latency	Tail responsiveness under realistic traffic	Queries occasionally spike far above average
Write freshness	How long new vectors take to become searchable	Recent updates cannot be found for seconds or minutes
Filter selectivity	How restrictive metadata filters are	Query cost jumps when filters are broad or highly skewed
Memory per million vectors	Infrastructure efficiency of the chosen index	HNSW or uncompressed storage becomes too expensive

Common failure modes

Using the wrong metric: Cosine, dot product, and L2 are not interchangeable unless the embedding setup makes them equivalent.
Ignoring filter design: Poor metadata schema can make filtered search much slower than unfiltered search.
Assuming low average latency is enough: Tail latency usually matters more than the mean for user-facing systems.
Over-compressing too early: Aggressive quantization or PQ can save memory while silently hurting recall.
No ground-truth benchmark: Without exact-search comparisons on a sample set, ANN tuning becomes guesswork.

Selection checklist

If your workload needs:
    highest recall with enough RAM -> HNSW is often a strong default
    lower memory footprint at larger scale -> consider IVF/PQ variants
    strict tenant isolation -> prioritize namespaces, ACLs, and filter performance
    frequent writes -> verify upsert cost and freshness guarantees
    low-ops deployment -> managed/serverless offerings may matter more than raw ANN speed

Exam framing: Vector databases are best understood as ANN search systems plus database operations. The important trade-off is not just speed versus quality, but speed versus quality versus memory versus operational complexity.

To-do list

Learn

Understand the difference between a vector index and a full vector database.
Learn when cosine similarity, dot product, and L2 distance are appropriate.
Study HNSW, IVF, and PQ at the intuition level and know their main trade-offs.
Learn why filtering, durability, replication, and freshness matter in production.

Practice

Load a small collection into a local vector database and test multiple similarity metrics.
Benchmark exact search against ANN search on a sampled evaluation set.
Measure the effect of metadata filters on latency and returned candidates.
Compare memory usage for an HNSW-style setup versus a compressed index setup.

Build

Build a similarity search service with CRUD support for vector records.
Add namespaces or tenant IDs and verify isolation in queries.
Create a benchmark script that tracks recall@k and p95 latency together.
Design a schema for metadata filters that would hold up under production growth.