pgvector vs the Vector DB Gold Rush

Three vector databases raised this week. I lost count of the launch posts. Each comes with a benchmark where it wins, a logo wall, and a tutorial that has you running new infrastructure before you have written a single eval. The pitch is always the same: embeddings are the future, and you need a real database built for them.

You probably don’t. Not yet.

Most teams reaching for a vector database this year are adding semantic search or a RAG feature to a product that already runs on Postgres. They have a few hundred thousand chunks, maybe a couple million. One team, a few on-call engineers, a finite appetite for new systems to babysit. For that team, the honest answer is to put the vectors in the Postgres you already operate, with the pgvector extension, and not think about a dedicated store until the numbers force you to.

I have been on both sides of this line. On a marketplace I led, the search workloads eventually justified specialized infrastructure, and we built it. But the first version of nearly every embedding feature I have shipped started in Postgres, and a surprising number never had a good reason to leave.

What pgvector actually is

It is a Postgres extension that adds a vector column type and a couple of index types for nearest-neighbor search. That’s it. Your embeddings live in a normal table, next to the columns you already have: the document id, the tenant id, the created_at, the soft-delete flag. You query them with SQL.

That last part is the whole argument, so let me make it concrete.

create extension if not exists vector;

create table doc_chunks (
    id          bigserial primary key,
    document_id bigint      not null references documents(id),
    tenant_id   bigint      not null,
    chunk_text  text        not null,
    embedding   vector(1536),          -- ada-002 dims; change if you swap models
    created_at  timestamptz not null default now()
);

-- IVFFlat, not exact search. ANN is a recall/speed trade, not a bug.
-- Build this AFTER you have a meaningful amount of data, not on an empty table.
create index on doc_chunks
    using ivfflat (embedding vector_cosine_ops)
    with (lists = 100);

And the query that everyone writes a dedicated database to do:

-- `<=>` is cosine distance. Smaller is closer.
-- Note the tenant filter sits right next to the vector search,
-- in the same WHERE, hitting the same table. That's the point.
select id, document_id, chunk_text,
       embedding <=> $1 as distance
from doc_chunks
where tenant_id = $2
order by embedding <=> $1
limit 8;

There is no second system in that query. No copy of the data kept in sync. No “now go look up the row in Postgres” round trip after the vector search hands you back a list of ids. The tenant filter, the join back to documents, the created_at you sort by when two chunks tie, all of it is right there in the database that already holds your source of truth.

Here is the part nobody tells you in the launch posts. The hard problem in production RAG is almost never raw nearest-neighbor speed. It is filtered search: “the closest chunks that this user is allowed to see, from documents not archived, in this workspace.” A dedicated vector store treats your metadata as a second-class citizen bolted onto the side of the index. Postgres treats it as the main event, because that is what Postgres is.

The tradeoff, stated honestly

A dedicated vector database will give you better recall at a given latency once your index gets large, and it will hold that under heavy concurrent query load better than Postgres will. That is real. The HNSW and IVF implementations in the purpose-built stores are tuned harder than what ships in an extension, and they were designed for this one job.

The cost is a system. A new datastore to provision, secure, back up, monitor, version, and keep in sync with your primary database. Every embedding update now has two write paths and a chance to drift. Every “why is this document showing up for the wrong user” bug now spans two systems instead of one. That sync layer is unglamorous, it is where the incidents come from, and nobody benchmarks it.

So the question is not “which is faster.” At small scale they are both fast enough that your network and your LLM call dominate the latency budget anyway. The question is whether you are at the scale where the recall-under-load gap is large enough to be worth operating a second system to close.

For most teams, in 2023, the answer is no. A few million vectors with an IVFFlat index, behind a sane filter, returns in single-digit to low-double-digit milliseconds on hardware you are already paying for. Your embedding API call takes longer than the search does. Optimizing the search at that point is optimizing the wrong thing.

Where pgvector starts to hurt

I am not selling pgvector as the answer forever. It has edges, and you should know where they are before you commit, because nothing is worse than discovering them in an incident.

Index builds are not free and not online in the way you want. IVFFlat needs to see data to pick its cluster centroids, so you build the index after loading, and rebuilding on a large table locks you up for a while. The lists parameter is a real tuning knob: too few and you scan too much, too many and recall drops because each list holds too little. There is no autopilot. (I have watched a team set lists = 1000 on a table of 50,000 rows and wonder why recall fell off a cliff. Each list had fifty vectors in it.)

Recall is a dial you set, not a property you get. With IVFFlat you tune probes at query time, trading latency for how many lists you actually search. Set it once during a calm afternoon and it will quietly degrade as your data grows and your distribution shifts. You have to watch it.

And Postgres is one box, mostly. You can read-replica the search load, but the vectors and the index live with everything else competing for the same buffer cache and the same CPU. A heavy embedding workload sharing a database with your transactional traffic is a noisy-neighbor problem waiting to happen. At some volume, isolation alone justifies moving the vectors out, separate from any recall argument.

When the dedicated DB genuinely earns it

There is a real line, and crossing it is not a failure. Reach for a purpose-built vector database when:

You are past tens of millions of vectors and growing, and the IVFFlat recall-versus-latency curve has stopped being good enough no matter how you tune it.
Filtered ANN is your hot path and you need the filter applied during the search, not as a pre-scan or a post-filter that throws away most of your candidates. The good dedicated stores do this well, and at high cardinality it matters.
You need an index type or a quantization scheme that the extension does not have, because memory at your vector count has become the binding constraint.
The search workload is large and bursty enough that isolating it from your primary database is worth a system on its own.

None of those are about hype. They are about a workload that has outgrown the convenient answer. When you hit them, the dedicated database is the right call, and the engineers who built those products built them for exactly this. Be fair to them: they are solving a genuinely hard problem at the top end. Most teams are just not at the top end yet, and adopting the top-end solution on day one buys you the operational cost without the workload that justifies it.

How to actually decide

Start in Postgres. Add the extension, write the schema above, ship the feature. Keep a few hundred real queries with answers you trust, and measure two things on every change: recall against those queries, and p95 latency under something like your real concurrency. Not the vendor’s benchmark. Yours, on your data, with your filters.

Then let the numbers move you. When recall at acceptable latency stops being achievable in Postgres after honest tuning, or when the search load starts hurting your transactional traffic, that is your signal. Migrate then. You will know what your workload looks like, what filters you actually run, and what recall you actually need, which means you will pick the right dedicated store instead of the loudest one.

The gold rush is selling you the destination before you have measured the trip. Run the trip first. Most of you will find the boring extension on the database you already have does the job, and one fewer system to operate is worth more than the benchmark you were never going to hit anyway.