nirmata.dev — Build. Think. Go Deep.

A vector database is not only a place where embeddings are stored. In a RAG system, it also becomes part of the product's latency, reliability, privacy, and cloud bill. That is why the deployment model matters.

The question is simple: who is responsible for the machine?If you answer that honestly, serverless, managed, and self-hosted vector databases become much easier to reason about.

First, What Does a Vector Database Need?

At minimum, a vector database has to store vectors, store metadata, build an index for similarity search, and answer nearest-neighbor queries quickly. In real production systems, it also needs backups, monitoring, scaling, access control, upgrades, and incident handling.

The database feature is only half the story. The operational surface is the other half.

Self-Hosted Vector Database

Self-hosted means you run the vector database yourself. For example, you may pull a Qdrant, Milvus, or Weaviate Docker image and deploy it on your own VM, Kubernetes cluster, ECS service, or bare-metal server.

What does the Docker pull give you?

It gives you the packaged database program and the dependencies needed to run it. It does not give you the hardware, production configuration, scaling strategy, backup policy, or monitoring setup. Those decisions are still yours.

With self-hosting, you are responsible for things like:

choosing CPU, RAM, disk, and network capacity
deciding whether the HNSW graph should live in memory
putting embeddings on fast SSD storage when needed
taking snapshots and saving them somewhere safe, like S3
handling OOM crashes, restarts, upgrades, and failover
building autoscaling and routing logic if traffic grows

Self-hosting is powerful when you need data privacy, deep control, or serious cost reduction at scale. If you have hundreds of millions of vectors, managed cloud pricing can become painful. Running your own cluster on carefully chosen machines can be dramatically cheaper, but only if you have the engineering team to operate it.

Managed Vector Database

Managed means the vendor runs a pre-configured vector database cluster for you. You still choose resources such as storage, RAM, CPU, pods, replicas, or instance size, but the provider handles many internal operations like provisioning, patching, and base-level reliability.

Scaling is usually not magic here. You often still turn knobs in a UI or API: increase replicas, change pod size, add storage, tune an index, or move to a larger tier. The provider gives you the controls, but you still need to understand when and why to use them.

Managed databases are a good middle path. You get much less DevOps work than self-hosting, but more control and predictability than serverless. If traffic increases, you can add resources. If cost matters, you can queue requests and accept slightly higher latency instead of scaling immediately.

Serverless Vector Database

Serverless means the provider hides almost all infrastructure decisions. You create a collection or index, upload vectors, send queries, and pay for usage. You do not think much about machines, replicas, restarts, or capacity planning.

This is excellent when you are starting out, traffic is spiky, or you want low operational effort. A small team can move fast because nobody has to babysit the database cluster.

The tradeoff is cost and control. At high scale, serverless can become expensive. You may also hit concurrency limits, throttling, noisy neighbor effects, or cold-start-like latency depending on the provider and workload.

The Practical Difference

Model	You manage	Best for
Serverless	data, queries, basic settings	early products, spiky traffic, fastest iteration
Managed	sizing, scaling knobs, cost-latency tradeoffs	growing products that need control without full ops
Self-hosted	infrastructure, scaling, backups, upgrades, incidents	privacy, massive scale, custom resource allocation

How I Would Choose

If you are building the first version of a RAG product, start with serverless or managed. The goal is to learn whether retrieval quality, chunking, and user experience are working. Infrastructure optimization too early can slow you down.

Move toward managed when traffic becomes predictable and you care about cost, latency, and capacity planning. Managed gives you room to tune the system without becoming a database operations team.

Move toward self-hosted when the cost curve or privacy requirements justify it. For example, if you have many customer datasets and want to isolate them logically while still sharing one efficient resource pool, self-hosting can give you much better control. But it only makes sense when the savings clearly beat the cost of hiring and maintaining the operations expertise.

The real decision is not "which vector database is best?" It is "which operational compromise is best for this stage of the product?"