Running Milvus in Production: Why Most Teams Eventually Stop Self-Hosting It#
Milvus has the most GitHub stars of any open-source vector database — over 30,000 — and for good reason. It handles billions of vectors, supports GPU-accelerated search, and scales horizontally in ways that Qdrant, Chroma, and Weaviate simply don't match at the top end.
But Milvus is also the most operationally demanding vector database you can choose to run yourself. Before your first query executes, you're managing three separate infrastructure dependencies — etcd, MinIO, and Pulsar — each of which can fail independently and take your Milvus instance down with it.
This guide is an honest look at what running Milvus in production actually involves: the architecture, the real infrastructure requirements, the most common failure points, and how to decide whether self-hosting is worth it for your team right now.
Why Milvus exists and what it's genuinely good at#
Milvus was built from day one as a distributed system designed for scale. It separates storage, metadata, message queuing, and compute into independent layers — each of which can scale independently. That's architectural elegance, and at hundreds of millions of vectors it pays for itself.
Milvus's architecture separates storage (MinIO/S3), metadata (etcd), message queuing (Pulsar/Kafka), and query and index nodes — each layer scaling independently. That architecture is overkill for 1 million vectors. It is exactly right for 1 billion.
The feature set reflects this:
Multiple index types — IVF_FLAT, HNSW, DiskANN, GPU-accelerated IVF_PQ — more index options than any other open-source vector DB
GPU-accelerated search — native NVIDIA GPU support for sub-millisecond latency at scale
Billion-scale proven — deployed in production at Salesforce, PayPal, and companies running recommendation systems at consumer scale
Streaming inserts — WAL-based architecture handles concurrent reads and writes cleanly, important for real-time indexing pipelines
Partition keys — logical data isolation within a single collection, useful for multi-tenant RAG applications
If you're building at scale and your dataset will exceed 50–100 million vectors, Milvus is probably the right choice. The question is whether you should self-host it.
The architecture you're signing up for#
Most vector database tutorials show you this:
bash
wget https://github.com/milvus-io/milvus/releases/download/v3.0-beta/milvus-standalone-docker-compose.yml -O docker-compose.yml
docker compose up -dRun that and you get three containers, not one:
milvus-etcd Up 2379/tcp, 2380/tcp
milvus-minio Up 9000/tcp
milvus-standalone Up 0.0.0.0:19530->19530/tcpMilvus uses etcd as its metadata and coordination service, and MinIO as its S3-compatible object storage layer to persist embeddings, index files generated from the vectors, and segmentation data.
That's just standalone mode — the simplest Milvus deployment. In distributed mode (the one you actually want for production at scale), the dependency list expands:
| Component | Role | Can it fail independently? |
|---|---|---|
| etcd | Metadata store, coordination, service discovery | Yes — cluster elections, disk I/O |
| MinIO / S3 | Segment storage, index files, embeddings | Yes — storage quota, auth, network |
| Pulsar / Kafka | Internal message queue, WAL | Yes — topic backlogs, OOM |
| QueryNode | Serves search queries from memory | Yes — OOM, segment loading |
| DataNode | Handles data ingestion | Yes — write throughput bottleneck |
| IndexNode | Builds vector indexes | Yes — CPU/GPU saturation |
| Proxy | Routes requests across components | Yes — single point if not replicated |
Each box in that table is something you maintain, monitor, and recover when it goes wrong.
The self-hosting reality: what nobody tells you upfront#
etcd is the most fragile piece#
Disk performance is critical to etcd. It is highly recommended that you use local NVMe SSDs. Slower disk response may cause frequent cluster elections that will eventually degrade the etcd service. Ideally, your disk dedicated to etcd should reach over 500 IOPS and below 10ms for the 99th percentile fsync latency.
In practice: if you put Milvus standalone on a general-purpose VPS with a shared HDD or even a slow SSD, etcd will periodically lose elections. Milvus goes read-only. Queries still work. Writes silently fail. You find out when your vector count stops growing and you spend an afternoon reading etcd logs.
Test your disk before deploying:
bash
mkdir test-data
fio --rw=write --ioengine=sync --fdatasync=1 \
--directory=test-data --size=2200m \
--bs=2300 --name=etcd-disk-testIf your p99 fsync latency is above 10ms, use a different disk or switch to managed.
MinIO costs add up quietly#
At 100M vectors with 1536 dimensions, your MinIO storage grows to 1–2TB. S3 costs are minimal at around $23 per TB per month, but the GET and PUT operations during compaction and segment loading add $10–50 per month on top. That's still manageable — but it's invisible until your cloud bill arrives.
Version upgrades are a planned event#
Milvus has a rapid release cycle. Upgrading a distributed system with etcd schema changes, storage format changes, and API changes requires planning and testing.
On a managed Kubernetes deployment, this means coordinating Helm chart updates, draining nodes, migrating etcd schema, and verifying that your MinIO data format is compatible with the new version. On a standalone Docker setup, it's slightly simpler but still involves downtime if you're not running a standby.
The Kubernetes operator helps — but is its own learning curve#
The Milvus Operator is the right way to run Milvus on Kubernetes in 2026. It turns Milvus into a first-class Kubernetes resource and handles things like scaling query nodes, rolling upgrades, and recovering from partial failures automatically.
yaml
apiVersion: milvus.io/v1beta1
kind: Milvus
metadata:
name: my-milvus
spec:
mode: cluster
dependencies:
etcd:
inCluster:
values:
replicaCount: 3
persistence:
size: 10Gi
storage:
type: s3
s3:
bucket: milvus-data
endpoint: s3.amazonaws.com
useSSL: true
components:
queryNode:
replicas: 3
dataNode:
replicas: 2
indexNode:
replicas: 2The Milvus Operator turns Milvus into a first-class Kubernetes resource. Instead of managing dozens of Deployments and StatefulSets yourself, you define a single custom resource. The operator understands how the pieces relate and enforces correct ordering and health checks.
That's genuinely better than raw Helm. But you still need a Kubernetes cluster, a configured StorageClass, working S3-compatible storage, and someone who knows what to do when a QueryNode goes OOM at 2 AM.
Hardware sizing for Milvus#
Milvus Standalone operates as a single-node deployment suitable for datasets up to 100 million vectors, while a distributed cluster deployment uses etcd for metadata, MinIO or S3 for object storage, and Pulsar or Kafka for log streaming.
Memory requirements by vector count (1536 dimensions, unquantized):
| Vectors | RAM (QueryNode) | etcd disk | MinIO storage |
|---|---|---|---|
| 1M | 8–12 GB | 10 GB NVMe | ~12 GB |
| 10M | 64–80 GB | 20 GB NVMe | ~120 GB |
| 100M | 640 GB+ | 50 GB NVMe | ~1.2 TB |
| 1B | Distributed cluster | 100 GB NVMe ×3 | ~12 TB |
Shard configuration rule of thumb from official Milvus docs: use 1 shard per 100 million vectors for read-heavy workloads, 4 shards per 100 million vectors for write-heavy workloads.
Milvus configuration for production#
Milvus configuration breaks into three layers: dependency component configurations for external services like etcd, MinIO, and message queue — critical for cluster setup and data persistence; internal component configurations for Milvus's internal architecture — key for performance tuning; and functional configurations covering security, logging, and resource limits — important for production deployments.
Key production settings in milvus.yaml:
yaml
etcd:
endpoints:
- etcd-0:2379
- etcd-1:2379
- etcd-2:2379
rootPath: milvus-prod # Unique per cluster — prevents data collisions
minio:
address: s3.amazonaws.com
port: 443
accessKeyID: ${AWS_ACCESS_KEY}
secretAccessKey: ${AWS_SECRET_KEY}
useSSL: true
bucketName: your-milvus-bucket
rootPath: milvus-prod
cloudProvider: aws
# Use Pulsar for distributed mode
pulsar:
address: pulsar-broker
port: 6650
# Security
common:
security:
authorizationEnabled: true
proxy:
http:
enabled: true
debugMode: false
# Query performance
queryNode:
segcore:
chunkRows: 128 # Reduce for lower memory, increase for throughput
cache:
memoryLimit: 0.8 # QueryNode uses 80% of available RAM for segment cacheCritical: always set a unique etcd.rootPath and minio.rootPath per Milvus cluster. Without it, two Milvus instances sharing the same etcd or MinIO will corrupt each other's data silently.
When self-hosting Milvus actually makes sense#
Be honest with yourself before committing to the ops overhead. Self-hosting Milvus is the right call when:
Your dataset exceeds 50–100 million vectors — below this, Qdrant or Weaviate are simpler and cheaper
You have dedicated Kubernetes expertise on the team — someone who owns the cluster day-to-day
You need GPU-accelerated search — Milvus's native GPU index support is unique among OSS vector DBs
You have compliance requirements that mandate on-premise or specific jurisdiction data residency
Your managed Milvus bill exceeds ~$500/month and your team has the bandwidth to absorb the ops work
The simple rule: if your monthly managed Milvus bill exceeds $500 and your team has Kubernetes expertise, evaluate self-hosting. Below $500 per month, the managed convenience is almost always worth it.
When you should strongly consider managed Milvus instead#
Self-hosting is an optimization, not a default. Consider managed when:
You're at the MVP or early traction stage — every hour on infra is an hour not shipping product
Your team doesn't have Kubernetes experience — Milvus distributed on K8s is not beginner infrastructure
Your dataset is under 10–20 million vectors — the architectural complexity isn't justified yet
You want predictable costs — etcd disk failures, MinIO egress, and Pulsar backlogs all create unpredictable cost spikes on self-hosted
You need automated backups and point-in-time restore without building them yourself
The production checklist for self-hosted Milvus#
If you're going ahead with self-hosting, verify every item before you call it production-ready:
NVMe SSDs for etcd — tested with
fio, p99 fsync latency under 10msetcd running as a 3-node cluster, not embedded single-node
MinIO / S3 backend configured with unique
rootPathper clusterPulsar or Kafka configured as external message queue (not embedded RocksDB for production)
Authentication enabled —
authorizationEnabled: trueTLS on the Milvus proxy endpoint — port 19530 not exposed directly
QueryNode memory limit set (
cache.memoryLimit)Shard count configured for your vector count and workload type
Milvus Operator deployed if running on Kubernetes
Automated backup schedule for MinIO buckets
Prometheus metrics scraped from Milvus metrics endpoint
Uptime monitor on Milvus proxy healthcheck endpoint with alerting
(For the uptime monitoring piece — if you don't have that in place yet, here's a 2-minute setup guide for HTTP and TCP monitoring with email and webhook alerts.)
Managed Milvus: what to look for#
If you're evaluating managed Milvus options, the things that matter in practice:
Does it abstract etcd and MinIO entirely? — or do you still configure them manually
Automated backups — are they daily and self-serve restorable, or do you raise a support ticket
Scaling without migration — can you scale vertically or add nodes without moving data
Free TLS and endpoint — production Milvus should always be HTTPS, not an exposed port
Monitoring included — you shouldn't need a separate observability stack just to know if Milvus is up
Other vector DBs on the same platform — if you're prototyping on Chroma and moving to Milvus for production, it's cleaner to do both on one platform
The managed path on Antryk#
Antryk deploys fully managed Milvus — along with Qdrant, Weaviate, and Chroma — in a single click. No etcd configuration, no MinIO buckets to provision, no Pulsar setup, no Kubernetes cluster required.
What's handled for you:
One-click deployment — pick Milvus, choose your plan, get a live HTTPS endpoint with API credentials in under two minutes
etcd, MinIO, and message queue — fully managed, no configuration needed, no individual component failures to chase
Automated daily backups — snapshots stored off-instance with one-click restore from the dashboard
Free TLS on every endpoint — no reverse proxy setup, no Certbot, your Milvus instance is HTTPS out of the box
Built-in monitoring — health, latency, and uptime tracked automatically without a separate Prometheus stack
Vertical scaling from the dashboard — upgrade your plan as your vector count grows without touching a config file
All four vector databases, one platform — if you started prototyping on Chroma or need Qdrant alongside Milvus for a different use case, it's all on the same dashboard, same billing
For teams that haven't crossed the self-hosting break-even point yet — or simply want to ship their AI features without owning a distributed infrastructure stack — Antryk's managed vector database plans cover Milvus at a cost that makes sense well before the $500/month threshold where self-hosting becomes worth the tradeoff.
Milvus vs alternatives: picking the right tool#
Milvus isn't the right answer for every team. Here's a quick honest comparison:
| Milvus | Qdrant | Weaviate | Chroma | |
|---|---|---|---|---|
| Best at | Billion-scale, GPU search | Fast filtered search, Rust performance | Hybrid search, modules | Prototyping, local dev |
| Self-hosting complexity | High (etcd + MinIO + Pulsar) | Low (single binary) | Medium (modules config) | Very low |
| RAG use cases | Large-scale pipelines | Production RAG | Knowledge graph + RAG | Early-stage RAG |
| Under 10M vectors | Overkill | Ideal | Good | Fine |
| Over 100M vectors | Purpose-built | Works | Works | Not recommended |
If your team is still in the 1–20 million vector range and building a RAG application, Qdrant is worth evaluating first — it's architecturally simpler to operate and performs excellently at that scale. Come back to Milvus when you hit the limits.
Bottom line#
Milvus is an exceptional piece of software. It scales to places other vector databases can't reach, and its index variety and GPU support are genuinely unmatched in open source.
But it comes with a real ops bill — etcd, MinIO, Pulsar, QueryNodes, DataNodes, IndexNodes — all before your application writes its first vector. That infrastructure makes sense when your scale demands it and your team can own it. Before that point, you're paying an ops tax you haven't earned yet.
Start with the right tool for your current scale. Add Milvus's complexity when the alternative starts hurting your performance requirements — not before.
If you want Milvus in production today without the infrastructure overhead, managed Milvus on Antryk handles all of it — deploy in under two minutes →
Priyanka K
Cloud Infrastructure Engineer
Priyanka has a background in backend engineering and cloud infrastructure. She's spent the last five years helping early-stage startups make smarter infrastructure decisions — without overcomplicating things. When she's not writing, she's probably arguing about database indexing strategies or breaking something in a staging environment. She believes good infrastructure should be invisible, and your weekend should stay yours.
ntryk
