ntryk
Antryk
ProductsDocsPricingBlogsAboutContact

Antryk

Antryk

Deploy in seconds, scale infinitely, and manage effortlessly — your next-gen cloud solution for modern applications.

Explore

    HomeProductsBlogsTerms of UsePrivacy PolicyPayment & Refund PolicyAbout UsContact Us

Connect

© 2026 Antryk. All rights reserved.
Back to Blogs
Vector Databases12 min read

Running Milvus in Production: Why Most Teams Eventually Stop Self-Hosting It

Milvus is the most powerful open-source vector database and one of the most complex to run. Before you go down the self-hosting rabbit hole, read this.

Priyanka K
June 28, 2026
running-milvus-in-production-why-most-teams-eventually-stop-self-hosting-it-image

Running Milvus in Production: Why Most Teams Eventually Stop Self-Hosting It#

Milvus has the most GitHub stars of any open-source vector database — over 30,000 — and for good reason. It handles billions of vectors, supports GPU-accelerated search, and scales horizontally in ways that Qdrant, Chroma, and Weaviate simply don't match at the top end.

But Milvus is also the most operationally demanding vector database you can choose to run yourself. Before your first query executes, you're managing three separate infrastructure dependencies — etcd, MinIO, and Pulsar — each of which can fail independently and take your Milvus instance down with it.

This guide is an honest look at what running Milvus in production actually involves: the architecture, the real infrastructure requirements, the most common failure points, and how to decide whether self-hosting is worth it for your team right now.

Why Milvus exists and what it's genuinely good at#

Milvus was built from day one as a distributed system designed for scale. It separates storage, metadata, message queuing, and compute into independent layers — each of which can scale independently. That's architectural elegance, and at hundreds of millions of vectors it pays for itself.

Milvus's architecture separates storage (MinIO/S3), metadata (etcd), message queuing (Pulsar/Kafka), and query and index nodes — each layer scaling independently. That architecture is overkill for 1 million vectors. It is exactly right for 1 billion.

The feature set reflects this:

  • Multiple index types — IVF_FLAT, HNSW, DiskANN, GPU-accelerated IVF_PQ — more index options than any other open-source vector DB

  • GPU-accelerated search — native NVIDIA GPU support for sub-millisecond latency at scale

  • Billion-scale proven — deployed in production at Salesforce, PayPal, and companies running recommendation systems at consumer scale

  • Streaming inserts — WAL-based architecture handles concurrent reads and writes cleanly, important for real-time indexing pipelines

  • Partition keys — logical data isolation within a single collection, useful for multi-tenant RAG applications

If you're building at scale and your dataset will exceed 50–100 million vectors, Milvus is probably the right choice. The question is whether you should self-host it.

The architecture you're signing up for#

Most vector database tutorials show you this:

bash

bash
wget https://github.com/milvus-io/milvus/releases/download/v3.0-beta/milvus-standalone-docker-compose.yml -O docker-compose.yml
docker compose up -d

Run that and you get three containers, not one:

plaintext
milvus-etcd       Up    2379/tcp, 2380/tcp
milvus-minio      Up    9000/tcp
milvus-standalone Up    0.0.0.0:19530->19530/tcp

Milvus uses etcd as its metadata and coordination service, and MinIO as its S3-compatible object storage layer to persist embeddings, index files generated from the vectors, and segmentation data.

That's just standalone mode — the simplest Milvus deployment. In distributed mode (the one you actually want for production at scale), the dependency list expands:

ComponentRoleCan it fail independently?
etcdMetadata store, coordination, service discoveryYes — cluster elections, disk I/O
MinIO / S3Segment storage, index files, embeddingsYes — storage quota, auth, network
Pulsar / KafkaInternal message queue, WALYes — topic backlogs, OOM
QueryNodeServes search queries from memoryYes — OOM, segment loading
DataNodeHandles data ingestionYes — write throughput bottleneck
IndexNodeBuilds vector indexesYes — CPU/GPU saturation
ProxyRoutes requests across componentsYes — single point if not replicated

Each box in that table is something you maintain, monitor, and recover when it goes wrong.

The self-hosting reality: what nobody tells you upfront#

etcd is the most fragile piece#

Disk performance is critical to etcd. It is highly recommended that you use local NVMe SSDs. Slower disk response may cause frequent cluster elections that will eventually degrade the etcd service. Ideally, your disk dedicated to etcd should reach over 500 IOPS and below 10ms for the 99th percentile fsync latency.

In practice: if you put Milvus standalone on a general-purpose VPS with a shared HDD or even a slow SSD, etcd will periodically lose elections. Milvus goes read-only. Queries still work. Writes silently fail. You find out when your vector count stops growing and you spend an afternoon reading etcd logs.

Test your disk before deploying:

bash

bash
mkdir test-data
fio --rw=write --ioengine=sync --fdatasync=1 \
    --directory=test-data --size=2200m \
    --bs=2300 --name=etcd-disk-test

If your p99 fsync latency is above 10ms, use a different disk or switch to managed.

MinIO costs add up quietly#

At 100M vectors with 1536 dimensions, your MinIO storage grows to 1–2TB. S3 costs are minimal at around $23 per TB per month, but the GET and PUT operations during compaction and segment loading add $10–50 per month on top. That's still manageable — but it's invisible until your cloud bill arrives.

Version upgrades are a planned event#

Milvus has a rapid release cycle. Upgrading a distributed system with etcd schema changes, storage format changes, and API changes requires planning and testing.

On a managed Kubernetes deployment, this means coordinating Helm chart updates, draining nodes, migrating etcd schema, and verifying that your MinIO data format is compatible with the new version. On a standalone Docker setup, it's slightly simpler but still involves downtime if you're not running a standby.

The Kubernetes operator helps — but is its own learning curve#

The Milvus Operator is the right way to run Milvus on Kubernetes in 2026. It turns Milvus into a first-class Kubernetes resource and handles things like scaling query nodes, rolling upgrades, and recovering from partial failures automatically.

yaml

yaml
apiVersion: milvus.io/v1beta1
kind: Milvus
metadata:
  name: my-milvus
spec:
  mode: cluster
  dependencies:
    etcd:
      inCluster:
        values:
          replicaCount: 3
          persistence:
            size: 10Gi
    storage:
      type: s3
      s3:
        bucket: milvus-data
        endpoint: s3.amazonaws.com
        useSSL: true
  components:
    queryNode:
      replicas: 3
    dataNode:
      replicas: 2
    indexNode:
      replicas: 2

The Milvus Operator turns Milvus into a first-class Kubernetes resource. Instead of managing dozens of Deployments and StatefulSets yourself, you define a single custom resource. The operator understands how the pieces relate and enforces correct ordering and health checks.

That's genuinely better than raw Helm. But you still need a Kubernetes cluster, a configured StorageClass, working S3-compatible storage, and someone who knows what to do when a QueryNode goes OOM at 2 AM.

Hardware sizing for Milvus#

Milvus Standalone operates as a single-node deployment suitable for datasets up to 100 million vectors, while a distributed cluster deployment uses etcd for metadata, MinIO or S3 for object storage, and Pulsar or Kafka for log streaming.

Memory requirements by vector count (1536 dimensions, unquantized):

VectorsRAM (QueryNode)etcd diskMinIO storage
1M8–12 GB10 GB NVMe~12 GB
10M64–80 GB20 GB NVMe~120 GB
100M640 GB+50 GB NVMe~1.2 TB
1BDistributed cluster100 GB NVMe ×3~12 TB

Shard configuration rule of thumb from official Milvus docs: use 1 shard per 100 million vectors for read-heavy workloads, 4 shards per 100 million vectors for write-heavy workloads.

Milvus configuration for production#

Milvus configuration breaks into three layers: dependency component configurations for external services like etcd, MinIO, and message queue — critical for cluster setup and data persistence; internal component configurations for Milvus's internal architecture — key for performance tuning; and functional configurations covering security, logging, and resource limits — important for production deployments.

Key production settings in milvus.yaml:

yaml

yaml
etcd:
  endpoints:
    - etcd-0:2379
    - etcd-1:2379
    - etcd-2:2379
  rootPath: milvus-prod   # Unique per cluster — prevents data collisions

minio:
  address: s3.amazonaws.com
  port: 443
  accessKeyID: ${AWS_ACCESS_KEY}
  secretAccessKey: ${AWS_SECRET_KEY}
  useSSL: true
  bucketName: your-milvus-bucket
  rootPath: milvus-prod
  cloudProvider: aws

# Use Pulsar for distributed mode
pulsar:
  address: pulsar-broker
  port: 6650

# Security
common:
  security:
    authorizationEnabled: true

proxy:
  http:
    enabled: true
    debugMode: false

# Query performance
queryNode:
  segcore:
    chunkRows: 128       # Reduce for lower memory, increase for throughput
  cache:
    memoryLimit: 0.8     # QueryNode uses 80% of available RAM for segment cache

Critical: always set a unique etcd.rootPath and minio.rootPath per Milvus cluster. Without it, two Milvus instances sharing the same etcd or MinIO will corrupt each other's data silently.

When self-hosting Milvus actually makes sense#

Be honest with yourself before committing to the ops overhead. Self-hosting Milvus is the right call when:

  • Your dataset exceeds 50–100 million vectors — below this, Qdrant or Weaviate are simpler and cheaper

  • You have dedicated Kubernetes expertise on the team — someone who owns the cluster day-to-day

  • You need GPU-accelerated search — Milvus's native GPU index support is unique among OSS vector DBs

  • You have compliance requirements that mandate on-premise or specific jurisdiction data residency

  • Your managed Milvus bill exceeds ~$500/month and your team has the bandwidth to absorb the ops work

The simple rule: if your monthly managed Milvus bill exceeds $500 and your team has Kubernetes expertise, evaluate self-hosting. Below $500 per month, the managed convenience is almost always worth it.

When you should strongly consider managed Milvus instead#

Self-hosting is an optimization, not a default. Consider managed when:

  • You're at the MVP or early traction stage — every hour on infra is an hour not shipping product

  • Your team doesn't have Kubernetes experience — Milvus distributed on K8s is not beginner infrastructure

  • Your dataset is under 10–20 million vectors — the architectural complexity isn't justified yet

  • You want predictable costs — etcd disk failures, MinIO egress, and Pulsar backlogs all create unpredictable cost spikes on self-hosted

  • You need automated backups and point-in-time restore without building them yourself

The production checklist for self-hosted Milvus#

If you're going ahead with self-hosting, verify every item before you call it production-ready:

  • NVMe SSDs for etcd — tested with fio, p99 fsync latency under 10ms

  • etcd running as a 3-node cluster, not embedded single-node

  • MinIO / S3 backend configured with unique rootPath per cluster

  • Pulsar or Kafka configured as external message queue (not embedded RocksDB for production)

  • Authentication enabled — authorizationEnabled: true

  • TLS on the Milvus proxy endpoint — port 19530 not exposed directly

  • QueryNode memory limit set (cache.memoryLimit)

  • Shard count configured for your vector count and workload type

  • Milvus Operator deployed if running on Kubernetes

  • Automated backup schedule for MinIO buckets

  • Prometheus metrics scraped from Milvus metrics endpoint

  • Uptime monitor on Milvus proxy healthcheck endpoint with alerting

(For the uptime monitoring piece — if you don't have that in place yet, here's a 2-minute setup guide for HTTP and TCP monitoring with email and webhook alerts.)

Managed Milvus: what to look for#

If you're evaluating managed Milvus options, the things that matter in practice:

  • Does it abstract etcd and MinIO entirely? — or do you still configure them manually

  • Automated backups — are they daily and self-serve restorable, or do you raise a support ticket

  • Scaling without migration — can you scale vertically or add nodes without moving data

  • Free TLS and endpoint — production Milvus should always be HTTPS, not an exposed port

  • Monitoring included — you shouldn't need a separate observability stack just to know if Milvus is up

  • Other vector DBs on the same platform — if you're prototyping on Chroma and moving to Milvus for production, it's cleaner to do both on one platform

The managed path on Antryk#

Antryk deploys fully managed Milvus — along with Qdrant, Weaviate, and Chroma — in a single click. No etcd configuration, no MinIO buckets to provision, no Pulsar setup, no Kubernetes cluster required.

What's handled for you:

  • One-click deployment — pick Milvus, choose your plan, get a live HTTPS endpoint with API credentials in under two minutes

  • etcd, MinIO, and message queue — fully managed, no configuration needed, no individual component failures to chase

  • Automated daily backups — snapshots stored off-instance with one-click restore from the dashboard

  • Free TLS on every endpoint — no reverse proxy setup, no Certbot, your Milvus instance is HTTPS out of the box

  • Built-in monitoring — health, latency, and uptime tracked automatically without a separate Prometheus stack

  • Vertical scaling from the dashboard — upgrade your plan as your vector count grows without touching a config file

  • All four vector databases, one platform — if you started prototyping on Chroma or need Qdrant alongside Milvus for a different use case, it's all on the same dashboard, same billing

For teams that haven't crossed the self-hosting break-even point yet — or simply want to ship their AI features without owning a distributed infrastructure stack — Antryk's managed vector database plans cover Milvus at a cost that makes sense well before the $500/month threshold where self-hosting becomes worth the tradeoff.

Milvus vs alternatives: picking the right tool#

Milvus isn't the right answer for every team. Here's a quick honest comparison:

MilvusQdrantWeaviateChroma
Best atBillion-scale, GPU searchFast filtered search, Rust performanceHybrid search, modulesPrototyping, local dev
Self-hosting complexityHigh (etcd + MinIO + Pulsar)Low (single binary)Medium (modules config)Very low
RAG use casesLarge-scale pipelinesProduction RAGKnowledge graph + RAGEarly-stage RAG
Under 10M vectorsOverkillIdealGoodFine
Over 100M vectorsPurpose-builtWorksWorksNot recommended

If your team is still in the 1–20 million vector range and building a RAG application, Qdrant is worth evaluating first — it's architecturally simpler to operate and performs excellently at that scale. Come back to Milvus when you hit the limits.

Bottom line#

Milvus is an exceptional piece of software. It scales to places other vector databases can't reach, and its index variety and GPU support are genuinely unmatched in open source.

But it comes with a real ops bill — etcd, MinIO, Pulsar, QueryNodes, DataNodes, IndexNodes — all before your application writes its first vector. That infrastructure makes sense when your scale demands it and your team can own it. Before that point, you're paying an ops tax you haven't earned yet.

Start with the right tool for your current scale. Add Milvus's complexity when the alternative starts hurting your performance requirements — not before.

If you want Milvus in production today without the infrastructure overhead, managed Milvus on Antryk handles all of it — deploy in under two minutes →

#milvus deployment#managed milvus#milvus cloud#milvus production#milvus self-hosted#milvus docker#milvus kubernetes#milvus etcd#milvus MinIO#milvus vs qdrant#vector database deployment#milvus RAG#milvus standalone#milvus distributed#managed vector database#milvus 2026#milvus hosting#milvus setup guide
Share:

Priyanka K

Cloud Infrastructure Engineer

Priyanka has a background in backend engineering and cloud infrastructure. She's spent the last five years helping early-stage startups make smarter infrastructure decisions — without overcomplicating things. When she's not writing, she's probably arguing about database indexing strategies or breaking something in a staging environment. She believes good infrastructure should be invisible, and your weekend should stay yours.

On This Page

  • Running Milvus in Production: Why Most Teams Eventually Stop Self-Hosting It
  • Why Milvus exists and what it's genuinely good at
  • The architecture you're signing up for
  • The self-hosting reality: what nobody tells you upfront
  • etcd is the most fragile piece
  • MinIO costs add up quietly
  • Version upgrades are a planned event
  • The Kubernetes operator helps — but is its own learning curve
  • Hardware sizing for Milvus
  • Milvus configuration for production
  • When self-hosting Milvus actually makes sense
  • When you should strongly consider managed Milvus instead
  • The production checklist for self-hosted Milvus
  • Managed Milvus: what to look for
  • The managed path on Antryk
  • Milvus vs alternatives: picking the right tool
  • Bottom line

Previous Post

Managed PostgreSQL vs Self-Hosted: What Every Startup Should Know Before Choosing

Related Articles

Related Articles

how-to-deploy-qdrant-in-production-the-complete-guide-2026-image
10 min

How to Deploy Qdrant in Production: The Complete Guide (2026)

Most Qdrant tutorials stop at docker run. Production needs persistence, TLS, quantization, and replication. Here's the real deployment checklist.

Priyanka K