Skip to main content

Command Palette

Search for a command to run...

ElasticSearch / OpenSearch Shard Imbalance: The Hidden Limitation

Published
5 min read
S

Software Architect building large-scale systems. I write about system design, and share my journey and experiments in ML and LLMs.

To truly understand Elasticsearch or OpenSearch, you must understand shards. They dictate your cluster’s performance, reliability, scalability, and even cost. Without grasping how shards are stored and balanced, you can’t master the system.

This post builds up the fundamentals and then reveals a silent limitation that can quietly damage performance — even when shard counts look perfectly balanced.

1. How Shards Are Stored Inside an Elasticsearch/OpenSearch Cluster

Before discussing shard sizing or performance issues, it’s important to understand the physical layout of an index across a cluster.
The diagram below illustrates this clearly:

1.1 Index → Logical Grouping of Documents

An index — such as My_Index — is only a logical grouping. Elasticsearch never stores it as one large block.
Instead, it splits the index into primary shards, which are the real physical storage units.


1.2 Primary Shards → The Authoritative Store

Primary shards store the original copy of your data.

When you index a document:

  • A routing hash (based on _id by default) selects the shard

  • The document is stored in that primary shard

  • Queries may be served by either the primary shard or any of its replicas

Developers may also specify a custom routing value for advanced use cases.


1.3 Replica Shards → High Availability + Parallel Reads

Replica shards (R1, R2, R3) are perfect copies of primary shards, stored on different nodes.

They provide:

  • High Availability - If the data node hosting P1 crashes, R1 is promoted to become the new primary instantly.

  • Increased Read Throughput - Read queries can be served by Primary shards Or any of their replicas. This allows parallelism and improved latency.


1.4 Data Nodes → Execution & Storage Engine

Data nodes:

  • Store shards

  • Execute queries

  • Perform indexing

  • Handle segment merges

  • Maintain filesystem-level structures

Each node holds a mix of primary and replica shards (as diagrammed).


1.5 Master Node (Cluster-Manager Node) → The Coordinator

The master node (officially cluster-manager node in newer versions) manages:

  • Cluster metadata

  • Shard allocation

  • Node join/leave

  • Promoting replicas to primaries

  • Rebalancing logic

It never stores data; it orchestrates the cluster.


A widely accepted guideline in the Elasticsearch and OpenSearch community is:

✔ Aim for ~30–50 GB per shard.

Why Shard size is important ?

Large shards cause:

  • Slow search performance because more data to be scanned

  • Slow recovery after a node failure

  • High heap pressure

  • Longer merge times

  • Large blast radius during node failure

Tiny shards cause:

  • Excess metadata

  • High memory overhead

  • More open file handles

  • Slow search performance because per data node threads are limited and a thread can only scan one shard at a time and with increase in number of shards per data node will result in delay in scanning across them

Shard sizing is one of the most impactful decisions in cluster design


3. Over-Sharding vs Under-Sharding

Under-Sharding (Huge Shards)Over-Sharding (Too Many Tiny Shards)
Too few shards → Each shard stores too much data (e.g., 200–500 GB).Too many shards → Each shard stores very little data.
Sluggish searches, Slow reallocations, Heap pressure and Long recovery timesToo many small Lucene indices, High memory overhead, Slowdowns during peak time, Master/manager node pressure and Unstable cluster operations

4. How Elasticsearch/OpenSearch Balances Shards

Here is the important part:

Elasticsearch balances based on shard count, not shard size.

If you have 20 shards and 5 nodes, the allocator will try to assign ~4 shards per node — regardless of how large or small they are.

This naive balancing strategy hides a dangerous imbalance.

Which brings us to the heart of this article.


5. The Hidden Limitation: Balanced Shard Count ≠ Balanced Workload

Consider this very common scenario:

  • Index A → under-sharded

    • 2 shards

    • 100 GB each

  • Index B → over-sharded

    • 10 shards

    • 1 GB each

  • Cluster → 4 data nodes

  • Total shards → 12

  • Target: 3 shards per node

On paper, perfect.
In practice, a trap.

This visual makes the imbalance obvious.

What’s happening here?

  • Nodes 1 & 2 are each carrying a 100 GB shard

  • Nodes 3 & 4 carry only tiny 1 GB shards

  • All nodes technically hold “3 shards”

But the actual data load — and therefore CPU, heap, GC pressure, and I/O — is wildly uneven.

Real-world impact:

  • Heavy nodes → slow merges, high CPU, GC spikes

  • Light nodes → mostly idle

  • Query latency fluctuates

  • Recovery time balloons

  • Cluster becomes fragile

  • Index A becomes a hotspot, slowing everything down

This is why shard count balancing can betray you.

Shards may be numerically balanced, but workload is not.

This is one of the most important realities engineers must understand.


Conclusion

Shards define everything about your Elasticsearch/OpenSearch cluster — performance, reliability, cost, and long-term stability.
By understanding how shards are stored, how replicas work, and — critically — how the cluster balances shard counts but not sizes, you can avoid silent performance killers.

A few smart decisions today can save you from outages, bottlenecks, and reindexing nightmares tomorrow.