How Elasticsearch Works: Scalability and Resilience

Elasticsearch (ES) is a distributed search and analytics engine built on top of Apache Lucene. To understand how it operates effectively, it is important to explore its mechanisms for scalability and resilience.

Scalability

Scalability refers to a system’s ability to handle growing amounts of data and traffic without performance degradation. Elasticsearch achieves scalability through its distributed architecture:

Clusters and Nodes: A cluster is a collection of one or more nodes (servers) that work together. Each node stores a portion of the data and participates in query execution. Adding more nodes allows the cluster to handle larger data volumes and more concurrent requests.

Sharding: Elasticsearch divides each index into shards, which are independent Lucene instances. Shards can be primary (store the original data) or replica (copies for redundancy). Sharding enables parallel processing, allowing queries to execute across multiple nodes efficiently.

Horizontal Scaling: Adding more nodes is the preferred method to scale Elasticsearch. Since shards are distributed across nodes, adding nodes increases both capacity and throughput.

Replica Shards for Load Handling: When the system experiences increased load, you can add more replica shards. These replicas are distributed across multiple machines, ensuring that search queries and read requests are spread across the cluster. This improves both performance and capacity.

Primary Shards Limitation: The number of primary shards is defined at index creation time. If you need a different number of primary shards later, you can use the resize APIs — split (to create more primary shards), shrink (to create fewer primary shards), or clone (to keep the same number of primary shards with new settings for replicas).

These operations copy Lucene segments and avoid a full re-indexing of all documents. When creating an index, you can set the number of primary and replica shards as settings of the index.

PUT /logs-index
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

Resilience

Resilience in Elasticsearch is achieved through primary and replica shards. The primary shards store the original data, while the replica shards serve as copies of the primary shards. Replica shards provide fault tolerance, ensuring that if a node fails, data remains available.

Additionally, replica shards enhance read performance, since search requests can be served by either primary or replica shards.

Replicas are always placed on different nodes from the primary shard, since two copies of the same data on the same node would offer no protection if the node were to fail. As a result, for a system to support n replicas, there needs to be at least n + 1 nodes in the cluster.

When a node leaves the cluster for any reason, intentional or otherwise, the master node reacts by:

1. Promoting a replica shard to primary to replace any primaries that were on the node.
2. Allocating replica shards to replace the missing replicas (assuming there are enough nodes).
3. Rebalancing shards evenly across the remaining nodes.

Example Scenario:
Node 2 loses network connectivity.

1. The master promotes a replica shard to primary for each primary shard that was on Node 2.
2. The master allocates new replicas to other nodes in the cluster.
3. Each new replica copies the entire primary shard across the network.
4. Additional shards are moved to different nodes to rebalance the cluster.
5. Node 5 returns after a few minutes.
6. The master rebalances the cluster by allocating shards to Node 5.

If the master had waited a few minutes, the missing shards could have been reallocated to Node 5 with minimal network traffic. This process is even quicker for idle shards (shards not receiving indexing requests) which have been automatically flushed.

Controlling Replica Allocation Delay: The allocation of replica shards that become unassigned when a node leaves can be delayed using the index.unassigned.node_left.delayed_timeout dynamic setting, which defaults to 1m.

When a node goes down, your cluster health status changes based on shard availability:

1. Yellow: All primary shards are active, but at least one replica shard is missing or unassigned.
2. Red: At least one primary shard is missing and no replica was available to be promoted, meaning some data is currently unavailable.

Read & Write

Writing Data (Indexing)

Write operations (index, update, or delete) follow a strict sequential flow to ensure all shard copies stay in sync:

1. Routing: The request is sent to a coordinating node, which determines the primary shard for the document (typically based on the document ID) and forwards the request to the node holding it.
2. Primary Execution: The primary shard validates the operation and executes it locally.
3. Replication: Once successful, the primary forwards the operation to all active replica shards in parallel.
4. Acknowledgment: The primary waits for responses from all replicas in the “in-sync” set before sending a success message back to the client.
5. Durability: Every write is appended to a translog (write-ahead log) on disk for crash recovery before being committed to the permanent index.

Reading Data

Elasticsearch provides two main ways to retrieve data:

1. GET by ID (Real-time)

Since the exact shard is known, the request is routed directly to one active copy (primary or replica). This is real-time; it can even read from the translog before the data is formally indexed.

2. Search (Near Real-time)

A search request is broadcast to all shards (or a subset) of an index.

1. Query Phase: Each shard executes the search locally and returns a sorted list of document IDs to the coordinating node.
2. Fetch Phase: The coordinating node merges the results and requests the full document content from the relevant shards.
3. Latency: Searches are near real-time because they only see data after an index refresh (default is every 1 second), which makes new segments searchable.

The Anatomy of an Elasticsearch Segment

A segment in Elasticsearch is a small, immutable, self-contained inverted index stored on disk. Each shard consists of one or more segments. Documents are first written to memory and periodically flushed into a new segment.

Once a segment is written to disk, it is immutable. This immutability allows segments to be easily cached and searched concurrently without complex locking mechanisms.

When you search a shard, Elasticsearch searches every segment in that shard sequentially and then merges the results.

Indexing Flow: Documents are first collected in an in-memory buffer. Periodically (default every 1 second), Elasticsearch performs a refresh, writing these documents into a new segment in the filesystem cache, making them searchable.

Because segments are immutable, data cannot be modified in place:
Deletions: A document is not physically removed; instead, it is marked as deleted in a separate bitset (or ".del" file).
Updates: An update is performed as a delete followed by an insert of the new version into a new segment.

As new documents are indexed, the number of small segments grows rapidly. Searching slows down because every segment must be checked. Elasticsearch automatically runs a merge process in the background, combining several smaller segments into one larger segment.

During merges:
1. Documents marked for deletion are physically removed and not copied to the new segment. You can manually trigger a merge (often called "optimize") using the Force Merge API to reduce an index to a specific number of segments, which is recommended for static, read-only indices.

2. Fewer, larger segments generally provide faster search performance than many small segments. Each segment consumes memory, CPU, and file handles.

3. Merging helps keep these overheads under control. Frequent segment creation (low refresh interval) increases indexing throughput but creates more segments, eventually triggering more background merges and consuming significant I/O resources.