Shards and Replicas - ignacio-alorre/ElasticSearch GitHub Wiki
Shards
An index can potentially store a large amount of data that can exceed the hardware limits of a single node. For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to serve search requests from a single node alone.
To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster.
Sharding is important for two primary reasons:
- It allows you to horizontally split/scale your content volume.
- It allows you to distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput.
Replicas
In a network/cloud environment where failures can be expected anytime, it is very useful and highly recommended to have a failover mechanism in case a shard/node somehow goes offline or disappears for whatever reason. To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short. A replica factor of 1 means 2 copies: primary + replica.
Replication is important for two primary reasons:
- It provides high availability in case a shard/node fails. For this reason, it is important to note that a replica shard is never allocated on the same node as the original/primary shard that it was copied from.
- It allows you to scale out your search volume/throughput since searches can be executed on all replicas in parallel.
Differences:
- After the index is created, you may change the number of replicas dynamically anytime but you cannot change the number of shards after-the-fact
- By default, each index in Elasticsearch is allovated 5 primary
Shardsand 1Replica, which means that if you have at least two nodes in your cluster, your index will have 5 primary shards and another 5 replica shards (1 complete replica) for a total of 10 shards per index.
How shards works
Uncomplete...
Source:
https://stackoverflow.com/questions/15694724/shards-and-replicas-in-elasticsearch