An index can potentially store a large amount of data that can exceed the hardware limits of a single node. For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to serve search requests from a single node alone. To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster. Sharding is important for two primary reasons: It allows you to horizontally split/scale your content volume. It allows you to distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput.
In a network/cloud environment where failures can be expected anytime, it is very useful and highly recommended to have a failover mechanism in case a shard/node somehow goes offline or disappears for whatever reason. To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short. Replication is important for two primary reasons: It provides high availability in case a shard/node fails. For this reason, it is important to note that a replica shard is never allocated on the same node as the original/primary shard that it was copied from. It allows you to scale out your search volume/throughput since searches can be executed on all replicas in parallel.
curl -XPUT 'localhost:9200/_settings' -d '
"index" : {
"number_of_replicas" : 0
An index can potentially store a large amount of data that can exceed the hardware limits of a single node. For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to serve search requests from a single node alone. To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster. Sharding is important for two primary reasons: It allows you to horizontally split/scale your content volume. It allows you to distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput.
In a network/cloud environment where failures can be expected anytime, it is very useful and highly recommended to have a failover mechanism in case a shard/node somehow goes offline or disappears for whatever reason. To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short. Replication is important for two primary reasons: It provides high availability in case a shard/node fails. For this reason, it is important to note that a replica shard is never allocated on the same node as the original/primary shard that it was copied from. It allows you to scale out your search volume/throughput since searches can be executed on all replicas in parallel.
请观看解释ES核心的视频 https://www.youtube.com/watch?v=PpX7J-G2PEo
关于多索引或多碎片的文章 弹性搜索,多个索引vs不同数据集的一个索引和类型?
Being distributed search server, ElasticSearch uses concept called Shard to distribute index documents across all nodes. An index can potentially store a large amount of data that can exceed the hardware limits of a single node For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to serve search requests from a single node alone. To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Documents are stored in shards, and shards are allocated to nodes in your cluster As your cluster grows or shrinks, Elasticsearch will automatically migrate shards between nodes so that the cluster remains balanced. A shard can be either a primary shard or a replica shard. Each document in your index belongs to a single primary shard, so the number of primary shards that you have determines the maximum amount of data that your index can hold A replica shard is just a copy of a primary shard.
Replica shard is the copy of primary Shard, to prevent data loss in case of hardware failure. Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short. An index can also be replicated zero (meaning no replicas) or more times. The number of shards and replicas can be defined per index at the time the index is created. After the index is created, you may change the number of replicas dynamically anytime but you cannot change the number of shards after-the-fact. By default, each index in Elasticsearch is allocated 5 primary Shards and 1 replica which means that if you have at least two nodes in your cluster, your index will have 5 primary shards and another 5 replica shards (1 complete replica) for a total of 10 shards per index.
Elasticsearch is superbly scalable with all the credit goes to its distributed architecture. It is made possible due to Sharding. Now, before moving further into it, let us consider a simple and very common use case. Let us suppose, you have an index which contains a hell lot of documents, and for the sake of simplicity, consider that the size of that index is 1 TB (i.e, Sum of sizes of each and every document in that index is 1 TB). Also, assume that you have two Nodes each with 512 GB of space available for storing data. As can be seen clearly, our entire index cannot be stored in any of the two nodes available and hence we need to distribute our index among these Nodes.