我试图理解什么shard和replica在Elasticsearch中,但我没有设法理解它。如果我下载Elasticsearch并运行脚本,那么根据我所知道的,我已经启动了一个具有单个节点的集群。现在这个节点(我的PC)有5个碎片(?)和一些副本(?)。

它们是什么,我有5个重复的索引吗?如果是,为什么?我需要一些解释。


当前回答

用最简单的术语来说,碎片只是存储在磁盘上一个分离文件夹中的索引的一部分:

这个截图显示了整个Elasticsearch目录。

如您所见,所有数据都进入data目录。

通过检查索引C-mAfLltQzuas72iMiIXNw,我们看到它有五个碎片(文件夹0到4)。

另一方面,JH_A8PgCRj-GK0GeQ0limw索引只有一个碎片(0文件夹)。

pri表示碎片的总数。

其他回答

不是答案,而是ElasticSearch的核心概念的另一个参考,我认为它们非常清楚地补充了@javanna的答案。

碎片

An index can potentially store a large amount of data that can exceed the hardware limits of a single node. For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to serve search requests from a single node alone. To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster. Sharding is important for two primary reasons: It allows you to horizontally split/scale your content volume. It allows you to distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput.

副本

In a network/cloud environment where failures can be expected anytime, it is very useful and highly recommended to have a failover mechanism in case a shard/node somehow goes offline or disappears for whatever reason. To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short. Replication is important for two primary reasons: It provides high availability in case a shard/node fails. For this reason, it is important to note that a replica shard is never allocated on the same node as the original/primary shard that it was copied from. It allows you to scale out your search volume/throughput since searches can be executed on all replicas in parallel.

索引被分解成碎片,以便分布它们和扩展它们。

副本是分片的副本,在节点丢失时提供可靠性。这个数字经常会引起混淆,因为副本计数== 1意味着集群必须有可用的分片的主副本和复制副本才能处于绿色状态。

为了创建副本,您的集群中必须至少有2个节点。

你可能会发现这里的定义更容易理解: http://www.elasticsearch.org/guide/reference/glossary/

用最简单的术语来说,碎片只是存储在磁盘上一个分离文件夹中的索引的一部分:

这个截图显示了整个Elasticsearch目录。

如您所见,所有数据都进入data目录。

通过检查索引C-mAfLltQzuas72iMiIXNw,我们看到它有五个碎片(文件夹0到4)。

另一方面,JH_A8PgCRj-GK0GeQ0limw索引只有一个碎片(0文件夹)。

pri表示碎片的总数。

如果你真的不喜欢看到它变黄。您可以将副本的数量设置为0:

curl -XPUT 'localhost:9200/_settings' -d '
{
    "index" : {
        "number_of_replicas" : 0
    }
}
'

请注意,您应该只在本地开发框上执行此操作。

碎片:

Being distributed search server, ElasticSearch uses concept called Shard to distribute index documents across all nodes. An index can potentially store a large amount of data that can exceed the hardware limits of a single node For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to serve search requests from a single node alone. To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Documents are stored in shards, and shards are allocated to nodes in your cluster As your cluster grows or shrinks, Elasticsearch will automatically migrate shards between nodes so that the cluster remains balanced. A shard can be either a primary shard or a replica shard. Each document in your index belongs to a single primary shard, so the number of primary shards that you have determines the maximum amount of data that your index can hold A replica shard is just a copy of a primary shard.

副本:

Replica shard is the copy of primary Shard, to prevent data loss in case of hardware failure. Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short. An index can also be replicated zero (meaning no replicas) or more times. The number of shards and replicas can be defined per index at the time the index is created. After the index is created, you may change the number of replicas dynamically anytime but you cannot change the number of shards after-the-fact. By default, each index in Elasticsearch is allocated 5 primary Shards and 1 replica which means that if you have at least two nodes in your cluster, your index will have 5 primary shards and another 5 replica shards (1 complete replica) for a total of 10 shards per index.