聚类和非聚类索引到底是什么意思?

让我提供一个关于“聚类索引”的教科书定义，摘自Database Systems: The Complete Book中的15.6.1:

我们也可以称之为聚类索引，它是一个或多个属性上的索引，这样所有具有该索引的搜索键的固定值的元组都出现在能够容纳它们的大致尽可能少的块上。

为了理解定义，让我们看一下教科书提供的例子15.10:

A relation R(a,b) that is sorted on attribute a and stored in that order, packed into blocks, is surely clusterd. An index on a is a clustering index, since for a given a-value a1, all the tuples with that value for a are consecutive. They thus appear packed into blocks, execept possibly for the first and last blocks that contain a-value a1, as suggested in Fig.15.14. However, an index on b is unlikely to be clustering, since the tuples with a fixed b-value will be spread all over the file unless the values of a and b are very closely correlated.

注意，该定义并没有强制数据块在磁盘上必须是连续的;它只是说带搜索键的元组被打包到尽可能少的数据块中。

A related concept is clustered relation. A relation is "clustered" if its tuples are packed into roughly as few blocks as can possibly hold those tuples. In other words, from a disk block perspective, if it contains tuples from different relations, then those relations cannot be clustered (i.e., there is a more packed way to store such relation by swapping the tuples of that relation from other disk blocks with the tuples the doesn't belong to the relation in the current disk block). Clearly, R(a,b) in example above is clustered.

为了将两个概念连接在一起，聚类关系可以具有聚类索引和非聚类索引。但是，对于非聚类关系，除非索引构建在关系的主键之上，否则不可能实现聚类索引。

“集群”作为一个词在数据库存储端的所有抽象级别(三个抽象级别:元组、块、文件)上被大量发送。一个叫做“集群文件”的概念，它描述了一个文件(一组块(一个或多个磁盘块)的抽象)是否包含来自一个关系或不同关系的元组。它与集群索引概念无关，因为它是在文件级别上。

然而，一些教材喜欢根据聚类文件定义定义聚类索引。这两种类型的定义在集群关系级别上是相同的，无论它们是根据数据磁盘块还是文件来定义集群关系。从这段的链接中，

在以下情况下，文件属性A上的索引称为聚类索引:属性值A = A的所有元组按顺序(=连续)存储在数据文件中

连续存储元组就相当于说“元组被打包到尽可能少的块中，以容纳这些元组”(一个是文件，另一个是磁盘)。这是因为连续存储元组是实现“将这些元组打包到尽可能少的块中”的方法。

2018-12-09 19:59:01

聚集索引

聚集索引根据表或视图中的键值对数据行进行排序和存储。这些是包含在索引定义中的列。每个表只能有一个聚集索引，因为数据行本身只能按一种顺序排序。

只有当表中包含聚集索引时，表中的数据行才会按排序顺序存储。当一个表具有聚集索引时，这个表称为聚集表。如果表没有聚集索引，则其数据行存储在称为堆的无序结构中。

非聚集

Nonclustered indexes have a structure separate from the data rows. A nonclustered index contains the nonclustered index key values and each key value entry has a pointer to the data row that contains the key value. The pointer from an index row in a nonclustered index to a data row is called a row locator. The structure of the row locator depends on whether the data pages are stored in a heap or a clustered table. For a heap, a row locator is a pointer to the row. For a clustered table, the row locator is the clustered index key.

可以将非键列添加到非聚集索引的叶级，以绕过现有的索引键限制，并执行完全覆盖的索引查询。有关更多信息，请参见创建包含列的索引。有关索引键限制的详细信息，请参见SQL Server最大容量规格。

参考:https://learn.microsoft.com/en-us/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described

2017-08-28 00:10:59

聚集索引意味着您告诉数据库在磁盘上存储实际上彼此接近的接近值。这样做的好处是可以快速扫描/检索某些聚集索引值范围内的记录。

例如，你有两个表，Customer和Order:

Customer
----------
ID
Name
Address

Order
----------
ID
CustomerID
Price

如果希望快速检索某个特定客户的所有订单，则可能希望在订单表的“CustomerID”列上创建聚集索引。这样，具有相同CustomerID的记录将在物理上彼此靠近地存储在磁盘上(集群)，从而加快了它们的检索速度。

附注:CustomerID上的索引显然不是唯一的，因此您要么需要添加第二个字段来“唯一”索引，要么让数据库为您处理，但这是另一回事。

Regarding multiple indexes. You can have only one clustered index per table because this defines how the data is physically arranged. If you wish an analogy, imagine a big room with many tables in it. You can either put these tables to form several rows or pull them all together to form a big conference table, but not both ways at the same time. A table can have other indexes, they will then point to the entries in the clustered index which in its turn will finally say where to find the actual data.

2009-08-09 16:01:24

使用聚集索引，行按与索引相同的顺序物理存储在磁盘上。因此，只能有一个聚集索引。

对于非聚集索引，有第二个列表，其中包含指向物理行的指针。您可以有许多非聚集索引，尽管每个新索引都会增加写入新记录的时间。

如果想要返回所有列，从聚集索引中读取通常更快。您不必先访问索引，再访问表。

如果需要重新排列数据，则写入具有聚集索引的表可能会较慢。

2009-08-09 16:05:40

让我提供一个关于“聚类索引”的教科书定义，摘自Database Systems: The Complete Book中的15.6.1:

我们也可以称之为聚类索引，它是一个或多个属性上的索引，这样所有具有该索引的搜索键的固定值的元组都出现在能够容纳它们的大致尽可能少的块上。

为了理解定义，让我们看一下教科书提供的例子15.10:

A relation R(a,b) that is sorted on attribute a and stored in that order, packed into blocks, is surely clusterd. An index on a is a clustering index, since for a given a-value a1, all the tuples with that value for a are consecutive. They thus appear packed into blocks, execept possibly for the first and last blocks that contain a-value a1, as suggested in Fig.15.14. However, an index on b is unlikely to be clustering, since the tuples with a fixed b-value will be spread all over the file unless the values of a and b are very closely correlated.

注意，该定义并没有强制数据块在磁盘上必须是连续的;它只是说带搜索键的元组被打包到尽可能少的数据块中。

A related concept is clustered relation. A relation is "clustered" if its tuples are packed into roughly as few blocks as can possibly hold those tuples. In other words, from a disk block perspective, if it contains tuples from different relations, then those relations cannot be clustered (i.e., there is a more packed way to store such relation by swapping the tuples of that relation from other disk blocks with the tuples the doesn't belong to the relation in the current disk block). Clearly, R(a,b) in example above is clustered.

为了将两个概念连接在一起，聚类关系可以具有聚类索引和非聚类索引。但是，对于非聚类关系，除非索引构建在关系的主键之上，否则不可能实现聚类索引。

“集群”作为一个词在数据库存储端的所有抽象级别(三个抽象级别:元组、块、文件)上被大量发送。一个叫做“集群文件”的概念，它描述了一个文件(一组块(一个或多个磁盘块)的抽象)是否包含来自一个关系或不同关系的元组。它与集群索引概念无关，因为它是在文件级别上。

然而，一些教材喜欢根据聚类文件定义定义聚类索引。这两种类型的定义在集群关系级别上是相同的，无论它们是根据数据磁盘块还是文件来定义集群关系。从这段的链接中，

在以下情况下，文件属性A上的索引称为聚类索引:属性值A = A的所有元组按顺序(=连续)存储在数据文件中

连续存储元组就相当于说“元组被打包到尽可能少的块中，以容纳这些元组”(一个是文件，另一个是磁盘)。这是因为连续存储元组是实现“将这些元组打包到尽可能少的块中”的方法。

2018-12-09 19:59:01

聚集索引——聚集索引定义了数据在表中物理存储的顺序。表数据只能按某种方式排序，因此，每个表只能有一个聚集索引。在SQL Server中，主键约束自动在特定列上创建聚集索引。

Non-Clustered Index - A non-clustered index doesn’t sort the physical data inside the table. In fact, a non-clustered index is stored at one place and table data is stored in another place. This is similar to a textbook where the book content is located in one place and the index is located in another. This allows for more than one non-clustered index per table.It is important to mention here that inside the table the data will be sorted by a clustered index. However, inside the non-clustered index data is stored in the specified order. The index contains column values on which the index is created and the address of the record that the column value belongs to.When a query is issued against a column on which the index is created, the database will first go to the index and look for the address of the corresponding row in the table. It will then go to that row address and fetch other column values. It is due to this additional step that non-clustered indexes are slower than clustered indexes

聚类索引和非聚类索引的区别

每个表只能有一个聚集索引。但是，你可以在一个表上创建多个非聚集索引。聚集索引只对表进行排序。因此，他们不消费额外的存储。非聚集索引存储在单独的位置从实际表中占用更多的存储空间。聚集索引比非聚集索引快，因为它们不要涉及任何额外的查找步骤。

有关更多信息，请参阅本文。

2020-05-20 10:03:48

聚类和非聚类索引到底是什么意思?

推荐文章

最新文章

标签