在准备70-433考试时,我注意到可以用以下两种方法之一创建覆盖索引。
CREATE INDEX idx1 ON MyTable (Col1, Col2, Col3)
——或——
CREATE INDEX idx1 ON MyTable (Col1) INCLUDE (Col2, Col3)
INCLUDE条款对我来说很陌生。你为什么要使用它,在决定是否使用INCLUDE子句创建覆盖索引时,你有什么指导方针?
在准备70-433考试时,我注意到可以用以下两种方法之一创建覆盖索引。
CREATE INDEX idx1 ON MyTable (Col1, Col2, Col3)
——或——
CREATE INDEX idx1 ON MyTable (Col1) INCLUDE (Col2, Col3)
INCLUDE条款对我来说很陌生。你为什么要使用它,在决定是否使用INCLUDE子句创建覆盖索引时,你有什么指导方针?
当前回答
There is a limit to the total size of all columns inlined into the index definition. That said though, I have never had to create index that wide. To me, the bigger advantage is the fact that you can cover more queries with one index that has included columns as they don't have to be defined in any particular order. Think about is as an index within the index. One example would be the StoreID (where StoreID is low selectivity meaning that each store is associated with a lot of customers) and then customer demographics data (LastName, FirstName, DOB): If you just inline those columns in this order (StoreID, LastName, FirstName, DOB), you can only efficiently search for customers for which you know StoreID and LastName.
另一方面,在StoreID上定义索引并包括LastName、FirstName、DOB列将允许您在本质上进行两次查找——在StoreID上进行索引谓词,然后在任何包含的列上进行查找谓词。这将让你覆盖所有可能的搜索排列,只要它开始与StoreID。
其他回答
您可以使用INCLUDE将一个或多个列添加到非聚集索引的叶级,如果这样做可以“覆盖”您的查询。
假设您需要查询员工的ID、部门ID和姓。
SELECT EmployeeID, DepartmentID, LastName
FROM Employee
WHERE DepartmentID = 5
如果碰巧在(EmployeeID, DepartmentID)上有一个非聚集索引,一旦找到给定部门的员工,现在必须执行“书签查找”以获得实际的完整员工记录,只是为了获得姓氏列。如果你有很多员工的话,这在绩效方面是非常昂贵的。
如果你在你的索引中包含了这个姓氏:
CREATE NONCLUSTERED INDEX NC_EmpDep
ON Employee(DepartmentID)
INCLUDE (Lastname, EmployeeID)
然后,您需要的所有信息都可以在非聚集索引的叶级中获得。只需在非聚类索引中查找给定部门的员工,就可以获得所有必要的信息,并且不再需要在索引中查找每个员工的书签—>可以节省大量时间。
显然,您不能在每个非聚集索引中包含每一列——但是如果您确实有一些查询只需要“覆盖”一到两列(并且经常使用),那么将这些查询包含到合适的非聚集索引中会非常有帮助。
The reasons why (including the data in the leaf level of the index) have been nicely explained. The reason that you give two shakes about this, is that when you run your query, if you don't have the additional columns included (new feature in SQL 2005) the SQL Server has to go to the clustered index to get the additional columns which takes more time, and adds more load to the SQL Server service, the disks, and the memory (buffer cache to be specific) as new data pages are loaded into memory, potentially pushing other more often needed data out of the buffer cache.
如果您不需要键中的列,那么选择INCLUDE而不是键列的一个原因是文档。这使得在未来发展索引更加容易。
考虑到你的例子:
CREATE INDEX idx1 ON MyTable (Col1) INCLUDE (Col2, Col3)
如果你的查询是这样的,那么这个索引是最好的:
SELECT col2, col3
FROM MyTable
WHERE col1 = ...
当然,如果将列放在关键部分可以获得额外的好处,则不应该将列放在INCLUDE中。下面两个查询实际上都倾向于索引键中的col2列。
SELECT col2, col3
FROM MyTable
WHERE col1 = ...
AND col2 = ...
SELECT TOP 1 col2, col3
FROM MyTable
WHERE col1 = ...
ORDER BY col2
让我们假设情况并非如此,我们在INCLUDE子句中有col2,因为将它放在索引的树部分中没有任何好处。
几年过去了。
你需要调优这个查询:
SELECT TOP 1 col2
FROM MyTable
WHERE col1 = ...
ORDER BY another_col
要优化该查询,可以使用以下索引:
CREATE INDEX idx1 ON MyTable (Col1, another_col) INCLUDE (Col2)
如果你检查你在这个表上已经有哪些索引,你之前的索引可能还在那里:
CREATE INDEX idx1 ON MyTable (Col1) INCLUDE (Col2, Col3)
现在您知道Col2和Col3不是索引树的一部分,因此不能用于缩小读索引范围,也不能用于对行进行排序。Is在索引的键部分的末尾(在col1之后)添加another_column是相当安全的。打破任何东西的风险很小:
DROP INDEX idx1 ON MyTable;
CREATE INDEX idx1 ON MyTable (Col1, another_col) INCLUDE (Col2, Col3);
该指数将变得更大,这仍然有一些风险,但与引入新指数相比,扩展现有指数通常更好。
如果你有一个没有INCLUDE的索引,你不知道在Col1之后添加another_col会中断什么查询。
CREATE INDEX idx1 ON MyTable (Col1, Col2, Col3)
如果在Col1和Col2之间添加another_col会发生什么?其他查询是否会受到影响?
如果添加这些列只是为了避免从表中获取它们,那么INCLUDE与键列相比还有其他“好处”。然而,我认为文档方面是最重要的。
回答你的问题:
在决定是否使用INCLUDE子句创建覆盖索引时,您有什么建议?
如果向索引中添加一列的唯一目的是在不访问表的情况下使该列在索引中可用,请将其放入INCLUDE子句中。
如果将列添加到索引键可以带来额外的好处(例如,通过或因为它可以缩小读索引范围),则将其添加到键中。
你可以在这里阅读更长的讨论:
https://use-the-index-luke.com/blog/2019-04/include-columns-in-btree-indexes
There is a limit to the total size of all columns inlined into the index definition. That said though, I have never had to create index that wide. To me, the bigger advantage is the fact that you can cover more queries with one index that has included columns as they don't have to be defined in any particular order. Think about is as an index within the index. One example would be the StoreID (where StoreID is low selectivity meaning that each store is associated with a lot of customers) and then customer demographics data (LastName, FirstName, DOB): If you just inline those columns in this order (StoreID, LastName, FirstName, DOB), you can only efficiently search for customers for which you know StoreID and LastName.
另一方面,在StoreID上定义索引并包括LastName、FirstName、DOB列将允许您在本质上进行两次查找——在StoreID上进行索引谓词,然后在任何包含的列上进行查找谓词。这将让你覆盖所有可能的搜索排列,只要它开始与StoreID。
基本索引列被排序,但包含的列不被排序。这节省了维护索引的资源,同时仍然可以在所包含的列中提供数据来覆盖查询。因此,如果您想涵盖查询,您可以将搜索条件放在索引的已排序列中定位行,但随后“包括”其他具有非搜索数据的未排序列。它确实有助于减少索引维护中的排序和碎片量。