在多个列上计数DISTINCT

如果您使用的是固定长度的数据类型，则可以将其转换为二进制，从而非常容易和快速地完成此操作。假设documententid和DocumentSessionId都是int，因此都是4字节长…

SELECT COUNT(DISTINCT CAST(DocumentId as binary(4)) + CAST(DocumentSessionId as binary(4)))
FROM DocumentOutputItems

My specific problem required me to divide a SUM by the COUNT of the distinct combination of various foreign keys and a date field, grouping by another foreign key and occasionally filtering by certain values or keys. The table is very large, and using a sub-query dramatically increased the query time. And due to the complexity, statistics simply wasn't a viable option. The CHECKSUM solution was also far too slow in its conversion, particularly as a result of the various data types, and I couldn't risk its unreliability.

然而，使用上述解决方案几乎没有增加查询时间(与简单使用SUM相比)，并且应该是完全可靠的!它应该能够帮助其他处于类似情况的人，所以我把它贴在这里。

2019-09-18 00:27:22

若要作为单个查询运行，请连接列，然后获取连接的字符串的不同实例计数。

SELECT count(DISTINCT concat(DocumentId, DocumentSessionId)) FROM DocumentOutputItems;

在MySQL中，你可以做同样的事情，而不需要下面的连接步骤:

SELECT count(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems;

MySQL文档中提到了这个特性:

http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_count-distinct

2016-07-28 20:21:27

如果你只有一个字段可以“DISTINCT”，你可以使用:

SELECT COUNT(DISTINCT DocumentId) 
FROM DocumentOutputItems

并且返回与原始的相同的查询计划，正如SET SHOWPLAN_ALL ON测试的那样。然而，你正在使用两个字段，所以你可以尝试一些疯狂的东西，如:

    SELECT COUNT(DISTINCT convert(varchar(15),DocumentId)+'|~|'+convert(varchar(15), DocumentSessionId)) 
    FROM DocumentOutputItems

但如果涉及到null，就会出现问题。我还是用原来的问题吧。

2009-09-24 13:34:03

这段代码使用distinct on 2参数，并提供特定于这些不同值的行数计数。它在MySQL中为我工作，就像一个魅力。

select DISTINCT DocumentId as i,  DocumentSessionId as s , count(*) 
from DocumentOutputItems   
group by i ,s;

2018-11-16 07:17:32

编辑:从不太可靠的仅校验和查询更改我发现了一种方法来做到这一点(在SQL Server 2005中)，这对我来说很好，我可以使用尽可能多的列，因为我需要(通过将它们添加到CHECKSUM()函数)。REVERSE()函数将int类型转换为varchars类型，以使distinct类型更加可靠

SELECT COUNT(DISTINCT (CHECKSUM(DocumentId,DocumentSessionId)) + CHECKSUM(REVERSE(DocumentId),REVERSE(DocumentSessionId)) )
FROM DocumentOutPutItems

2012-07-06 23:01:04

如果您使用的是固定长度的数据类型，则可以将其转换为二进制，从而非常容易和快速地完成此操作。假设documententid和DocumentSessionId都是int，因此都是4字节长…

SELECT COUNT(DISTINCT CAST(DocumentId as binary(4)) + CAST(DocumentSessionId as binary(4)))
FROM DocumentOutputItems

My specific problem required me to divide a SUM by the COUNT of the distinct combination of various foreign keys and a date field, grouping by another foreign key and occasionally filtering by certain values or keys. The table is very large, and using a sub-query dramatically increased the query time. And due to the complexity, statistics simply wasn't a viable option. The CHECKSUM solution was also far too slow in its conversion, particularly as a result of the various data types, and I couldn't risk its unreliability.

然而，使用上述解决方案几乎没有增加查询时间(与简单使用SUM相比)，并且应该是完全可靠的!它应该能够帮助其他处于类似情况的人，所以我把它贴在这里。

2019-09-18 00:27:22

在多个列上计数DISTINCT

推荐文章

最新文章

标签