是否有更好的方法来执行这样的查询:
SELECT COUNT(*)
FROM (SELECT DISTINCT DocumentId, DocumentSessionId
FROM DocumentOutputItems) AS internalQuery
我需要数一下这个表中不同项的数量,但不同项超过两列。
我的查询工作得很好,但我想知道我是否可以只使用一个查询(不使用子查询)得到最终结果
是否有更好的方法来执行这样的查询:
SELECT COUNT(*)
FROM (SELECT DISTINCT DocumentId, DocumentSessionId
FROM DocumentOutputItems) AS internalQuery
我需要数一下这个表中不同项的数量,但不同项超过两列。
我的查询工作得很好,但我想知道我是否可以只使用一个查询(不使用子查询)得到最终结果
当前回答
如果您使用的是固定长度的数据类型,则可以将其转换为二进制,从而非常容易和快速地完成此操作。假设documententid和DocumentSessionId都是int,因此都是4字节长…
SELECT COUNT(DISTINCT CAST(DocumentId as binary(4)) + CAST(DocumentSessionId as binary(4)))
FROM DocumentOutputItems
My specific problem required me to divide a SUM by the COUNT of the distinct combination of various foreign keys and a date field, grouping by another foreign key and occasionally filtering by certain values or keys. The table is very large, and using a sub-query dramatically increased the query time. And due to the complexity, statistics simply wasn't a viable option. The CHECKSUM solution was also far too slow in its conversion, particularly as a result of the various data types, and I couldn't risk its unreliability.
然而,使用上述解决方案几乎没有增加查询时间(与简单使用SUM相比),并且应该是完全可靠的!它应该能够帮助其他处于类似情况的人,所以我把它贴在这里。
其他回答
我用过这种方法,对我很有效。
SELECT COUNT(DISTINCT DocumentID || DocumentSessionId)
FROM DocumentOutputItems
对于我的案例,它提供了正确的结果。
这对我很管用。在oracle中:
SELECT SUM(DECODE(COUNT(*),1,1,1))
FROM DocumentOutputItems GROUP BY DocumentId, DocumentSessionId;
在jpql:
SELECT SUM(CASE WHEN COUNT(i)=1 THEN 1 ELSE 1 END)
FROM DocumentOutputItems i GROUP BY i.DocumentId, i.DocumentSessionId;
如果您使用的是固定长度的数据类型,则可以将其转换为二进制,从而非常容易和快速地完成此操作。假设documententid和DocumentSessionId都是int,因此都是4字节长…
SELECT COUNT(DISTINCT CAST(DocumentId as binary(4)) + CAST(DocumentSessionId as binary(4)))
FROM DocumentOutputItems
My specific problem required me to divide a SUM by the COUNT of the distinct combination of various foreign keys and a date field, grouping by another foreign key and occasionally filtering by certain values or keys. The table is very large, and using a sub-query dramatically increased the query time. And due to the complexity, statistics simply wasn't a viable option. The CHECKSUM solution was also far too slow in its conversion, particularly as a result of the various data types, and I couldn't risk its unreliability.
然而,使用上述解决方案几乎没有增加查询时间(与简单使用SUM相比),并且应该是完全可靠的!它应该能够帮助其他处于类似情况的人,所以我把它贴在这里。
下面是不带subselect的简短版本:
SELECT COUNT(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems
它在MySQL中工作得很好,我认为优化器更容易理解这一点。
编辑:显然我误解了MSSQL和MySQL -对不起,但也许它有帮助。
您不喜欢现有查询的哪些方面?如果您担心两列之间的DISTINCT不返回唯一的排列,为什么不试试呢?
在Oracle中,它当然可以像您所期望的那样工作。
SQL> select distinct deptno, job from emp
2 order by deptno, job
3 /
DEPTNO JOB
---------- ---------
10 CLERK
10 MANAGER
10 PRESIDENT
20 ANALYST
20 CLERK
20 MANAGER
30 CLERK
30 MANAGER
30 SALESMAN
9 rows selected.
SQL> select count(*) from (
2 select distinct deptno, job from emp
3 )
4 /
COUNT(*)
----------
9
SQL>
edit
我进入了分析的死胡同,但答案很明显……
SQL> select count(distinct concat(deptno,job)) from emp
2 /
COUNT(DISTINCTCONCAT(DEPTNO,JOB))
---------------------------------
9
SQL>
编辑2
对于以下数据,上面提供的串联解决方案将会计数错误:
col1 col2
---- ----
A AA
AA A
所以我们要包含分隔符…
select col1 + '*' + col2 from t23
/
显然,所选择的分隔符必须是一个字符或一组字符,它不能出现在任何一列中。