我是一个老派的MySQL用户,总是更喜欢JOIN而不是子查询。但是现在每个人都用子查询,我讨厌它;我不知道为什么。

我缺乏理论知识来判断是否有任何不同。子查询是否与JOIN一样好,因此没有什么可担心的?


当前回答

MySQL版本:5.5.28-0ubuntu0.12.04.2-log

在我的印象中,JOIN总是比MySQL中的子查询更好,但EXPLAIN是更好的判断方式。下面是一个子查询比join更好的例子。

这是我的查询与3个子查询:

EXPLAIN SELECT vrl.list_id,vrl.ontology_id,vrl.position,l.name AS list_name, vrlih.position AS previous_position, vrl.moved_date 
FROM `vote-ranked-listory` vrl 
INNER JOIN lists l ON l.list_id = vrl.list_id 
INNER JOIN `vote-ranked-list-item-history` vrlih ON vrl.list_id = vrlih.list_id AND vrl.ontology_id=vrlih.ontology_id AND vrlih.type='PREVIOUS_POSITION' 
INNER JOIN list_burial_state lbs ON lbs.list_id = vrl.list_id AND lbs.burial_score < 0.5 
WHERE vrl.position <= 15 AND l.status='ACTIVE' AND l.is_public=1 AND vrl.ontology_id < 1000000000 
 AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=43) IS NULL 
 AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=55) IS NULL 
 AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=246403) IS NOT NULL 
ORDER BY vrl.moved_date DESC LIMIT 200;

解释说明:

+----+--------------------+----------+--------+-----------------------------------------------------+--------------+---------+-------------------------------------------------+------+--------------------------+
| id | select_type        | table    | type   | possible_keys                                       | key          | key_len | ref                                             | rows | Extra                    |
+----+--------------------+----------+--------+-----------------------------------------------------+--------------+---------+-------------------------------------------------+------+--------------------------+
|  1 | PRIMARY            | vrl      | index  | PRIMARY                                             | moved_date   | 8       | NULL                                            |  200 | Using where              |
|  1 | PRIMARY            | l        | eq_ref | PRIMARY,status,ispublic,idx_lookup,is_public_status | PRIMARY      | 4       | ranker.vrl.list_id                              |    1 | Using where              |
|  1 | PRIMARY            | vrlih    | eq_ref | PRIMARY                                             | PRIMARY      | 9       | ranker.vrl.list_id,ranker.vrl.ontology_id,const |    1 | Using where              |
|  1 | PRIMARY            | lbs      | eq_ref | PRIMARY,idx_list_burial_state,burial_score          | PRIMARY      | 4       | ranker.vrl.list_id                              |    1 | Using where              |
|  4 | DEPENDENT SUBQUERY | list_tag | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.l.list_id,const                          |    1 | Using where; Using index |
|  3 | DEPENDENT SUBQUERY | list_tag | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.l.list_id,const                          |    1 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | list_tag | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.l.list_id,const                          |    1 | Using where; Using index |
+----+--------------------+----------+--------+-----------------------------------------------------+--------------+---------+-------------------------------------------------+------+--------------------------+

使用join的相同查询是:

EXPLAIN SELECT vrl.list_id,vrl.ontology_id,vrl.position,l.name AS list_name, vrlih.position AS previous_position, vrl.moved_date 
FROM `vote-ranked-listory` vrl 
INNER JOIN lists l ON l.list_id = vrl.list_id 
INNER JOIN `vote-ranked-list-item-history` vrlih ON vrl.list_id = vrlih.list_id AND vrl.ontology_id=vrlih.ontology_id AND vrlih.type='PREVIOUS_POSITION' 
INNER JOIN list_burial_state lbs ON lbs.list_id = vrl.list_id AND lbs.burial_score < 0.5 
LEFT JOIN list_tag lt1 ON lt1.list_id = vrl.list_id AND lt1.tag_id = 43 
LEFT JOIN list_tag lt2 ON lt2.list_id = vrl.list_id AND lt2.tag_id = 55 
INNER JOIN list_tag lt3 ON lt3.list_id = vrl.list_id AND lt3.tag_id = 246403 
WHERE vrl.position <= 15 AND l.status='ACTIVE' AND l.is_public=1 AND vrl.ontology_id < 1000000000 
AND lt1.list_id IS NULL AND lt2.tag_id IS NULL 
ORDER BY vrl.moved_date DESC LIMIT 200;

输出为:

+----+-------------+-------+--------+-----------------------------------------------------+--------------+---------+---------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type   | possible_keys                                       | key          | key_len | ref                                         | rows | Extra                                        |
+----+-------------+-------+--------+-----------------------------------------------------+--------------+---------+---------------------------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | lt3   | ref    | list_tag_key,list_id,tag_id                         | tag_id       | 5       | const                                       | 2386 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | l     | eq_ref | PRIMARY,status,ispublic,idx_lookup,is_public_status | PRIMARY      | 4       | ranker.lt3.list_id                          |    1 | Using where                                  |
|  1 | SIMPLE      | vrlih | ref    | PRIMARY                                             | PRIMARY      | 4       | ranker.lt3.list_id                          |  103 | Using where                                  |
|  1 | SIMPLE      | vrl   | ref    | PRIMARY                                             | PRIMARY      | 8       | ranker.lt3.list_id,ranker.vrlih.ontology_id |   65 | Using where                                  |
|  1 | SIMPLE      | lt1   | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.lt3.list_id,const                    |    1 | Using where; Using index; Not exists         |
|  1 | SIMPLE      | lbs   | eq_ref | PRIMARY,idx_list_burial_state,burial_score          | PRIMARY      | 4       | ranker.vrl.list_id                          |    1 | Using where                                  |
|  1 | SIMPLE      | lt2   | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.lt3.list_id,const                    |    1 | Using where; Using index                     |
+----+-------------+-------+--------+-----------------------------------------------------+--------------+---------+---------------------------------------------+------+----------------------------------------------+

rows列的比较表明了差异,使用join的查询使用的是using temporary;使用filesort。

当然,当我运行这两个查询时,第一个查询在0.02秒内完成,第二个查询甚至在1分钟后都没有完成,所以EXPLAIN正确地解释了这些查询。

如果我在list_tag表上没有INNER JOIN,即如果我删除

AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=246403) IS NOT NULL  

从第一个查询和相应的:

INNER JOIN list_tag lt3 ON lt3.list_id = vrl.list_id AND lt3.tag_id = 246403

从第二个查询开始,那么EXPLAIN为两个查询返回相同的行数,并且这两个查询的运行速度相同。

其他回答

使用EXPLAIN查看数据库如何对数据执行查询。这个答案中有一个很大的“视情况而定”……

PostgreSQL可以将子查询重写为连接,或将连接重写为子查询,如果它认为其中一个比另一个快。这完全取决于数据、索引、相关性、数据量、查询等。

在大多数情况下,join比子查询快,子查询比子查询快的情况非常罕见。

在join中,RDBMS可以为您的查询创建一个更好的执行计划,并可以预测应该加载哪些数据来处理并节省时间,不像子查询,它将运行所有的查询并加载所有的数据来进行处理。

子查询的好处是它们比join更可读:这就是为什么大多数新学习SQL的人更喜欢它们;这是简单的方法;但是当涉及到性能时,join在大多数情况下更好,尽管它们也不难读。

MySQL版本:5.5.28-0ubuntu0.12.04.2-log

在我的印象中,JOIN总是比MySQL中的子查询更好,但EXPLAIN是更好的判断方式。下面是一个子查询比join更好的例子。

这是我的查询与3个子查询:

EXPLAIN SELECT vrl.list_id,vrl.ontology_id,vrl.position,l.name AS list_name, vrlih.position AS previous_position, vrl.moved_date 
FROM `vote-ranked-listory` vrl 
INNER JOIN lists l ON l.list_id = vrl.list_id 
INNER JOIN `vote-ranked-list-item-history` vrlih ON vrl.list_id = vrlih.list_id AND vrl.ontology_id=vrlih.ontology_id AND vrlih.type='PREVIOUS_POSITION' 
INNER JOIN list_burial_state lbs ON lbs.list_id = vrl.list_id AND lbs.burial_score < 0.5 
WHERE vrl.position <= 15 AND l.status='ACTIVE' AND l.is_public=1 AND vrl.ontology_id < 1000000000 
 AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=43) IS NULL 
 AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=55) IS NULL 
 AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=246403) IS NOT NULL 
ORDER BY vrl.moved_date DESC LIMIT 200;

解释说明:

+----+--------------------+----------+--------+-----------------------------------------------------+--------------+---------+-------------------------------------------------+------+--------------------------+
| id | select_type        | table    | type   | possible_keys                                       | key          | key_len | ref                                             | rows | Extra                    |
+----+--------------------+----------+--------+-----------------------------------------------------+--------------+---------+-------------------------------------------------+------+--------------------------+
|  1 | PRIMARY            | vrl      | index  | PRIMARY                                             | moved_date   | 8       | NULL                                            |  200 | Using where              |
|  1 | PRIMARY            | l        | eq_ref | PRIMARY,status,ispublic,idx_lookup,is_public_status | PRIMARY      | 4       | ranker.vrl.list_id                              |    1 | Using where              |
|  1 | PRIMARY            | vrlih    | eq_ref | PRIMARY                                             | PRIMARY      | 9       | ranker.vrl.list_id,ranker.vrl.ontology_id,const |    1 | Using where              |
|  1 | PRIMARY            | lbs      | eq_ref | PRIMARY,idx_list_burial_state,burial_score          | PRIMARY      | 4       | ranker.vrl.list_id                              |    1 | Using where              |
|  4 | DEPENDENT SUBQUERY | list_tag | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.l.list_id,const                          |    1 | Using where; Using index |
|  3 | DEPENDENT SUBQUERY | list_tag | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.l.list_id,const                          |    1 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | list_tag | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.l.list_id,const                          |    1 | Using where; Using index |
+----+--------------------+----------+--------+-----------------------------------------------------+--------------+---------+-------------------------------------------------+------+--------------------------+

使用join的相同查询是:

EXPLAIN SELECT vrl.list_id,vrl.ontology_id,vrl.position,l.name AS list_name, vrlih.position AS previous_position, vrl.moved_date 
FROM `vote-ranked-listory` vrl 
INNER JOIN lists l ON l.list_id = vrl.list_id 
INNER JOIN `vote-ranked-list-item-history` vrlih ON vrl.list_id = vrlih.list_id AND vrl.ontology_id=vrlih.ontology_id AND vrlih.type='PREVIOUS_POSITION' 
INNER JOIN list_burial_state lbs ON lbs.list_id = vrl.list_id AND lbs.burial_score < 0.5 
LEFT JOIN list_tag lt1 ON lt1.list_id = vrl.list_id AND lt1.tag_id = 43 
LEFT JOIN list_tag lt2 ON lt2.list_id = vrl.list_id AND lt2.tag_id = 55 
INNER JOIN list_tag lt3 ON lt3.list_id = vrl.list_id AND lt3.tag_id = 246403 
WHERE vrl.position <= 15 AND l.status='ACTIVE' AND l.is_public=1 AND vrl.ontology_id < 1000000000 
AND lt1.list_id IS NULL AND lt2.tag_id IS NULL 
ORDER BY vrl.moved_date DESC LIMIT 200;

输出为:

+----+-------------+-------+--------+-----------------------------------------------------+--------------+---------+---------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type   | possible_keys                                       | key          | key_len | ref                                         | rows | Extra                                        |
+----+-------------+-------+--------+-----------------------------------------------------+--------------+---------+---------------------------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | lt3   | ref    | list_tag_key,list_id,tag_id                         | tag_id       | 5       | const                                       | 2386 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | l     | eq_ref | PRIMARY,status,ispublic,idx_lookup,is_public_status | PRIMARY      | 4       | ranker.lt3.list_id                          |    1 | Using where                                  |
|  1 | SIMPLE      | vrlih | ref    | PRIMARY                                             | PRIMARY      | 4       | ranker.lt3.list_id                          |  103 | Using where                                  |
|  1 | SIMPLE      | vrl   | ref    | PRIMARY                                             | PRIMARY      | 8       | ranker.lt3.list_id,ranker.vrlih.ontology_id |   65 | Using where                                  |
|  1 | SIMPLE      | lt1   | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.lt3.list_id,const                    |    1 | Using where; Using index; Not exists         |
|  1 | SIMPLE      | lbs   | eq_ref | PRIMARY,idx_list_burial_state,burial_score          | PRIMARY      | 4       | ranker.vrl.list_id                          |    1 | Using where                                  |
|  1 | SIMPLE      | lt2   | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.lt3.list_id,const                    |    1 | Using where; Using index                     |
+----+-------------+-------+--------+-----------------------------------------------------+--------------+---------+---------------------------------------------+------+----------------------------------------------+

rows列的比较表明了差异,使用join的查询使用的是using temporary;使用filesort。

当然,当我运行这两个查询时,第一个查询在0.02秒内完成,第二个查询甚至在1分钟后都没有完成,所以EXPLAIN正确地解释了这些查询。

如果我在list_tag表上没有INNER JOIN,即如果我删除

AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=246403) IS NOT NULL  

从第一个查询和相应的:

INNER JOIN list_tag lt3 ON lt3.list_id = vrl.list_id AND lt3.tag_id = 246403

从第二个查询开始,那么EXPLAIN为两个查询返回相同的行数,并且这两个查询的运行速度相同。

子查询通常用于将单行作为原子值返回,不过它们也可以用于用IN关键字比较多行之间的值。在SQL语句中几乎任何有意义的地方都允许使用它们,包括目标列表、WHERE子句等等。一个简单的子查询可以用作搜索条件。例如,在一对表之间:

SELECT title 
FROM books 
WHERE author_id = (
    SELECT id 
    FROM authors 
    WHERE last_name = 'Bar' AND first_name = 'Foo'
);

注意,在子查询的结果上使用普通值操作符要求只返回一个字段。如果你想检查一个值是否存在于一组其他值中,请使用in:

SELECT title 
FROM books 
WHERE author_id IN (
    SELECT id FROM authors WHERE last_name ~ '^[A-E]'
);

这显然不同于LEFT-JOIN,你只是想连接表a和表B的东西,即使连接条件在表B中没有找到任何匹配的记录,等等。

如果你只是担心速度,你必须检查你的数据库,写一个好的查询,看看是否有显著的性能差异。

首先,为了比较这两个,首先你应该区分查询和子查询:

一个子查询类,它总是使用连接编写相应的等效查询 不能使用连接重写的子查询类

对于第一类查询,一个好的RDBMS将把联接查询和子查询视为等效的,并将产生相同的查询计划。

现在甚至mysql也这么做了。

尽管如此,有时它并不会,但这并不意味着连接总是会赢-我有在mysql中使用子查询提高性能的情况。(例如,如果有一些东西阻止mysql计划器正确估计成本,如果计划器没有看到连接变量和子查询变量相同,那么子查询可以通过强制某个路径来优于连接)。

结论是,如果您想确定哪一种查询性能更好,就应该同时测试连接和子查询变量。

对于第二个类,比较没有意义,因为这些查询不能使用连接重写,在这种情况下,子查询是完成所需任务的自然方式,您不应该歧视它们。