我是一个老派的MySQL用户,总是更喜欢JOIN而不是子查询。但是现在每个人都用子查询,我讨厌它;我不知道为什么。

我缺乏理论知识来判断是否有任何不同。子查询是否与JOIN一样好,因此没有什么可担心的?


当前回答

子查询通常用于将单行作为原子值返回,不过它们也可以用于用IN关键字比较多行之间的值。在SQL语句中几乎任何有意义的地方都允许使用它们,包括目标列表、WHERE子句等等。一个简单的子查询可以用作搜索条件。例如,在一对表之间:

SELECT title 
FROM books 
WHERE author_id = (
    SELECT id 
    FROM authors 
    WHERE last_name = 'Bar' AND first_name = 'Foo'
);

注意,在子查询的结果上使用普通值操作符要求只返回一个字段。如果你想检查一个值是否存在于一组其他值中,请使用in:

SELECT title 
FROM books 
WHERE author_id IN (
    SELECT id FROM authors WHERE last_name ~ '^[A-E]'
);

这显然不同于LEFT-JOIN,你只是想连接表a和表B的东西,即使连接条件在表B中没有找到任何匹配的记录,等等。

如果你只是担心速度,你必须检查你的数据库,写一个好的查询,看看是否有显著的性能差异。

其他回答

子查询是解决“从A获取事实,以B的事实为条件”这种形式的问题的逻辑正确方法。在这种情况下,在子查询中插入B比进行连接更具逻辑意义。从实际意义上讲,它也更安全,因为您不必担心由于与B的多个匹配而从a获得重复的事实。

然而,实际上,答案通常归结于性能。当给出连接和子查询时,一些优化器会很糟糕,而另一些则相反,这是特定于优化器、特定于dbms版本和特定于查询的。

从历史上看,显式连接通常会胜出,因此已经建立的智慧是连接更好,但优化器一直在变得更好,因此我更喜欢先以逻辑一致的方式编写查询,然后在性能限制的情况下重新构造查询。

如果你想用join加速你的查询:

对于“inner join/join”, 不要使用where条件,而是使用“ON”条件。 例如:

     select id,name from table1 a  
   join table2 b on a.name=b.name
   where id='123'

 Try,

    select id,name from table1 a  
   join table2 b on a.name=b.name and a.id='123'

对于“左/右连接”, 不要在“ON”条件下使用,因为如果你使用左/右连接,它将获得任何一个表的所有行。所以,在"开"里也没用。所以,尝试使用“Where”条件

A general rule is that joins are faster in most cases (99%). The more data tables have, the subqueries are slower. The less data tables have, the subqueries have equivalent speed as joins. The subqueries are simpler, easier to understand, and easier to read. Most of the web and app frameworks and their "ORM"s and "Active record"s generate queries with subqueries, because with subqueries are easier to split responsibility, maintain code, etc. For smaller web sites or apps subqueries are OK, but for larger web sites and apps you will often have to rewrite generated queries to join queries, especial if a query uses many subqueries in the query.

有人说“一些RDBMS可以将子查询重写为连接,或将连接重写为子查询,当它认为其中一个比另一个快时”,但这句话适用于简单的情况,当然不适用于带有子查询的复杂查询,这实际上会导致性能问题。

子查询通常用于将单行作为原子值返回,不过它们也可以用于用IN关键字比较多行之间的值。在SQL语句中几乎任何有意义的地方都允许使用它们,包括目标列表、WHERE子句等等。一个简单的子查询可以用作搜索条件。例如,在一对表之间:

SELECT title 
FROM books 
WHERE author_id = (
    SELECT id 
    FROM authors 
    WHERE last_name = 'Bar' AND first_name = 'Foo'
);

注意,在子查询的结果上使用普通值操作符要求只返回一个字段。如果你想检查一个值是否存在于一组其他值中,请使用in:

SELECT title 
FROM books 
WHERE author_id IN (
    SELECT id FROM authors WHERE last_name ~ '^[A-E]'
);

这显然不同于LEFT-JOIN,你只是想连接表a和表B的东西,即使连接条件在表B中没有找到任何匹配的记录,等等。

如果你只是担心速度,你必须检查你的数据库,写一个好的查询,看看是否有显著的性能差异。

MySQL版本:5.5.28-0ubuntu0.12.04.2-log

在我的印象中,JOIN总是比MySQL中的子查询更好,但EXPLAIN是更好的判断方式。下面是一个子查询比join更好的例子。

这是我的查询与3个子查询:

EXPLAIN SELECT vrl.list_id,vrl.ontology_id,vrl.position,l.name AS list_name, vrlih.position AS previous_position, vrl.moved_date 
FROM `vote-ranked-listory` vrl 
INNER JOIN lists l ON l.list_id = vrl.list_id 
INNER JOIN `vote-ranked-list-item-history` vrlih ON vrl.list_id = vrlih.list_id AND vrl.ontology_id=vrlih.ontology_id AND vrlih.type='PREVIOUS_POSITION' 
INNER JOIN list_burial_state lbs ON lbs.list_id = vrl.list_id AND lbs.burial_score < 0.5 
WHERE vrl.position <= 15 AND l.status='ACTIVE' AND l.is_public=1 AND vrl.ontology_id < 1000000000 
 AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=43) IS NULL 
 AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=55) IS NULL 
 AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=246403) IS NOT NULL 
ORDER BY vrl.moved_date DESC LIMIT 200;

解释说明:

+----+--------------------+----------+--------+-----------------------------------------------------+--------------+---------+-------------------------------------------------+------+--------------------------+
| id | select_type        | table    | type   | possible_keys                                       | key          | key_len | ref                                             | rows | Extra                    |
+----+--------------------+----------+--------+-----------------------------------------------------+--------------+---------+-------------------------------------------------+------+--------------------------+
|  1 | PRIMARY            | vrl      | index  | PRIMARY                                             | moved_date   | 8       | NULL                                            |  200 | Using where              |
|  1 | PRIMARY            | l        | eq_ref | PRIMARY,status,ispublic,idx_lookup,is_public_status | PRIMARY      | 4       | ranker.vrl.list_id                              |    1 | Using where              |
|  1 | PRIMARY            | vrlih    | eq_ref | PRIMARY                                             | PRIMARY      | 9       | ranker.vrl.list_id,ranker.vrl.ontology_id,const |    1 | Using where              |
|  1 | PRIMARY            | lbs      | eq_ref | PRIMARY,idx_list_burial_state,burial_score          | PRIMARY      | 4       | ranker.vrl.list_id                              |    1 | Using where              |
|  4 | DEPENDENT SUBQUERY | list_tag | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.l.list_id,const                          |    1 | Using where; Using index |
|  3 | DEPENDENT SUBQUERY | list_tag | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.l.list_id,const                          |    1 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | list_tag | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.l.list_id,const                          |    1 | Using where; Using index |
+----+--------------------+----------+--------+-----------------------------------------------------+--------------+---------+-------------------------------------------------+------+--------------------------+

使用join的相同查询是:

EXPLAIN SELECT vrl.list_id,vrl.ontology_id,vrl.position,l.name AS list_name, vrlih.position AS previous_position, vrl.moved_date 
FROM `vote-ranked-listory` vrl 
INNER JOIN lists l ON l.list_id = vrl.list_id 
INNER JOIN `vote-ranked-list-item-history` vrlih ON vrl.list_id = vrlih.list_id AND vrl.ontology_id=vrlih.ontology_id AND vrlih.type='PREVIOUS_POSITION' 
INNER JOIN list_burial_state lbs ON lbs.list_id = vrl.list_id AND lbs.burial_score < 0.5 
LEFT JOIN list_tag lt1 ON lt1.list_id = vrl.list_id AND lt1.tag_id = 43 
LEFT JOIN list_tag lt2 ON lt2.list_id = vrl.list_id AND lt2.tag_id = 55 
INNER JOIN list_tag lt3 ON lt3.list_id = vrl.list_id AND lt3.tag_id = 246403 
WHERE vrl.position <= 15 AND l.status='ACTIVE' AND l.is_public=1 AND vrl.ontology_id < 1000000000 
AND lt1.list_id IS NULL AND lt2.tag_id IS NULL 
ORDER BY vrl.moved_date DESC LIMIT 200;

输出为:

+----+-------------+-------+--------+-----------------------------------------------------+--------------+---------+---------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type   | possible_keys                                       | key          | key_len | ref                                         | rows | Extra                                        |
+----+-------------+-------+--------+-----------------------------------------------------+--------------+---------+---------------------------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | lt3   | ref    | list_tag_key,list_id,tag_id                         | tag_id       | 5       | const                                       | 2386 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | l     | eq_ref | PRIMARY,status,ispublic,idx_lookup,is_public_status | PRIMARY      | 4       | ranker.lt3.list_id                          |    1 | Using where                                  |
|  1 | SIMPLE      | vrlih | ref    | PRIMARY                                             | PRIMARY      | 4       | ranker.lt3.list_id                          |  103 | Using where                                  |
|  1 | SIMPLE      | vrl   | ref    | PRIMARY                                             | PRIMARY      | 8       | ranker.lt3.list_id,ranker.vrlih.ontology_id |   65 | Using where                                  |
|  1 | SIMPLE      | lt1   | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.lt3.list_id,const                    |    1 | Using where; Using index; Not exists         |
|  1 | SIMPLE      | lbs   | eq_ref | PRIMARY,idx_list_burial_state,burial_score          | PRIMARY      | 4       | ranker.vrl.list_id                          |    1 | Using where                                  |
|  1 | SIMPLE      | lt2   | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.lt3.list_id,const                    |    1 | Using where; Using index                     |
+----+-------------+-------+--------+-----------------------------------------------------+--------------+---------+---------------------------------------------+------+----------------------------------------------+

rows列的比较表明了差异,使用join的查询使用的是using temporary;使用filesort。

当然,当我运行这两个查询时,第一个查询在0.02秒内完成,第二个查询甚至在1分钟后都没有完成,所以EXPLAIN正确地解释了这些查询。

如果我在list_tag表上没有INNER JOIN,即如果我删除

AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=246403) IS NOT NULL  

从第一个查询和相应的:

INNER JOIN list_tag lt3 ON lt3.list_id = vrl.list_id AND lt3.tag_id = 246403

从第二个查询开始,那么EXPLAIN为两个查询返回相同的行数,并且这两个查询的运行速度相同。