什么是SQL JOIN,有哪些不同的类型?


当前回答

定义:


join是一种同时从多个表中查询合并数据的方法。

join的类型:


对于RDBMS来说,有5种类型的连接:

Equi-Join: Combines common records from two tables based on equality condition. Technically, Join made by using equality-operator (=) to compare values of Primary Key of one table and Foreign Key values of another table, hence result set includes common(matched) records from both tables. For implementation see INNER-JOIN. Natural-Join: It is enhanced version of Equi-Join, in which SELECT operation omits duplicate column. For implementation see INNER-JOIN Non-Equi-Join: It is reverse of Equi-join where joining condition is uses other than equal operator(=) e.g, !=, <=, >=, >, < or BETWEEN etc. For implementation see INNER-JOIN. Self-Join:: A customized behavior of join where a table combined with itself; This is typically needed for querying self-referencing tables (or Unary relationship entity). For implementation see INNER-JOINs. Cartesian Product: It cross combines all records of both tables without any condition. Technically, it returns the result set of a query without WHERE-Clause.

根据SQL的关注和进展,有3种类型的连接,所有的RDBMS连接都可以使用这些类型的连接来实现。

INNER-JOIN: It merges(or combines) matched rows from two tables. The matching is done based on common columns of tables and their comparing operation. If equality based condition then: EQUI-JOIN performed, otherwise Non-EQUI-Join. OUTER-JOIN: It merges(or combines) matched rows from two tables and unmatched rows with NULL values. However, can customized selection of un-matched rows e.g, selecting unmatched row from first table or second table by sub-types: LEFT OUTER JOIN and RIGHT OUTER JOIN. 2.1. LEFT Outer JOIN (a.k.a, LEFT-JOIN): Returns matched rows from two tables and unmatched from the LEFT table(i.e, first table) only. 2.2. RIGHT Outer JOIN (a.k.a, RIGHT-JOIN): Returns matched rows from two tables and unmatched from the RIGHT table only. 2.3. FULL OUTER JOIN (a.k.a OUTER JOIN): Returns matched and unmatched from both tables. CROSS-JOIN: This join does not merges/combines instead it performs Cartesian product.

注意:根据需要,Self-JOIN可以通过INNER-JOIN、OUTER-JOIN和CROSS-JOIN来实现,但是表必须与自身连接。

欲了解更多信息:

例子:

1.1: INNER-JOIN:等价连接实现

SELECT  *
FROM Table1 A 
 INNER JOIN Table2 B ON A.<Primary-Key> =B.<Foreign-Key>;

1.2: INNER-JOIN:自然连接实现

Select A.*, B.Col1, B.Col2          --But no B.ForeignKeyColumn in Select
 FROM Table1 A
 INNER JOIN Table2 B On A.Pk = B.Fk;

1.3:带非等连接实现的INNER-JOIN

Select *
 FROM Table1 A INNER JOIN Table2 B On A.Pk <= B.Fk;

1.4:内部连接与自我连接

Select *
 FROM Table1 A1 INNER JOIN Table1 A2 On A1.Pk = A2.Fk;

2.1: OUTER JOIN(完全外部连接)

Select *
 FROM Table1 A FULL OUTER JOIN Table2 B On A.Pk = B.Fk;

2.2:左连接

Select *
 FROM Table1 A LEFT OUTER JOIN Table2 B On A.Pk = B.Fk;

2.3:右连接

Select *
 FROM Table1 A RIGHT OUTER JOIN Table2 B On A.Pk = B.Fk;

3.1:交叉连接

Select *
 FROM TableA CROSS JOIN TableB;

3.2:交叉连接-自连接

Select *
 FROM Table1 A1 CROSS JOIN Table1 A2;

/ / / /

Select *
 FROM Table1 A1,Table1 A2;

其他回答

定义:


join是一种同时从多个表中查询合并数据的方法。

join的类型:


对于RDBMS来说,有5种类型的连接:

Equi-Join: Combines common records from two tables based on equality condition. Technically, Join made by using equality-operator (=) to compare values of Primary Key of one table and Foreign Key values of another table, hence result set includes common(matched) records from both tables. For implementation see INNER-JOIN. Natural-Join: It is enhanced version of Equi-Join, in which SELECT operation omits duplicate column. For implementation see INNER-JOIN Non-Equi-Join: It is reverse of Equi-join where joining condition is uses other than equal operator(=) e.g, !=, <=, >=, >, < or BETWEEN etc. For implementation see INNER-JOIN. Self-Join:: A customized behavior of join where a table combined with itself; This is typically needed for querying self-referencing tables (or Unary relationship entity). For implementation see INNER-JOINs. Cartesian Product: It cross combines all records of both tables without any condition. Technically, it returns the result set of a query without WHERE-Clause.

根据SQL的关注和进展,有3种类型的连接,所有的RDBMS连接都可以使用这些类型的连接来实现。

INNER-JOIN: It merges(or combines) matched rows from two tables. The matching is done based on common columns of tables and their comparing operation. If equality based condition then: EQUI-JOIN performed, otherwise Non-EQUI-Join. OUTER-JOIN: It merges(or combines) matched rows from two tables and unmatched rows with NULL values. However, can customized selection of un-matched rows e.g, selecting unmatched row from first table or second table by sub-types: LEFT OUTER JOIN and RIGHT OUTER JOIN. 2.1. LEFT Outer JOIN (a.k.a, LEFT-JOIN): Returns matched rows from two tables and unmatched from the LEFT table(i.e, first table) only. 2.2. RIGHT Outer JOIN (a.k.a, RIGHT-JOIN): Returns matched rows from two tables and unmatched from the RIGHT table only. 2.3. FULL OUTER JOIN (a.k.a OUTER JOIN): Returns matched and unmatched from both tables. CROSS-JOIN: This join does not merges/combines instead it performs Cartesian product.

注意:根据需要,Self-JOIN可以通过INNER-JOIN、OUTER-JOIN和CROSS-JOIN来实现,但是表必须与自身连接。

欲了解更多信息:

例子:

1.1: INNER-JOIN:等价连接实现

SELECT  *
FROM Table1 A 
 INNER JOIN Table2 B ON A.<Primary-Key> =B.<Foreign-Key>;

1.2: INNER-JOIN:自然连接实现

Select A.*, B.Col1, B.Col2          --But no B.ForeignKeyColumn in Select
 FROM Table1 A
 INNER JOIN Table2 B On A.Pk = B.Fk;

1.3:带非等连接实现的INNER-JOIN

Select *
 FROM Table1 A INNER JOIN Table2 B On A.Pk <= B.Fk;

1.4:内部连接与自我连接

Select *
 FROM Table1 A1 INNER JOIN Table1 A2 On A1.Pk = A2.Fk;

2.1: OUTER JOIN(完全外部连接)

Select *
 FROM Table1 A FULL OUTER JOIN Table2 B On A.Pk = B.Fk;

2.2:左连接

Select *
 FROM Table1 A LEFT OUTER JOIN Table2 B On A.Pk = B.Fk;

2.3:右连接

Select *
 FROM Table1 A RIGHT OUTER JOIN Table2 B On A.Pk = B.Fk;

3.1:交叉连接

Select *
 FROM TableA CROSS JOIN TableB;

3.2:交叉连接-自连接

Select *
 FROM Table1 A1 CROSS JOIN Table1 A2;

/ / / /

Select *
 FROM Table1 A1,Table1 A2;

来自W3schools的一个例子:






有趣的是,大多数其他答案都存在以下两个问题:

它们只关注连接的基本形式 他们(ab)使用维恩图,这是一个不准确的工具来可视化连接(他们更适合于联合)。

我最近写了一篇关于这个主题的文章:关于在SQL中连接表的许多不同方法的可能不完整的全面指南,我将在这里总结。

首先也是最重要的:join是笛卡尔积

这就是为什么维恩图解释得如此不准确,因为JOIN在两个连接的表之间创建了一个笛卡尔积。维基百科很好地说明了这一点:

笛卡尔积的SQL语法是CROSS JOIN。例如:

SELECT *

-- This just generates all the days in January 2017
FROM generate_series(
  '2017-01-01'::TIMESTAMP,
  '2017-01-01'::TIMESTAMP + INTERVAL '1 month -1 day',
  INTERVAL '1 day'
) AS days(day)

-- Here, we're combining all days with all departments
CROSS JOIN departments

它将一个表中的所有行与另一个表中的所有行组合在一起:

来源:

+--------+   +------------+
| day    |   | department |
+--------+   +------------+
| Jan 01 |   | Dept 1     |
| Jan 02 |   | Dept 2     |
| ...    |   | Dept 3     |
| Jan 30 |   +------------+
| Jan 31 |
+--------+

结果:

+--------+------------+
| day    | department |
+--------+------------+
| Jan 01 | Dept 1     |
| Jan 01 | Dept 2     |
| Jan 01 | Dept 3     |
| Jan 02 | Dept 1     |
| Jan 02 | Dept 2     |
| Jan 02 | Dept 3     |
| ...    | ...        |
| Jan 31 | Dept 1     |
| Jan 31 | Dept 2     |
| Jan 31 | Dept 3     |
+--------+------------+

如果我们只是写一个逗号分隔的表列表,我们会得到相同的结果:

-- CROSS JOINing two tables:
SELECT * FROM table1, table2

内部连接(Theta-JOIN)

INNER JOIN只是一个经过过滤的CROSS JOIN,其中过滤器谓词在关系代数中称为Theta。

例如:

SELECT *

-- Same as before
FROM generate_series(
  '2017-01-01'::TIMESTAMP,
  '2017-01-01'::TIMESTAMP + INTERVAL '1 month -1 day',
  INTERVAL '1 day'
) AS days(day)

-- Now, exclude all days/departments combinations for
-- days before the department was created
JOIN departments AS d ON day >= d.created_at

注意关键字INNER是可选的(在MS Access中除外)。

(请参阅文章中的结果示例)

均匀加入

一种特殊的Theta-JOIN是我们最常用的equi JOIN。谓词将一个表的主键与另一个表的外键连接起来。如果我们使用Sakila数据库进行说明,我们可以这样写:

SELECT *
FROM actor AS a
JOIN film_actor AS fa ON a.actor_id = fa.actor_id
JOIN film AS f ON f.film_id = fa.film_id

这结合了所有演员和他们的电影。

或者,在一些数据库中:

SELECT *
FROM actor
JOIN film_actor USING (actor_id)
JOIN film USING (film_id)

USING()语法允许指定必须出现在JOIN操作表两侧的列,并在这两列上创建一个相等谓词。

自然的加入

其他答案单独列出了这个“JOIN类型”,但这没有意义。它只是equi JOIN的语法糖形式,它是Theta-JOIN或INNER JOIN的一种特殊情况。NATURAL JOIN简单地收集被连接的表和USING()连接这些列所共有的所有列。这几乎没什么用,因为会出现意外匹配(比如Sakila数据库中的LAST_UPDATE列)。

语法如下:

SELECT *
FROM actor
NATURAL JOIN film_actor
NATURAL JOIN film

外连接

现在,OUTER JOIN与INNER JOIN有点不同,因为它创建了几个笛卡尔积的UNION。我们可以写成:

-- Convenient syntax:
SELECT *
FROM a LEFT JOIN b ON <predicate>

-- Cumbersome, equivalent syntax:
SELECT a.*, b.*
FROM a JOIN b ON <predicate>
UNION ALL
SELECT a.*, NULL, NULL, ..., NULL
FROM a
WHERE NOT EXISTS (
  SELECT * FROM b WHERE <predicate>
)

没有人想要编写后者,所以我们编写OUTER JOIN(通常由数据库更好地优化)。

与INNER一样,关键字OUTER在这里也是可选的。

OUTER JOIN有三种口味:

LEFT [OUTER] JOIN: JOIN表达式的左表被添加到联合中,如上所示。 RIGHT [OUTER] JOIN:将JOIN表达式的右边表添加到联合中,如上图所示。 FULL [OUTER] JOIN:如上所示,JOIN表达式的两个表都被添加到联合中。

所有这些都可以与关键字USING()或NATURAL组合(我最近实际上有一个NATURAL FULL JOIN的真实用例)

替代语法

在Oracle和SQL Server中有一些历史悠久的,已弃用的语法,在SQL标准有此语法之前,它们已经支持OUTER JOIN:

-- Oracle
SELECT *
FROM actor a, film_actor fa, film f
WHERE a.actor_id = fa.actor_id(+)
AND fa.film_id = f.film_id(+)

-- SQL Server
SELECT *
FROM actor a, film_actor fa, film f
WHERE a.actor_id *= fa.actor_id
AND fa.film_id *= f.film_id

话虽如此,但不要使用这种语法。我只是在这里列出它,以便您可以从旧的博客文章/遗留代码中识别它。

分区OUTER连接

很少有人知道这一点,但是SQL标准指定了分区OUTER JOIN (Oracle实现了它)。你可以这样写:

WITH

  -- Using CONNECT BY to generate all dates in January
  days(day) AS (
    SELECT DATE '2017-01-01' + LEVEL - 1
    FROM dual
    CONNECT BY LEVEL <= 31
  ),

  -- Our departments
  departments(department, created_at) AS (
    SELECT 'Dept 1', DATE '2017-01-10' FROM dual UNION ALL
    SELECT 'Dept 2', DATE '2017-01-11' FROM dual UNION ALL
    SELECT 'Dept 3', DATE '2017-01-12' FROM dual UNION ALL
    SELECT 'Dept 4', DATE '2017-04-01' FROM dual UNION ALL
    SELECT 'Dept 5', DATE '2017-04-02' FROM dual
  )
SELECT *
FROM days 
LEFT JOIN departments 
  PARTITION BY (department) -- This is where the magic happens
  ON day >= created_at

结果如下:

+--------+------------+------------+
| day    | department | created_at |
+--------+------------+------------+
| Jan 01 | Dept 1     |            | -- Didn't match, but still get row
| Jan 02 | Dept 1     |            | -- Didn't match, but still get row
| ...    | Dept 1     |            | -- Didn't match, but still get row
| Jan 09 | Dept 1     |            | -- Didn't match, but still get row
| Jan 10 | Dept 1     | Jan 10     | -- Matches, so get join result
| Jan 11 | Dept 1     | Jan 10     | -- Matches, so get join result
| Jan 12 | Dept 1     | Jan 10     | -- Matches, so get join result
| ...    | Dept 1     | Jan 10     | -- Matches, so get join result
| Jan 31 | Dept 1     | Jan 10     | -- Matches, so get join result

这里的重点是,无论join是否匹配“join的另一端”上的任何内容,来自连接的已分区一侧的所有行都将在结果中结束。长话短说:这就是在报告中填充稀疏数据。非常有用!

半连接

严重吗?没有其他答案了吗?当然不是,因为它在SQL中没有原生语法,很不幸(就像下面的ANTI JOIN一样)。但我们可以使用IN()和EXISTS(),例如,找到所有在电影中演出过的演员:

SELECT *
FROM actor a
WHERE EXISTS (
  SELECT * FROM film_actor fa
  WHERE a.actor_id = fa.actor_id
)

WHERE a.actor_id = fa。Actor_id谓词充当半连接谓词。如果你不相信,看看执行计划,比如Oracle。您将看到数据库执行了一个SEMI JOIN操作,而不是EXISTS()谓词。

反加入

这与SEMI JOIN正好相反(注意不要使用not IN,因为它有一个重要的警告)

以下是所有没有拍过电影的演员:

SELECT *
FROM actor a
WHERE NOT EXISTS (
  SELECT * FROM film_actor fa
  WHERE a.actor_id = fa.actor_id
)

有些人(尤其是MySQL的人)也会这样写ANTI - JOIN:

SELECT *
FROM actor a
LEFT JOIN film_actor fa
USING (actor_id)
WHERE film_id IS NULL

我认为历史原因是表现。

横向连接

天哪,这个太酷了。只有我一个人提起这件事?这是一个很酷的问题:

SELECT a.first_name, a.last_name, f.*
FROM actor AS a
LEFT OUTER JOIN LATERAL (
  SELECT f.title, SUM(amount) AS revenue
  FROM film AS f
  JOIN film_actor AS fa USING (film_id)
  JOIN inventory AS i USING (film_id)
  JOIN rental AS r USING (inventory_id)
  JOIN payment AS p USING (rental_id)
  WHERE fa.actor_id = a.actor_id -- JOIN predicate with the outer query!
  GROUP BY f.film_id
  ORDER BY revenue DESC
  LIMIT 5
) AS f
ON true

它将找出每位演员收入最高的5部电影。每当你需要一个TOP-N-per-something查询时,LATERAL JOIN将是你的朋友。如果您是SQL Server使用者,那么您应该知道这个JOIN类型的名称是APPLY

SELECT a.first_name, a.last_name, f.*
FROM actor AS a
OUTER APPLY (
  SELECT f.title, SUM(amount) AS revenue
  FROM film AS f
  JOIN film_actor AS fa ON f.film_id = fa.film_id
  JOIN inventory AS i ON f.film_id = i.film_id
  JOIN rental AS r ON i.inventory_id = r.inventory_id
  JOIN payment AS p ON r.rental_id = p.rental_id
  WHERE fa.actor_id = a.actor_id -- JOIN predicate with the outer query!
  GROUP BY f.film_id
  ORDER BY revenue DESC
  LIMIT 5
) AS f

好吧,也许这是欺骗,因为LATERAL JOIN或APPLY表达式实际上是一个产生几行的“相关子查询”。但是如果我们允许“相关子查询”,我们还可以讨论……

多重集

这实际上只由Oracle和Informix实现(据我所知),但它可以在PostgreSQL中使用数组和/或XML以及在SQL Server中使用XML进行模拟。

MULTISET生成一个相关的子查询,并将结果行集嵌套在外部查询中。下面的查询选择所有演员,并为每个演员在嵌套集合中收集他们的电影:

SELECT a.*, MULTISET (
  SELECT f.*
  FROM film AS f
  JOIN film_actor AS fa USING (film_id)
  WHERE a.actor_id = fa.actor_id
) AS films
FROM actor

正如您所看到的,除了通常提到的“无聊的”INNER、OUTER和CROSS JOIN之外,还有更多类型的JOIN。更多细节请见我的文章。请不要再用维恩图来说明它们了。

我要推一下我最讨厌的:USING关键字。

如果JOIN两边的表都有正确命名的外键(即,相同的名称,而不仅仅是“id”),那么可以使用:

SELECT ...
FROM customers JOIN orders USING (customer_id)

我发现这是非常实用的,可读的,但不经常使用。

在我看来,我创造了一个比文字更能解释的插图: