Set operations (SQL)
UNION operator
In SQL the UNION
clause combines the results of two SQL queries into a single table of all matching rows. The two queries must result in the same number of columns and compatible data types in order to unite. Any duplicate records are automatically removed unless UNION ALL
is used.
UNION
can be useful in data warehouse applications where tables aren't perfectly normalized.[1] A simple example would be a database having tables sales2005
and sales2006
that have identical structures but are separated because of performance considerations. A UNION
query could combine results from both tables.
Note that UNION
does not guarantee the order of rows. Rows from the second operand may appear before, after, or mixed with rows from the first operand. In situations where a specific order is desired, ORDER BY
must be used.
Note that UNION ALL
may be much faster than plain UNION
.
Examples
Given these two tables:
person | amount |
---|---|
Joe | 1000 |
Alex | 2000 |
Bob | 5000 |
person | amount |
---|---|
Joe | 2000 |
Alex | 2000 |
Zach | 35000 |
Executing this statement:
SELECT * FROM sales2005
UNION
SELECT * FROM sales2006;
yields this result set, though the order of the rows can vary because no ORDER BY
clause was supplied:
person | amount |
---|---|
Joe | 1000 |
Alex | 2000 |
Bob | 5000 |
Joe | 2000 |
Zach | 35000 |
Note that there are two rows for Joe because those rows are distinct across their columns. There is only one row for Alex because those rows are not distinct for both columns.
UNION ALL
gives different results, because it will not eliminate duplicates. Executing this statement:
SELECT * FROM sales2005
UNION ALL
SELECT * FROM sales2006;
would give these results, again allowing variance for the lack of an ORDER BY
statement:
person | amount |
---|---|
Joe | 1000 |
Joe | 2000 |
Alex | 2000 |
Alex | 2000 |
Bob | 5000 |
Zach | 35000 |
The discussion of full outer joins also has an example that uses UNION
.
INTERSECT operator
The SQL INTERSECT
operator takes the results of two queries and returns only rows that appear in both result sets. For purposes of duplicate removal the INTERSECT
operator does not distinguish between NULLs
. The INTERSECT
operator removes duplicate rows from the final result set. The INTERSECT ALL
operator does not remove duplicate rows from the final result set.
Example
The following example INTERSECT
query returns all rows from the Orders table where Quantity is between 50 and 100.
SELECT *
FROM Orders
WHERE Quantity BETWEEN 1 AND 100
INTERSECT
SELECT *
FROM Orders
WHERE Quantity BETWEEN 50 AND 200;
EXCEPT operator
The SQL EXCEPT
operator takes the distinct rows of one query and returns the rows that do not appear in a second result set. The EXCEPT ALL
operator does not remove duplicates. For purposes of row elimination and duplicate removal, the EXCEPT
operator does not distinguish between NULLs
.
Notably, the Oracle platform provides a MINUS
operator which is functionally equivalent to the SQL standard EXCEPT DISTINCT
operator .
Example
The following example EXCEPT
query returns all rows from the Orders table where Quantity is between 1 and 49, and those with a Quantity between 76 and 100.
Worded another way; the query returns all rows where the Quantity is between 1 and 100, apart from rows where the quantity is between 50 and 75.
SELECT *
FROM Orders
WHERE Quantity BETWEEN 1 AND 100
EXCEPT
SELECT *
FROM Orders
WHERE Quantity BETWEEN 50 AND 75;
Example
The following example is equivalent to the above example but without using the EXCEPT
operator.
SELECT o1.*
FROM (
SELECT *
FROM Orders
WHERE Quantity BETWEEN 1 AND 100) o1
LEFT JOIN (
SELECT *
FROM Orders
WHERE Quantity BETWEEN 50 AND 75) o2
ON o1.id = o2.id
WHERE o2.id IS NULL
See also
References
- ↑ "a
UNION ALL
views technique for managing maintenance and performance in your large data warehouse environment ... ThisUNION ALL
technique has saved many of my clients with issues related to time-sensitive database designs. These databases usually have an extremely volatile current timeframe, month, or day portion and the older data is rarely updated. Using different container DASD allocations, tablespaces, tables, and index definitions, the settings can be tuned for the specific performance considerations for these different volatility levels and update frequency situations." Terabyte Data Warehouse Table Design Choices - Part 2 (URL accessed on July 25, 2006)
External links
- MSDN documentation on UNION in Transact-SQL for SQL Server
- Naming of select list items in set operations
- UNION in MySQL with Examples
- UNION in MySQL
- UNION Clause in PostgreSQL
- SQL UNION and UNION ALL
- Sort order within UNION statement
- Designing a data flow that loads a warehouse table
- Oracle 11g documentation for UNION (ALL), INTERSECT and MINUS
- SQL Set Operators
|