SQL Correlated Subqueries
Understanding Co-related Subqueries in SQL
Introduction
SQL Correlated Subqueries retrieve data from a table referenced in the outer query. They are termed "correlated" because the subquery's execution is influenced by the outer query's rows.
When using correlated subqueries, it's essential to employ a table alias (or correlation name) to clarify the table reference intended for use within the subquery. Unlike regular subqueries, co-related subqueries are executed repeatedly, once for each row processed by the outer query
How Co-related Subqueries Work
In an uncorrelated subquery, the subquery is executed independently first, and its result is then used as a value in the outer query. In contrast, a correlated subquery is linked to the outer query by referencing a column and is executed repeatedly, once for each row processed by the outer query.
This dependency allows co-related subqueries to dynamically adapt their results based on the outer query's current row.
Tips:
An alias is a shorthand or nickname assigned to a table by placing it immediately after the table name in the FROM clause. This practice is useful when retrieving data from multiple tables simultaneously.
Syntax:
SELECT column1, column2, ... FROM table1 outer_alias WHERE column1 operator ( SELECT aggregate_function(column) FROM table2 inner_alias WHERE inner_alias.column = outer_alias.column );
Example:
Consider two tables:
- Employees: Contains employee details.
- Departments: Contains department details.
Task: Find employees whose salary is higher than the average salary of their respective departments.
SQL Code:
SELECT e.employee_id, e.name, e.salary, e.department_id
FROM employees e
WHERE e.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department_id = e.department_id
);
- Outer Query: Retrieves employee details.
- Inner Query: Calculates the average salary for the department of the current employee.
- The WHERE clause ensures only employees earning above their department's average salary are selected.
Uses of Co-related Subqueries
- Comparative Analysis: Comparing row-specific data with aggregated results.
- Data Validation: Ensuring rows meet conditions based on related data.
- Hierarchical Queries: Retrieving parent-child relationships in data.
- Filtering: Dynamically filtering rows based on related criteria.
Advantages
- Allows dynamic filtering based on related data.
- Provides a powerful way to handle complex queries.
Disadvantages
- Can be slower due to repeated execution for each outer query row.
- Difficult to optimize compared to regular subqueries or joins.
Example: SQL Correlated Subqueries
The following correlated subqueries retrieve ord_num, ord_amount, cust_code, and agent_code from the table orders ('a' and 'b' are aliases for the orders and agents tables), based on the following conditions -
The agent_code in the orders table must match the agent_code in the agents table, and the agent_name in the agents table must be 'Alex',
the following SQL statement can be used:
Sample table: orders
ORD_NUM ORD_AMOUNT ADVANCE_AMOUNT ORD_DATE CUST_CODE AGENT_CODE ORD_DESCRIPTION ---------- ---------- -------------- --------- --------------- --------------- ----------------- 200114 3500 2000 15-AUG-08 C00002 A008 200122 2500 400 16-SEP-08 C00003 A004 200118 500 100 20-JUL-08 C00023 A006 200119 4000 700 16-SEP-08 C00007 A010 200121 1500 600 23-SEP-08 C00008 A004 200130 2500 400 30-JUL-08 C00025 A011 200134 4200 1800 25-SEP-08 C00004 A005 200108 4000 600 15-FEB-08 C00008 A004 200103 1500 700 15-MAY-08 C00021 A005 200105 2500 500 18-JUL-08 C00025 A011 ...........
Sample table: agents
+------------+----------------------+--------------------+------------+-----------------+---------+ | AGENT_CODE | AGENT_NAME | WORKING_AREA | COMMISSION | PHONE_NO | COUNTRY | +------------+----------------------+--------------------+------------+-----------------+---------+ | A007 | Ramasundar | Bangalore | 0.15 | 077-25814763 | | | A003 | Alex | London | 0.13 | 075-12458969 | | | A008 | Alford | New York | 0.12 | 044-25874365 | | | A011 | Ravi Kumar | Bangalore | 0.15 | 077-45625874 | | | A010 | Santakumar | Chennai | 0.14 | 007-22388644 | | | A012 | Lucida | San Jose | 0.12 | 044-52981425 | | | A005 | Anderson | Brisban | 0.13 | 045-21447739 | | | A001 | Subbarao | Bangalore | 0.14 | 077-12346674 | | | A002 | Mukesh | Mumbai | 0.11 | 029-12358964 | | | A006 | McDen | London | 0.15 | 078-22255588 | | | A004 | Ivan | Torento | 0.15 | 008-22544166 | | | A009 | Benjamin | Hampshair | 0.11 | 008-22536178 | | +------------+----------------------+--------------------+------------+-----------------+---------+
SQL Code:
-- Selecting specific columns from the orders table and aliasing it as 'a'
SELECT a.ord_num, a.ord_amount, a.cust_code, a.agent_code
-- Filtering orders based on agent_code
FROM orders a
-- Specifying the condition for filtering
WHERE a.agent_code = (
-- Selecting agent_code from agents table and aliasing it as 'b'
SELECT b.agent_code
-- Filtering agents based on agent_name
FROM agents b
-- Specifying the condition for filtering
WHERE b.agent_name = 'Alex'
);
Explanation:
- This SQL query retrieves specific columns from the "orders" table, aliasing it as 'a'.
- It filters the orders based on the agent_code.
- The subquery selects the agent_code from the "agents" table, aliasing it as 'b'.
- It filters the agents based on the agent_name being 'Alex'.
- The main query then selects orders where the agent_code matches the result of the subquery, meaning it selects orders handled by the agent with the name 'Alex'.
Output:
ORD_NUM ORD_AMOUNT CUST_CODE AGENT_CODE ---------- ---------- ---------- ---------- 200127 2500 C00015 A003 200100 1000 C00015 A003
The inner of the above query returns the 'agent_code' A003.
The simplified form of above code is:
SQL Code:
SELECT a.ord_num,a.ord_amount,a.cust_code,a.agent_code
FROM orders a
WHERE a.agent_code='A003';
Visual Presentation:
Using EXISTS with a Correlated Subquery
We have already used the EXISTS operator to check the existence of a result of a subquery. EXISTS operator can be used in correlated subqueries also. Using EXISTS the following query display the employee_id, manager_id, first_name and last_name of those employees who manage other employees.
Sample table: employeesemployee_id first_name last_name email phone_number hire_date job_id salary commission_pct manager_id department_id ----------- ---------- ---------- ---------- ------------ ---------- ---------- ---------- -------------- ---------- ------------- 100 Steven King SKING 515.123.4567 6/17/1987 AD_PRES 24000 90 101 Neena Kochhar NKOCHHAR 515.123.4568 6/18/1987 AD_VP 17000 100 90 102 Lex De Haan LDEHAAN 515.123.4569 6/19/1987 AD_VP 17000 100 90 103 Alexander Hunold AHUNOLD 590.423.4567 6/20/1987 IT_PROG 9000 102 60 104 Bruce Ernst BERNST 590.423.4568 6/21/1987 IT_PROG 6000 103 60 105 David Austin DAUSTIN 590.423.4569 6/22/1987 IT_PROG 4800 103 60 106 Valli Pataballa VPATABAL 590.423.4560 6/23/1987 IT_PROG 4800 103 60 107 Diana Lorentz DLORENTZ 590.423.5567 6/24/1987 IT_PROG 4200 103 60 108 Nancy Greenberg NGREENBE 515.124.4569 6/25/1987 FI_MGR 12000 101 100 109 Daniel Faviet DFAVIET 515.124.4169 6/26/1987 FI_ACCOUNT 9000 108 100 .................................
SQL Code:
-- Selecting specific columns from the employees table and aliasing it as 'a'
SELECT employee_id, manager_id, first_name, last_name
-- Filtering employees based on the EXISTS condition
FROM employees a
-- Specifying the condition for filtering using the EXISTS keyword
WHERE EXISTS (
-- Selecting employee_id from employees table and aliasing it as 'b'
SELECT employee_id
-- Filtering employees based on manager_id
FROM employees b
-- Specifying the condition for filtering
WHERE b.manager_id = a.employee_id
);
Explanation:
- This SQL query retrieves specific columns from the "employees" table, aliasing it as 'a'.
- It filters the employees based on the existence of records in a correlated subquery.
- The subquery selects the employee_id from the "employees" table, aliasing it as 'b'.
- It filters the employees based on whether there exists an employee with a manager_id that matches the employee_id from the outer query.
- The main query then selects employees where the subquery returns at least one record, meaning it selects employees who manage other employees.
Visual Presentation:
Output:
EMPLOYEE_ID MANAGER_ID FIRST_NAME LAST_NAME ----------- ---------- -------------------- --------------- 100 Steven King 101 100 Neena Kochhar 102 100 Lex De Haan 103 102 Alexander Hunold 108 101 Nancy Greenberg 114 100 Den Raphaely 120 100 Matthew Weiss 121 100 Adam Fripp 122 100 Payam Kaufling 123 100 Shanta Vollman 124 100 Kevin Mourgos 145 100 John Russell 146 100 Karen Partners 147 100 Alberto Errazuriz 148 100 Gerald Cambrault 149 100 Eleni Zlotkey 201 100 Michael Hartstein 205 101 Shelley Higgins
Using NOT EXISTS with a Correlated Subquery
NOT EXISTS is logically opposite of EXISTS operator. NOT EXISTS is used when we need to check if rows do not exist in the results returned by a subquery. Using NOT EXISTS the following query display the employee_id, manager_id, first_name and last_name of those employees who have no manager status. This query is opposite to the previous one.
Sample table: employeesemployee_id first_name last_name email phone_number hire_date job_id salary commission_pct manager_id department_id ----------- ---------- ---------- ---------- ------------ ---------- ---------- ---------- -------------- ---------- ------------- 100 Steven King SKING 515.123.4567 6/17/1987 AD_PRES 24000 90 101 Neena Kochhar NKOCHHAR 515.123.4568 6/18/1987 AD_VP 17000 100 90 102 Lex De Haan LDEHAAN 515.123.4569 6/19/1987 AD_VP 17000 100 90 103 Alexander Hunold AHUNOLD 590.423.4567 6/20/1987 IT_PROG 9000 102 60 104 Bruce Ernst BERNST 590.423.4568 6/21/1987 IT_PROG 6000 103 60 105 David Austin DAUSTIN 590.423.4569 6/22/1987 IT_PROG 4800 103 60 106 Valli Pataballa VPATABAL 590.423.4560 6/23/1987 IT_PROG 4800 103 60 107 Diana Lorentz DLORENTZ 590.423.5567 6/24/1987 IT_PROG 4200 103 60 108 Nancy Greenberg NGREENBE 515.124.4569 6/25/1987 FI_MGR 12000 101 100 109 Daniel Faviet DFAVIET 515.124.4169 6/26/1987 FI_ACCOUNT 9000 108 100 .................................
SQL Code:
-- Selecting specific columns from the employees table and aliasing it as 'a'
SELECT employee_id, manager_id, first_name, last_name
-- Filtering employees based on the NOT EXISTS condition
FROM employees a
-- Specifying the condition for filtering using the NOT EXISTS keyword
WHERE NOT EXISTS (
-- Selecting employee_id from employees table and aliasing it as 'b'
SELECT employee_id
-- Filtering employees based on manager_id
FROM employees b
-- Specifying the condition for filtering
WHERE b.manager_id = a.employee_id
);
Explanation:
- This SQL query retrieves specific columns from the "employees" table, aliasing it as 'a'.
- It filters the employees based on the non-existence of records in a correlated subquery.
- The subquery selects the employee_id from the "employees" table, aliasing it as 'b'.
- It filters the employees based on whether there does not exist an employee with a manager_id that matches the employee_id from the outer query.
- The main query then selects employees where the subquery returns no records, meaning it selects employees who do not manage other employees.
Visual Presentation:
Output:
EMPLOYEE_ID MANAGER_ID FIRST_NAME LAST_NAME ----------- ---------- -------------------- -------------- 104 103 Bruce Ernst 105 103 David Austin 106 103 Valli Pataballa 107 103 Diana Lorentz 109 108 Daniel Faviet 110 108 John Chen 111 108 Ismael Sciarra 112 108 Jose Manuel Urman 113 108 Luis Popp 115 114 Alexander Khoo 116 114 Shelli Baida 117 114 Sigal Tobias 118 114 Guy Himuro 119 114 Karen Colmenares 125 120 Julia Nayer 126 120 Irene Mikkilineni 127 120 James Landry 128 120 Steven Markle 129 121 Laura Bissot 130 121 Mozhe Atkinson 131 121 James Marlow ........ .......
Example: Performance Consideration with Correlated Subqueries
Find orders where the order amount exceeds the average order amount handled by the same agent
SQL Code:
-- Selects order number, order amount, and agent code from the orders table aliased as 'o'.
-- Filters orders where the order amount is greater than the average order amount handled by the same agent.
SELECT ord_num, ord_amount, agent_code
FROM orders o
WHERE ord_amount > (
-- Subquery: Calculates the average order amount for orders handled by the agent of the current row in the outer query.
SELECT AVG(ord_amount)
FROM orders
WHERE agent_code = o.agent_code
);
Explanation:
- This SQL query retrieves ord_num, ord_amount, and agent_code from the orders table, aliased as 'o'.
- It filters the results to include only those orders where the ord_amount is greater than the average ord_amount handled by the same agent_code.
- The subquery calculates the average ord_amount for orders associated with the agent_code of each row (o.agent_code) in the outer query.
- This demonstrates the use of a correlated subquery to compare values within the same table based on a condition from the outer query.
Real-World Use Case of Correlated Subquery
List employees who have been with the company longer than any of their direct reports
SQL Code:
-- Selects employee_id, first_name, and last_name from the employees table aliased as 'e'.
-- Filters employees where the hire date is greater than all hire dates of employees managed by them.
SELECT employee_id, first_name, last_name
FROM employees e
WHERE hire_date > ALL (
-- Subquery: Retrieves hire dates of employees who report to the manager of the current row in the outer query.
SELECT hire_date
FROM employees
WHERE manager_id = e.employee_id
);
Explanation:
- This SQL query selects employee_id, first_name, and last_name from the employees table, aliased as 'e'.
- It filters the results to include only those employees whose hire_date is greater than the hire dates of all employees who report to them.
- The subquery retrieves hire_date values from employees who have the same manager_id as the employee_id of the current row (e.employee_id) in the outer query.
- This query demonstrates the use of the ALL operator with a correlated subquery to compare values across related rows in the same table based on a condition from the outer query.
Frequently Asked Questions (FAQ) - SQL Correlated Subqueries
1. What are SQL Correlated Subqueries?
SQL Correlated Subqueries are used to select data from a table referenced in the outer query. The subquery in a correlated subquery is related to the outer query, hence the term "correlated". Each execution of the subquery in the correlated subquery depends on the row processed by the outer query.
2. How do Correlated Subqueries differ from Uncorrelated Subqueries?
In SQL, an uncorrelated subquery executes independently and provides a single value or set of values to the outer query. In contrast, a correlated subquery references columns from the outer query and executes once for each row processed by the outer query.
3. What is a table alias (correlation name) used for in Correlated Subqueries?
A table alias, or correlation name, specifies which table reference is to be used in the context of a correlated subquery. It helps differentiate columns from the outer query and the subquery when both reference the same table.
4. When should I use SQL Correlated Subqueries?
Correlated subqueries are useful when you need to filter rows based on conditions evaluated from related rows in another table. They are typically used in scenarios where the filtering condition depends on values from the current row being processed by the outer query.
5. What are some keywords commonly used with Correlated Subqueries?
Commonly used keywords include EXISTS and NOT EXISTS, which help to check for the existence or non-existence of rows returned by a subquery in the context of a correlated subquery.
Check out our 1000+ SQL Exercises with solution and explanation to improve your skills.
Previous: Multiple Row and Column Subqueries
Next: Nested subqueries
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics