IMPALA SQL_SUBQUERY - loukenny/atme GitHub Wiki
SUBQUERY
- nested query, useful for intermediary transformations
- subquery is processed before the main query
- mail/subquery ์ ์ ์ ํ๊ฒ ํํฐ๊ฐ ์ฌ์ฉ๋์๋์ง ํ์ธ ํ์
- ์ด๋ ์์น์์๋ ์ฌ์ฉ ๊ฐ๋ฅ -
SELECT,FROM,WHERE,GROUP BY,IN - ์ฌ๋ฌ ๊ฐ ์ฌ์ฉ ๊ฐ๋ฅ -
SELECT,FROM,WHERESELECT ์ - need to return a single value
- mathematical calculations
- ํ๋์ ์ด๋ก์๋ ์ฌ์ฉ๊ฐ๋ฅ
FROM ์ ๋ ๋ณต์กํ ๊ฒฐ๊ณผ ๋ฐํ์ ์ํ ๋, ์ฌ๋ฌ๊ฐ ๊ฐ๋ฅ(alias, join)- restructure & transform
- aggregates of aggregates
WHERE ์ ์ค์ง 1๊ฐ์ ์ด๋ง ๋ฐํํ ์ ์์
- return info - scalar quantities, list, table
Uncorrelated subquery
- s-q does not contain a reference to the outer query
- s-q can run independently of the outer query
- commonly found w
WHERE,FROM - s-q executes only once and returns the results to the outer query
Correlated subquery
- s-q contains a reference to the outer query
- s-q cannot run independently of the outer query
- commonly found w
WHERE,SELECT - s-q executes only once and returns the results to the outer query
- ๋ฉ์ธ์ฟผ๋ฆฌ์ ๊ฐ์ ์๋ธ์ฟผ๋ฆฌ๊ฐ ์ฌ์ฉํ๊ณ , ์๋ธ์ฟผ๋ฆฌ์ ๊ฐ์ ๋ฐ์์ ๋ฉ์ธ์ฟผ๋ฆฌ๊ฐ ๊ณ์ฐํ๋ ๊ตฌ์กฐ์ ์ฟผ๋ฆฌ, ์ฟผ๋ฆฌ ์ฒ๋ฆฌ ์๊ฐ์ด ๊ฑธ๋ฆฌ๊ธฐ ๋๋ฌธ์
INNER JOIN์ฌ์ฉ ๊ณ ๋ ค - re-run for every row generated in the final data set
- used for advanced joining, filtering and evaluating data
| Simple | Correlated |
|---|---|
| extracting, structuring, filtering | can't be extracted on its own |
| run _independently_from the main q | dependent on the main q to excute |
| evaluated once in the whole q | evaluated in loops, signif slows down q runtime |
โ
โก
โ = โก, WHERE ์ ์๋ธ์ฟผ๋ฆฌ๋ง ์ฐจ์ด
Nested subquery
- can be correlated or uncorrelated or combination of the a two
- can reference info from the outer subquery or main query
Set Operator
| UNION | UNION ALL | INTERSECT | MINUS |
|---|---|---|---|
| O | O | X | X |

โ LEFT JOIN, NOT EXISTS as a MINUS
Subquery as a Column
โ = โก
โ
SELECT countries.name AS country, COUNT(*) AS cities_num
FROM cities
INNER JOIN countries
ON countries.code = cities.country_code
GROUP BY country
ORDER BY cities_num DESC, country
LIMIT 9;
โก SELECT์ ์ ์๋ธ์ฟผ๋ฆฌ๋ฅผ ๋ฐ๋ก ์ด๋ก ์ฌ์ฉํ ์ ์์
SELECT countries.name AS country,
(SELECT count(*)
FROM cities
WHERE countries.code = cities.country_code) AS cities_num
FROM countries
ORDER BY cities_num desc, country
LIMIT 9;
Common Table Expressions (CTEs)
- excuted once, improves query performance
- CTE is then stored in memory
- referencing other CTEs
- referencing itself
SELF JOINโ = โก
โ
โก CTE ์ฌ์ฉํ ๊ฒฝ์ฐ
- 2๊ฐ ์ด์ CTE ํ์ฉํ ๊ฒฝ์ฐ
Presence & Absence
ways to determine whether data in one table is present, or absent, in a related table IN, EXISTS๋ ๋์ผํ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ฌ์ค์๋ ๋ถ๊ตฌํ๊ณ EXISTS๊ฐ ๋ ์ข์ ์ฑ๋ฅ, ๋จ์ํ ํน์ ์นผ๋ผ์ ๊ฐ์ ์ด์ฉํ ๋์๋ IN์ ์ด์ฉ, ์๋ธ์ฟผ๋ฆฌ๋ฅผ ์ด์ฉํ ๋๋ EXISTS ์ฌ์ฉํด์ผ ์ฑ๋ฅ์ด ๋ ์ข์
IN & EXISTS
IN()- () โ ํน์ ๊ฐ or ์๋ธ์ฟผ๋ฆฌ
- collects all the results from a s-q before passing to the outer query
- One major issue with using NOT IN is the way it handles NULL values
- if the columns in the s-q, being evaluated for a non-match, contain NULL values, no results are returned
SELECT WorldBankRegion,
CountryName,
Capital
FROM Nations
WHERE Capital NOT IN (SELECT NearestPop
FROM Earthquakes
WHERE NearestPop IS NOT NULL)
EXISTS()- () โ only ์๋ธ์ฟผ๋ฆฌ
- will stop searching the s-q when the condition is TRUE
SELECT TeamName,
TeamCode,
City
FROM Teams AS t
WHERE EXISTS
(SELECT 1 ----- ์กด์ฌ ํ๋ฉด 1์ ๋ฝ์๋ผ
FROM Earthquakes AS e
WHERE t.City = e.NearestPop);
INTERSECT & EXCEPT
INTERSECT- checks for presence
EXCEPT- checks for absence
- advantage) great for data interrogation
remove duplicates from the returned - disadvantage) the number and order of columns in the SELECT statement must be the same btwn queries
EXISTS & NOT EXISTS
both of these operators use a s-q in the WHERE filter condition
EXISTS- checks for presence
NOT EXISTS- checks for absence
- advantage) advantage over INTERSECT and EXCEPT
s-q will stop searching as soon as it evaluates to TRUE
results can contain any column from the outer query, and in any order - disadvantage) results can only contain columns from the outer query
โ restricting results to columns from the outer query only is also a disadvantage, compared to a JOIN
IN & NOT IN
IN- checks for presence
NOT IN- checks for absence
- advantage) advantage over INTERSECT and EXCEPT
results can contain any column from the outer query, and in any order - disadvantage) results can only contain columns from the outer query
โ restricting results to columns from the outer query only is also a disadvantage, compared to a JOIN
NO RESULTS RETURNED because of the way NOT IN handles NULLS in the s-q
JOIN & exclusive LOJ
INNER JOIN- checks for presence
exclusive LEFT OUTER JOIN- checks for absence
- advantage) results can contain any column, from all joined queries, in any order
โ compred to EXISTS NOT EXISTS IN NOT IN, the results are restricted to columns from the outer query only - disadvantage) exclusive LOJ requires to add the IS NULL as a WHERE filter condition