IMPALA SQL_SUBQUERY - loukenny/atme GitHub Wiki

SUBQUERY

SQL Style Guide

  • nested query, useful for intermediary transformations
  • subquery is processed before the main query
  • mail/subquery ์— ์ ์ ˆํ•˜๊ฒŒ ํ•„ํ„ฐ๊ฐ€ ์‚ฌ์šฉ๋˜์—ˆ๋Š”์ง€ ํ™•์ธ ํ•„์š”
  • ์–ด๋А ์œ„์น˜์—์„œ๋‚˜ ์‚ฌ์šฉ ๊ฐ€๋Šฅ - SELECT, FROM, WHERE, GROUP BY, IN
  • ์—ฌ๋Ÿฌ ๊ฐœ ์‚ฌ์šฉ ๊ฐ€๋Šฅ - SELECT, FROM, WHERE
    • SELECT ์ ˆ
      • need to return a single value
      • mathematical calculations
      • ํ•˜๋‚˜์˜ ์—ด๋กœ์„œ๋„ ์‚ฌ์šฉ๊ฐ€๋Šฅ
    • FROM ์ ˆ ๋” ๋ณต์žกํ•œ ๊ฒฐ๊ณผ ๋ฐ˜ํ™˜์„ ์›ํ•  ๋•Œ, ์—ฌ๋Ÿฌ๊ฐœ ๊ฐ€๋Šฅ(alias, join)
      • restructure & transform
      • aggregates of aggregates
    • WHERE ์ ˆ ์˜ค์ง 1๊ฐœ์˜ ์—ด๋งŒ ๋ฐ˜ํ™˜ํ•  ์ˆ˜ ์žˆ์Œ
  • return info - scalar quantities, list, table

Uncorrelated subquery

  • s-q does not contain a reference to the outer query
  • s-q can run independently of the outer query
  • commonly found w WHERE, FROM
  • s-q executes only once and returns the results to the outer query

Correlated subquery

  • s-q contains a reference to the outer query
  • s-q cannot run independently of the outer query
  • commonly found w WHERE, SELECT
  • s-q executes only once and returns the results to the outer query
  • ๋ฉ”์ธ์ฟผ๋ฆฌ์˜ ๊ฐ’์„ ์„œ๋ธŒ์ฟผ๋ฆฌ๊ฐ€ ์‚ฌ์šฉํ•˜๊ณ , ์„œ๋ธŒ์ฟผ๋ฆฌ์˜ ๊ฐ’์„ ๋ฐ›์•„์„œ ๋ฉ”์ธ์ฟผ๋ฆฌ๊ฐ€ ๊ณ„์‚ฐํ•˜๋Š” ๊ตฌ์กฐ์˜ ์ฟผ๋ฆฌ, ์ฟผ๋ฆฌ ์ฒ˜๋ฆฌ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆฌ๊ธฐ ๋•Œ๋ฌธ์— INNER JOIN ์‚ฌ์šฉ ๊ณ ๋ ค
  • re-run for every row generated in the final data set
  • used for advanced joining, filtering and evaluating data
Simple Correlated
extracting, structuring, filtering can't be extracted on its own
run _independently_from the main q dependent on the main q to excute
evaluated once in the whole q evaluated in loops, signif slows down q runtime

โ‘ 
โ‘ก
โ‘  = โ‘ก, WHERE ์ ˆ ์„œ๋ธŒ์ฟผ๋ฆฌ๋งŒ ์ฐจ์ด

Nested subquery

  • can be correlated or uncorrelated or combination of the a two
  • can reference info from the outer subquery or main query

Set Operator

UNION UNION ALL INTERSECT MINUS
O O X X

KakaoTalk_20190725_021952689
โ†’ LEFT JOIN, NOT EXISTS as a MINUS

Subquery as a Column

โ‘  = โ‘ก
โ‘ 
   SELECT countries.name AS country, COUNT(*) AS cities_num
     FROM cities
          INNER JOIN countries
          ON countries.code = cities.country_code
   GROUP BY country
   ORDER BY cities_num DESC, country
   LIMIT 9;
โ‘ก SELECT์ ˆ์— ์„œ๋ธŒ์ฟผ๋ฆฌ๋ฅผ ๋ฐ”๋กœ ์—ด๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Œ
   SELECT countries.name AS country,
          (SELECT count(*)
             FROM cities
            WHERE countries.code = cities.country_code) AS cities_num
   FROM countries
   ORDER BY cities_num desc, country
   LIMIT 9;

Common Table Expressions (CTEs)

  • excuted once, improves query performance
    • CTE is then stored in memory
  • referencing other CTEs
  • referencing itself SELF JOIN โ‘  = โ‘ก
    โ‘ 
    โ‘ก CTE ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ
  • 2๊ฐœ ์ด์ƒ CTE ํ™œ์šฉํ•  ๊ฒฝ์šฐ

Presence & Absence

ways to determine whether data in one table is present, or absent, in a related table IN, EXISTS๋Š” ๋™์ผํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์คŒ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  EXISTS๊ฐ€ ๋” ์ข‹์€ ์„ฑ๋Šฅ, ๋‹จ์ˆœํžˆ ํŠน์ • ์นผ๋Ÿผ์˜ ๊ฐ’์„ ์ด์šฉํ•  ๋•Œ์—๋Š” IN์„ ์ด์šฉ, ์„œ๋ธŒ์ฟผ๋ฆฌ๋ฅผ ์ด์šฉํ•  ๋•Œ๋Š” EXISTS ์‚ฌ์šฉํ•ด์•ผ ์„ฑ๋Šฅ์ด ๋” ์ข‹์Œ

IN & EXISTS

  • IN()
    • () โ† ํŠน์ • ๊ฐ’ or ์„œ๋ธŒ์ฟผ๋ฆฌ
    • collects all the results from a s-q before passing to the outer query
    • One major issue with using NOT IN is the way it handles NULL values
      • if the columns in the s-q, being evaluated for a non-match, contain NULL values, no results are returned
SELECT WorldBankRegion,
          CountryName,
          Capital
FROM Nations 
WHERE Capital NOT IN (SELECT NearestPop
     		        FROM Earthquakes
                       WHERE NearestPop IS NOT NULL)
  • EXISTS()
    • () โ† only ์„œ๋ธŒ์ฟผ๋ฆฌ
    • will stop searching the s-q when the condition is TRUE
SELECT TeamName,
       TeamCode,   
	City
FROM Teams AS t
WHERE EXISTS
      (SELECT 1  ----- ์กด์žฌ ํ•˜๋ฉด 1์„ ๋ฝ‘์•„๋ผ  
	FROM Earthquakes AS e 	  
	WHERE t.City = e.NearestPop);

INTERSECT & EXCEPT

  • INTERSECT
    • checks for presence
  • EXCEPT
    • checks for absence
  • advantage) great for data interrogation
    remove duplicates from the returned
  • disadvantage) the number and order of columns in the SELECT statement must be the same btwn queries

EXISTS & NOT EXISTS

both of these operators use a s-q in the WHERE filter condition

  • EXISTS
    • checks for presence
  • NOT EXISTS
    • checks for absence
  • advantage) advantage over INTERSECT and EXCEPT
    s-q will stop searching as soon as it evaluates to TRUE
    results can contain any column from the outer query, and in any order
  • disadvantage) results can only contain columns from the outer query
    โ†’ restricting results to columns from the outer query only is also a disadvantage, compared to a JOIN

IN & NOT IN

  • IN
    • checks for presence
  • NOT IN
    • checks for absence
  • advantage) advantage over INTERSECT and EXCEPT
    results can contain any column from the outer query, and in any order
  • disadvantage) results can only contain columns from the outer query
    โ†’ restricting results to columns from the outer query only is also a disadvantage, compared to a JOIN
    NO RESULTS RETURNED because of the way NOT IN handles NULLS in the s-q

JOIN & exclusive LOJ

  • INNER JOIN
    • checks for presence
  • exclusive LEFT OUTER JOIN
    • checks for absence
  • advantage) results can contain any column, from all joined queries, in any order
    โ†’ compred to EXISTS NOT EXISTS IN NOT IN, the results are restricted to columns from the outer query only
  • disadvantage) exclusive LOJ requires to add the IS NULL as a WHERE filter condition