IMPALA SQL_FUNCTIONS - loukenny/atme GitHub Wiki

FUNCTIONS

CASE WHEN

  • Categorizing data; SELECT Clause


WHERE ์ ˆ ์„ค์ •

WHERE ์ ˆ 8455๋ฅผ HOME/AWAY๋กœ ์„ค์ •X โ†’ ELSE ์ ˆ ์˜ค๋ฅ˜ ๋ฐœ์ƒ
์ด๊ฐ™์€ ๋ฌธ์ œ๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด WHERE ์ ˆ์— specific condition ์„ค์ •์ด ํ•„์š”ํ•จ
ELSE๋ฅผ ์„ค์ •ํ•˜๋˜ ์•ˆํ•˜๋˜, ์กฐ๊ฑด ์™ธ ๋‚˜๋จธ์ง€ ๊ฒฐ๊ณผ ๊ฐ’์€ ํ•ญ์ƒ NULL
ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด WHERE์ ˆ์—์„œ filter๋กœ CASE ์‚ฌ์šฉํ•ด์•ผ ํ•จ

SELECT date,
        -- ํ™ˆํŒ€, ์›์ •ํŒ€ ์„ค์ •
        CASE WHEN hometeam_id = 8634 THEN 'FC Barcelona' 
         ELSE 'Real Madrid CF' END as home,
	CASE WHEN awayteam_id = 8634 THEN 'FC Barcelona' 
         ELSE 'Real Madrid CF' END as away,
        -- ๊ฒฝ๊ธฐ ๊ฒฐ๊ณผ ๊ฒฝ์šฐ์˜ ์ˆ˜ ์„ค์ •
	CASE WHEN home_goal > away_goal AND hometeam_id = 8634 THEN 'Barcelona win!'
        WHEN home_goal > away_goal AND hometeam_id = 8633 THEN 'Real Madrid win!'
        WHEN home_goal < away_goal AND awayteam_id = 8634 THEN 'Barcelona win!'
        WHEN home_goal < away_goal AND awayteam_id = 8633 THEN 'Real Madrid win!'
        ELSE 'Tie!' END as outcome
FROM matches_spain
-- ํ™ˆํŒ€, ์›์ •ํŒ€ id ํ•œ์ •
WHERE (awayteam_id = 8634 OR hometeam_id = 8634)
      AND (awayteam_id = 8633 OR hometeam_id = 8633);
  • Filtering data; WHERE Clause ํ•„ํ„ฐ๋กœ ์‚ฌ์šฉํ•˜๋ฉด์„œ ๊ฒฐ๊ณผ๊ฐ’์—์„œ ๋ณด๊ธฐ ์›ํ•˜์ง€ ์•Š๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๊ฑฐ๋ฅผ ์ˆ˜ ์žˆ์Œ


WHERE ์ ˆ์—์„œ CASE WHEN ๊ตฌ๋ฌธ์— END IS NOT NULL ๋ช…์‹œ

  • Aggregating data
    • CASE WHEN with COUNT
    • CASE WHEN with SUM
    • CASE WHEN with AVG
โ–ท COUNT
   SELECT season,
          COUNT(CASE WHEN hometeam_id = 8650
                      AND home_goal > away_goal
                     THEN id END) AS home_wins
          COUNT(CASE WHEN awayteam_id = 8650
                      AND away_goal > home_goal
                     THEN id END) AS away_wins
   FROM match
   GROUP BY season;

โ–ท SUM
   SELECT season,
          SUM(CASE WHEN hometeam_id = 8650
                   THEN home_goal END) AS home_goals
          SUM(CASE WHEN awayteam_id = 8650
                   THEN away_goal END) AS away_goals
   FROM match
   GROUP BY season;
โ†’ ELSE ๊ฒฝ์šฐ๋Š” ์ž๋™์œผ๋กœ NULL

โ–ท AVG
   SELECT season,
          ROUND(AVG(CASE WHEN hometeam_id = 8650
                         THEN home_goal END), 2) AS home_goals
          ROUND(AVG(CASE WHEN awayteam_id = 8650
                         THEN away_goal END), 2) AS away_goals
   FROM match
   GROUP BY season;

โ–ท Percentages w CASE & AVG ***
   SELECT season,
          ROUND(AVG(CASE WHEN hometeam_id = 8455 AND home_goal > away_goal THEN 1
                         WHEN hometeam_id = 8455 AND home_goal < away_goal THEN 0
                          END), 2) AS pct_homewins,
          ROUND(AVG(CASE WHEN awayteam_id = 8455 AND away_goal > home_goal THEN 1
                         WHEN awayteam_id = 8455 AND away_goal < home_goal THEN 0
                          END), 2) AS pct_awaywins
   FROM match
   GROUP BY season;

Impala Date and Time Functions

  • date data type์€ timestamp ํ˜•ํƒœ๋งŒ ์ง€์›
    now()
    add_months(timestamp date, int months)
    adddate(timestamp startdate, int days)
    date_add(timestamp startdate, int days)
    date_part(string, timestamp)
    date_sub(timestamp startdate, int days)
    datediff(timestamp enddate, timestamp startdate)
    day(timestamp date)
    dayname(timestamp date)
    dayofweek(timestamp date)
    dayofyear(timestamp date)
    days_add(timestamp startdate, int days)
    days_sub(timestamp startdate, int days)
    extract(timestamp, string unit)
  extract(year from now());
  extract(now(), "year");

from_timestamp(datetime timestamp, pattern string)
from_unixtime(bigint unixtime[, string format])
...

  • How to Convert data type? to compare two values, they need to be of the same type ์ •์ˆ˜์™€ ์†Œ์ˆ˜๋ฅผ ๋น„๊ตํ• ๋•Œ๋„ ๋งˆ์ฐฌ๊ฐ€์ง€
    • (IMPLICIT) SQL Server converts AUTOMATICALLY, behind the scenes

      • For comparing two values in SQL Server, they need to have the same data type.
      • If the data types are different, SQL Server implicitly converts one type to another, based on data type precedence.
      • The data type with the lower precedence is converted to the data type with the higher precedence.
    • (EXPLICIT) developer explicitly converts the data, performed w FUCTIONS CAST() & CONVERT()

      • CAST(exptression AS data_type [(length)])
      • CONVERT(data_type [(length)], expression [, style])
    • Data type precedence


Window Functions

  • Perform calculations on an already generated result set (a window)
  • Processed after every part of query except ORDER BY
    • uses info in result set rather than database
  • Window ํ•จ์ˆ˜๋Š” ๋‹ค๋ฅธ ํ•จ์ˆ˜์™€ ๋‹ฌ๋ฆฌ ์ค‘์ฒฉํ•ด์„œ ์‚ฌ์šฉ ๋ถˆ๊ฐ€, ์„œ๋ธŒ์ฟผ๋ฆฌ์—์„œ๋Š” ์‚ฌ์šฉ ๊ฐ€๋Šฅ
  • Use with aggregate calculations w/o having to GROUP BY
    • similar to subqueries in SELECT
    • running totals, rankings, moving avg โ‘  = โ‘ก
      โ‘  ์„œ๋ธŒ์ฟผ๋ฆฌ ์‚ฌ์šฉ

      โ‘ก ์œˆ๋„์šฐ ํ•จ์ˆ˜ ์‚ฌ์šฉ

Window Function ์ข…๋ฅ˜

์ฐธ๊ณ 

  • Window ํ•จ์ˆ˜ OVER ๋ฌธ๊ตฌ๊ฐ€ ํ‚ค์›Œ๋“œ๋กœ ํ•„์ˆ˜ ํฌํ•จ
    • OVER; ๊ด€๋ จ ์ฐฝ ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๊ธฐ ์ „์— ํ–‰ ์ง‘ํ•ฉ์˜ ๋ถ„ํ• ๊ณผ ์ˆœ์„œ๋ฅผ ๊ฒฐ์ •ํ•จ. ์ฆ‰, OVER ์ ˆ์€ ์ฟผ๋ฆฌ ๊ฒฐ๊ณผ ์ง‘ํ•ฉ ๋‚ด์˜ ์ฐฝ ๋˜๋Š” ์‚ฌ์šฉ์ž ์ง€์ • ํ–‰ ์ง‘ํ•ฉ์„ ์ •์˜ํ•จ. ๊ทธ๋Ÿฐ ๋‹ค์Œ ์ฐฝ ํ•จ์ˆ˜๊ฐ€ ์ฐฝ์˜ ๊ฐ ํ–‰์— ๋Œ€ํ•œ ๊ฐ’์„ ๊ณ„์‚ฐ โ†’ ORDER BY, GROUP BY ์—ญํ• 
  • Sliding Window?

- Sliding Window
  ROWS BETWEEN <start> AND <finish>

- specifying keywords in <start>, <finish>
  PRECEDING 
  FOLLOWING ์œ„์˜ ๋‘˜, ํ–‰์˜ ๊ฐฏ์ˆ˜ ๊ตฌ์ฒดํ™”
  UNBOUNDED PRECEDING
  UNBOUNDED FOLLOWING  ์œ„์˜ ๋‘˜, ๋ชจ๋“  ํ–‰์˜ ๊ฐฏ์ˆ˜๋ฅผ ํฌํ•จํ•ด๋ผ
  CURRENT ROW ํ˜„์žฌ ํ–‰์—์„œ ๊ณ„์‚ฐ์„ ๋ฉˆ์ถ”๊ณ  ์‹ถ๋‹ค
๊ตฌ๋ถ„ ์ข…๋ฅ˜
์ˆœ์œ„(RANK) ๊ด€๋ จ RANK, DENSE_RANK, ROW_NUMBER
์ง‘๊ณ„(AGGREGATE) ๊ด€๋ จ SUM, MAX, MIN, AVG, COUNT
๊ทธ๋ฃน ๋‚ด ๋น„์œจ ๊ด€๋ จ ํ•จ์ˆ˜ CUME_DIST, PERCENT_RANK, NTILE, RATIO_TO_REPORT
์„ ํ˜•๋ถ„์„์„ ํฌํ•จํ•œ ํ†ต๊ณ„๋ถ„์„ ํ•จ์ˆ˜ CORR, COVAR_POP, COVAR_SAMP, STDDEV, STDDEV_POP, STDDEV_SAMP, VARIANCE, VAR_POP, VAR_SAMP, REGR_(LINEAR REGRESSION), REGR_SLOPE, REGR_INTERCEPT, REGR_COUNT, REGR_R2, REGR_AVGX, REGR_AVGY, REGR_SXX, REGR_SYY, REGR_SXY

์‚ฌ์šฉ ์˜ˆ์‹œ

์ฐธ๊ณ 

  • PARTITION BY ๋Š” ์—ฌ๋Ÿฌ๊ฐœ ํ–‰์„ ํ•จ๊ป˜ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Œ
    • OVER(PARTITION BY col1, col2 ...)
  • Can partition aggregate calculations, ranks, etc

COUNT()OVER() : ์ „์ฒดํ–‰ ์นด์šดํŠธ
COUNT(
)OVER(PARTITION BY ์ปฌ๋Ÿผ) : ๊ทธ๋ฃน๋‹จ์œ„๋กœ ๋‚˜๋ˆ„์–ด ์นด์šดํŠธ

MAX(์ปฌ๋Ÿผ)OVER() : ์ „์ฒดํ–‰ ์ค‘์— ์ตœ๊ณ ๊ฐ’
MAX(์ปฌ๋Ÿผ)OVER(PARTITION BY ์ปฌ๋Ÿผ) : ๊ทธ๋ฃน๋‚ด ์ตœ๊ณ ๊ฐ’

MIN(์ปฌ๋Ÿผ)OVER() : ์ „์ฒดํ–‰ ์ค‘์— ์ตœ์†Œ๊ฐ’
MIN(์ปฌ๋Ÿผ)OVER(PARTITION BY ์ปฌ๋Ÿผ) : ๊ทธ๋ฃน๋‚ด ์ตœ์†Œ๊ฐ’

SUM(์ปฌ๋Ÿผ)OVER() : ์ „์ฒดํ–‰ ํ•ฉ
SUM(์ปฌ๋Ÿผ)OVER(PARTITION BY ์ปฌ๋Ÿผ) : ๊ทธ๋ฃน๋‚ด ํ•ฉ

AVG(์ปฌ๋Ÿผ)OVER() : ์ „์ฒดํ–‰ ํ‰๊ท 
AVG(์ปฌ๋Ÿผ)OVER(PARTITION BY ์ปฌ๋Ÿผ) : ๊ทธ๋ฃน๋‚ด ํ‰๊ท 

STDDEV(์ปฌ๋Ÿผ)OVER() : ์ „์ฒดํ–‰ ํ‘œ์ค€ํŽธ์ฐจ
STDDEV(์ปฌ๋Ÿผ)OVER(PARTITION BY ์ปฌ๋Ÿผ) : ๊ทธ๋ฃน๋‚ด ํ‘œ์ค€ํŽธ์ฐจ

RATIO_TO_REPORT(์ปฌ๋Ÿผ)OVER() : ํ˜„์žฌํ–‰๊ฐ’/SUM(์ „์ฒดํ–‰๊ฐ’), % ๋‚˜ํƒ€๋‚ผ ๋•Œ *100
RATIO_TO_REPORT(์ปฌ๋Ÿผ)OVER(PARTITION BY ์ปฌ๋Ÿผ) : ํ˜„์žฌํ–‰๊ฐ’ / SUM(๊ทธ๋ฃนํ–‰๊ฐ’), % ๋‚˜ํƒ€๋‚ผ ๋•Œ *100

Analytic Function

  • ์ง‘๊ณ„ํ•จ์ˆ˜ vs ๋ถ„์„ํ•จ์ˆ˜
    • ์ง‘๊ณ„ํ•จ์ˆ˜๋Š” ๊ทธ๋ฃน๋ณ„ ์ตœ๋Œ€, ์ตœ์†Œ, ํ•ฉ๊ณ„, ํ‰๊ท , ๊ฑด์ˆ˜ ๊ตฌํ•  ๋•Œ, ๊ทธ๋ฃน๋ณ„ 1๊ฐœ ํ–‰ ๋ฐ˜ํ™˜
    • ๋ถ„์„ํ•จ์ˆ˜๋Š” ๊ทธ๋ฃน๋‹จ์œ„๋กœ ๊ฐ’์„ ๊ณ„์‚ฐํ•œ๋‹ค๋Š” ์ ์—์„œ ์ง‘๊ณ„ํ•จ์ˆ˜์™€ ์œ ์‚ฌํ•˜์ง€๋งŒ, ๊ทธ๋ฃน๋งˆ๋‹ค๊ฐ€ ์•„๋‹ˆ๋ผ ๊ฒฐ๊ณผSet์˜ ๊ฐ ํ–‰๋งˆ๋‹ค ์ง‘๊ณ„๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ค€๋‹ค๋Š” ์ ์—์„œ ์ง‘๊ณ„ํ•จ์ˆ˜์™€ ์ƒ๋‹นํ•œ ์ฐจ์ด
    • ๋ถ„์„ํ•จ์ˆ˜๋Š” ๊ทธ๋ฃน๋ณ„ ๊ณ„์‚ฐ๊ฒฐ๊ณผ๋ฅผ ๊ฐ ํ–‰๋งˆ๋‹ค ๋ณด์—ฌ์คŒ

Dynamic Input

๊ทธ๋ฆผ1

โš ๏ธ **GitHub.com Fallback** โš ๏ธ