IMPALA SQL_FUNCTIONS - loukenny/atme GitHub Wiki
- Categorizing data; SELECT Clause
WHERE ์ ์ค์
WHERE ์ 8455๋ฅผ HOME/AWAY๋ก ์ค์ X โ ELSE ์ ์ค๋ฅ ๋ฐ์
์ด๊ฐ์ ๋ฌธ์ ๋ฅผ ๋ฐฉ์งํ๊ธฐ ์ํด WHERE ์ ์ specific condition ์ค์ ์ด ํ์ํจ
ELSE๋ฅผ ์ค์ ํ๋ ์ํ๋, ์กฐ๊ฑด ์ธ ๋๋จธ์ง ๊ฒฐ๊ณผ ๊ฐ์ ํญ์ NULL
ํด๊ฒฐํ๊ธฐ ์ํด WHERE์ ์์ filter๋ก CASE ์ฌ์ฉํด์ผ ํจ
SELECT date,
-- ํํ, ์์ ํ ์ค์
CASE WHEN hometeam_id = 8634 THEN 'FC Barcelona'
ELSE 'Real Madrid CF' END as home,
CASE WHEN awayteam_id = 8634 THEN 'FC Barcelona'
ELSE 'Real Madrid CF' END as away,
-- ๊ฒฝ๊ธฐ ๊ฒฐ๊ณผ ๊ฒฝ์ฐ์ ์ ์ค์
CASE WHEN home_goal > away_goal AND hometeam_id = 8634 THEN 'Barcelona win!'
WHEN home_goal > away_goal AND hometeam_id = 8633 THEN 'Real Madrid win!'
WHEN home_goal < away_goal AND awayteam_id = 8634 THEN 'Barcelona win!'
WHEN home_goal < away_goal AND awayteam_id = 8633 THEN 'Real Madrid win!'
ELSE 'Tie!' END as outcome
FROM matches_spain
-- ํํ, ์์ ํ id ํ์
WHERE (awayteam_id = 8634 OR hometeam_id = 8634)
AND (awayteam_id = 8633 OR hometeam_id = 8633);
- Filtering data; WHERE Clause ํํฐ๋ก ์ฌ์ฉํ๋ฉด์ ๊ฒฐ๊ณผ๊ฐ์์ ๋ณด๊ธฐ ์ํ์ง ์๋ ๋ฐ์ดํฐ๋ฅผ ๊ฑฐ๋ฅผ ์ ์์
WHERE ์ ์์ CASE WHEN ๊ตฌ๋ฌธ์ END IS NOT NULL ๋ช
์
- Aggregating data
- CASE WHEN with COUNT
- CASE WHEN with SUM
- CASE WHEN with AVG
โท COUNT
SELECT season,
COUNT(CASE WHEN hometeam_id = 8650
AND home_goal > away_goal
THEN id END) AS home_wins
COUNT(CASE WHEN awayteam_id = 8650
AND away_goal > home_goal
THEN id END) AS away_wins
FROM match
GROUP BY season;
โท SUM
SELECT season,
SUM(CASE WHEN hometeam_id = 8650
THEN home_goal END) AS home_goals
SUM(CASE WHEN awayteam_id = 8650
THEN away_goal END) AS away_goals
FROM match
GROUP BY season;
โ ELSE ๊ฒฝ์ฐ๋ ์๋์ผ๋ก NULL
โท AVG
SELECT season,
ROUND(AVG(CASE WHEN hometeam_id = 8650
THEN home_goal END), 2) AS home_goals
ROUND(AVG(CASE WHEN awayteam_id = 8650
THEN away_goal END), 2) AS away_goals
FROM match
GROUP BY season;
โท Percentages w CASE & AVG ***
SELECT season,
ROUND(AVG(CASE WHEN hometeam_id = 8455 AND home_goal > away_goal THEN 1
WHEN hometeam_id = 8455 AND home_goal < away_goal THEN 0
END), 2) AS pct_homewins,
ROUND(AVG(CASE WHEN awayteam_id = 8455 AND away_goal > home_goal THEN 1
WHEN awayteam_id = 8455 AND away_goal < home_goal THEN 0
END), 2) AS pct_awaywins
FROM match
GROUP BY season;
- date data type์
timestamp
ํํ๋ง ์ง์
now()
add_months(timestamp date, int months)
adddate(timestamp startdate, int days)
date_add(timestamp startdate, int days)
date_part(string, timestamp)
date_sub(timestamp startdate, int days)
datediff(timestamp enddate, timestamp startdate)
day(timestamp date)
dayname(timestamp date)
dayofweek(timestamp date)
dayofyear(timestamp date)
days_add(timestamp startdate, int days)
days_sub(timestamp startdate, int days)
extract(timestamp, string unit)
extract(year from now());
extract(now(), "year");
from_timestamp(datetime timestamp, pattern string)
from_unixtime(bigint unixtime[, string format])
...
- How to Convert data type?
to compare two values, they need to be of the same type
์ ์์ ์์๋ฅผ ๋น๊ตํ ๋๋ ๋ง์ฐฌ๊ฐ์ง
-
(IMPLICIT) SQL Server converts AUTOMATICALLY, behind the scenes
- For comparing two values in SQL Server, they need to have the same data type.
- If the data types are different, SQL Server implicitly converts one type to another, based on data type precedence.
- The data type with the lower precedence is converted to the data type with the higher precedence.
-
(EXPLICIT) developer explicitly converts the data, performed w FUCTIONS
CAST()
&CONVERT()
-
CAST(exptression AS data_type [(length)])
-
CONVERT(data_type [(length)], expression [, style])
-
-
Data type precedence
-
- Perform calculations on an already generated result set (a window)
- Processed after every part of query except
ORDER BY
- uses info in result set rather than database
- Window ํจ์๋ ๋ค๋ฅธ ํจ์์ ๋ฌ๋ฆฌ ์ค์ฒฉํด์ ์ฌ์ฉ ๋ถ๊ฐ, ์๋ธ์ฟผ๋ฆฌ์์๋ ์ฌ์ฉ ๊ฐ๋ฅ
- Use with aggregate calculations w/o having to
GROUP BY
- similar to subqueries in
SELECT
- running totals, rankings, moving avg
โ = โก
โ ์๋ธ์ฟผ๋ฆฌ ์ฌ์ฉ
โก ์๋์ฐ ํจ์ ์ฌ์ฉ
- similar to subqueries in
- Window ํจ์ OVER ๋ฌธ๊ตฌ๊ฐ ํค์๋๋ก ํ์ ํฌํจ
- OVER; ๊ด๋ จ ์ฐฝ ํจ์๋ฅผ ์ ์ฉํ๊ธฐ ์ ์ ํ ์งํฉ์ ๋ถํ ๊ณผ ์์๋ฅผ ๊ฒฐ์ ํจ. ์ฆ, OVER ์ ์ ์ฟผ๋ฆฌ ๊ฒฐ๊ณผ ์งํฉ ๋ด์ ์ฐฝ ๋๋ ์ฌ์ฉ์ ์ง์ ํ ์งํฉ์ ์ ์ํจ. ๊ทธ๋ฐ ๋ค์ ์ฐฝ ํจ์๊ฐ ์ฐฝ์ ๊ฐ ํ์ ๋ํ ๊ฐ์ ๊ณ์ฐ โ
ORDER BY
,GROUP BY
์ญํ
- OVER; ๊ด๋ จ ์ฐฝ ํจ์๋ฅผ ์ ์ฉํ๊ธฐ ์ ์ ํ ์งํฉ์ ๋ถํ ๊ณผ ์์๋ฅผ ๊ฒฐ์ ํจ. ์ฆ, OVER ์ ์ ์ฟผ๋ฆฌ ๊ฒฐ๊ณผ ์งํฉ ๋ด์ ์ฐฝ ๋๋ ์ฌ์ฉ์ ์ง์ ํ ์งํฉ์ ์ ์ํจ. ๊ทธ๋ฐ ๋ค์ ์ฐฝ ํจ์๊ฐ ์ฐฝ์ ๊ฐ ํ์ ๋ํ ๊ฐ์ ๊ณ์ฐ โ
- Sliding Window?
- Sliding Window
ROWS BETWEEN <start> AND <finish>
- specifying keywords in <start>, <finish>
PRECEDING
FOLLOWING ์์ ๋, ํ์ ๊ฐฏ์ ๊ตฌ์ฒดํ
UNBOUNDED PRECEDING
UNBOUNDED FOLLOWING ์์ ๋, ๋ชจ๋ ํ์ ๊ฐฏ์๋ฅผ ํฌํจํด๋ผ
CURRENT ROW ํ์ฌ ํ์์ ๊ณ์ฐ์ ๋ฉ์ถ๊ณ ์ถ๋ค
๊ตฌ๋ถ | ์ข ๋ฅ |
---|---|
์์(RANK) ๊ด๋ จ |
RANK , DENSE_RANK , ROW_NUMBER
|
์ง๊ณ(AGGREGATE) ๊ด๋ จ |
SUM , MAX , MIN , AVG , COUNT
|
๊ทธ๋ฃน ๋ด ๋น์จ ๊ด๋ จ ํจ์ |
CUME_DIST , PERCENT_RANK , NTILE , RATIO_TO_REPORT
|
์ ํ๋ถ์์ ํฌํจํ ํต๊ณ๋ถ์ ํจ์ |
CORR , COVAR_POP , COVAR_SAMP , STDDEV , STDDEV_POP , STDDEV_SAMP , VARIANCE , VAR_POP , VAR_SAMP , REGR_(LINEAR REGRESSION) , REGR_SLOPE , REGR_INTERCEPT , REGR_COUNT , REGR_R2 , REGR_AVGX , REGR_AVGY , REGR_SXX , REGR_SYY , REGR_SXY
|
- PARTITION BY ๋ ์ฌ๋ฌ๊ฐ ํ์ ํจ๊ป ์ฌ์ฉํ ์ ์์
- OVER(PARTITION BY col1, col2 ...)
- Can partition aggregate calculations, ranks, etc
COUNT()OVER() : ์ ์ฒดํ ์นด์ดํธ
COUNT()OVER(PARTITION BY ์ปฌ๋ผ) : ๊ทธ๋ฃน๋จ์๋ก ๋๋์ด ์นด์ดํธ
MAX(์ปฌ๋ผ)OVER() : ์ ์ฒดํ ์ค์ ์ต๊ณ ๊ฐ
MAX(์ปฌ๋ผ)OVER(PARTITION BY ์ปฌ๋ผ) : ๊ทธ๋ฃน๋ด ์ต๊ณ ๊ฐ
MIN(์ปฌ๋ผ)OVER() : ์ ์ฒดํ ์ค์ ์ต์๊ฐ
MIN(์ปฌ๋ผ)OVER(PARTITION BY ์ปฌ๋ผ) : ๊ทธ๋ฃน๋ด ์ต์๊ฐ
SUM(์ปฌ๋ผ)OVER() : ์ ์ฒดํ ํฉ
SUM(์ปฌ๋ผ)OVER(PARTITION BY ์ปฌ๋ผ) : ๊ทธ๋ฃน๋ด ํฉ
AVG(์ปฌ๋ผ)OVER() : ์ ์ฒดํ ํ๊ท
AVG(์ปฌ๋ผ)OVER(PARTITION BY ์ปฌ๋ผ) : ๊ทธ๋ฃน๋ด ํ๊ท
STDDEV(์ปฌ๋ผ)OVER() : ์ ์ฒดํ ํ์คํธ์ฐจ
STDDEV(์ปฌ๋ผ)OVER(PARTITION BY ์ปฌ๋ผ) : ๊ทธ๋ฃน๋ด ํ์คํธ์ฐจ
RATIO_TO_REPORT(์ปฌ๋ผ)OVER() : ํ์ฌํ๊ฐ/SUM(์ ์ฒดํ๊ฐ), % ๋ํ๋ผ ๋ *100
RATIO_TO_REPORT(์ปฌ๋ผ)OVER(PARTITION BY ์ปฌ๋ผ) : ํ์ฌํ๊ฐ / SUM(๊ทธ๋ฃนํ๊ฐ), % ๋ํ๋ผ ๋ *100
- ์ง๊ณํจ์ vs ๋ถ์ํจ์
- ์ง๊ณํจ์๋ ๊ทธ๋ฃน๋ณ ์ต๋, ์ต์, ํฉ๊ณ, ํ๊ท , ๊ฑด์ ๊ตฌํ ๋, ๊ทธ๋ฃน๋ณ 1๊ฐ ํ ๋ฐํ
- ๋ถ์ํจ์๋ ๊ทธ๋ฃน๋จ์๋ก ๊ฐ์ ๊ณ์ฐํ๋ค๋ ์ ์์ ์ง๊ณํจ์์ ์ ์ฌํ์ง๋ง, ๊ทธ๋ฃน๋ง๋ค๊ฐ ์๋๋ผ ๊ฒฐ๊ณผSet์ ๊ฐ ํ๋ง๋ค ์ง๊ณ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ฌ์ค๋ค๋ ์ ์์ ์ง๊ณํจ์์ ์๋นํ ์ฐจ์ด
- ๋ถ์ํจ์๋ ๊ทธ๋ฃน๋ณ ๊ณ์ฐ๊ฒฐ๊ณผ๋ฅผ ๊ฐ ํ๋ง๋ค ๋ณด์ฌ์ค