V1 Monitoring - 100-hours-a-week/9-team-Devths-WIKI GitHub Wiki

1. ๋„์ž… ๋ฐฐ๊ฒฝ ๋ฐ ๋ชฉํ‘œ

V1 Monitoring์—์„œ๋Š” ์„œ๋น„์Šค ์žฅ์• ๋ฅผ โ€œ์„œ๋ฒ„๊ฐ€ ์ฃฝ์—ˆ๋Š”์ง€โ€๊ฐ€ ์•„๋‹ˆ๋ผ โ€œ์„œ๋น„์Šค๊ฐ€ ์ •์ƒ ๋™์ž‘ํ•˜๋Š”์ง€โ€๋กœ ํŒ๋‹จํ•ด์•ผ ํ•จ. ํŠนํžˆ ์™ธ๋ถ€ API์™€ Polling ์š”์ฒญ์ด ๋ชฐ๋ฆฌ๋Š” ๊ตฌ์กฐ์—์„œ DB ๋ณ‘๋ชฉ์€ ๊ณง ์ฒด๊ฐ ์žฅ์• ๋กœ ์ด์–ด์ง€๊ธฐ ๋•Œ๋ฌธ์—, ์ปค๋„ฅ์…˜ ํ’€(HikariCP)์˜ Active/Pending ์ƒํƒœ๋ฅผ ํ•ต์‹ฌ ์„ ํ–‰ ์ง€ํ‘œ๋กœ ์„ ์ •ํ•จ.

2. ๋ฌธ์ œ ์ƒํ™ฉ

CloudWatch Agent ๊ธฐ๋ฐ˜์œผ๋กœ JVM/GC ๋“ฑ ๊ธฐ๋ณธ ์ง€ํ‘œ๋Š” ์ˆ˜์ง‘ ์ค‘์ด์—ˆ์œผ๋‚˜, ์ •์ž‘ ์„œ๋น„์Šค ์•ˆ์ •์„ฑ์— ์ง๊ฒฐ๋˜๋Š” HikariCP ์ปค๋„ฅ์…˜ ํ’€ ๋ฉ”ํŠธ๋ฆญ์ด CloudWatch์— ๋‚˜ํƒ€๋‚˜์ง€ ์•Š๋Š” ์ƒํ™ฉ์ด ๋ฐœ์ƒํ•จ. Actuator/JMX์—์„œ๋Š” ๋ฉ”ํŠธ๋ฆญ ์กฐํšŒ๊ฐ€ ๊ฐ€๋Šฅํ–ˆ์œผ๋‚˜, โ€œ์กฐํšŒ ๊ฐ€๋Šฅโ€๊ณผ โ€œCloudWatch๋กœ ์ „์†กโ€์€ ๋ณ„๊ฐœ๋กœ ๋™์ž‘ํ•˜๊ณ  ์žˆ์—ˆ์Œ.

3. ํ•ด๊ฒฐ ๋ฐ ๋ถ„์„ ๊ณผ์ •

3.1 ๋‹จ๊ณ„ 1: ์˜์กด์„ฑ ๋ฐ ๊ถŒํ•œ ๊ธฐ๋ฐ˜์˜ ๊ธฐ๋ณธ ์—ฐ๋™ ์‹œ๋„

micrometer-registry-cloudwatch๋ฅผ ์ ์šฉํ•˜๊ณ , CloudWatch ์ „์†ก์— ํ•„์š”ํ•œ IAM ๊ถŒํ•œ(์˜ˆ: PutMetricData)์„ ๋ถ€์—ฌํ•จ. Actuator/JMX์—์„œ ๊ฐ’์ด ๋ณด์ด๋Š” ๊ฒƒ์„ ๊ทผ๊ฑฐ๋กœ โ€œ์ž๋™ ์ „์†ก๋  ๊ฒƒโ€์ด๋ผ ๊ธฐ๋Œ€ํ–ˆ์œผ๋‚˜, CloudWatch ์ฝ˜์†”์—์„œ ์ง€ํ‘œ๊ฐ€ ๊ณ„์† ๋น„์–ด์žˆ๋Š” ์ƒํƒœ๊ฐ€ ์œ ์ง€๋จ. ์ด ๋‹จ๊ณ„์—์„œ ์˜์กด์„ฑ ์ถ”๊ฐ€๋งŒ์œผ๋กœ ์ž๋™ ์—ฐ๋™์ด ์™„์„ฑ๋˜์ง€ ์•Š์Œ์„ ํ™•์ธํ•จ.

3.2 ๋‹จ๊ณ„ 2: CloudWatchMeterRegistry ์ˆ˜๋™ Bean ๋“ฑ๋ก์œผ๋กœ ์ „์†ก ๊ฒฝ๋กœ ๊ฐ•์ œ ํ™œ์„ฑํ™”

Spring Boot ์ž๋™ ์„ค์ •๋งŒ์œผ๋กœ๋Š” ์ „์†ก ์ฃผ๊ธฐ(step), namespace, registry ํ™œ์„ฑํ™” ์—ฌ๋ถ€๊ฐ€ ๋ช…ํ™•ํžˆ ์ œ์–ด๋˜์ง€ ์•Š๋Š” ๋ฌธ์ œ๊ฐ€ ํ•ต์‹ฌ ์›์ธ์œผ๋กœ ํŒ๋‹จํ•จ. ๋”ฐ๋ผ์„œ CloudWatchConfig๋ฅผ ์ปค์Šคํ„ฐ๋งˆ์ด์ง•ํ•˜๊ณ  CloudWatchMeterRegistry๋ฅผ ์ˆ˜๋™ Bean์œผ๋กœ ๋“ฑ๋กํ•˜์—ฌ ์ „์†ก ๋งค์ปค๋‹ˆ์ฆ˜์„ ๊ฐ•์ œ๋กœ ํ™œ์„ฑํ™”ํ•จ. ์ด ์กฐ์น˜ ์ดํ›„ CloudWatch์— HikariCP ๋ฉ”ํŠธ๋ฆญ์ด ์‹ค์ œ๋กœ ์ƒ์„ฑ๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์—ฌ, โ€œ์ˆ˜์ง‘ โ†’ ๋ ˆ์ง€์ŠคํŠธ๋ฆฌ ๋“ฑ๋ก โ†’ ์ „์†กโ€ ํŒŒ์ดํ”„๋ผ์ธ์ด ์ •์ƒํ™”๋˜์—ˆ์Œ์„ ๊ฒ€์ฆํ•จ.

3.3 ๋‹จ๊ณ„ 3: ๋น„์šฉ ์œ„๊ธฐ ๊ฐ์ง€ ๋ฐ ํ•„ํ„ฐ๋ง ์ „๋žต ์ ์šฉ

์ˆ˜๋™ ๋“ฑ๋ก ์ดํ›„ ๋ฌธ์ œ๊ฐ€ ํ•˜๋‚˜ ๋” ๋“œ๋Ÿฌ๋‚จ. HikariCP๋งŒ ์˜ฌ๋ผ์˜ค๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, JVM/Tomcat/CPU/์Šค๋ ˆ๋“œ ๋“ฑ Actuator ๊ธฐ๋ฐ˜ ์ง€ํ‘œ ์ „๋ถ€๊ฐ€ CloudWatch๋กœ ์ „์†ก๋˜๋ฉด์„œ ์‚ฌ์šฉ์ž ์ •์˜ ๋ฉ”ํŠธ๋ฆญ์ด 600๊ฐœ ์ด์ƒ ์ƒ์„ฑ๋˜๋Š” ํ˜„์ƒ์ด ๋ฐœ์ƒํ•จ. CloudWatch๋Š” ์‚ฌ์šฉ์ž ์ •์˜ ๋ฉ”ํŠธ๋ฆญ์ด ๋Š˜์–ด๋‚ ์ˆ˜๋ก ๋น„์šฉ์ด ์„ ํ˜•์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ด ์ƒํƒœ๋ฅผ ์œ ์ง€ํ•˜๋ฉด ์›” ์ˆ˜์‹ญ๋งŒ ์› ๋‹จ์œ„ ๊ณผ๊ธˆ ์œ„ํ—˜์ด ํ˜„์‹คํ™”๋จ. ์ด์— ๋”ฐ๋ผ ์šด์˜ ๊ด€์ ์—์„œ โ€œ๋งŽ์ด ๋ชจ์œผ๋Š” ๋ชจ๋‹ˆํ„ฐ๋งโ€์ด ์•„๋‹ˆ๋ผ โ€œํ•„์ˆ˜๋งŒ ๋ชจ์œผ๋Š” ๋ชจ๋‹ˆํ„ฐ๋งโ€์œผ๋กœ ๋ฐฉํ–ฅ์„ ์ „ํ™˜ํ•จ.

์ตœ์ข…์ ์œผ๋กœ MeterFilter๋ฅผ ์ ์šฉํ•˜์—ฌ hikaricp ๊ด€๋ จ ์ง€ํ‘œ๋งŒ ํ—ˆ์šฉํ•˜๊ณ  ๋‚˜๋จธ์ง€๋Š” ์ฐจ๋‹จํ•จ. ์ด๋กœ์จ ๊ฐ€์‹œ์„ฑ(Active/Pending ๋ชจ๋‹ˆํ„ฐ๋ง)๊ณผ ๋น„์šฉ(๋ถˆํ•„์š” ์ง€ํ‘œ ์ „์†ก ์ฐจ๋‹จ)์„ ๋™์‹œ์— ๋‹ฌ์„ฑํ•จ.

4. CloudWatch๋กœ๋งŒ ๋ชจ๋‹ˆํ„ฐ๋งํ•œ ์ด์œ 

4.1 V1 ๋‹จ๊ณ„์˜ ๋ชฉํ‘œ๊ฐ€ โ€œ์ตœ๋‹จ ์‹œ๊ฐ„ ๋‚ด ์šด์˜ ๊ฐ€์‹œ์„ฑ ํ™•๋ณดโ€์˜€์Œ

V1์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฒƒ์€ ์™„๋ฒฝํ•œ ๊ด€์ธก ํ”Œ๋žซํผ ๊ตฌ์ถ•์ด ์•„๋‹ˆ๋ผ, ์žฅ์• ๋ฅผ ๋น ๋ฅด๊ฒŒ ๊ฐ์ง€ํ•˜๊ณ  ์›์ธ์— ์ ‘๊ทผ ๊ฐ€๋Šฅํ•œ ์ˆ˜์ค€์˜ ๊ฐ€์‹œ์„ฑ์„ ์ตœ์†Œ ๋น„์šฉ/์ตœ์†Œ ๋ณต์žก๋„๋กœ ํ™•๋ณดํ•˜๋Š” ๊ฒƒ์ด์—ˆ์Œ. CloudWatch๋Š” ์ด๋ฏธ AWS ์ธํ”„๋ผ ์•ˆ์—์„œ ๊ธฐ๋ณธ ๊ด€์ธก ์ฑ„๋„๋กœ ์ž๋ฆฌ ์žก๊ณ  ์žˆ๊ณ , IAM/๋„ค์ž„์ŠคํŽ˜์ด์Šค/๋Œ€์‹œ๋ณด๋“œ๋กœ ๋ฐ”๋กœ ์šด์˜ ๊ฐ€๋Šฅํ•œ ์žฅ์ ์ด ์žˆ์–ด ์ดˆ๊ธฐ ๋ชฉํ‘œ์— ๋ถ€ํ•ฉํ•จ.

4.2 Prometheus ๋„์ž…์€ โ€œ์ธํ”„๋ผ ์šด์˜ ์ฑ…์ž„โ€์ด ํ•จ๊ป˜ ๋”ฐ๋ผ์˜ด

Prometheus๋ฅผ ์„ ํƒํ•˜๋ฉด ๋‹จ์ˆœํžˆ ๋ฉ”ํŠธ๋ฆญ์„ ๋ณด๋Š” ์ˆ˜์ค€์„ ๋„˜์–ด, ์„œ๋ฒ„/์Šคํ† ๋ฆฌ์ง€/๊ณ ๊ฐ€์šฉ์„ฑ/์—…๊ทธ๋ ˆ์ด๋“œ/๋ฐฑ์—…/๋ณด์•ˆ๊นŒ์ง€ ์šด์˜ ํ•ญ๋ชฉ์ด ํ™•์žฅ๋จ. ์ฆ‰ โ€œ๋ชจ๋‹ˆํ„ฐ๋ง ๊ธฐ๋Šฅโ€์„ ์–ป๋Š” ๋Œ€์‹  โ€œ๋ชจ๋‹ˆํ„ฐ๋ง ํ”Œ๋žซํผ ์šด์˜โ€์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ์ œํ’ˆ์„ ํ•˜๋‚˜ ๋” ๋งŒ๋“œ๋Š” ๊ฒƒ๊ณผ ๊ฐ™์•„์ง. V1์—์„œ๋Š” ์‹ ๊ทœ ์šด์˜ ๋ถ€ํ•˜๋ฅผ ๋Š˜๋ฆฌ๊ธฐ๋ณด๋‹ค, ๊ธฐ์กด AWS ์šด์˜ ํ๋ฆ„์— ๋…น๋Š” CloudWatch๋กœ ์ง‘์ค‘ํ•˜๋Š” ๊ฒƒ์ด ํ•ฉ๋ฆฌ์ ์ด์—ˆ์Œ.

4.3 ๋น„์šฉ ๊ด€์ ์—์„œ Prometheus๊ฐ€ ํ•ญ์ƒ ๋” ์‹ธ๋‹ค๊ณ  ๋‹จ์ •ํ•  ์ˆ˜ ์—†์—ˆ์Œ

Prometheus๋Š” ๋ฉ”ํŠธ๋ฆญ ๊ฐœ์ˆ˜ ์ž์ฒด ๊ณผ๊ธˆ์€ ์—†์ง€๋งŒ, ๊ทธ ๋Œ€์‹  ํด๋Ÿฌ์Šคํ„ฐ ๋ฆฌ์†Œ์Šค(๋ฉ”๋ชจ๋ฆฌ/๋””์Šคํฌ)์™€ ๋ณด๊ด€ ์ •์ฑ…(์žฅ๊ธฐ ์ €์žฅ ์‹œ Thanos/Mimir ๋“ฑ) ๋น„์šฉ์ด ๋ฐœ์ƒํ•จ. V1์—์„œ๋Š” โ€œ์žฅ๊ธฐ ๋ถ„์„โ€๋ณด๋‹ค โ€œ์ฆ‰์‹œ ์žฅ์•  ๊ฐ์ง€โ€๊ฐ€ ์šฐ์„ ์ด์—ˆ๊ณ , ํ•„์š”ํ•œ ์ง€ํ‘œ๋ฅผ HikariCP๋กœ ์ขํžˆ๋ฉด CloudWatch ๋น„์šฉ๋„ ํ†ต์ œ ๊ฐ€๋Šฅํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— CloudWatch๊ฐ€ ๋” ํ˜„์‹ค์ ์ธ ์„ ํƒ์ด์—ˆ์Œ.

4.4 ๊ฒฐ๋ก ์ ์œผ๋กœ โ€œV1์€ CloudWatch, ํ™•์žฅ ๋‹จ๊ณ„์—์„œ Prometheusโ€ ์ „๋žต์ด ์ ํ•ฉํ–ˆ์Œ

ํ˜„ ๋‹จ๊ณ„์—์„œ๋Š” CloudWatch๋กœ ํ•ต์‹ฌ ์ง€ํ‘œ ์ค‘์‹ฌ์˜ ์ตœ์†Œ ๊ด€์ธก ์ฒด๊ณ„๋ฅผ ๋จผ์ € ์™„์„ฑํ•จ์œผ๋กœ์จ ์šด์˜ ์•ˆ์ •์„ฑ์„ ํ™•๋ณดํ•จ. ์ดํ›„ ํŠธ๋ž˜ํ”ฝ ์„ฑ์žฅ๊ณผ ํ•จ๊ป˜ SLI/SLO ๊ธฐ๋ฐ˜์˜ ์žฅ๊ธฐ ๋ถ„์„, ๊ณ ํ•ด์ƒ๋„ ๋ฉ”ํŠธ๋ฆญ/์นด๋””๋„๋ฆฌํ‹ฐ ํ™•์žฅ, ์„œ๋น„์Šค ๊ฐ„ ์ƒ์„ธ ๋น„๊ต๊ฐ€ ํ•„์š”ํ•ด์ง€๋Š” ์‹œ์ ์— Prometheus(๋˜๋Š” Managed Prometheus)๋กœ ํ™•์žฅํ•˜๋Š” ๋กœ๋“œ๋งต์ด ์ž์—ฐ์Šค๋Ÿฌ์›€.

5. ์šด์˜ ์ ์šฉ ๋ฐฉ์‹ ์š”์•ฝ

HikariCP ํ•ต์‹ฌ ์ง€ํ‘œ(Active, Pending ์ค‘์‹ฌ)๋ฅผ CloudWatch๋กœ ์ „์†กํ•˜๊ณ , ๋Œ€์‹œ๋ณด๋“œ/์•Œ๋žŒ ๊ธฐ์ค€์„ โ€œDB ๋ณ‘๋ชฉ ์กฐ๊ธฐ ๊ฐ์ง€โ€ ๊ด€์ ์œผ๋กœ ์„ค์ •ํ•จ. ๋ถˆํ•„์š”ํ•œ ๋ฉ”ํŠธ๋ฆญ์€ MeterFilter๋กœ ์ฐจ๋‹จํ•˜์—ฌ, ๋ชจ๋‹ˆํ„ฐ๋ง์ด ๋น„์šฉ ๋ถ€์ฑ„๋กœ ๋ณ€์งˆ๋˜๋Š” ๊ฒƒ์„ ์˜ˆ๋ฐฉํ•จ.

6. ๋ฐฐ์šด ์ 

โ€œ์˜์กด์„ฑ ์ถ”๊ฐ€ = ์ž๋™ ์—ฐ๋™โ€์ด ์•„๋‹ˆ๋ผ, ํด๋ผ์šฐ๋“œ ์—ฐ๋™์—์„œ๋Š” ์ „์†ก ์ฃผ๊ธฐ/๋ ˆ์ง€์ŠคํŠธ๋ฆฌ ํ™œ์„ฑํ™”/๊ถŒํ•œ/๋„ค์ž„์ŠคํŽ˜์ด์Šค๋ฅผ ์„œ๋น„์Šค๊ฐ€ ์ง์ ‘ ์ œ์–ดํ•ด์•ผ ํ•จ์„ ํ™•์ธํ•จ. ๋˜ํ•œ ๋ชจ๋‹ˆํ„ฐ๋ง์€ ๋งŽ์ด ์Œ“๋Š” ๊ฒƒ์ด ๋ชฉ์ ์ด ์•„๋‹ˆ๋ผ, ์šฐ๋ฆฌ ์„œ๋น„์Šค์—์„œ ์‹ค์ œ ์žฅ์• ๋ฅผ ์•ž๋‹น๊ฒจ ๊ฐ์ง€ํ•˜๋Š” ํ•ต์‹ฌ ์ง€ํ‘œ๋ฅผ ์„ ๋ณ„ํ•˜๋Š” ์„ค๊ณ„ ํ™œ๋™์ž„์„ ์ฒด๊ฐํ•จ. ๋น„์šฉ๊นŒ์ง€ ํฌํ•จํ•ด์„œ ์„ค๊ณ„ํ•ด์•ผ ์šด์˜์—์„œ ์ง€์† ๊ฐ€๋Šฅํ•œ ๋ชจ๋‹ˆํ„ฐ๋ง์ด ๋œ๋‹ค๋Š” ์ ์„ V1์—์„œ ํ•™์Šตํ•จ.