Metric Design - 3sam5oh/webtoon-search-service GitHub Wiki

Test Monitoring ๊ด€๋ จ์œผ๋กœ ๋ฉ”ํŠธ๋ฆญ ๋””์ž์ธํ•˜๋ฉด์„œ ์ง„ํ–‰ํ–ˆ๋˜ ๋‚ด์šฉ ๊ณต์œ  

๐Ÿ”Ž Metrics

  • ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅ๊ณผ ์ƒํƒœ๋ฅผ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๊ธฐ ์œ„ํ•ด ์ˆ˜์ง‘๋˜๋Š” ๋‹ค์–‘ํ•œ ์œ ํ˜•์˜ ๋ฐ์ดํ„ฐ. ์ด๋Ÿฌํ•œ ๋ฐ์ดํ„ฐ๋Š” ์ˆ˜์น˜ ํ˜•์‹์œผ๋กœ ํ‘œํ˜„๋˜๋ฉฐ, ์‹œ์Šคํ…œ์˜ ๋‹ค์–‘ํ•œ ์ธก๋ฉด์— ๋Œ€ํ•œ ์ •๋Ÿ‰์  ์ •๋ณด๋ฅผ ์ œ๊ณตํ•œ๋‹ค.
  • ์ด๋Ÿฐ ์ •๋ณด๋ฅผ ์ˆ˜์ง‘, ์ €์žฅ, ์ฟผ๋ฆฌ ๋ฐ ์•Œ๋žŒ์„ ํ†ตํ•ด ๋ชจ๋‹ˆํ„ฐ๋งํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋Š” ๊ฒƒ์ด Prometheus ๋“ฑ์˜ ๋„๊ตฌ.

๐Ÿ’Š USE / RED

USE/RED๋Š” ๋ฉ”ํŠธ๋ฆญ ๋””์ž์ธ ๋ฐฉ๋ฒ•๋ก ์˜ ์ข…๋ฅ˜. 
USE๋Š” ๋ฌผ๋ฆฌ์  ํ•˜๋“œ์›จ์–ด(์„œ๋ฒ„) ์œ„์ฃผ์˜ ๋ฉ”ํŠธ๋ฆญ ๋””์ž์ธ ๋ฐฉ๋ฒ•์„,
RED๋Š” ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋ ˆ๋ฒจ์˜ ๋ฉ”ํŠธ๋ฆญ ๋””์ž์ธ ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•จ.  

๐Ÿ“’ USE

For every resource, check utilization, saturation, and errors.
- ๋ชจ๋“  ์ž์›์— ๋Œ€ํ•ด์„œ ์ ์œ ์œจ, ํฌํ™”์œจ, ์˜ค๋ฅ˜๋ฅผ ํ™•์ธํ•˜์ž. 
  • ์ฃผ๋กœ ๋ฌผ๋ฆฌ์  ์ž์›์— ๋Œ€ํ•œ ๋ฉ”ํŠธ๋ฆญ์„ ์ˆ˜์ง‘ํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๊ธฐ์ค€์ด๋ผ๊ณ  ์ดํ•ดํ•จ.
  • Resource(์ž์›) : ๋ฌผ๋ฆฌ์  ์„œ๋ฒ„์˜ ๋ชจ๋“  ๊ธฐ๋Šฅ ๊ตฌ์„ฑ ์š”์†Œ
  • Utilization(์ ์œ ์œจ) : ๋ฆฌ์†Œ์Šค๊ฐ€ ์„œ๋น„์Šค์— ๋Œ€ํ•˜์—ฌ ๋ฐ”์˜๊ฒŒ ์‚ฌ์šฉ๋˜๋Š” ํ‰๊ท  ์‹œ๊ฐ„
    -> ์ ์œ ์œจ์ด ๋†’๋‹ค = ๋ฆฌ์†Œ์Šค๊ฐ€ ์ฒ˜๋ฆฌ๋˜์ง€์•Š๊ณ  ๋ฐ€๋ ค์žˆ์„ ๊ฐ€๋Šฅ์„ฑ(bottle neck ๋“ฑ)์ด ์žˆ๋‹ค.
  • Saturation(ํฌํ™”์œจ) : ๋ฆฌ์†Œ์Šค๊ฐ€ ์ฒ˜๋ฆฌํ•˜์ง€ ๋ชปํ•œ ์—ฌ๋ถ„์˜ ์ผ(extra work)์˜ ์ •๋„. ๋ฆฌ์†Œ์Šค๊ฐ€ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ํ•œ๊ณ„๋ฅผ ๋„˜์–ด ์ฒ˜๋ฆฌํ•˜์ง€ ๋ชปํ•œ ์ž‘์—…์˜ ์ •๋„.
  • Errors(์˜ค๋ฅ˜) : ์—๋Ÿฌ ์ด๋ฒคํŠธ์˜ ํšŸ์ˆ˜
  • ์„ธ ์ง€ํ‘œ๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๋‚ฎ์€ ๋ ˆ๋ฒจ์˜(์„œ๋ฒ„์˜ ๋ฌผ๋ฆฌ์  ์ž์› - ๋„คํŠธ์›Œํฌ) ๋ชจ๋‹ˆํ„ฐ๋ง์ด ๊ฐ€๋Šฅํ•˜๋‹ค.
    CPU ์ ์œ ์œจ์€ ๋‚ฎ์€๋ฐ ํฌํ™”์œจ์ด ๋†’๋‹ค? -> ์Šค๋ ˆ๋“œ๋‚˜ ํ”„๋กœ์„ธ์Šค ๋ถ„๋ฐฐ๊ฐ€ ์ž˜๋ชป๋˜์–ด ์žˆ์–ด์„œ ํ•˜๋‚˜์˜ CPU ์ฝ”์–ด์— ์ผ์ด ๋ชฐ๋ ค์žˆ๋‹ค. -> ์Šค๋ ˆ๋“œ ํ’€์„ ์‚ฌ์šฉํ•ด์„œ ํ•˜๋‚˜์˜ ์Šค๋ ˆ๋“œ์— ๋ชฐ๋ฆฐ ์ž‘์—…์„ ๋ถ„๋ฐฐํ•˜๋Š” ๋“ฑ์˜ ์กฐ์น˜๋ฅผ ์ทจํ•  ์ˆ˜ ์žˆ์Œ.
Pros and cons
  • Pros : 'it solves about 80% of server issues with 5% of the effort', 'it can be applied to systems other than servers'. USE ๋ฐฉ๋ฒ•๋ก ์„ ์ ์šฉํ•˜์—ฌ ๋””์ž์ธํ•œ ๋ฉ”ํŠธ๋ฆญ์€ ์„œ๋ฒ„ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์—ฌ๋Ÿฌ ์‹œ์Šคํ…œ์— ์ ์šฉ ๊ฐ€๋Šฅํ•˜๊ณ , ์ ์€ ๋…ธ๋ ฅ์„ ๋“ค์—ฌ ๋†’์€ ํšจ์œจ์˜ ์„ฑ๋Šฅ ๊ฐœ์„ ์„ ์ด๋ค„๋‚ผ ์ˆ˜ ์žˆ๋‹ค.
  • Cons : ๋‹ค์–‘ํ•œ ์ •๋ณด๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์ง€๋งŒ, MSA ๊ตฌ์กฐ์—์„œ๋Š” ์‹ ๋ขฐํ•˜๊ธฐ๊ฐ€ ํž˜๋“ฌ(์ถ”์ƒ์ ์ด๊ธฐ ๋•Œ๋ฌธ). ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๊ฒƒ์€ ์บ์น˜ ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ low level ๋ฉ”ํŠธ๋ฆญ(์ธํ”„๋ผ ๋ ˆ๋ฒจ์˜ ๋ฉ”ํŠธ๋ฆญ)์ด์–ด์„œ ์–ด๋Š ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ์–ด๋Š ์„œ๋น„์Šค์—์„œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š”์ง€ ์•Œ ์ˆ˜ ์—†์Œ.

brendangregg

๐Ÿ“• RED

โ€œThe USE Method doesnโ€™t really apply to services; it applies to hardware, network disks, things like this,โ€
โ€œWe really wanted a microservices-oriented monitoring philosophy, so we came up with the RED Method.โ€
โ€œ๋ชจ๋“  ๋ฆฌ์†Œ์Šค์— ๋Œ€ํ•ด ์š”์ฒญ๋ฅ (Rate), ์˜ค๋ฅ˜์œจ(Error), ์ง€์†์‹œ๊ฐ„(Duration)์„ ๋ชจ๋‹ˆํ„ฐ๋ง ํ•˜๋Š” ๊ฒƒ์„ ๊ฐ•์กฐโ€
  • USE ๋ฐฉ๋ฒ•๋ก ์ด low level์˜ ๋ฉ”ํŠธ๋ฆญ ์ˆ˜์ง‘์„ ๊ฐ•์กฐํ–ˆ๋‹ค๋ฉด, RED ๋ฐฉ๋ฒ•๋ก ์€ Application ๋ ˆ๋ฒจ์˜ ๋ฉ”ํŠธ๋ฆญ์„ ๊ฐ•์กฐ
  • Resource : Application ์˜ Service ๋ฅผ ์˜๋ฏธํ•œ๋‹ค๊ณ  ๋ด๋„ ๋จ
  • Rate : ์ฒ˜๋ฆฌ์œจ. ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์ด ์ฒ˜๋ฆฌํ•˜๋Š” ์š”์ฒญ์˜ ์†๋„. ex) ์ดˆ๋‹น HTTP request ์ˆ˜.
  • Error : ์˜ค๋ฅ˜ ์ˆ˜. ์š”์ฒญ์ด ์‹คํŒจํ•˜๋Š” ๋น„์œจ.ex) HTTP 500 error ๋น„์œจ
  • Duration : ์ฒ˜๋ฆฌ ์‹œ๊ฐ„. ์š”์ฒญ์ด ์ฒ˜๋ฆฌ๋˜๋Š” ๋ฐ ๊ฑธ๋ฆฌ๋Š” ์‹œ๊ฐ„. ex) HTTP request ์‹œ๊ฐ„.
  • RED ๋ฐฉ๋ฒ•๋ก ์€ SLA(Service Level Agreement) ๋“ฑ์˜ ํŒŒ์•…์— ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ์Œ.

The RED Method

๐Ÿ“ Metrics Design

USE/RED ๋ฐฉ๋ฒ•๋ก ์„ ์ฐธ๊ณ ํ•ด์„œ ์‚ฌ์šฉํ•  ์‹œ์Šคํ…œ์— ์ ํ•ฉํ•œ ๋ฉ”ํŠธ๋ฆญ ์„ ์ •ํ•˜๊ธฐ

1. ๋ฉ”ํŠธ๋ฆญ ๋””์ž์ธ ์‹œ ๊ณ ๋ฏผํ•ด๋ณธ ์ 

์–ด๋–ค ๋ฉ”ํŠธ๋ฆญ์„ ์ˆ˜์ง‘ํ•ด์•ผ ํ•  ๊ฑด์ง€?

1) ๊ธฐ๋ณธ์ ์œผ๋กœ ์‹œ์Šคํ…œ ๋‚ด์˜ ๋ชจ๋“  ์•„ํ‚คํ…์ณ๋Š” ๋ชจ๋‹ˆํ„ฐ๋ง ๋˜์–ด์•ผ ํ•œ๋‹ค.

  • ๊ธฐ๋ณธ์ ์œผ๋กœ ํ•ด๋‹น ์•„ํ‚คํ…์ณ๊ฐ€ ์ œ๊ณตํ•ด์ฃผ๋Š” ๋ฉ”ํŠธ๋ฆญ ์™ธ์—๋„ ์•„ํ‚คํ…์ณ์™€ application ์ด ํ†ต์‹ ํ•˜๋Š” ๋ถ€๋ถ„์—์„œ ๋ฉ”ํŠธ๋ฆญ ์ˆ˜์ง‘ํ•˜๊ธฐ (์š”์ฒญ ํšŸ์ˆ˜, ์š”์ฒญ ์‹œ๊ฐ„, ์—๋Ÿฌ ํšŸ์ˆ˜ ์ฒดํฌ)

2) ์„œ๋น„์Šค path ๋ณ„๋กœ ๋ฉ”ํŠธ๋ฆญ ์ˆ˜์ง‘์ด ๊ฐ€๋Šฅํ•ด์•ผ ํ•œ๋‹ค.(์–ด๋Š ์„œ๋น„์Šค ์ง€์ ์—์„œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š”์ง€ ํŒŒ์•…ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ)

  • tag๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ path ๋ณ„ ๋ฉ”ํŠธ๋ฆญ์„ ๊ตฌ๋ถ„ํ•˜์—ฌ ์ˆ˜์ง‘ํ•œ๋‹ค. (๊ฐ™์€ ์ด๋ฆ„์„ ๊ฐ€์ง€๋”๋ผ๋„ ํƒœ๊ทธ๋กœ ๋ฉ”ํŠธ๋ฆญ์ด ๊ตฌ๋ถ„๋œ๋‹ค)

3) ์ „์ฒด ํšŸ์ˆ˜์™€ ๋ฉ”์†Œ๋“œ ์š”์ถœ ํšŸ์ˆ˜ ๋‘˜ ๋‹ค ์ˆ˜์ง‘์ด ํ•„์š”ํ•˜๋‹ค.

  • ์‹ค์ œ ์„œ๋น„์Šค ์ƒํƒœ์—์„œ ์–ด๋Š ๋ฉ”์†Œ๋“œ์— ํŠธ๋ž˜ํ”ฝ์ด ์–ด๋Š ์ •๋„ ๋ชฐ๋ฆฌ๋Š”์ง€๋ฅผ ํŒŒ์•…ํ•˜๋ ค๋ฉด ํ•„์š”ํ•˜์ง€ ์•Š์„๊นŒ?
  • ํŠน์ • ํŠธ๋ž˜ํ”ฝ์ด ๋ชฐ๋ฆฌ๋Š” ๋ฉ”์†Œ๋“œ์—๋Š” ์ถ”๊ฐ€์ ์ธ ์•„ํ‚คํ…์ณ๋ฅผ ๋ถ™์—ฌ์„œ ์„ฑ๋Šฅ ๊ฐœ์„ ์„ ํ•ด๋ณผ ์ˆ˜๋„ ์žˆ์„ ๊ฒƒ ๊ฐ™์Œ.

4) http request time ๊ณผ๋Š” ๋ณ„๊ฐœ๋กœ ๋ฉ”์†Œ๋“œ์˜ ์‘๋‹ต ์†๋„๋„ ํ•„์š”ํ•  ๊ฒƒ

5) USE ๋ณด๋‹ค๋Š” RED ์— ์ค‘์ ์„ ๋‘๊ณ  ๋ฉ”ํŠธ๋ฆญ์„ ๋””์ž์ธํ•˜๋Š” ๊ฒƒ์ด ์ข‹์ง€ ์•Š์„๊นŒ?

  • ์ธํ”„๋ผ์˜ ์ •๋ณด๋ฅผ ์ „ํ•ด์ฃผ๋Š” USE ๊ด€๋ จ ๋ฉ”ํŠธ๋ฆญ๋“ค์€ ๊ธฐ๋ณธ actuator metrics ๋“ฑ์—์„œ ์ œ๊ณตํ•ด์ค€๋‹ค.
  • ๋ฐ˜๋ฉด ์„œ๋น„์Šค ๋ ˆ๋ฒจ์˜ ์ •๋ณด๋ฅผ ์ „๋‹ฌํ•ด์ฃผ๋Š” ๋ฉ”ํŠธ๋ฆญ์€ ๋ณ„๋„๋กœ ์ œ๊ณตํ•ด์ฃผ์ง€ ์•Š์Œ (๊ธฐ๊ปํ•ด์•ผ http request ์™€ ๊ด€๋ จ๋œ ๋ฉ”ํŠธ๋ฆญ๋“ค ์ •๋„์ธ๋ฐ, ์ด๊ฒƒ๋„ ์š”์ฒญ ์ „์ฒด๋ฅผ ์•Œ๋ ค์ค„ ๋ฟ์ด์ง€ ๊ฐ ์„œ๋น„์Šค ๋ ˆ๋ฒจ์—์„œ๋Š” ๋ชจ๋‹ˆํ„ฐ๋ง์ด ๋ถˆ๊ฐ€๋Šฅ
  • RED ๋ฐฉ๋ฒ•๋ก ์„ ์‚ฌ์šฉํ•ด์„œ ์‚ฌ์šฉ์ž๊ฐ€ ์‚ฌ์šฉ ์‹œ, ๊ฐ๊ฐ์˜ ์„œ๋น„์Šค ๋ณ„๋กœ ์š”์ฒญ ํšŸ์ˆ˜(Rate), ์—๋Ÿฌ ํšŸ์ˆ˜ ๋ฐ ๋ฐœ์ƒ ์ง€์ (Error), ์ฒ˜๋ฆฌ ์‹œ๊ฐ„(์Šคํ”„๋ง ๋‚ด๋ถ€์˜ ์ฝ”๋“œ ๋ฌธ์ œ์ธ์ง€? ์—ฐ๊ฒฐ๋œ ์•„ํ‚คํ…์ณ์˜ ๋ฌธ์ œ์ธ์ง€?)(Duration)์„ ์ƒ๊ฐํ•ด๋ณด๊ณ  ๋””์ž์ธ ํ•˜๊ธฐ.

2. ๋ชจ๋‹ˆํ„ฐ๋ง ๋Œ€์ƒ ์‹œ์Šคํ…œ

image image

  • Backend App (Spring Actuator)
  • Frontend App (Node Exporter)
  • Nginx (Nginx Exporter)
  • Opensearch (Opensearch Exporter)
  • MySQL (MySQL Exporter)
  • Redis (Redis Exporter)
  • Node: Springboot, Vue.js ๋“ฑ์˜ ๋™์ž‘ํ•˜๋Š” WorkerNode ๋ชจ๋‹ˆํ„ฐ๋ง (Node Exporter)

3. Spring Actuator ๊ธฐ๋ณธ ์ œ๊ณต ๋ฉ”ํŠธ๋ฆญ

{
    "names": [
    	# ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์ด ์‹คํ–‰๋˜๊ณ  Ready ์ƒํƒœ๊ฐ€ ๋˜๊ธฐ๊นŒ์ง€ ๊ฑธ๋ฆฐ ์‹œ๊ฐ„
        "application.ready.time",
        # ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์ด ์‹œ์ž‘๋˜๊ธฐ ์‹œ์ž‘ํ•œ ์‹œ๊ฐ„
        "application.started.time",
        # ๋””์Šคํฌ์˜ ์—ฌ์œ  ๊ณต๊ฐ„
        "disk.free",
        # ๋””์Šคํฌ์˜ ์ด ์šฉ๋Ÿ‰
        "disk.total",
        # ํ˜„์žฌ ํ™œ์„ฑํ™”๋œ executor ์ž‘์—…์˜ ์ˆ˜
        "executor.active",
        # ์™„๋ฃŒ๋œ executor ์ž‘์—…์˜ ์ˆ˜
        "executor.completed",
        # executor ํ’€์˜ ํ•ต์‹ฌ ํฌ๊ธฐ
        "executor.pool.core",
        # executor ํ’€์˜ ์ตœ๋Œ€ ํฌ๊ธฐ
        "executor.pool.max",
        # executor ํ’€์˜ ํ˜„์žฌ ํฌ๊ธฐ
        "executor.pool.size",
        # executor ํ’€์˜ ํ์— ๋Œ€๊ธฐ ์ค‘์ธ ์ž‘์—…์˜ ์ˆ˜
        "executor.queue.remaining",
        # ํ˜„์žฌ executor ํ’€์— ๋Œ€๊ธฐ ์ค‘์ธ ์ž‘์—…์˜ ์ˆ˜
        "executor.queued",
        # ์ด HTTP ์„œ๋ฒ„ ์š”์ฒญ์˜ ์ˆ˜
        "http.server.requests",
        # ํ˜„์žฌ ์ฒ˜๋ฆฌ ์ค‘์ธ HTTP ์„œ๋ฒ„ ์š”์ฒญ์˜ ์ˆ˜
        "http.server.requests.active",
        # JVM ๋ฒ„ํผ์˜ ๊ฐœ์ˆ˜.
        "jvm.buffer.count",
        # ์‚ฌ์šฉ ์ค‘์ธ JVM ๋ฒ„ํผ ๋ฉ”๋ชจ๋ฆฌ ์–‘.
        "jvm.buffer.memory.used",
        # JVM ๋ฒ„ํผ์˜ ์ด ์šฉ๋Ÿ‰.
        "jvm.buffer.total.capacity",
        # ๋กœ๋“œ๋œ JVM ํด๋ž˜์Šค์˜ ์ˆ˜
        "jvm.classes.loaded",
        # ์–ธ๋กœ๋“œ๋œ JVM ํด๋ž˜์Šค์˜ ์ˆ˜
        "jvm.classes.unloaded",
        # JVM์—์„œ์˜ ํด๋ž˜์Šค ์ปดํŒŒ์ผ ์‹œ๊ฐ„
        "jvm.compilation.time",
        # GC(Garbage Collection) ์ค‘์— ์‚ด์•„์žˆ๋Š” ๊ฐ์ฒด์˜ ํฌ๊ธฐ
        "jvm.gc.live.data.size",
        # GC ์ค‘์— ์ˆ˜์ง‘๋  ์ˆ˜ ์žˆ๋Š” ์ตœ๋Œ€ ๋ฐ์ดํ„ฐ ํฌ๊ธฐ
        "jvm.gc.max.data.size",
        # GC๋กœ ํ• ๋‹น๋œ ๋ฉ”๋ชจ๋ฆฌ ์–‘
        "jvm.gc.memory.allocated",
        # GC๋กœ ํ”„๋กœ๋ชจ์…˜๋œ ๋ฉ”๋ชจ๋ฆฌ ์–‘
        "jvm.gc.memory.promoted",
        # GC ์˜ค๋ฒ„ํ—ค๋“œ.
        "jvm.gc.overhead",
        # GC ์ผ์‹œ ์ •์ง€ ์‹œ๊ฐ„.
        "jvm.gc.pause",
        # JVM์— ๋Œ€ํ•œ ์ผ๋ฐ˜์ ์ธ ์ •๋ณด๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฌธ์ž์—ด
        "jvm.info",
        # JVM์ด ํ• ๋‹นํ•œ ๋ฉ”๋ชจ๋ฆฌ ์ค‘์— ํ˜„์žฌ ์ปค๋ฐ‹๋œ(ํ™•๋ณด๋œ) ๋ฉ”๋ชจ๋ฆฌ ์–‘
        "jvm.memory.committed",
        # JVM์ด ์ตœ๋Œ€๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฉ”๋ชจ๋ฆฌ ์–‘
        "jvm.memory.max",
        # ๊ฐ€๋น„์ง€ ์ปฌ๋ ‰์…˜ ํ›„์˜ JVM ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋ฅ 
        "jvm.memory.usage.after.gc",
        # JVM์ด ํ˜„์žฌ ์‚ฌ์šฉ ์ค‘์ธ ๋ฉ”๋ชจ๋ฆฌ ์–‘
        "jvm.memory.used",
        # ํ˜„์žฌ ์‹คํ–‰ ์ค‘์ธ ๋ฐ๋ชฌ(๋ฐฑ๊ทธ๋ผ์šด๋“œ) ์Šค๋ ˆ๋“œ์˜ ์ˆ˜
        "jvm.threads.daemon",
        # ํ˜„์žฌ ํ™œ์„ฑํ™”๋œ ์Šค๋ ˆ๋“œ์˜ ์ˆ˜
        "jvm.threads.live",
        # ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์ด ์‹คํ–‰๋˜๋Š” ๋™์•ˆ ์ตœ๋Œ€๋กœ ํ™œ์„ฑํ™”๋œ ์Šค๋ ˆ๋“œ์˜ ์ˆ˜
        "jvm.threads.peak",
        # ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์ด ์‹คํ–‰๋˜๋ฉด์„œ ์‹œ์ž‘๋œ ์ด ์Šค๋ ˆ๋“œ์˜ ์ˆ˜
        "jvm.threads.started",
        # ํ˜„์žฌ ์Šค๋ ˆ๋“œ์˜ ์ƒํƒœ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฉ”ํŠธ๋ฆญ
        "jvm.threads.states",
        # Logback ๋กœ๊น… ์ด๋ฒคํŠธ์˜ ์ด ์ˆ˜
        #์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ ๋กœ๊น…์ด ๋ฐœ์ƒํ•  ๋•Œ๋งˆ๋‹ค ์ด ๊ฐ’์ด ์ฆ๊ฐ€
        "logback.events",
        # ํ˜„์žฌ ํ”„๋กœ์„ธ์Šค์˜ CPU ์‚ฌ์šฉ๋ฅ 
        "process.cpu.usage",
        # ํ”„๋กœ์„ธ์Šค๊ฐ€ ์‹œ์ž‘๋œ ์‹œ๊ฐ„
        "process.start.time",
        # ํ”„๋กœ์„ธ์Šค๊ฐ€ ์‹คํ–‰๋œ ์‹œ๊ฐ„
        "process.uptime",
        # ์‹œ์Šคํ…œ์—์„œ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ CPU ์ฝ”์–ด์˜ ์ˆ˜
        "system.cpu.count",
        # ์‹œ์Šคํ…œ ์ „์ฒด CPU ์‚ฌ์šฉ๋ฅ 
        "system.cpu.usage",
        # ํ˜„์žฌ ํ™œ์„ฑํ™”๋œ Tomcat ์„ธ์…˜์˜ ์ˆ˜
        "tomcat.sessions.active.current",
        # Tomcat ์„ธ์…˜์˜ ์ตœ๋Œ€ ํ™œ์„ฑํ™” ์ˆ˜
        "tomcat.sessions.active.max",
        # ์œ ํšจํ•œ ์ตœ๋Œ€ Tomcat ์„ธ์…˜ ์ˆ˜
        "tomcat.sessions.alive.max",
        # ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ ์ƒ์„ฑ๋œ ์ด Tomcat ์„ธ์…˜ ์ˆ˜
        "tomcat.sessions.created",
        # ๋งŒ๋ฃŒ๋œ Tomcat ์„ธ์…˜์˜ ์ด ์ˆ˜
        "tomcat.sessions.expired",
        # ๊ฑฐ๋ถ€๋œ Tomcat ์„ธ์…˜์˜ ์ด ์ˆ˜
        "tomcat.sessions.rejected"
    ]
}

4. ๋””์ž์ธํ•œ ๋ฉ”ํŠธ๋ฆญ

  • 1์ฐจ
1. ์ „์ฒด์ ์œผ๋กœ ํ™•์ธํ•  ๋ฉ”ํŠธ๋ฆญ ๋ชฉ๋ก ์ฒดํฌ
2. USE / RED ๊ฐ๊ฐ Error ๋ฉ”ํŠธ๋ฆญ ์ˆ˜์ง‘์„ ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ• ์ง€ ํ™•์ธ
*Spring
USE
U
(disk.total - disk.free) / disk.total ( (์ „์ฒด ์šฉ๋Ÿ‰ - ์—ฌ์œ  ์šฉ๋Ÿ‰) / ์ „์ฒด ์šฉ๋Ÿ‰)
system.cpu.usage (์ „์ฒด ์‹œ์Šคํ…œ cpu ์‚ฌ์šฉ๋ฅ )
process.cpu.usage (ํ”„๋กœ์„ธ์Šค cpu ์‚ฌ์šฉ๋ฅ ) 
(jvm.memory.max - jvm.memory.used) / jvm.memory.max (jvm ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋ฅ )
(executor.pool.size - executor.queue.remaining) / executor.pool.size (ํ˜„์žฌ ํ’€ ์‚ฌ์šฉ๋Ÿ‰)

S
(executor.pool.size - executor.queued) / executor.pool.size ( ์Šค๋ ˆ๋“œ ํ’€์˜ ํฌํ™” ์ƒํƒœ )
executor.queued (์Šค๋ ˆ๋“œ ํ’€์˜ ๋Œ€๊ธฐ์—ด์— ํ˜„์žฌ ๋Œ€๊ธฐ ์ค‘์ธ ์ž‘์—…์˜ ์ˆ˜) 
jvm.gc.overhead

E
??๊ธฐ๋ณธ ์ œ๊ณต merics์ค‘์—์„œ ์—๋Ÿฌ ์นด์šดํŠธ ํ• ๋งŒํ•œ ๋ฉ”ํŠธ๋ฆญ์ด ๋ญ๊ฐ€ ์žˆ๋Š”์ง€ 

RED
R
http.server.requests (์ด HTTP ์š”์ฒญ ์ˆ˜) 
http.server.requests.active ( ํ˜„์žฌ ์ฒ˜๋ฆฌ ์ค‘์ธ HTTP ์š”์ฒญ ์ˆ˜ )
-> 1๋ถ„ ๋‹จ์œ„๋กœ ๋Š์–ด์„œ (์ด HTTP ์š”์ฒญ - ํ˜„์žฌ ์ฒ˜๋ฆฌ ์ค‘์ธ HTTP ์š”์ฒญ) / ์ด HTTP ์š”์ฒญ ํ•˜๋ฉด 
์š”์ฒญ์ด ์–ผ๋งˆ๋‚˜ ๋‚จ์•„์žˆ๋Š”์ง€๋ฅผ ์•Œ ์ˆ˜ ์žˆ์„ ๋“ฏ

??

D
??


Opensearch
USE
U
CPU_Utilization (CPU ์‚ฌ์šฉ๋ฅ ) 
Disk_Utilization (๋””์Šคํฌ ์‚ฌ์šฉ๋ฅ ) 
Heap_Used / Heap_Maxed (๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋ฅ )
IO_ReadThroughput (์ง€๋‚œ 5์ดˆ๊ฐ„ ๋””์Šคํฌ์—์„œ ์ฝ์–ด์˜จ ๋ฐ์ดํ„ฐ ์–‘)

S
(ThreadPool_TotalThreads - ThreadPool_ActiveThreads) / ThreadPool_TotalThreads
(์Šค๋ ˆ๋“œ ํ’€ ์ž”์—ฌ๋Ÿ‰) 

E
Paging_MajfltRate (์ดˆ๋‹น ๋ฐœ์ƒํ•œ ์ฃผ์š”ํ•œ ์˜ค๋ฅ˜ ์ˆ˜)
Paging_MinfltRate (์ดˆ๋‹น ๋ฐœ์ƒํ•œ ๋งˆ์ด๋„ˆ ์˜ค๋ฅ˜ ์ˆ˜)

RED
R
HTTP_TotalRequest 
HTTP_RequestDocs
Disk_ServiceRate

E
ThreadPool_RejectedReqs(๊ฑฐ๋ถ€๋œ  executions ์ˆ˜)

D
Disk_WaitTime( ์ง€๋‚œ 5์ดˆ๊ฐ„ ๋””์Šคํฌ r/w ํ‰๊ท  ์‘๋‹ต์‹œ๊ฐ„)
GC_Collection_Time
  • 2์ฐจ
1. opensearch๋ฅผ ์ œ์™ธํ•œ springboot ๋ฉ”ํŠธ๋ฆญ์œผ๋กœ ๋ฒ”์œ„ ์ถ•์†Œ (ํ•„์š”ํ•œ ๊ฒฝ์šฐ ํƒ€ ์•„ํ‚คํ…์ณ ๋ฉ”ํŠธ๋ฆญ ์ˆ˜์ง‘๋„ ์ถ”๊ฐ€)
2. ์‚ฌ์šฉ์ž ์ •์˜ ๋ฉ”ํŠธ๋ฆญ์ด ์ ์ ˆํ•˜๊ฒŒ ์ ์šฉ๋˜๊ณ  ์žˆ๋Š”์ง€ ํ™•์ธ ํ•„์š”
3. ์ถ”๊ฐ€์ ์ธ ๋ฉ”ํŠธ๋ฆญ์ด ์–ด๋Š ๊ฒŒ ์žˆ์„ ์ˆ˜ ์žˆ์„์ง€ ํ™•์ธ (ํ–ฅํ›„ Redis ์—ฐ๊ฒฐ ์‹œ Redis ์‘๋‹ต ์‹œ๊ฐ„ / ์ ‘์† ์—๋Ÿฌ ๊ด€๋ จ ๋ฉ”ํŠธ๋ฆญ ์ถ”๊ฐ€ ๋“ฑ)

search.request.count(controller)

  • ํ•ด๋‹น ์–ด๋Œ‘ํ„ฐ(์ปจํŠธ๋กค๋Ÿฌ)๋กœ ๋“ค์–ด์˜จ ์ „์ฒด ์š”์ฒญ ์ˆ˜ ํ™•์ธ

  • ํ•ต์‹ฌ ๋น„์ฆˆ๋‹ˆ์Šค ๋กœ์ง ํŒŒ์•…

  • ์ถœ๋ ฅ ์˜ˆ์‹œ

# HELP search_request_count_total  
# TYPE search_request_count_total counter
search_request_count_total{application="webtoon-search",class="search-webtoon-controller",exception="IllegalArgumentException",method="searchWebtoon",result="failure"} 4.0
search_request_count_total{application="webtoon-search",class="search-webtoon-controller",exception="none",method="searchWebtoon",result="success"} 3.0

search.request.duration(controller)

  • ์›นํˆฐ ๊ฒ€์ƒ‰ ๋ฉ”์†Œ๋“œ์˜ response ๊นŒ์ง€์˜ ๋ฐ˜ํ™˜ ์‹œ๊ฐ„ ํ™•์ธ

  • handler method ~ opensearch ๊นŒ์ง€์˜ ์ „์ฒด์ ์ธ ์‘๋‹ต ์‹œ๊ฐ„ ํŒŒ์•…

  • ์—ฌ๊ธฐ์„œ ๋ฌธ์ œ๊ฐ€ ์—†๋Š”๋ฐ ์ „์ฒด ์‘๋‹ต ์‹œ๊ฐ„์ด ๊ธธ์–ด์ง€๋Š” ๊ฑฐ๋ฉด ํ†ฐ์บฃ ~ Nginx ~ LB ~ Client ์•„ํ‚คํ…์ณ์˜ ๋ฌธ์ œ๋ฅผ ์˜์‹ฌํ•ด๋ณผ ์ˆ˜ ์žˆ์„ ๊ฒƒ

  • ์ถœ๋ ฅ ์˜ˆ์‹œ

# HELP search_request_duration_seconds duration until search webtoon list
# TYPE search_request_duration_seconds summary
search_request_duration_seconds_count{application="webtoon-search",class="com.samsamohoh.webtoonsearch.adapter.web.SearchWebtoonController",endpoint="/webtoons/search",exception="IllegalArgumentException",method="searchWebtoon"} 4
search_request_duration_seconds_sum{application="webtoon-search",class="com.samsamohoh.webtoonsearch.adapter.web.SearchWebtoonController",endpoint="/webtoons/search",exception="IllegalArgumentException",method="searchWebtoon"} 7.155E-4
search_request_duration_seconds_count{application="webtoon-search",class="com.samsamohoh.webtoonsearch.adapter.web.SearchWebtoonController",endpoint="/webtoons/search",exception="none",method="searchWebtoon"} 3
search_request_duration_seconds_sum{application="webtoon-search",class="com.samsamohoh.webtoonsearch.adapter.web.SearchWebtoonController",endpoint="/webtoons/search",exception="none",method="searchWebtoon"} 0.1211729

search.condition.null.count(service)

  • controller๋กœ๋ถ€ํ„ฐ ๋ฐ›์•„์˜ค๋Š” ๊ฒ€์ƒ‰์–ด ๊ฐ์ฒด๊ฐ€ null ๋˜๋Š” ๋นˆ ๊ฐ’์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š”์ง€ ํ™•์ธ

  • ํ”„๋ก ํŠธ ๋‹จ์—์„œ 1์ฐจ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•˜์ง€ ๋ชปํ•œ ์ผ€์ด์Šค๊ฐ€ ์žˆ์„ ๊ฒฝ์šฐ ์—๋Ÿฌ ๋ฐ˜ํ™˜ ํ›„ ์นด์šดํŒ…

  • ์˜๋ฏธ ์—†๋Š” ๋ฉ”ํŠธ๋ฆญ์ผ ๊ฐ€๋Šฅ์„ฑ๋„ ์žˆ์Œ (์นด์šดํŒ…์œผ๋กœ ์–ด๋–ค ํšจ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์„์ง€?)

  • ์ถœ๋ ฅ ์˜ˆ์‹œ

# HELP search_condition_null_count_total title is null
# TYPE search_condition_null_count_total counter
search_condition_null_count_total{application="webtoon-search",class="search-webtoon-service",endpoint="/webtoons/search",method="search-webtoons"} 4.0

search.opensearch.reply.duration(adapter)

  • adapter ~ opensearch ๊ฐ„์˜ ์‘๋‹ต ์‹œ๊ฐ„ ํ™•์ธ

  • ์ถœ๋ ฅ ์˜ˆ์‹œ

# HELP search_opensearch_reply_duration_seconds duration until opensearch reply
# TYPE search_opensearch_reply_duration_seconds summary
search_opensearch_reply_duration_seconds_count{application="webtoon-search",class="com.samsamohoh.webtoonsearch.adapter.searchengine.SearchEngineAdapter",endpoint="/webtoons/search",exception="none",method="loadWebtoons"} 3
search_opensearch_reply_duration_seconds_sum{application="webtoon-search",class="com.samsamohoh.webtoonsearch.adapter.searchengine.SearchEngineAdapter",endpoint="/webtoons/search",exception="none",method="loadWebtoons"} 0.1199463
# HELP search_opensearch_reply_duration_seconds_max duration until opensearch reply
# TYPE search_opensearch_reply_duration_seconds_max gauge
search_opensearch_reply_duration_seconds_max{application="webtoon-search",class="com.samsamohoh.webtoonsearch.adapter.searchengine.SearchEngineAdapter",endpoint="/webtoons/search",exception="none",method="loadWebtoons"} 0.0971084

opensearch.connection.fail.count(adapter)

  • opensearch์˜ ์‘๋‹ต ์‹คํŒจ ๋ฐœ์ƒ์‹œ ํ™•์ธ

  • opensearch ์‘๋‹ต ์‹คํŒจ๊ฐ€ ์–ผ๋งˆ๋‚˜ ์ž์ฃผ ์žˆ์—ˆ๋Š”์ง€ ์นด์šดํŒ…

  • ํšŸ์ˆ˜๊ฐ€ ๋งŽ์œผ๋ฉด spring ~ opensearch ๊ฐ„์˜ ๋„คํŠธ์›Œํฌ ๋ฌธ์ œ ๋˜๋Š” opensearch ์ž์ฒด์˜ ๋ฌธ์ œ ์˜์‹ฌ๊ฐ€๋Šฅ.

  • ์ถœ๋ ฅ ์˜ˆ์‹œ

# HELP opensearch_connection_fail_count_total metrics for opensearch connecting failure
# TYPE opensearch_connection_fail_count_total counter
opensearch_connection_fail_count_total{application="webtoon-search",class="search-engine-adapter",endpoint="/webtoons/search",method="load-webtoons"} 2.0
โš ๏ธ **GitHub.com Fallback** โš ๏ธ