k8s netwoking guide - heshed/aPaaS GitHub Wiki

https://morioh.com/p/ecb38c8342ba

๊ฐ„๋‹จ ์š”์•ฝ

part 1

Kubernetes Networking Model

  • ๋ชจ๋“  Pod์€ ๊ณ ์œ ์˜ IP๋ฅผ ๊ฐ€์ง„๋‹ค.
  • Pod IP๋Š” Pod ๋‚ด๋ถ€์˜ ๋ชจ๋“  ์ปจํ…Œ์ด๋„ˆ์™€ ๊ณต์œ ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋ชจ๋“  ๋‹ค๋ฅธ Pod๋“ค๊ณผ routable ํ•˜๋‹ค.
  • k8s๋…ธ๋“œ์—์„œ ๊ตฌ๋™๋˜๋Š” ์–ด๋–ค ์ •์ง€ ์ปจํ…Œ์ด๋„ˆ๋“ค์€("pause" containers) sandbox ์ปจํ…Œ์ด๋„ˆ๋ผ ๋ถ€๋ฅด๊ณ , ์œ ์ผํ•œ job์€ network namespace (netns, pod๋‚ด๋ถ€์˜ ๋ชจ๋“  ์ปจํ…Œ์ด๋„ˆ๋“ค์—๊ฒŒ ๊ณต์œ ๋จ)๋ฅผ ๋ณด์กดํ•˜๊ณ  ์œ ์ง€ํ•˜๋Š” ์—ญํ• ์ด๋‹ค. ( https://www.ianlewis.org/en/almighty-pause-container )
  • pod IP๋Š” ์ปจํ…Œ์ด๋„ˆ๊ฐ€ ์ฃฝ๊ฑฐ๋‚˜ ์ƒˆ๋กœ์šด ์ปจํ…Œ์ด๋„ˆ๋กœ ๋Œ€์ฒด๋œ๋‹ค๊ณ  ํ•ด๋„ ๋ณ€ํ•˜์ง€ ์•Š๋Š”๋‹ค.
  • IP-per-pod ๋ชจ๋ธ์˜ ํฐ ์žฅ์ ์€ ํ˜ธ์ŠคํŠธ ๋‚ด์—์„œ ip/port ์ถฉ๋Œ์ด ์ ˆ๋Œ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š์œผ๋ฉฐ, ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์ด ์–ด๋–ค ํฌํŠธ๋ฅผ ์‚ฌ์šฉํ•˜๋Š”์ง€ ์—ผ๋ คํ•  ์ผ์ด ์—†๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.
  • k8s์˜ ์œ ์ผํ•œ ์š”๊ตฌ์‚ฌํ•ญ์€ Pod IP๋“ค์ด ๋‹ค๋ฅธ ๋ชจ๋“  pod๋“ค์—๊ฒŒ ์–ด๋–ค ๋…ธ๋“œ๋“ค์ด๋“  routable/accessible ํ•œ์ง€ ์—ฌ๋ถ€์ด๋‹ค.

Intra-node communication

  • k8s node root network namespace

k8s-node

root netns์— eth0 ๋„คํŠธ์›Œํฌ ์ธํ„ฐํŽ˜์ด์Šค๊ฐ€ ์กด์žฌ

  • pods

each-pod

๋น„์Šทํ•˜๊ฒŒ, ๊ฐ pod๋Š” netns๊ฐ€ ์žˆ๊ณ , vitual ์ด๋”๋„ท pair ์—ฐ๊ฒฐ์ด root netns์— ์—ฐ๊ฒฐ๋˜์–ด ์žˆ๋‹ค. ๊ธฐ๋ณธ pipe-pair. root netns - pod netns

pod๋Š” eth0 ๋„คํŠธ์›Œํฌ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ๊ฐ€์ง€๋ฉฐ, pod๋Š” ํ˜ธ์ŠคํŠธ์— ๋Œ€ํ•ด์„œ ์•Œ์ง€ ๋ชปํ•˜๊ณ , ์ž์‹ ์˜ root network setup๋งŒ ์žˆ๋Š” ๊ฒƒ์œผ๋กœ ์ธ์ง€ํ•œ๋‹ค. ๋‹ค๋ฅธ end ์ธํ„ฐํŽ˜์ด์Šค๋Š” vethxxx ์ด๋‹ค.

ifconfig ๋˜๋Š” ip a ๋กœ ๋‹น์‹ ์˜ ๋…ธ๋“œ์— ์žˆ๋Š” ๋ชจ๋“  ๋„คํŠธ์›Œํฌ ์ธํ„ฐํŽ˜์ด์Šค ๋ชฉ๋ก์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

๋…ธ๋“œ์˜ ๋ชจ๋“  pod๊ฐ€ ์„œ๋กœ ํ†ต์‹ ํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋ฆฌ๋ˆ…์Šค ์ด๋”๋„ท ๋ธŒ๋ฆฟ์ง€์ธ cbr0์ด ์‚ฌ์šฉ๋œ๋‹ค. docker๋„ ์œ ์‚ฌํ•˜๊ฒŒ docker0์œผ๋กœ ์ด๋ฆ„์„ ์ง€์—ˆ๋‹ค. ๋ธŒ๋ฆฟ์ง€ ๋ฆฌ์ŠคํŠธ๋ฅผ ํ™•์ธํ•˜๋ ค๋ฉด brctl show ๋ช…๋ น์–ด๋ฅผ ์ด์šฉํ•˜๋ฉด ๋œ๋‹ค.

packet-going-pods

ํŒจํ‚ท์ด pod1์—์„œ pod2๋กœ ๊ฐ„๋‹ค๊ณ  ํ•ด๋ณด์ž.

  1. ํŒจํ‚ท์ด pod1 netns eth0์„ ๋– ๋‚˜ root netns์˜ vethxxx๋กœ ๋“ค์–ด๊ฐ„๋‹ค.
  2. ARP ๋ฆฌํ€˜์ŠคํŠธ๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๋ชฉ์ ์ง€๋ฅผ ๋ฐœ๊ฒฌํ•˜๋Š” cbr0์—๊ฒŒ ๋˜์ง„๋‹ค "who has this IP?"
  3. vethyyy ๊ฐ€ ์‘๋‹ตํ•œ๋‹ค. ๋ธŒ๋ฆฟ์ง€๋Š” ํŒจํ‚ท์„ ์–ด๋””๋กœ ๋ณด๋‚ด์•ผํ• ์ง€ ์ด์ œ ์•Œ๊ฒŒ ๋˜์—ˆ๋‹ค.
  4. ํŒจํ‚ท์ด vethyyy๋กœ ๋„๋‹ฌํ•˜๋ฉด, pipe-pair๋ฅผ ๊ฑฐ์ณ์„œ pod2์˜ netns์— ๋„์ฐฉํ•œ๋‹ค.

docker๋„ ๋™์ผํ•œ ๋ฐฉ์‹์œผ๋กœ ๋™์ž‘ํ•œ๋‹ค.

Inter-node communication

pods๋Š” ๋…ธ๋“œ ์‚ฌ์ด๋ฅผ ๊ฑฐ์ณ์„œ ๋„์ฐฉํ•ด์•ผ ํ•œ๋‹ค. k8s๋Š” ๊ทธ ๋ฐฉ์‹์— ๋Œ€ํ•ด์„œ ์‹ ๊ฒฝ์จ์ฃผ์ง€ ์•Š๋Š”๋‹ค. L2(๋…ธ๋“œ๊ฐ„์„ ๊ต์ฐจํ•˜๋Š” ARP), L3(IP ๋ผ์šฐํŒ… - ํด๋ผ์šฐ๋“œ๊ฐ€ ๋ผ์šฐํŠธ ํ…Œ์ด๋ธ”์„ ์ œ๊ณตํ•˜๋“ฏ์ด), overlay ๋„คํŠธ์›Œํฌ, ๋˜๋Š” carrier pigeons (์ „์„œ๊ตฌ??) ( https://en.wikipedia.org/wiki/IP_over_Avian_Carriers ) ( IP(Internet Protocol)์„ ์ „์„œ๊ตฌ์— ์‹ค์–ด์„œ ๋ณด๋‚ด์ž! April fool ์ธ๋“ฏ) ๋ชจ๋“  ๋…ธ๋“œ๋Š” ์œ ์ผํ•œ pod IP๋“ค์˜ CIDR ๋ธ”๋ก์œผ๋กœ ํ• ๋‹น๋˜์–ด ์žˆ์–ด์„œ, ๋‹ค๋ฅธ ๋…ธ๋“œ์˜ pod๋“ค ip์™€ ์ถฉ๋Œํ•˜์ง€ ์•Š๋Š”๋‹ค.

๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ, ํด๋ผ์šฐ๋“œ ํ™˜๊ฒฝ์—์„œ ํด๋ผ์šฐ๋“œ ํ”„๋กœ๋ฐ”์ด๋” ๋ผ์šฐํŠธ ํ…Œ์ด๋ธ”์€ ํŒจํ‚ท์ด ์ •ํ™•ํ•œ ๋ชฉ์ ์ง€์— ๋„์ฐฉํ•˜๋„๋ก ํ•ด์ค€๋‹ค. ๋งŽ์€ ์ข…๋ฅ˜์˜ ๋„คํŠธ์›Œํฌ ํ”Œ๋Ÿฌ๊ทธ์ธ๋“ค์ด ์กด์žฌํ•œ๋‹ค.

์ด์ œ ๋‘ ๋…ธ๋“œ๊ฐ€ ์žˆ๊ณ , ๊ฐ ๋…ธ๋“œ๋Š” ๋‹ค์–‘ํ•œ ๋„คํŠธ์›Œํฌ ๋„ค์ž„์ŠคํŽ˜์ด์Šค, ๋„คํŠธ์›Œํฌ ์ธํ„ฐํŽ˜์ด์Šค, ๋ธŒ๋ฆฟ์ง€๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๊ณ  ํ•˜์ž.

ํŒจํ‚ท์ด pod1์—์„œ pod4(๋‹ค๋ฅธ ๋…ธ๋“œ)๋กœ ๊ฐ„๋‹ค๊ณ  ๊ฐ€์ •ํ•ด๋ณด์ž.

  1. ํŒจํ‚ท์ด pod1 netns eth0์—์„œ root netns vethxxx ๋กœ ํ–ฅํ•œ๋‹ค.
  2. cbr0์„ ํ†ต๊ณผํ•˜๋Š”๋ฐ, cbr0์€ ๋ชฉ์ ์ง€๋ฅผ ์ฐพ๋Š” ARP ๋ฆฌํ€˜์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.
  3. ํŒจํ‚ท์ด cbr0์„ ๋– ๋‚˜ ๋ฉ”์ธ ๋„คํŠธ์›Œํฌ ์ธํ„ฐํŽ˜์ด์Šค eth0์œผ๋กœ ๊ฐ„๋‹ค. ๋ˆ„๊ตฌ๋„ pod4 IP๋ฅผ ๊ฐ€์ง„ ๋…ธ๋“œ๊ฐ€ ์—†๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.
  4. node1 ๋จธ์‹ ์„ ๋– ๋‚œ๋‹ค. src=pod1 ์—์„œ dst=pod4 ์ •๋ณด๋ฅผ ์—ฎ์–ด์„œ
  5. ๋ผ์šฐํŠธ ํ…Œ์ด๋ธ”์€ ๊ฐ ๋…ธ๋“œ์˜ CIDR๋ธ”๋ก์„ ์…‹์—…ํ•˜๊ณ  ์žˆ๊ณ , ํŒจํ‚ท์„ ์–ด๋–ค ๋…ธ๋“œ์˜ CIDR ๋ธ”๋ก์ด pod4 IP๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ๋Š”์ง€ ์ฐพ๋Š”๋‹ค.
  6. ํŒจํ‚ท์ด ๋ฉ”์ธ ๋„คํŠธ์›Œํฌ ์ธํ„ฐํŽ˜์ด์Šค eth0์˜ node2์— ๋„์ฐฉํ•œ๋‹ค.
  7. pod4๊ฐ€ eth0์˜ IP๊ฐ€ ์•„๋‹ˆ๋”๋ผ๋„, ๋…ธ๋“œ๊ฐ€ IP์™€ ์—ฎ์—ฌ ์„ค์ •๋˜์–ด ์žˆ์œผ๋ฏ€๋กœ ํŒจํ‚ท์€ cbr0์—๊ฒŒ ํฌ์›Œ๋”ฉ๋œ๋‹ค.
  8. ๋…ธ๋“œ์˜ ๋ผ์šฐํŒ… ํ…Œ์ด๋ธ” pod4 IP์™€ ๋งค์นญ๋˜๋Š”์ง€ ํ™•์ธํ•œ๋‹ค. ๊ณง ๋…ธ๋“œ์˜ CIDR๋ธ”๋ก์˜ ๋ชฉ์ ์ง€๋กœ์„œ cbr0๋ฅผ ์ฐพ์•„๋‚ธ๋‹ค.
  9. route -n ๋ช…๋ น์–ด๋ฅผ ์ด์šฉํ•˜์—ฌ ๋…ธ๋“œ ๋ผ์šฐํŠธ ํ…Œ์ด๋ธ” ๋ฆฌ์ŠคํŠธ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. cbr0์˜ ๋ผ์šฐํŠธ๋ฅผ ๋ณด์—ฌ์ค„ ๊ฒƒ์ด๋‹ค.

  1. ๋ธŒ๋ฆฟ์ง€๊ฐ€ ํŒจํ‚ท์„ ๊ฐ€์ ธ๊ฐ€๊ณ  APR ๋ฆฌํ€˜์ŠคํŠธ๋ฅผ ํ•˜์—ฌ vethyyy IP๋ฅผ ์ฐพ๋Š”๋‹ค.
  2. ํŒจํ‚ท์ด pipe-pair ๋ฅผ ๊ฑฐ์ณ pod4์— ๋„์ฐฉํ•œ๋‹ค.

part 2

overlay networks ๊ฐ€ ๋™์ž‘ํ•˜๋Š” ์›๋ฆฌ๋ฅผ ์‚ดํŽด๋ณด์ž. ๋˜ํ•œ pod์˜ ๋ณ€ํ™”๊ฐ€ k8s์œ„์— ์‹คํ–‰ํ•˜๋Š” ์•ฑ์œผ๋กœ๋ถ€ํ„ฐ ์ถ”์ƒํ™”๋˜๋Š” ๋ฐฉ๋ฒ•์„ ์ดํ•ดํ•˜๊ฒŒ ๋  ๊ฒƒ์ด๋‹ค.

Overlay networks

์˜ค๋ฒ„๋ ˆ์ด ๋„คํŠธ์›Œํฌ๋Š” ํ•„์ˆ˜์˜ต์…˜์€ ์•„๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ์ง€๋งŒ ํŠน์ •ํ•œ ์ƒํ™ฉ์—์„œ ๋„์›€์ด ๋œ๋‹ค. IP ๊ณต๊ฐ„์ด ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์„ ๊ฒฝ์šฐ, ๋„คํŠธ์›Œํฌ๊ฐ€ ์™ธ๋ถ€์˜ routes๋ฅผ ๋‹ค๋ฃจ์ง€ ๋ชปํ• ๋•Œ์™€ ๊ฐ™์€ ์ƒํ™ฉ์ด๋‹ค. ๋˜๋Š” ์˜ค๋ฒ„๋ ˆ์ด๊ฐ€ ์ œ๊ณตํ•˜๋Š” ์–ด๋–ค ๋ถ€๊ฐ€์ ์ธ ์šด์˜ ํ”ผ์ฒ˜๋“ค์„ ํ•„์š”๋กœ ํ• ์ˆ˜๋„ ์žˆ๋‹ค. ๊ณตํ†ต์ ์œผ๋กœ ๋ณด๋Š” ์ผ€์ด์Šค๋Š” ํด๋ผ์šฐ๋“œ ํ”„๋กœ๋ฐ”์ด๋”๊ฐ€ ๋ผ์šฐํŠธ ํ…Œ์ด๋ธ”์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ๋Š” ๋ผ์šฐํŠธ ์ œํ•œ(limit)์ˆ˜์น˜๊ฐ€ ์–ผ๋งˆ์ธ๊ฐ€์ด๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด AWS ๋ผ์šฐํŠธ ํ…Œ์ด๋ธ”์€ 50 routes ๊นŒ์ง€๋Š” ๋„คํŠธ์›Œํฌ ์„ฑ๋Šฅ์— ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š๋Š”๋‹ค. ๊ทธ๋Ÿฌ๋ฏ€๋กœ k8s ๋…ธ๋“œ๊ฐ€ 50๊ฐœ๋ฅผ ๋„˜์–ด์„ค ๊ฒฝ์šฐ์— AWS ๋ผ์šฐํŠธ ํ…Œ์ด๋ธ”์€ ์ถฉ๋ถ„์น˜ ์•Š์„ ๊ฒƒ์ด๋‹ค. ์ด๋Ÿฌํ•œ ๊ฒฝ์šฐ์— ์˜ค๋ฒ„๋ ˆ์ด ๋„คํŠธ์›Œํฌ๊ฐ€ ๋„์›€์ด ๋œ๋‹ค.

๋…ธ๋“œ ์‚ฌ์ด์— ๋„ค์ดํ‹ฐ๋ธŒ ๋„คํŠธ์›Œํฌ๋ฅผ ํšก๋‹จํ•˜๋Š” ๊ฒฝ์šฐ ํŒจํ‚ท-๋‚ด-ํŒจํ‚ท encapsulating ์ด ํ•„์ˆ˜์ ์ด๋‹ค. ๋ชจ๋“  ํŒจํ‚ท์„ ์บก์А-๋””์บก์Аํ™” ํ•ด์•ผํ•˜๋Š” latency์™€ ๋ณต์žก์„ฑ์˜ ์˜ค๋ฒ„ํ—ค๋“œ ๋•Œ๋ฌธ์— ์˜ค๋ฒ„๋ ˆ์ด ๋„คํŠธ์›Œํฌ ์‚ฌ์šฉ์„ ์›ํ•˜์ง€ ์•Š์„ ์ˆ˜๋„ ์žˆ๋‹ค. ์˜ค๋ฒ„๋ ˆ์ด ๋„คํŠธ์›Œํฌ๋Š” ์ž์ฃผ ํ•„์š”ํ•˜์ง„ ์•Š๋‹ค. ๋•Œ๋ฌธ์— ํ•„์š”์„ฑ์ด ํ™•์‹คํžˆ ์žˆ์„ ๊ฒฝ์šฐ์—๋งŒ ์‚ฌ์šฉํ•ด์•ผ ํ•œ๋‹ค.

์˜ค๋ฒ„๋ ˆ์ด ๋„คํŠธ์›Œํฌ ์œ„์—์„œ ํŠธ๋ž˜ํ”ฝ ํ๋ฆ„์ด ์–ด๋–ป๊ฒŒ ์ง„ํ–‰๋˜๋Š”์ง€๋ฅผ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ, flannel์˜ ์˜ˆ๋ฅผ ์‚ดํŽด๋ณด๋„๋ก ํ•˜์ž. CoreOS์˜ ์˜คํ”ˆ์†Œ์Šค์ด๋‹ค.

root netns์— flannel0 ๊ฐ€์ƒ ์ด๋”๋„ท ๋””๋ฐ”์ด์Šค๊ฐ€ ์ถ”๊ฐ€๋œ๋‹ค. flannel์€ ๊ฐ€์ƒํ™•์žฅ LAN (VXLAN)์˜ ๊ตฌํ˜„์ธ๋ฐ, ๋ฆฌ๋ˆ…์Šค์—๊ฒŒ ์žˆ์–ด์„œ๋Š” ๊ทธ์ € ๋‹ค๋ฅธ ๋„คํŠธ์›Œํฌ ์ธํ„ฐํŽ˜์ด์Šค์ผ ๋ฟ์ด๋‹ค.

๋‹ค๋ฅธ ๋…ธ๋“œ์˜ pod1์—์„œ pod4๋กœ ์ „๋‹ฌํ•˜๋Š” ํŒจํ‚ท์˜ ํ”Œ๋กœ์šฐ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

  1. ํŒจํ‚ท์ด eth0์— pod1์˜ netns๋ฅผ ๋– ๋‚˜ vethxxx์˜ root netns ๋กœ ์ง„์ž…ํ•œ๋‹ค.

  2. ํŒจํ‚ท์ด cbr0 ์— ๋“ค์–ด์™€ APR ๋ฆฌํ€˜์ŠคํŠธ๋ฅผ ํ†ตํ•ด ๋ชฉ์ ์ง€๋ฅผ ์ฐพ๋Š”๋‹ค.

    1. ๋…ธ๋“œ์˜ ๋ˆ„๊ตฌ๋„ pod4์— ๋Œ€ํ•œ IP ์ฃผ์†Œ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ๋ธŒ๋ฆฟ์ง€๊ฐ€ flannel0์œผ๋กœ ํŒจํ‚ท์„ ๋ณด๋‚ธ๋‹ค. ๋…ธ๋“œ์˜ ๋ผ์šฐํŠธ ํ…Œ์ด๋ธ”์ด flannel0์˜ pod ๋„คํŠธ์›Œํฌ range์— ๋Œ€ํ•ด์„œ ํƒ€๊ฒŸ์œผ๋กœ ์„ค์ • ๊ตฌ์„ฑ๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.
    2. flanneld ๋ฐ๋ชฌ์€ ๋ชจ๋“  pod IP๋ฅผ ์•Œ๊ณ  ์žˆ๋Š” k8s apiserver ๋˜๋Š” etcd๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ, ์–ด๋–ค ๋…ธ๋“œ์— ์†ํ•˜๋Š”์ง€๋ฅผ ์•Œ์•„๋‚ธ๋‹ค. flannel์€ pod IP -> ๋…ธ๋“œ IP์˜ ๋งตํ•‘์„ ๋งŒ๋“ค์–ด๋‚ธ๋‹ค. (in userspace)

    flannel0 ์ด ์ด ํŒจํ‚ท์„ ์žก์•„์„œ UDP ํŒจํ‚ท + extra ํ—ค๋”๋ฅผ ๋งŒ๋“ค๊ณ , source ์™€ destination IP๋ฅผ ๊ด€๋ จ ๋…ธ๋“œ๋กœ ๋ณ€๊ฒฝํ•˜์—ฌ ํŠน๋ณ„ํ•œ vxlan port ๋กœ ๋ณด๋‚ธ๋‹ค. (์ผ๋ฐ˜์ ์œผ๋กœ 8472 ํฌํŠธ)

    ๋งตํ•‘์ด ์œ ์ €์ŠคํŽ˜์ด์Šค(userspace)์— ์žˆ๋‹ค ํ•ด๋„, ์‹ค์ œ์ ์ธ ์บก์Аํ™”์™€ ๋ฐ์ดํ„ฐ ํ”Œ๋กœ์šฐ๋Š” ์ปค๋„ ์ŠคํŽ˜์ด์Šค์—์„œ ์ด๋ฃจ์–ด์ง„๋‹ค. ๋”ฐ๋ผ์„œ ์ƒ๋‹นํžˆ ๋น ๋ฅด๋‹ค.

    1. ์บก์Аํ™”๋œ ํŒจํ‚ท์€ ๋…ธ๋“œ ํŠธ๋ž˜ํ”ฝ์˜ ๋ผ์šฐํŒ…์— ์†ํ•ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— eth0 ์œผ๋กœ ๋ณด๋‚ด์ง„๋‹ค.
  3. ํŒจํ‚ท์€ ๋…ธ๋“œ IP๋ฅผ source, destination ์œผ๋กœ ํ•˜๊ณ  ๋…ธ๋“œ๋ฅผ ๋– ๋‚œ๋‹ค.

  4. ํด๋ผ์šฐ๋“œ ํ”„๋กœ๋ฐ”์ด๋” ๋ผ์šฐํŠธ ํ…Œ์ด๋ธ”์€ ๋…ธ๋“œ๊ฐ„์˜ ๋ผ์šฐํŠธ ํŠธ๋ž˜ํ”ฝ์ด ์–ด๋–ค ๋ฐฉ์‹์ธ์ง€๋ฅผ ์ด๋ฏธ ์•Œ๊ณ  ์žˆ์–ด์„œ ๋ชฉ์ ์ง€ node2์— ํŒจํ‚ท์„ ๋ณด๋‚ธ๋‹ค.

    1. ํŒจํ‚ท์ด node2์˜ eth0์— ๋„์ฐฉํ•œ๋‹ค. ํฌํŠธ๊ฐ€ ํŠน๋ณ„ํ•œ vxlan ํฌํŠธ์ด๊ธฐ ๋•Œ๋ฌธ์— ์ปค๋„์€ ํŒจํ‚ท์„ flannel0 ์œผ๋กœ ๋ณด๋‚ธ๋‹ค.
    2. flannel0 ์€ ํŒจํ‚ท์˜ ์บก์Аํ™”๋ฅผ ํ’€๊ณ , root netns (network namespace)์œผ๋กœ ๋˜๋Œ๋ ค ๋ณด๋‚ธ๋‹ค(emit it back)
    3. IP ํฌ์›Œ๋”ฉ์ด ๊ฐ€๋Šฅํ•˜๋ฏ€๋กœ ์ปค๋„์€ cbr0์œผ๋กœ ๋ณด๋‚ด ๋ผ์šฐํŠธ ํ…Œ์ด๋ธ”์„ ์ฐธ์กฐํ•˜๊ฒŒ ํ•œ๋‹ค.
  5. ๋ธŒ๋ฆฟ์ง€๊ฐ€ ํŒจํ‚ท์„ ์žก์•„ APR ๋ฆฌํ€˜์ŠคํŠธ๋ฅผ ๋งŒ๋“ค์–ด vethyyy ์— ์†ํ•œ IP์ž„์„ ์ฐพ๋Š”๋‹ค.

  6. ํŒจํ‚ท์ด pipe-pair๋ฅผ ๊ฐ€๋กœ์งˆ๋Ÿฌ pod4์— ๋„๋‹ฌํ•œ๋‹ค.

๋‹ค๋ฅธ ๊ตฌํ˜„์— ๋”ฐ๋ผ์„œ ์กฐ๊ธˆ์”ฉ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ k8s์˜ ์˜ค๋ฒ„๋ ˆ์ด ๋„คํŠธ์›Œํฌ์˜ ๊ธฐ๋ณธ์ ์ธ ๋™์ž‘๋ฐฉ์‹์ด๋‹ค. k8s๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด์„œl ์˜ค๋ฒ„๋ ˆ์ด ๋„คํŠธ์›Œํฌ๋ฅผ ๋ฐ˜๋“œ์‹œ ์‚ฌ์šฉํ•ด์•ผํ•  ๊ฒƒ ๊ฐ™์€ ๋ณดํŽธ์ ์ธ ์˜คํ•ด๊ฐ€ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์‚ฌ์‹ค์€ ํŠน๋ณ„ํ•œ ์‹œ๋‚˜๋ฆฌ์˜ค ์ƒํ™ฉ์—์„œ ์‚ฌ์šฉํ• ์ง€๊ฐ€ ๊ฒฐ์ •๋˜๋Š” ๊ฒƒ์ด๋‹ค. ๋”ฐ๋ผ์„œ ์ ˆ๋Œ€์ ์œผ๋กœ ์‚ฌ์šฉ์ด ํ•„์š”ํ•  ๋•Œ๋งŒ ์‚ฌ์šฉํ•ด์•ผ ํ•œ๋‹ค.

part 3

An illustrated guide to Kubernetes Networking [Part 3] Cluster dynamics Due to the every-changing dynamic nature of Kubernetes, and distributed systems in general, the pods (and consequently their IPs) change all the time. Reasons could range from desired rolling updates and scaling events to unpredictable pod or node crashes. This makes the Pod IPs unreliable for using directly for communications.

Enter Kubernetes Servicesโ€Šโ€”โ€Ša virtual IP with a group of Pod IPs as endpoints (identified via label selectors). These act as a virtual load balancer, whose IP stays the same while the backend Pod IPs may keep changing.

The whole virtual IP implementation is actually iptables (the recent versions have an option of using IPVS, but thatโ€™s another discussion) rules, that are managed by the Kubernetes componentโ€Šโ€”โ€Škube-proxy. This name is actually misleading now. It used to work as a proxy pre-v1.0 days, which turned out to be pretty resource intensive and slower due to constant copying between kernel space and user space. Now, itโ€™s just a controller, like many other controllers in Kubernetes, that watches the api server for endpoints changes and updates the iptables rules accordingly.

Due to these iptables rules, whenever a packet is destined for a service IP, itโ€™s DNATed (DNAT=Destination Network Address Translation), meaning the destination IP is changed from service IP to one of the endpointsโ€Šโ€”โ€Špod IPโ€Šโ€”โ€Šchosen at random by iptables. This makes sure the load is evenly distributed among the backend pods.

When this DNAT happens, this info is stored in conntrackโ€Šโ€”โ€Šthe Linux connection tracking table (stores 5-tuple translations iptables has done: protocol, srcIP, srcPort, dstIP, dstPort). This is so that when a reply comes back, it can un-DNAT, meaning change the source IP from the Pod IP to the Service IP. This way, the client is unaware of how the packet flow is handled behind the scenes.

So by using Kubernetes services, we can use same ports without any conflicts (since we can remap ports to endpoints). This makes service discovery super easy. We can just use the internal DNS and hard-code the service hostnames. We can even use the service host and port environment variables preset by Kubernetes.

Protip: Take this second approach and save a lot of unnecessary DNS calls!

Outbound traffic The Kubernetes services weโ€™ve talked about so far work within a cluster. However, in most of the practical cases, applications need to access some external api/website.

Generally, nodes can have both private and public IPs. For internet access, there is some sort of 1:1 NAT of these public and private IPs, especially in cloud environments.

For normal communication from node to some external IP, source IP is changed from nodeโ€™s private IP to itโ€™s public IP for outbound packets and reversed for reply inbound packets. However, when connection to an external IP is initiated by a Pod, the source IP is the Pod IP, which the cloud providerโ€™s NAT mechanism doesnโ€™t know about. It will just drop packets with source IPs other than the node IPs.

So we use, you guessed it, some more iptables! These rules, also added by kube-proxy, do the SNAT (Source Network Address Translation) aka IP MASQUERADE. This tells the kernel to use IP of the interface this packet is going out from, in place of the source Pod IP. A conntrack entry is also kept to un-SNAT the reply.

Inbound traffic Everythingโ€™s good so far. Pods can talk to each other, and to the internet. But weโ€™re still missing a key pieceโ€Šโ€”โ€Šserving the user request traffic. As of now, there are two main ways to do this:

NodePort/Cloud Loadbalancer (L4โ€Šโ€”โ€ŠIP and Port) Setting the service type to NodePort assigns the service a nodePort in range 30000-33000. This nodePort is open on every node, even if thereโ€™s no pod running on a particular node. Inbound traffic on this NodePort would be sent to one of the pods (it may even be on some other node!) using, again, iptables.

A service type of LoadBalancer in cloud environments would create a cloud load balancer (ELB, for example) in front of all the nodes, hitting the same nodePort.

Ingress (L7โ€Šโ€”โ€ŠHTTP/TCP)

A bunch of different implements, like nginx, traefik, haproxy, etc., keep a mapping of http hostnames/paths and the respective backends. This is entry point of the traffic over a load balancer and nodeport as usual, but the advantage is that we can have one ingress handling inbound traffic for all the services instead of requiring multiple nodePorts and load balancers.

Network Policy Think of this like security groups/ACLs for pods. The NetworkPolicy rules allow/deny traffic across pods. The exact implementation depends on the network layer/CNI, but most of them just use iptables.

Thatโ€™s all for now. In the previous parts we studied the foundation of Kubernetes Networking and how overlays work. Now we know how the Service abstraction helping in a dynamic cluster and makes discovery super easy. We also covered how the outbound and inbound traffic flow works and how network policy is useful for security within a cluster.

...