kubernetes networking - Murray-LIANG/forgetful GitHub Wiki
- Container-to-container
- Pod-to-pod
- Pod-to-service
- Internet-to-service
+-------------------------------------------------------------------+ +-------------------------------------------------------------------+
| | | |
| +-----------------------------------+ | | +-----------------------------------+ |
| | +-------------+ +-------------+ | | | | +-------------+ +-------------+ | |
| | | Container.A | | Container.B | | | | | | Container.A | | Container.B | | |
| | +-------------+ +-------------+ | | | | +-------------+ +-------------+ | |
| | | | | | | |
| | +------------------+ | | | | +------------------+ | |
| | | Parent Container | | | | | | Parent Container | | |
| | | (i.e. pause) | | | | | | (i.e. pause) | | |
| | +------------------+ | | | | +------------------+ | |
| | | +-------------------+ | | | | +-------------------+ |
| | +-----------------+ | | | | | | +-----------------+ | | | |
| | | eth0 172.17.0.2 | Pod.1 | | 172.17.0.3 Pod.2 | | | | | eth0 172.18.0.2 | Pod.1 | | 172.18.0.3 Pod.2 | |
| +---+--+--------------+-------------+ +----------+--------+ | | +---+--+--------------+-------------+ +----------+--------+ |
| | | | | | | |
| | | | | | | |
| +---+----+ +----+---+ | | +---+----+ +----+---+ |
| | veth.x | | veth.y | | | | veth.x | | veth.y | |
| +--+--------+---------------------------------+--------+--+ | | +--+--------+---------------------------------+--------+--+ |
| | | | | | | |
| | docker0 | | | | docker0 | |
| | 172.17.0.1 Bridge | | | | 172.18.0.1 Bridge | |
| +--------+------------------------------------------------+ | | +--------+------------------------------------------------+ |
| | | | | |
| | | | | |
| +--------+--------+ | | +--------+--------+ |
| | eth0 | | | | eth0 | |
| | 192.168.180.167 | Node.1 | | | 192.168.180.168 | Node.2 |
+---+--------+--------+---------------------------------------------+ +---+--------+--------+---------------------------------------------+
| |
| +------------------+ |
| | | |
+-----------------------------------------+ Switch or Router +----------------+
+------------------+
Containers in the same pod share the same network namespace. The services in these containers use different ports. These containers can communicate easily via localhost
. In Kubernetes, a parent container mostly pause
container sets up the network for all containers in the same pod.
When a container needs to visit another one in different pod, it accesses the service via the pod/service IP and port. It relates to the pod-pod networking below.
One implementation is using Linux Bridge. For example, create a bridge named docker0
in the root network namespace, then connect the pod's eth0
to the bridge docker0
via veth pair. The bridge connects the node's eth0
and pods' eth0
in a huge network like switch. Then the pod 172.17.0.2
can communicate with 172.17.0.3
with the support of ARP.
Pods across different nodes can communicate depending on:
- the route table configurations of these nodes
- the networking between these nodes.
For example, the route table configured on Node.1
routes the packet to its eth0
. Then networking between nodes routes the packet to Node.2
. Finally the route table configured on Node.2
routes the packet to the bridge then to the correct pod in Node.2
.
NOTE: the route table configuration can also be isolated via network namespaces.
In Kubernetes, service
is a deployment to handle the issue of pod un-durable IP address. When node reboots, application crashes and scaling would cause pod's IP address changing. The Cluster IP
of service
works as VIP. Traffic addressed to this IP will be load-balanced to the set of backing pods associated with the service
.
The route from service
to pod and the load-balanced of pods are controlled by iptables
. The packet flows like:
- The source pod's
eth0
port. i.e. src in the packet:pod-1
, dst:service-1
. -
veth0
of the Linux bridge. src and dst in the packet are same as in step #1. - The
iptables
configured in root network namespace changes the destination of the packet fromservice
IP topod
IP. src in the packet:pod-1
, dst:pod-5
. - The updated packet is sent out through node's
eth0
. - Now the packet is outside of the node of source pod. Its source is the source pod IP and destination is the destination pod IP, instead of service IP. And it works like the pod-to-pod routing above.
NOTE: The iptables
is same on all nodes. And the rules related to service-pod routing are installed by kube-proxy
.
The responding packet is with original destination pod IP as source until it reaches the node of original source pod. With the iptables on the node, the packet's source now changes to original destination service IP.
SNAT
is used here.
There are two SNAT
rewriting happening. One happens in the root network namespace which rewrites the source of packet from pod-1 IP
to node IP
. The other happens on the gateway of nodes' network which rewrites the source of packet from node IP
to gateway IP
.
The responding packet will be routed to the correct pod under the help of SNAT
.
Three solutions:
- Use
NodePort
type service - Use
LoadBalancer
type service - Use
Ingress
controller
NodePort service
Each node exports the same port for the service. The request going to the node port will re-direct to the service
, then service
routes the packet to pods as described above.
For example, with below service's definition, the packet will be routed in path: <node ip>:30123
-> <cluster ip>:80
-> <pod ip>:8080
# service.yaml
...
spec:
type: NodePort
ports:
- port: 80 # ClusterIP port
targetPort: 8080 # pod port
nodePort: 30123 # node port
LoadBalancer service
The NodePort
could be a single point of failure cause your client could point to a node for a service. If this node failed, the service cannot access.
You can create a service
of LoadBalancer
type. It's like putting a load balancer between the client and nodes. This load balancer is outside of the Kubernetes cluster.
The implementation of the LoadBalancer is provided by a cloud provider
that knows how to create a load balancer for your service.
The request reaches the LoadBalancer. The LoadBalancer picks a Node at random. The Node routes the request to the service. Then iptables
routes the packet to the pod.
Ingress controller
Ingress
controller is a layer 7 load balancer, because it is HTTP aware, knows about URLs and paths. This allows you to segment your service traffic by URL path.
Why we need Ingress
controllers? Because there is a load balancer for each LoadBalancer
service, these load balancers have public IP address each. And Ingress
only needs one public IP address to provide access for multiple services.
Ingress
controller doesn't forward the request to related service, but pick a pod endpoint based the definition of the service.