Cilium - kamialie/knowledge_corner GitHub Wiki

Cilium

General

Open-source and cloud native solution for providing, securing, and observing network connectivity between workloads. Works on top of eBPF. Was created to solve these challenges. Main advantage is the ability to define network, observability and security features directly into the kernel.

Cilium implements a simple flat Layer 3 network. By default an overlay networking model is used; requires minimal effort for deployment, only IP connectivity between hosts. Traffic is encapsulated for transport between hosts. Native routing is also supported, where regular routing tables on the nodes are used to route traffic to Pods; works well in native IPv6 networks and in conjunction with cloud network routers, or pre-existing routing daemons.

Life of a packet.

Terminology

Cilium Endpoint is application containers that share a common IP. This is similar to a Pod in Kubernetes, thus, Cilium Endpoint is essentially a Pod.

Cilium identity is determined by labels and is unique cluster-wide. An endpoint is assigned an identity based on it's security relevant labels; endpoints, which share same set of security relevant labels, will share same identity. Unique numeric identifier is associated with each identity and is what's used by eBPF programs and Hubble.

Security relevant labels are meaningful labels, which exclude metadata, e.g. creation timestamp. User has to provide list of string prefixes of meaningful label prefixes; standard behavior is id prefix - id.service1, id.service2.

Each agent is responsible to update eBPF maps with numeric identities relevant to endpoints running locally bye watching relevant Kubernetes resources.

Docs.

Architecture

Architecture diagram, component overview.

Cilium operator (a single instance) - manages duties in the cluster, which generally are handled once for entire cluster. Isn't part of critical path for forwarding or network policy decision; may also be shortly unavailable.

Cilium agent (daemonset):

  • synchronize cluster state with api server
  • load eBPF programs and update eBPF maps via Linux kernel
  • fetch newly scheduled workloads from CNI plugin executable via filesystem socket
  • create DNS and envoy proxies
  • create Hubble gRPC services when the latter is enabled

Cilium client (CLI) - installed alongside Cilium agent, mostly used to inspect the state of agent (client communicates via agent's REST API).

Cilium CNI plugin - installed by agent on node's filesystem and invoked by Kubernetes when a pod is scheduled or terminated on a node; agent also reconfigures node's CNI to make use of newly installed plugin. When required plugin can communicate with agent via filesystem socket.


Hubble server (embedded into Cilium agent) - runs on each node and retrieves eBPF-based visibility from Cilium. Offers gRPC service to retrieve flows and Prometheus metrics.

Hubble relay - standalone component, which is aware of all running Hubble servers and offers cluster-wide visibility by connecting to their respective APIs and providing an API that represents entire cluster. Acts as intermediate between Hubble gRPC services and Hubble Observers. Whenever it is enabled, Cilium agents are restarted to enable gRPC services; Hubble Observer service and Hubble Peer service are added alongside. Peer service allows relay to detect new Hubble-enabled Cilium agents. Users interact with Observer via UI and CLI.

Cilium Mesh API server (optional, when service mesh is enabled) - allows Kubernetes services to be shared amongst multiple clusters. Deploys an etcd key-value store in each cluster, to hold information about Cilium identities. Also exposes a proxy service for each of these etcd stores. Cilium agents running in any member of the same Cluster Mesh can use this service to read information about Cilium identity state globally across the mesh. This allows to create and access global services that span the Cluster Mesh. Once the Cilium Cluster Mesh API service is available, Cilium agents running in any Kubernetes cluster that is a member of the Cluster Mesh are then able to securely read from each cluster’s etcd proxy thus gaining knowledge of Cilium identity state globally across the mesh. This makes it possible to create global services that span the cluster mesh.

IPAM modes.

Network policy

Network policies define which workloads are permitted to communicate with each other. Cilium assigns an identity to a group of applications, e.g. Kubernetes labels. This allows to avoid usage of Linux firewall rules, thus, scales better. Cilium supports multiple network policy formats; all can be used at the same time, but this might lead to unintended behavior.

  • standard Kubernetes NetworkPolicy (supports layer 3 and 4)
  • CiliumNetworkPolicy (supports layer 3, 4, and 7)
  • CiliumClusterwideNetworkPolicy (applies to entire cluster)

CiliumNetworkPolicy is an extension of NetworkPolicy. Additional capabilities are (allow defining rules like "Allow HTTP GET to /foo/bar", "Require HTTP header X-Foo: in all requests"):

  • L7 HTTP policy rules, limiting Ingress and Egress to specific HTTP paths
  • Additional Layer 7 protocols, DNS, Kafka, gRPC.
  • Service name based Egress policy for internal cluster communication
  • L3/L4 Ingress/Egress using entity matching
  • L3 Ingress and Egress policy using DNS FQDN matching

When L7 policy is applied and is active for any endpoint on a node, a local-only HTTP proxy is started by Cilium, and eBPF programs are instructed to direct incoming traffic to this proxy, so the latter can interpret and apply policy rules. This proxy also provides L7 observability. Path, Method, Host, Headers can be specified to match network traffic; if omitted, all traffic is allowed. L7 policy basically extends L4 - start with L4, then add rules section to define L7 logic.

Whether to write Ingress or Egress policy depends on the intent:

  • Ingress - control which pods can initiate communication with a particular service or endpoint
  • Egress - control what a pod can send information to

Observability

Hubble is a dedicated network observability component of Cilium. Identity concept is used to easily identify and filter out traffic. Includes:

  • Visibility into Layer 3/4 (IP address and port) and Layer 7 (API protocol)
  • Event monitoring, e.g. dropped packet includes a lot of metadata such as full label information of sender and receiver
  • Prometheus metrics
  • Graphical UI for network traffic

Provides answers to:

  • Service dependency and communication map - what services communicate with each other, how frequently, what does dependency graph look like, what HTTP calls were made
  • Network monitoring and alerting - is and why network communication is failing, is it DNS, layer 4 or layer 7
  • Application monitoring - rate of 4xx or 5xx codes for a particular service, latency between applications
  • Security observability - which connections have been blocked due to network policy, which services were accessed outside of cluster

Hubble components

Hubble server - runs on each node as part of Cilium agent operations. Implements gRPC observer service, which provides access to network flows on a node; implements gRPC peer service used by Hubble Relay to discover peer Hubble servers.

Hubble peer (service) - used by Hubble Relay to discover Hubble servers.

Hubble relay (deployment) - communicates and keeps consistent gRPC API connections with Hubble server via peer service, exposes API for observability.

Hubble relay (service) - used by Hubble UI, can be exposed to be used by Hubble CLI as well.

Hubble UI (deployment) - backend for UI.

Hubble UI (service) - endpoint for client.


Hubble CLI

To install Hubble components via CLI run cilium hubble enable --ui. Hubble CLI cheatsheet.

While inside Cilium agent container run the following command to see the flow of events on that particular node:

$ hubble observe --follow

# Get help page describing all possible filter options
$ hubble observe --follow

# Example with from and to label arguments to filter out communication between
# particular set of pods
$ kubectl -n kube-system exec -ti pod/cilium-<hash> -c cilium-agent -- hubble observe --from-label "class=tiefighter" --to-label "class=deathstar" --verdict DROPPED --last 1 -o json

Hubble CLI can also be installed locally to observe cluster-wide flows.

# First hubble service should be exposed locally
$ cilium hubble port-forward &

# Verify that hubble is accessible
$ hubble status

# Sample command
$ hubble observe --to-label "class=deathstar" --verdict DROPPED --all

Network flow

Similar to network packet, but designed to help to understand how the packet flows in the cluster; includes context information: where packet is coming from, where it is going, and whether it was dropped or forwarded. Since in Kubernetes environment IP addresses are ephemeral, flows provide more durable context information. This context information can also be exposed as labels in Prometheus metrics.

Features

Other features that are enabled/provided by Cilium.

Service mesh

TODO:

Cluster mesh

Common use cases:

  • HA - running multiple clusters in different regions or zones, covers complete or temporary unavailability of failure domain or misconfiguration.
  • Shared services - a fairly common practice is to build a cluster per tenant or service, e.g. different security requirements. Managing some common services like monitoring, secrets management and so on in a single cluster, which all other services/tenants would have access to could reduce maintenance overhead.
  • Splitting Stateful from Stateless services - since stateless is much more agile, simple to scale and migrate, it is easier to also manage it separately, thus, isolating dependency complexity of stateful applications to a smaller number of cluster.

Cilium supports both Kubernetes Ingress and Gateway API to provide fully functional service mesh. Effectively multiple clusters get merged into a large unified network. Provided features:

  • pod IP routing across cluster at native performance via tunneling or direct-routing
  • Transparent discovery of globally available Kubernetes services
  • Network policy enforcement (either Kubernetes NetworkPolicy or CiliumNetworkPolicy)
  • Transparent encryption

Requirements:

  • Nodes across cluster have unique IPs and connectivity between each other
  • All clusters must be assigned unique podCIDR to avoid pod IP overlapping across the mesh
  • The network between clusters must allow inter-cluster communication so Cilium agents can access all Cluster Mesh API Servers in the mesh. In public cloud Kubernetes services ensure that firewall requirements are fulfilled. The exact requirements depend on whether Cilium is configured to run in direct-routing or tunneling mode.

Architecture is based on Cluster Mesh API server and read only etcd. Each cluster runs its own replica of these components; Cilium agents watch Cluster Mesh API server in their cluster for changes and apply them to replicate multi-cluster state (access to API server is protected via TLS cert). State from multiple clusters is never mixed; one cluster has read-only access to another cluster. Configuration occurs via Kubernetes secrets, which contain address information of the remote etcd proxies, cluster name, and certificates required for access.

Global services behavior is affected by following annotations on Kubernetes services:

  • service.cilium.io/global set to true declares service to be global (services in other clusters must be defined with identical name and namespace).
  • service.cilium.io/shared (implied true) includes a service in global load-balancing. Setting to false excludes it, if global annotation from above was set to true.
  • service.cilium.io/affinity can be set to local, remote, or none. With local setting remote endpoint are used only if local ones are unavailable or not healthy (effectively fail-over from local to remote). remote is the opposite, useful during maintenance or other expected disruptions. none is implied default.

Network policies can utilize labels that include cluster name information, effectively introducing cross-cluster rules. User is still responsible to deploy policies in the correct cluster(s) based on intent.

Transparent encryption feature must be enabled or disabled in all clusters. Otherwise cluster without encryption configuration won't be able to communicate with encrypted cluster.

During installation utilize cluster.name and cluster.id Helm chart properties to set unique cluster names. After installing Cilium Mesh in first cluster, all subsequent clusters must use same CA that Cilium generated for the first cluster, because cluster mesh uses mTLS to secure access between cluster mesh API servers. With Helm there is also an option to prepare CA separately beforehand.

$ cilium install --context=$CLUSTER2 ...
$ kubectl --context=$CLUSTER1 get secret -n kube-system cilium-ca -o yaml | kubectl --context $CLUSTER2 create -f -
$ cilium clustermesh enable --service-type NodePort --context $CLUSTER1
$ cilium clustermesh enable --service-type NodePort --context $CLUSTER2

Encryption

Transparent encryption may be needed in a situation when network that Kubernetes setup is running on is not trusted. PCI and HIPAA also now started to require encryption of data transmitted between networked services. Cilium provides this via WireGuard or IPSec (protocols that provide in-kernel transparent traffic encryption). This only encrypts traffic inside the cluster across nodes (external traffic or node-local communications are not affected)! WireGuard is a lightweight virtual Private Network solution built into Linux kernel; peer-based VPN - exchanging public keys, similar to SSH keys. IPsec is a similar, but older FIPS-compliant solution.

Configuration tutorial.

Helm values to enable encryption:

encryption:
  enabled: true
  type: wireguard

Verification steps:

# Might need to restart daemonset after Helm upgrade to apply changes
$ kubectl rollout restart daemonset/cilium -n kube-system

# Verify if # of peers is consistent with number of Cilium enabled nodes
$ kubectl exec -n kube-system -ti ds/cilium -- cilium status | grep Encryption
# Also new network device is created, cilium_wg0
$ kubectl exec -n kube-system -ti ds/cilium -- ip link | grep cilium

Load balancing

Can fully replace kube-proxy for this purpose, and even be used as a standalone load balancer; this feature is implemented in eBPF using hash tables.

kube-proxy implements Kubernetes services models by adjusting iptables ruleset, usually multiple iptables rules for each backend a service is serving; for each added service the list of rules to be traversed grows exponentially leading to performance issues in large clusters.

By default Cilium only handles per-packet in-cluster load balancing of ClusterIP services, while kube-proxy handles NodePort and LoadBalancer services, and ExternalIPs. Cilium can perform all these tasks including HostPort allocations, if containers define them.

Cilium CLI can detect absence of kube-proxy and modify Helm template configuration during installation.

Helm values example:

kubeProxyReplacement: strict
k8sServiceHost: <api-server host>
k8sServicePort: <api-server port>

Installation

Via CLI:

# Automatically reads current kubectl context and identifies cluster info
# such as kind of cluster, components present, etc
$ cilium install

# Get status on all Cilium components
$ cilium status

# Enable UI, status can be checked with previous command
cilium hubble enable --ui

# Test the setup, expect to take at least 10 minutes, 50 minutes 3-node EKS setup
$ cilium connectivity test --request-timeout 30s --connect-timeout 10s

CLI cheatsheet.

Metrics

Both Cilium and Hubble can also expose Prometheus metrics (independently of each other). Cilium provides information on how its components are operating, while Hubble provides information on network performance and flows. Configuring metrics collection.

Cilium operator and agent Prometheus metrics are enabled via Helm chart options, which starts embedded Prometheus server, and annotates pods for easy discovery. Additionally a headless cilium-agent Kubernetes Service is defined. Agent exposed metrics, Operator exposed metrics. Helm values files for enabling metrics:

prometheus:
  enabled: true
operator:
  prometheus:
    enabled: true

Hubble exposed metrics. When Hubble metrics are enabled, an annotated headless hubble-metrics Kubernetes Service is also created for Prometheus discovery. Since no Hubble metrics are enabled by default, one has to explicitly configure desired Hubble metrics with the flow context mapped to Prometheus labels. Since Hubble provides very rich context, not all information should be mapped to labels; and is configurable. Source and destination labels are configuration and can be filled in with any context information that fits best in the concrete case, e.g. sourceContext=ip for IP address to represent source. Additional labels can be populated from flow information using labelContext option. Sample Helm values files:

hubble:
  enabled: true
  metrics:
    enabled:
      - dns
      - drop
        # Add context information (as Prometheus labels)
        #- drop:sourceContext=pod;destinationContext=pod
      - tcp
      - flow
      - port-distribution
      - httpV2

Follow ups

⚠️ **GitHub.com Fallback** ⚠️