Fedora k8s - hpaluch/hpaluch.github.io GitHub Wiki

Fedora k8s

Single node k8s on Fedora 41.

K8s Web Dashboard

Generally I followed official guide on: https://docs.fedoraproject.org/en-US/quick-docs/using-kubernetes-kubeadm/ with few exceptions. Please read detailed setup guide on: https://github.com/hpaluch/k8s-wordpress/blob/master/README.md#setup-k8s-with-kubeadm-on-fedora-41

[!WARNING] I recommend using "short" hostname in system - without domain! RedHat in the past used FQDN in hostname which causes sometimes problems - some components are too smart and strip everything after . in hostname but some not - it typically creates problem that something matching hostname does not work properly.

Added metrics server, variant 1 using workaround from https://github.com/kubernetes-sigs/metrics-server/issues/1221:

# tested on k8s v1.32
sudo dnf install helm
helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
helm install metrics-server metrics-server/metrics-server --set args="{--kubelet-insecure-tls}" -n kube-system

Variant 2 - troublesome: Added metrics server from: https://devopscube.com/setup-kubernetes-cluster-kubeadm/ using:

curl -fLO https://raw.githubusercontent.com/techiescamp/kubeadm-scripts/main/manifests/metrics-server.yaml
kubectl apply -f metrics-server.yaml

Added simple test nginx from same page using nginx.yaml:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx:latest
          ports:
            - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  selector:
    app: nginx
  type: NodePort
  ports:
    - port: 80
      targetPort: 80
      nodePort: 32000

And applied it with kubectl apply -f nginx.yaml

Tip: See also my project https://github.com/hpaluch/k8s-wordpress for WordPress deployment on single-node k8s cluster.

Problem: pod(s) stuck in ContainerCreating state

But there was problem - both metrics and nginx were in state ContainerCreating:

$ kubectl get po -A

NAMESPACE      NAME                                          READY   STATUS              RESTARTS   AGE
default        nginx-deployment-7c79c4bf97-2h4bs             0/1     ContainerCreating   0          9m49s
default        nginx-deployment-7c79c4bf97-z8pdp             0/1     ContainerCreating   0          9m49s
kube-flannel   kube-flannel-ds-p97kl                         1/1     Running             0          12m
kube-system    coredns-76f75df574-6gzgp                      1/1     Running             0          21m
kube-system    coredns-76f75df574-kw699                      1/1     Running             0          21m
kube-system    etcd-fed-k8s.example.com                      1/1     Running             0          21m
kube-system    kube-apiserver-fed-k8s.example.com            1/1     Running             0          21m
kube-system    kube-controller-manager-fed-k8s.example.com   1/1     Running             0          21m
kube-system    kube-proxy-k6g8s                              1/1     Running             0          21m
kube-system    kube-scheduler-fed-k8s.example.com            1/1     Running             0          21m
kube-system    metrics-server-d4dc9c4f-97d4f                 0/1     ContainerCreating   0          10m

When I tried describe:

$ kubectl describe po nginx-deployment-7c79c4bf97-2h4bs

...
 Warning  FailedCreatePodSandBox  13m                   kubelet            \
  Failed to create pod sandbox: rpc error: code = Unknown \
  desc = failed to create pod network sandbox k8s_nginx-deployment-7c79c4bf97-\
   2h4bs_default_925a62ee-bf5e-4a5c-bb67-7836e179a41e_0(d70b4c9cd349833b21ce1c\
   83bc74b1949a3fbe730fd31b7629982298a09bc9b8): error adding pod \
   default_nginx-deployment-7c79c4bf97-2h4bs to CNI network "cbr0": \
   plugin type="flannel" failed (add): failed to set bridge addr: "cni0" \
   already has an IP address different from 10.244.0.1/24

Googling revealed page: https://devops.stackexchange.com/questions/14891/cni0-already-has-an-ip-address There is tip to use sudo ip link delete cni0 type bridge but I rather recommend rebooting system (deleteing bridge will cause other issues when creating containers)...

And that helped - hoping that it was only transient bug.

Metrics server forbidden

Now metrics server is running but not working properly. Found:

$ kubectl logs -n kube-system metrics-server-d4dc9c4f-97d4f

I1225 14:21:01.208774       1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E1225 14:21:08.871515       1 scraper.go:149] "Failed to scrape node" \
                            err="request failed, status: \"403 Forbidden\"" node="fed-k8s.example.com"

Auditing API Server (dit not help)

It was "red hering" - because 403 Forbidden was returned by Kubelet instead of API Server(!). Please skip to next section for fix...

[!CAUTION] You can easily crash API server and make your whole k8s totally unusable! Use on YOUR OWN RISK!

I though that metrics 403 error was reported by API server, but it turned out to be not true. I followed https://kubernetes.io/docs/tasks/debug/debug-cluster/_print/#audit-policy

create new file /etc/kubernetes/audit-policy.yaml with contents:

# Log all requests at the Metadata level.
# https://kubernetes.io/docs/tasks/debug/debug-cluster/_print/
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata

copied cp /etc/kubernetes/manifests/kube-apiserver.yaml /home/USER/
applied following diff:

--- kube-apiserver.yaml.orig	2024-12-25 19:52:11.475000000 +0100
+++ kube-apiserver.yaml	2024-12-25 19:55:18.808000000 +0100
@@ -40,6 +40,8 @@
     - --service-cluster-ip-range=10.96.0.0/12
     - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
     - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
+    - --audit-policy-file=/etc/kubernetes/audit-policy.yaml
+    - --audit-log-path=/var/log/kubernetes/audit/audit.log
     image: registry.k8s.io/kube-apiserver:v1.29.12
     imagePullPolicy: IfNotPresent
     livenessProbe:
@@ -85,6 +87,12 @@
     - mountPath: /etc/kubernetes/pki
       name: k8s-certs
       readOnly: true
+    - mountPath: /etc/kubernetes/audit-policy.yaml
+      name: audit
+      readOnly: true
+    - mountPath: /var/log/kubernetes/audit/
+      name: audit-log
+      readOnly: false
   hostNetwork: true
   priority: 2000001000
   priorityClassName: system-node-critical
@@ -104,4 +112,12 @@
       path: /etc/kubernetes/pki
       type: DirectoryOrCreate
     name: k8s-certs
+  - name: audit
+    hostPath:
+      path: /etc/kubernetes/audit-policy.yaml
+      type: File
+  - name: audit-log
+    hostPath:
+      path: /var/log/kubernetes/audit/
+      type: DirectoryOrCreate
 status: {}

now prepare log directory:

sudo  196  mkdir -p /var/log/kubernetes/audit/
sudo chmod a+rwxt /var/log/kubernetes/audit/

WARNING! Now extremely dangerous step !!!
copy back modified kube-apiserver.yaml to /etc/kubernetes/manifests/
K8s will detect manifest change and re-deploying API Server POD - K8s will be completely unavailable for some time.
if there was not error you should see quickly increasing file /var/log/kubernetes/audit/audit.log

to make it easier to read I passed it through jq:

jq < /var/log/kubernetes/audit/audit.log | tee /home/USER/audit-log.json

but I quickly found that API Server is not sending 403:

$ fgrep code audit-log.json | sort -u 

  "code": 200
  "code": 201
  "code": 404
  "code": 500

So I decided to look into source:

in my metrics-server.yaml I found image: registry.k8s.io/metrics-server/metrics-server:v0.7.1
so it should be following tag: https://github.com/kubernetes-sigs/metrics-server/releases/tag/v0.7.1
Failed to scrape node is in pkg/scraper/scraper.go
collectNode() contains c.kubeletClient.GetMetrics(ctx, node)
looking for magic /metrics/resource path found something in KNOWN_ISSUES

# returns prometheus metrics
kubectl get --raw /api/v1/nodes/`hostname`/proxy/metrics/resource
# returns json
kubectl get --raw /api/v1/nodes/`hostname`/proxy/stats/summary | jq

Finally got something:

curl -fsSkv https://`hostname -i`:10250/metrics/resource

< HTTP/2 403 
< content-type: text/plain; charset=utf-8
< content-length: 80
< date: Wed, 25 Dec 2024 19:51:55 GMT
* The requested URL returned error: 403

And:

netstat -anp | fgrep 10250
tcp        0      0 10.244.0.1:54188        10.244.0.212:10250      TIME_WAIT   -
tcp6       0      0 :::10250                :::*                    LISTEN      824/kubelet
tcp6       0      0 192.168.122.92:10250    10.244.0.212:54492      ESTABLISHED 824/kubelet

Fix for metrics 403 Forbidden

Kubelet is standalone process that manages containers:

/usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --fail-swap-on=false --pod-manifest-path=/etc/kubernetes/manifests --cluster-dns=10.96.0.10 --cluster-domain=cluster.local --authorization-mode=Webhook --client-ca-file=/etc/kubernetes/pki/ca.crt --cgroup-driver=systemd --container-runtime-endpoint=unix:///var/run/crio/crio.sock --pod-infra-container-image=registry.k8s.io/pause:3.9

And that's exactly difference between Ubuntu (where it works) and Fedora - the --authorization-mode=Webhook (Ubuntu is simply omitting this argument switching to default AlwaysAllow)... On https://serverfault.com/a/1166711 there is recommendation to replace --authorization-mode=Webhook with -authorization-mode=AlwaysAllow, but it is strongly discouraged for production (not my case :-)

Resulting change in /etc/systemd/system/kubelet.service.d/override.conf

[Service]
Environment="KUBELET_AUTHZ_ARGS=--authorization-mode=AlwaysAllow --client-ca-file=/etc/kubernetes/pki/ca.crt"

And sudo systemctl daemon-reload && sudo systemctl restart kubelet - or rather restart whole system...

Once metrics works you can use for example these commands (it make take several minutes before 1st metrics data come):

$ kubectl top nodes

NAME       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
fed-k8s2   137m         6%     1087Mi          27%  

$ kubectl top pods -A 

NAMESPACE      NAME                               CPU(cores)   MEMORY(bytes)   
kube-flannel   kube-flannel-ds-j9ksd              8m           43Mi            
kube-system    coredns-76f75df574-k4tbs           2m           12Mi            
kube-system    coredns-76f75df574-zsdwh           2m           56Mi            
kube-system    etcd-fed-k8s2                      15m          65Mi       
...

Installing web dashboard

K8s includes "Dashboard UI" - we will more or less follow: https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/ to install it:

sudo dnf install helm
helm repo add kubernetes-dashboard https://kubernetes.github.io/dashboard/
helm upgrade --install kubernetes-dashboard \
    kubernetes-dashboard/kubernetes-dashboard \
    --create-namespace --namespace kubernetes-dashboard
# poll command below until all PODs are "Running"
kubectl get pod -n kubernetes-dashboar

To access Dashboard you have to create temporary proxy using:

kubectl -n kubernetes-dashboard port-forward svc/kubernetes-dashboard-kong-proxy 8443:443

However it will forward only from localhost. You have two choices to connect:

create SSH tunnel from Workstation to K8s server - adding to appropriate entry in ~/.ssh/config line: LocalForward 127.0.0.1:8443 127.0.0.1:8443, then logged to my K8s node and run above command
or install on your Workstation K8s kubectl that has Proxy capability

To test later option I did:

query on your k8s server kubectl version:

# run on k8s server:

$ kubectl version

Client Version: v1.29.11
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.12

now install on your Workstation kubectl following https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/#install-kubectl-binary-with-curl-on-linux

but remember to replace v1.29.0 in URL below with version matching your k8s deployment.

# run on Workstation

mkdir -p ~/bin
curl -fL -o ~/bin/kubectl  https://dl.k8s.io/release/v1.29.0/bin/linux/amd64/kubectl
~/bin/kubectl version

Client Version: v1.29.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
The connection to the server localhost:8080 was refused - did you specify the right host or port?

to be able to access our K8s server we need to copy ~/.kube/config from k8s server to Workstation using:
```
# run on Workstation
mkdir -p ~/.kube
scp IP_OF_YOUR_K8S_SERVER:.kube/config ~/.kube/
```

when you run again kubectl on your workstation it should also report server version, for example:

# run on Workstation

$ kubectl version

Client Version: v1.29.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.12

now run on your Workstation tunnel command:

# run on Workstation
kubectl -n kubernetes-dashboard port-forward svc/kubernetes-dashboard-kong-proxy 8443:443

and run browser on your Workstation with URL: https://127.0.0.1:8443/

you should get advice how to generate token

# run on any Node: workstation or k8s server:
kubectl -n NAMESPACE create token SERVICE_ACCOUNT

we know namespace but not SERVICE_ACCOUNT
we have to follow https://github.com/kubernetes/dashboard/blob/master/docs/user/access-control/creating-sample-user.md to create admin-user with cluster-admin role:

Create ui-admin-user.yaml with contents:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: admin-user
  namespace: kubernetes-dashboard

And apply it with: kubectl apply -f ui-admin-user.yaml

Now verify that ClusterRole named cluster-admin exists:

$ kubectl get clusterrole -A cluster-admin

NAME            CREATED AT
cluster-admin   2024-12-26T07:30:51Z

Create file ui-admin-binding.yaml with contents:

piVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: admin-user
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: admin-user
  namespace: kubernetes-dashboard

And apply it with kubectl apply -f ui-admin-binding.yaml

Finally we can generate token:

kubectl -n kubernetes-dashboard create token admin-user

On your browser paste this very looong token to textbox line Bearer token * (it is easy to overlook it - it does not look like textbox - but like horizontal line!).

Now you will be greeted with Web-UI - but please note that you have to select proper Namespace on left-top corner or All namespaces to list objects in all namespaces.

Now you should see similar view as on top of this page:

K8s Web Dashboard

And that's it!

Monitoring with Prometheus + Grafana

I just followed guide on https://medium.com/@muppedaanvesh/a-hands-on-guide-to-kubernetes-monitoring-using-prometheus-grafana-%EF%B8%8F-b0e00b1ae039