Photon OS troubleshooting: starting control plane in 'kind create cluster' fails with 'failed to init node with kubeadm' - dcasota/photonos-scripts GitHub Wiki

20.03.2022: Recently kind create cluster failed with an error 'failed to init node with kubeadm' on a rpi4 (arm64) installation. issue.txt

Before that I've reinstalled an older ca-certificates package. Also, the rpi4 has booted several times.

I didn't investigate too much time, because the setup is a homelab installation, and I've thought that downgrading the ca-certificates package was the root cause. Hence I've updated it to the latest version, but the issue still persisted.

systemctl status kubelet showed that the service died with an error "Error getting node" err="node "127.0.0.1" not found". A restart didn't help. I've uninstalled kubernetes, and installed it again, without luck. I've noticed that kind get clustersshowed up with the cluster kind. I've deleted it. This time kind get clustersdisplayed 'no kind clusters found.'.

Kubelet still had the issue of error getting node.

root@photon-f72a20e6dbfc [ /etc/systemd ]# systemctl status kubelet
● kubelet.service - Kubernetes Kubelet Server
     Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
     Active: active (running) since Sun 2022-03-20 14:27:31 CET; 1s ago
       Docs: https://github.com/GoogleCloudPlatform/kubernetes
   Main PID: 17604 (kubelet)
      Tasks: 14 (limit: 9347)
     Memory: 59.0M
     CGroup: /system.slice/kubelet.service
             └─17604 /usr/bin/kubelet --logtostderr=true --v=0 --kubeconfig=/etc/kubernetes/kubeconfig --address=127.0.0.1 --hostname-override=127.0.0.1

Mär 20 14:27:33 photon-f72a20e6dbfc kubelet[17604]: E0320 14:27:33.096235   17604 controller.go:144] failed to ensure lease exists, will retry in 800ms, error: Get "http://127.0.0.1:8080/apis/>
Mär 20 14:27:33 photon-f72a20e6dbfc kubelet[17604]: E0320 14:27:33.188341   17604 kubelet.go:2412] "Error getting node" err="node \"127.0.0.1\" not found"
Mär 20 14:27:33 photon-f72a20e6dbfc kubelet[17604]: E0320 14:27:33.288772   17604 kubelet.go:2412] "Error getting node" err="node \"127.0.0.1\" not found"
Mär 20 14:27:33 photon-f72a20e6dbfc kubelet[17604]: E0320 14:27:33.389234   17604 kubelet.go:2412] "Error getting node" err="node \"127.0.0.1\" not found"
Mär 20 14:27:33 photon-f72a20e6dbfc kubelet[17604]: I0320 14:27:33.416244   17604 kubelet_node_status.go:71] "Attempting to register node" node="127.0.0.1"
Mär 20 14:27:33 photon-f72a20e6dbfc kubelet[17604]: E0320 14:27:33.417092   17604 kubelet_node_status.go:93] "Unable to register node with API server" err="Post \"http://127.0.0.1:8080/api/v1/>
Mär 20 14:27:33 photon-f72a20e6dbfc kubelet[17604]: E0320 14:27:33.464095   17604 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node>
Mär 20 14:27:33 photon-f72a20e6dbfc kubelet[17604]: E0320 14:27:33.489882   17604 kubelet.go:2412] "Error getting node" err="node \"127.0.0.1\" not found"
Mär 20 14:27:33 photon-f72a20e6dbfc kubelet[17604]: E0320 14:27:33.590373   17604 kubelet.go:2412] "Error getting node" err="node \"127.0.0.1\" not found"
Mär 20 14:27:33 photon-f72a20e6dbfc kubelet[17604]: E0320 14:27:33.691013   17604 kubelet.go:2412] "Error getting node" err="node \"127.0.0.1\" not found"

I've stopped kubelet, and removed kubernetes again tdnf remove kubernetes. In addition, I've deleted some files which I've found on the filesystem.

rm /usr/libexec/kubernetes
rm -rf /usr/share/gocode/pkg/mod/github.com/
rm -rf /usr/share/gocode/pkg/mod/cache/download/k8s.io/kubernetes

I've deleted all docker images, containers and volumes and removed docker.

After that, I've installed kubernetes and docker again. Now the installation worked.

root@photon-f72a20e6dbfc [ / ]# kind create cluster
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.21.1) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Thanks for using kind! 😊
root@photon-f72a20e6dbfc [ / ]#

The kubelet issue still persisted. kubeadm certs check-expiration resulted that all certificates were missing. I haven't the experience yet, to drill down more efficient the root cause and to fix it with minimal impact. For now, in the homelab, I redeployed the setup.