Photon OS troubleshooting: starting control plane in 'kind create cluster' fails with 'failed to init node with kubeadm' - dcasota/photonos-scripts GitHub Wiki
20.03.2022:
Recently kind create cluster
failed with an error 'failed to init node with kubeadm' on a rpi4 (arm64) installation.
issue.txt
Before that I've reinstalled an older ca-certificates package. Also, the rpi4 has booted several times.
I didn't investigate too much time, because the setup is a homelab installation, and I've thought that downgrading the ca-certificates package was the root cause. Hence I've updated it to the latest version, but the issue still persisted.
systemctl status kubelet
showed that the service died with an error "Error getting node" err="node "127.0.0.1" not found". A restart didn't help.
I've uninstalled kubernetes, and installed it again, without luck.
I've noticed that kind get clusters
showed up with the cluster kind
. I've deleted it. This time kind get clusters
displayed 'no kind clusters found.'.
Kubelet still had the issue of error getting node.
root@photon-f72a20e6dbfc [ /etc/systemd ]# systemctl status kubelet
● kubelet.service - Kubernetes Kubelet Server
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2022-03-20 14:27:31 CET; 1s ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Main PID: 17604 (kubelet)
Tasks: 14 (limit: 9347)
Memory: 59.0M
CGroup: /system.slice/kubelet.service
└─17604 /usr/bin/kubelet --logtostderr=true --v=0 --kubeconfig=/etc/kubernetes/kubeconfig --address=127.0.0.1 --hostname-override=127.0.0.1
Mär 20 14:27:33 photon-f72a20e6dbfc kubelet[17604]: E0320 14:27:33.096235 17604 controller.go:144] failed to ensure lease exists, will retry in 800ms, error: Get "http://127.0.0.1:8080/apis/>
Mär 20 14:27:33 photon-f72a20e6dbfc kubelet[17604]: E0320 14:27:33.188341 17604 kubelet.go:2412] "Error getting node" err="node \"127.0.0.1\" not found"
Mär 20 14:27:33 photon-f72a20e6dbfc kubelet[17604]: E0320 14:27:33.288772 17604 kubelet.go:2412] "Error getting node" err="node \"127.0.0.1\" not found"
Mär 20 14:27:33 photon-f72a20e6dbfc kubelet[17604]: E0320 14:27:33.389234 17604 kubelet.go:2412] "Error getting node" err="node \"127.0.0.1\" not found"
Mär 20 14:27:33 photon-f72a20e6dbfc kubelet[17604]: I0320 14:27:33.416244 17604 kubelet_node_status.go:71] "Attempting to register node" node="127.0.0.1"
Mär 20 14:27:33 photon-f72a20e6dbfc kubelet[17604]: E0320 14:27:33.417092 17604 kubelet_node_status.go:93] "Unable to register node with API server" err="Post \"http://127.0.0.1:8080/api/v1/>
Mär 20 14:27:33 photon-f72a20e6dbfc kubelet[17604]: E0320 14:27:33.464095 17604 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node>
Mär 20 14:27:33 photon-f72a20e6dbfc kubelet[17604]: E0320 14:27:33.489882 17604 kubelet.go:2412] "Error getting node" err="node \"127.0.0.1\" not found"
Mär 20 14:27:33 photon-f72a20e6dbfc kubelet[17604]: E0320 14:27:33.590373 17604 kubelet.go:2412] "Error getting node" err="node \"127.0.0.1\" not found"
Mär 20 14:27:33 photon-f72a20e6dbfc kubelet[17604]: E0320 14:27:33.691013 17604 kubelet.go:2412] "Error getting node" err="node \"127.0.0.1\" not found"
I've stopped kubelet, and removed kubernetes again tdnf remove kubernetes
.
In addition, I've deleted some files which I've found on the filesystem.
rm /usr/libexec/kubernetes
rm -rf /usr/share/gocode/pkg/mod/github.com/
rm -rf /usr/share/gocode/pkg/mod/cache/download/k8s.io/kubernetes
I've deleted all docker images, containers and volumes and removed docker.
After that, I've installed kubernetes and docker again. Now the installation worked.
root@photon-f72a20e6dbfc [ / ]# kind create cluster
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.21.1) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:
kubectl cluster-info --context kind-kind
Thanks for using kind! 😊
root@photon-f72a20e6dbfc [ / ]#
The kubelet issue still persisted. kubeadm certs check-expiration
resulted that all certificates were missing.
I haven't the experience yet, to drill down more efficient the root cause and to fix it with minimal impact. For now, in the homelab, I redeployed the setup.