hardening テスト失敗 - tmurakam/kubespray GitHub Wiki

k8s 1.33 PR の hardning テストが失敗する 以下と同様?

調査

とりあえず、roles/kubenretes/control-plane/tasks にある kubeadm-setup.yml をちょっと直して試してみる

"Kubeadm | Initialize first control plane node" のところ。retries を外して確認。

なおタイムアウトは kubeadm_init_timeout にあるが、これは 300s。 ubuntu20-calico-all-in-one-hardning.yml でこれをぶっこめばいいのか?

エラーログ

うーん、4分たっても control plane が立ち上がってこないらしい。 そしてリトライしたときは前の残骸が残ってるから失敗する。うーん、これどうすりゃいいの?

TASK [kubernetes/control-plane : Kubeadm | Initialize first control plane node] ***
task path: /builds/kargo-ci/kubernetes-sigs-kubespray/roles/kubernetes/control-plane/tasks/kubeadm-setup.yml:168
fatal: [ubuntu-2004-0]: FAILED! => {
    "changed": true,
    "cmd": [
        "timeout",
        "-k",
        "300s",
        "300s",
        "/usr/local/bin/kubeadm",
        "init",
        "--config=/etc/kubernetes/kubeadm-config.yaml",
        "--ignore-preflight-errors=",
        "--skip-phases=addon/coredns",
        "--upload-certs"
    ],
    "delta": "0:04:03.440052",
    "end": "2025-05-13 08:58:22.471354",
    "failed_when_result": true,
    "rc": 1,
    "start": "2025-05-13 08:54:19.031302"
}
STDOUT:
[init] Using Kubernetes version: v1.33.0
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/ssl"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local lb-apiserver.kubernetes.local localhost ubuntu-2004-0 ubuntu-2004-0.cluster.local] and IPs [10.233.0.1 10.11.51.41 127.0.0.1 ::1]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [etcd etcd.kube-system etcd.kube-system.svc etcd.kube-system.svc.cluster.local localhost ubuntu-2004-0] and IPs [10.11.51.41 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [etcd etcd.kube-system etcd.kube-system.svc etcd.kube-system.svc.cluster.local localhost ubuntu-2004-0] and IPs [10.11.51.41 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests"
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 500.835042ms
[control-plane-check] Waiting for healthy control plane components. This can take up to 4m0s
[control-plane-check] Checking kube-apiserver at https://10.11.51.41:6443/livez
[control-plane-check] Checking kube-controller-manager at https://127.0.0.1:10257/healthz
[control-plane-check] Checking kube-scheduler at https://127.0.0.1:10259/livez
[control-plane-check] kube-apiserver is not healthy after 4m0.00088402s
[control-plane-check] kube-controller-manager is not healthy after 4m0.000886074s
[control-plane-check] kube-scheduler is not healthy after 4m0.001117499s
A control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
	- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
	Once you have found the failing container, you can inspect its logs with:
	- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'
STDERR:
W0513 08:54:19.055679    8699 initconfiguration.go:362] [config] WARNING: Ignored configuration document with GroupVersionKind kubeadm.k8s.io/v1beta4, Kind=UpgradeConfiguration
W0513 08:54:19.058281    8699 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [10.233.0.3]
	[WARNING SystemVerification]: cgroups v1 support is in maintenance mode, please migrate to cgroups v2
error execution phase wait-control-plane: failed while waiting for the control plane to start: [kube-apiserver check failed at https://10.11.51.41:6443/livez: Get "https://10.11.51.41:6443/livez?timeout=10s": dial tcp 10.11.51.41:6443: connect: connection refused, kube-controller-manager check failed at https://127.0.0.1:10257/healthz: Get "https://127.0.0.1:10257/healthz": dial tcp 127.0.0.1:10257: connect: connection refused, kube-scheduler check failed at https://127.0.0.1:10259/livez: Get "https://127.0.0.1:10259/livez": dial tcp 127.0.0.1:10259: connect: connection refused]
To see the stack trace of this error execute with --v=5 or higher
MSG:
non-zero return code

ローカルで動かしてみて原因がわかった。kube-apiserver が以下のエラーで exit していた。

E0513 10:15:53.465052       1 run.go:72] "command failed" err="unrecognized feature gate: AppArmor"

AppArmor feature gate を削除しないといけない。