Debug k8s issues - ganeshahv/Contrail_SRE GitHub Wiki
ubuntu@ip-172-31-32-211:~$ kubectl get nodes -o wide
Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
kubeadm reset
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
sudo chmod 777 $HOME/.kube/config
export KUBECONFIG=/etc/kubernetes/kubelet.conf
export KUBECONFIG=/home/ubuntu/.kube/config
kubectl get nodes
PersistentVolume and PersistentVolumeClaim stuck in Terminating state
ubuntu@ip-172-31-32-211:~$ kubectl get pv
NAME      CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS        CLAIM             STORAGECLASS   REASON   AGE
10gpv01   10Gi       RWO            Retain           Terminating   default/myclaim                           5h55m
pvvol-1   1Gi        RWX            Retain           Available                                               65m
ubuntu@ip-172-31-32-211:~$ kubectl get pvc
NAME      STATUS        VOLUME    CAPACITY   ACCESS MODES   STORAGECLASS   AGE
myclaim   Terminating   10gpv01   10Gi       RWO                           5h39m
Edit the pv and the pvc resource and delete the finalizers stanza. Finalizers are arbitrary string values, that when present ensure that a hard delete of a resource is not possible while they exist.
ubuntu@ip-172-31-32-211:~$ kubectl edit pv 10gpv01
persistentvolume/10gpv01 edited
ubuntu@ip-172-31-32-211:~$ kubectl edit pvc myclaim
persistentvolumeclaim/myclaim edited
Unable to do a curl to the master k8s node when using a Ingress. Only the worker node is listed in the endpoint info.
ubuntu@ip-172-31-32-211:~$ curl  -H "Host: www.giri.com" http://<master_node_public_IP> 
curl: (7) Failed to connect to 13.235.214.225 port 80: Connection refused
ubuntu@ip-172-31-32-211:~$ kubectl describe ep -n kube-system traefik-ingress-service
Name:         traefik-ingress-service
Namespace:    kube-system
Labels:       <none>
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2020-08-31T07:18:48Z
apiVersion: v1
Subsets:
  Addresses:          192.168.89.13
  NotReadyAddresses:  <none>
  Ports:
    Name   Port  Protocol
    ----   ----  --------
    admin  8080  TCP
    web    80    TCP
Events:  <none>
The master node had a taint and hence the ingress-controllers were not launching on it.
Only one pod was being launched on the worker node.
This is because Daemonset launched one pod on all the nodes,
unlike Deployment where replicas are launched based on the user input.
ubuntu@ip-172-31-32-211:~$ kubectl get po -n kube-system -o wide | grep traefi
traefik-ingress-controller-lv9nv           1/1     Running   0  
ubuntu@ip-172-31-32-211:~$     kubectl describe nodes | grep -i taint
Taints:             node-role.kubernetes.io/master:NoSchedule
Removed the taint on the master node, which created the pod on both the nodes.
ubuntu@ip-172-31-32-211:~$ kubectl get po -n kube-system -o wide | grep traefi
traefik-ingress-controller-d8rfb           1/1     Running   0          25m   172.31.38.235   ip-172-31-38-235   <none>           <none>
traefik-ingress-controller-ggdmj           1/1     Running   0          25m   172.31.32.211   ip-172-31-32-211   <none>           <none>
ubuntu@ip-172-31-32-211:~$ kubectl describe ep -n kube-system traefik-ingress-service
Name:         traefik-ingress-service
Namespace:    kube-system
Labels:       <none>
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2020-08-31T08:57:25Z
Subsets:
  Addresses:          172.31.32.211,172.31.38.235
  NotReadyAddresses:  <none>
  Ports:
    Name   Port  Protocol
    ----   ----  --------
    admin  8080  TCP
    web    80    TCP
Events:  <none>
ubuntu@ip-172-31-32-211:~$
ubuntu@ip-172-31-32-211:~$ curl  -H "Host: www.shourya.com" http://k8smaster
<!DOCTYPE html>
<html>
<head>
<titleShourya</title>
coredns-pods are stuck in ContainerCreating status
Delete the pods and check to see they show a Running state.
ubuntu@ip-172-31-32-211:~$ kubectl get nodes
Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
- On both the master and worker nodes, do a 
kubeadm reset. - On both the master and worker nodes, clean the CNI configuration, pki files, 
etcddirectory and flush the IP Tables. 
rm -rf /etc/cni/net.d
rm -rf /etc/kubernetes/pki/*
rm -rf /var/lib/etcd/*
iptables --flush
- On the master node, clean up the conf file
 
rm -rf /home/ubuntu/.kube/config
- Do a 
kubeadm initon the master. - Do a 
kubeadm joinon the worker. - On the master node, do the following:
 
 mkdir -p $HOME/.kube
 sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
 sudo chown $(id -u):$(id -g) $HOME/.kube/config
- Enable CNI by doing a 
kubectl apply -f calico.yamlon the master node.