Pod related issues - ShreyaVirmani/extra GitHub Wiki

Pod Related Issues


1. Router/Registry Not deploying to correct node

Ensure that registry with the --selector flag is created.

   --credentials=/etc/origin/master/openshift-registry.kubeconfig --replicas=2 \
   --images='registry.access.redhat.com/openshift3/ose-${component}:${version}' --selector='region=Infra'

Solution
To see if the selector is specified or not, type oc edit dc/docker-registry and search for nodeSelector. If this value is not defined, it is easier to delete the entire registry and re-create it with the --selector flag seen above.

2. Application Pod fails to deploy


This can be caused by DNS issues. Turn on the debugging on your DNS server. If using bind:
rdnc querylog
Then tail /var/log/messages if you see entries like this:

github.com.ose.example.com

The likely cause is that your DNS wild card is redirecting this causing the build to fail.

3. Pod Restarting Redundantly


If the pod is in the CrashLoopBackOff state it means that the pods  are starting, crashing, starting again, and then crashing again repeatedly.
If the Back-Off restarting failed container output message is received, then that container probably exited soon after Kubernetes started the container.

To look for errors in the logs of the current pod, run the following command:
$ oc logs YOUR_POD_NAME

To look for errors in the logs of the previous pod that crashed, run the following command:
$ oc logs --previous YOUR-POD_NAME

Note: For a multi-container pod, you can append the container name at the end. For example:
$ oc logs POD_NAME CONTAINER_NAME

If the Liveness probe isn't returning a successful status, verify that the Liveness probe is configured correctly for the application. 

If your pod is still stuck after completing the above steps , try the following steps:
1.    To confirm that worker nodes exist in the cluster and are in Ready status (which allow pods to be scheduled on it), run the following command:
$ oc get nodes The output should look similar to the following:

NAME                                          STATUS   ROLES    AGE   VERSION
ip-192-168-6-51.us-east-2.compute.internal    Ready    <none>   25d   v1.14.6-eks-5047ed
ip-192-168-86-33.us-east-2.compute.internal   Ready    <none>   25d   v1.14.6-eks-5047ed

If the nodes are not in the cluster, add worker nodes.

If the nodes are NotReady or can't join the cluster, see https://aws.amazon.com/premiumsupport/knowledge-center/eks-node-status-ready/

2.    To check the version of the oc cluster, run the following command:

$ oc version 
Client Version: openshift-clients-4.4.0-202006211643.p0-2-gd89e458c3
Server Version: 4.4.15

3.    To check the version of the oc worker node, run the following command:

$ oc get node -o custom-columns=NAME:.metadata.name,VERSION:.status.nodeInfo.kubeletVersion
The output should look similar to the following:

NAME                                               VERSION
ip-192-168-6-51.us-east-2.compute.internal    v1.14.6-eks-5047ed
ip-192-168-86-33.us-east-2.compute.internal   v1.14.6-eks-5047ed

4.  Based on the output from steps 2 and 3, confirm that the oc server version for the cluster matches the version of the worker nodes within an acceptable version skew.

Important: The patch versions can be different (for example, v4.4.x for the cluster vs. v4.4.y for the worker node).

If the cluster and worker node versions are incompatible, create a new node group.
--or--
Create a new managed node group using a compatible oc version Then, delete the node group with the incompatible oc version.

5.  To confirm that the oc  control plane can communicate with the worker nodes, verify firewall rules against the recommended rules and then verify that the nodes are in Ready status.

4. Pods are not running with 'restricted' SCC by default


All Pods (oc cluster 4.3) that are started without an SCC defined should adopt the default restricted SCC. What to do if the Pods have started running with the anyuid or another OpenShift system SCC from the list below:
anyuid, hostaccess, hostmount-anyuid, hostnetwork, node-exporter, nonroot, privileged, restricted

**Solution **

Note: Resetting the system SCCs within the cluster may result in application Pods being unable to be scheduled and should be done during a cluster maintenance period.

Please ensure that appropriate backups are taken before resetting, this can be achieved using the following commands:

for SCC in restricted anyuid hostaccess hostmount-anyuid hostnetwork nonroot privileged;
  do oc get scc $SCC -o yaml | tee $SCC_backup.yaml;
  do oc get scc $SCC -o json | tee $SCC_backup.json;
done

After this has successfully completed, you will be able to rollback to the OpenShift system SCCs.

To reset them, run the following command:

DEFAULT_SCC_YAML=https://access.redhat.com/sites/default/files/attachments/4.3-default-scc-list_0.yml
curl $DEFAULT_SCC_YAML | oc apply -f -

As noted above, this may result in application Pods not being able to be scheduled. This may be due to Pods being reliant on changes to these SCCs.
To resolve this, new SCCs must be created with the required permissions / user and applied to these Deployments.
If you require to rollback your SCCs at any stage, go into the folder containing the SCC backups and run the following commands:

for SCC in restricted anyuid hostaccess hostmount-anyuid hostnetwork nonroot privileged;
  do oc apply -f $SCC_backup.yaml;
done

5. Binary Build Fails, citing "BadRequest"


Running of a binary build using:

oc start-build my-app --from-dir=./build-dir

Fails with the following error message:

Uploading directory "oc-build" as binary input for the build ...
Error from server (BadRequest): cannot upload file to build sprint-rest-2 with status New

Solution

It’s likely that there is a problem with one of your ImageStream objects. Take a look at your buildConfig

$ oc describe bc/spring-rest
Name:                spring-rest
Namespace:           spring-rest-dev
Created:             29 minutes ago
Labels:              application=spring-rest
                     template=generic-java-jenkins-pipeline
Annotations:         kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"BuildConfig","metadata":{"annotations":{},"labels":{"application":"spring-rest","template":"generic-java-jenkins-pipeline"},"name":"spring-rest","namespace":"spring-rest-dev"},"spec":{"output":{"to":{"kind":"ImageStreamTag","name":"spring-rest:latest"}},"source":{"binary":{},"type":"Binary"},"strategy":{"sourceStrategy":{"from":{"kind":"ImageStreamTag","name":"redhat-openjdk18-openshift:1.1","namespace":"openshift"}},"type":"Source"}}}


Latest Version:    1


Strategy:         Source
From Image:       ImageStreamTag openshift/redhat-openjdk18-openshift:1.1
Output to:        ImageStreamTag spring-rest:latest
Binary:           provided on build


Build Run Policy:  Serial
Triggered by:    <none>


Build                 Status           Duration            Creation Time
spring-rest-1         complete         50s                 2020-08-28 20:55:21 -0500 EST


Events:  <none>
Notice the two lines that say From Image: and Output to:. Its likely that one of those imagestreams are either misspelled, or have not yet been created. Ensure your imagestreams are created and correct, and try running the build again.
⚠️ **GitHub.com Fallback** ⚠️