Pod related issues - ShreyaVirmani/extra GitHub Wiki
Ensure that registry with the --selector flag is created.
--credentials=/etc/origin/master/openshift-registry.kubeconfig --replicas=2 \
--images='registry.access.redhat.com/openshift3/ose-${component}:${version}' --selector='region=Infra'
Solution
To see if the selector is specified or not, type oc edit dc/docker-registry
and search for nodeSelector
. If this value is not defined, it is easier to delete the entire registry and re-create it with the --selector
flag seen above.
This can be caused by DNS issues. Turn on the debugging on your DNS server. If using bind:
rdnc querylog
Then tail /var/log/messages
if you see entries like this:
github.com.ose.example.com
The likely cause is that your DNS wild card is redirecting this causing the build to fail.
If the pod is in the CrashLoopBackOff
state it means that the pods are starting, crashing, starting again, and then crashing again repeatedly.
If the Back-Off restarting failed container
output message is received, then that container probably exited soon after Kubernetes started the container.
To look for errors in the logs of the current pod, run the following command:
$ oc logs YOUR_POD_NAME
To look for errors in the logs of the previous pod that crashed, run the following command:
$ oc logs --previous YOUR-POD_NAME
Note: For a multi-container pod, you can append the container name at the end. For example:
$ oc logs POD_NAME CONTAINER_NAME
If the Liveness probe isn't returning a successful status, verify that the Liveness probe is configured correctly for the application.
If your pod is still stuck after completing the above steps , try the following steps:
1. To confirm that worker nodes exist in the cluster and are in Ready status (which allow pods to be scheduled on it), run the following command:
$ oc get nodes
The output should look similar to the following:
NAME STATUS ROLES AGE VERSION
ip-192-168-6-51.us-east-2.compute.internal Ready <none> 25d v1.14.6-eks-5047ed
ip-192-168-86-33.us-east-2.compute.internal Ready <none> 25d v1.14.6-eks-5047ed
If the nodes are not in the cluster, add worker nodes.
If the nodes are NotReady or can't join the cluster, see https://aws.amazon.com/premiumsupport/knowledge-center/eks-node-status-ready/
2. To check the version of the oc cluster, run the following command:
$ oc version
Client Version: openshift-clients-4.4.0-202006211643.p0-2-gd89e458c3
Server Version: 4.4.15
3. To check the version of the oc worker node, run the following command:
$ oc get node -o custom-columns=NAME:.metadata.name,VERSION:.status.nodeInfo.kubeletVersion
The output should look similar to the following:
NAME VERSION
ip-192-168-6-51.us-east-2.compute.internal v1.14.6-eks-5047ed
ip-192-168-86-33.us-east-2.compute.internal v1.14.6-eks-5047ed
4. Based on the output from steps 2 and 3, confirm that the oc server version for the cluster matches the version of the worker nodes within an acceptable version skew.
Important: The patch versions can be different (for example, v4.4.x for the cluster vs. v4.4.y for the worker node).
If the cluster and worker node versions are incompatible, create a new node group.
--or--
Create a new managed node group using a compatible oc version
Then, delete the node group with the incompatible oc version
.
5. To confirm that the oc
control plane can communicate with the worker nodes, verify firewall rules against the recommended rules and then verify that the nodes are in Ready status.
All Pods (oc cluster 4.3) that are started without an SCC defined should adopt the default restricted SCC. What to do if the Pods have started running with the anyuid or another OpenShift system SCC from the list below:
anyuid, hostaccess, hostmount-anyuid, hostnetwork, node-exporter, nonroot, privileged, restricted
**Solution **
Note: Resetting the system SCCs within the cluster may result in application Pods being unable to be scheduled and should be done during a cluster maintenance period.
Please ensure that appropriate backups are taken before resetting, this can be achieved using the following commands:
for SCC in restricted anyuid hostaccess hostmount-anyuid hostnetwork nonroot privileged;
do oc get scc $SCC -o yaml | tee $SCC_backup.yaml;
do oc get scc $SCC -o json | tee $SCC_backup.json;
done
After this has successfully completed, you will be able to rollback to the OpenShift system SCCs.
To reset them, run the following command:
DEFAULT_SCC_YAML=https://access.redhat.com/sites/default/files/attachments/4.3-default-scc-list_0.yml
curl $DEFAULT_SCC_YAML | oc apply -f -
As noted above, this may result in application Pods not being able to be scheduled. This may be due to Pods being reliant on changes to these SCCs.
To resolve this, new SCCs must be created with the required permissions / user and applied to these Deployments.
If you require to rollback your SCCs at any stage, go into the folder containing the SCC backups and run the following commands:
for SCC in restricted anyuid hostaccess hostmount-anyuid hostnetwork nonroot privileged;
do oc apply -f $SCC_backup.yaml;
done
Running of a binary build using:
oc start-build my-app --from-dir=./build-dir
Fails with the following error message:
Uploading directory "oc-build" as binary input for the build ...
Error from server (BadRequest): cannot upload file to build sprint-rest-2 with status New
Solution
It’s likely that there is a problem with one of your ImageStream objects. Take a look at your buildConfig
$ oc describe bc/spring-rest
Name: spring-rest
Namespace: spring-rest-dev
Created: 29 minutes ago
Labels: application=spring-rest
template=generic-java-jenkins-pipeline
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"BuildConfig","metadata":{"annotations":{},"labels":{"application":"spring-rest","template":"generic-java-jenkins-pipeline"},"name":"spring-rest","namespace":"spring-rest-dev"},"spec":{"output":{"to":{"kind":"ImageStreamTag","name":"spring-rest:latest"}},"source":{"binary":{},"type":"Binary"},"strategy":{"sourceStrategy":{"from":{"kind":"ImageStreamTag","name":"redhat-openjdk18-openshift:1.1","namespace":"openshift"}},"type":"Source"}}}
Latest Version: 1
Strategy: Source
From Image: ImageStreamTag openshift/redhat-openjdk18-openshift:1.1
Output to: ImageStreamTag spring-rest:latest
Binary: provided on build
Build Run Policy: Serial
Triggered by: <none>
Build Status Duration Creation Time
spring-rest-1 complete 50s 2020-08-28 20:55:21 -0500 EST
Events: <none>
Notice the two lines that say From Image: and Output to:. Its likely that one of those imagestreams are either misspelled, or have not yet been created. Ensure your imagestreams are created and correct, and try running the build again.