DevOps: Troubleshooting 502 Bad Gateway Nginx 1.21.1 log writing failed. No space left on device @ io_write ‐ home nginx app log production.log - department-of-veterans-affairs/diffusion-marketplace GitHub Wiki
This will act as a guide and SOP for troubleshooting the server error 502 Bad Gateway Nginx/1.21.1
-
The first response to such error is to
-
Check the monitoring tool, as of 8/14/2023 we use Dynatrace for monitoring our Production and Staging servers. This will give you an overview of what issues you might want to investigate or look further into mostly disk space or CPU issues.
-
Access the Jenkins UI http://internal-dm-devops-523064655.us-gov-west-1.elb.amazonaws.com/, click on the Jobs and access the server which is currently having a downtime.
-
Click on Build Now to deploy the last commit merged to the master branch, once the build has been implemented go to the console output and look through each log information. This will help you streamline what files or directories to be accessed on the server.
-
Disk space issues usually give bad gateway errors. If this is the case, you will notice a disk space error as you refine your logs that read
log writing failed. No space left on device @ io_write - /home/nginx/app/log/production.log
Restoring Disk space
- Access the AWS console,
sshinto the server experiencing the downtime, Sudo su ec2-user- pass the command
df-hthis will give you an overview of which directories or mounted points have exhausted their disk space. if it is the situation of the /home directory, - pass the command
docker psto see running docker containers on the server - Copy the second container ID number with the image name
dm:vaecPass the commanddocker exec -it container id /bin/bashyou will be able to ssh into the container as an nginx user. cdinto /home/nginx/app/log/ andlsyou will see the filesproduction.log web.stder.log and web.stdout.log- Pass the command
du -h filenameto help identify the memory space taken up by each file. - The aim is to restore space to the /home directory and let the developers know you will be clearing the contents of those files.
- Once an agreement with the team has been reached you can then pass the following command
truncate -s 0 /home/nginx/app/log/production.logto clear the contents of the log files. Replace production.log with other log files naming conventions respectively. - After this is done, access the Jenkins server and repeat the same
Build nowsteps on the affected environment. - Once the
build nowjob has been deployed successfully access the webpage of the environment on your browser. - The webpage should be back up.