Troubleshooting jobs running on Databricks - ja-guzzle/guzzle_docs GitHub Wiki
- Where to looks for the infromation
- Guzzle UI/ Job info
- Guzzle job logs
- Databricks Job URL
- Driver logs
- Init scripts logs
- Issues in job runs
- Error: Connection reset by peer can
- Cluster resources not available
- You will have to look for errors at job group and job logs
- In some cases the error will show up only in stage level job info record
- Job logs
- Orchestration logs - this is generated for a job group run or stage run. For job group run the log file id is same as job group instance id
They are found here
- Apparently guzzle job running in https://southeastasia.azuredatabricks.net/?o=143986498199862#job/4901/run/1 was successful (we can verify using job instance id 200424142415401574). And IOException: Connection reset by peer can be simply ignored as it looks like internal driver/executor coordination issue during cluster shutdown.
- The job cluster which failed was this one (which failed due to public ip address limit): https://southeastasia.azuredatabricks.net/?o=143986498199862#job/4907/run/1