Troubleshooting jobs running on Databricks - ja-guzzle/guzzle_docs GitHub Wiki

Table of Contents

Where to looks for the infromation

Guzzle UI/ Job info

  1. You will have to look for errors at job group and job logs
  2. In some cases the error will show up only in stage level job info record

Guzzle job logs

  1. Job logs
  2. Orchestration logs - this is generated for a job group run or stage run. For job group run the log file id is same as job group instance id

Databricks Job URL

  1. https://southeastasia.azuredatabricks.net/?o=143986498199862#job/4901/run/1

Driver logs

Init scripts logs

They are found here image

Issues in job runs

Error: Connection reset by peer can

  1. Apparently guzzle job running in https://southeastasia.azuredatabricks.net/?o=143986498199862#job/4901/run/1 was successful (we can verify using job instance id 200424142415401574). And IOException: Connection reset by peer can be simply ignored as it looks like internal driver/executor coordination issue during cluster shutdown.
  2. The job cluster which failed was this one (which failed due to public ip address limit): https://southeastasia.azuredatabricks.net/?o=143986498199862#job/4907/run/1 image

Cluster resources not available

⚠️ **GitHub.com Fallback** ⚠️