Logging: Best Practices - ganong-noel/lab_manual GitHub Wiki

In builds, logging is a very important part of keeping track of what happens in the running of the scripts and is a very important tool to help debug potential errors. Proper logging can save you a lot of time, sometimes days, in debugging errors in your scripts. However, we want to make sure we also do not overwhelm the log files with too much information as computations, even as simple as row counts, can be expensive and time consuming, so choosing what information to include in your log files is also very important. This wiki provides some basic tips to incorporate in your logging habits:

  • Always name your log file something different on each run.
    • This will help you distinguish between log files from different runs of the build.
    • Easiest way to do this is to write code that automatically appends the date to the name of the log file. For example: “rdfo_logfile_20240129”.
  • Include dates and timestamps in log files.
    • This is one of the most important todos while making a log file so you can traceback to what exactly you ran on a particular day and even a particular time.
    • Include timestamps for all messages.
    • Always store your date as YYYYMMDD, which helps with sorting.
  • Include a message that tells you what kind of sample you are running.
    • For example, if you have created an output using a 1% build, the log file from this run should let you know you ran a 1% sample on 29 January 2024 at 10am.
  • Avoid ambiguous messages that only you can understand. Often better to write in full sentences, avoiding short forms/acronyms unless they are very common.
    • Include critical information that can easily be understood by other members of your team by themselves and without your intervention. You should even add context if needed.
  • Include statements when a table has been written.
    • Sometimes a script can write a bunch of tables at one go and eat up a lot of time. If you are monitoring a build and want to check whether an intermediate table has been written correctly, this statement will let you know that you can check this out.
  • Avoid including extremely sensitive information in your log files.
    • In AWS builds, for example, you might use very sensitive information for running builds like your AWS access key, secret key, etc. Do not include this information in your log files as they are visible to other people.