Ops 201 Reading 4 - marsecguy/reading-notes-cyberops GitHub Wiki

Ops 201 Reading 4

Troubleshooting Techniques

A primary task and skill for IT professionals is troubleshooting, meaning diagnosing and fixing problems. While that may seem obvious for HelpDesk positions and technician jobs at non-tech companies, it is also important for less obvious positions, such as coders/developers. When their code isn't working the way they expect, troubleshooting techniques can help them identify the problem and fix it so that they can move on.

There are a number of methodologies for investigating the cause of something (root cause analysis, five whys, etc.), but they can all be distilled down to a basic core. First, the problem must be correctly and thoroughly identified. Much like in medicine, different hardware and software has different effects on other parts of the total machine. Correctly identifying all of the symptoms can point the troubleshooter towards the possible cause of and immediately rule out others. This process of elimination requires a good knowledge of the basics regarding how different parts of hardware, firmware and software work together.

Once the list of possible causes has been narrowed down, they can be prioritized into order of ease to correct. Starting with the quickest and easiest possible solution, a new process of elimination can begin. Attempting various fixes that don't work can allow for remaining possibilities to be eliminated. This will go on until either the problem is resolved, or the technician exhausts all possible fixes within their skill level and the matter needs to be elevated to a higher expert.

Once the cause has been determined, the fix must be made. Many times, this will happen naturally as a part of the troubleshooting process because something the technician attempts will work. When the process of diagnosing the problem doesn't naturally lead to a fix, consideration must be given to the time and effort required to implement the solution and the impact they will have on other operations. When the fix is implemented, the troubleshooter needs to consider all operational environments and whether the solution needs to be tested in a different environment, such as for a different user.

After the fix is completed and thoroughly tested, the technician needs to make sure the documentation is complete. More detailed notes can be extremely helpful and time-saving if the problem should occur again. Only when the documentation is complete should the trouble ticket, or whatever tracking mechanism is used, be closed out.

A final note about documentation: looking back for similar problems in the past can be a huge time saver. It can also be used to develop new strategies and/or troubleshooting flow charts. Many organizations are great at developing lessons learned, but terrible at actually learning from them.

Source: Professor Messer