Troubleshooting - wAlber47/Tech-Journal GitHub Wiki
Troubleshooting (TSHOOT):
You will usually gather information, try to solve it, and then eventually solve it. There are different processes that you should try to follow.
One of the best devices to follow while troubleshooting is the OSI model, understanding what it is and how it works it a great tool.
The OSI Model:
Application:
The Actual Application
Presentation:
The Files
Session:
Encryption (SSL)
Transport:
TCP/UDP
Network:
IP Addresses
Data-link:
Media Access Control (MAC) Addresses
IP Addresses inside of a Local Area Network
Address Resolution Protocol (ARP)
Physical:
Cables and Wiring
How to use this to Troubleshoot:
Top Down:
Start at the application layer and work down from there.
Down Up:
Start by checking the physical connections and move up the model.
Divide and Conquer:
Start in the middle by running a ping or a traceroute, if you can ping the device, then there shouldn't be an issue with the Network, Data-link, and Physical layers. From knowing that, you can work your way up starting at the middle. If you can't ping, then you most likely have an issue with one of those layers.
Running a ping scan is much easier than checking all the physical connections leading up to a computer.
Follow the path to help find your issue.
Use network command line tools, (lookup, ping, traceroute)
Changes can commonly cause issues, keep a baseline config this way you can compare when an issue arises.
You can also try to swap components, figure out if issues still arise by changing one thing at a time. Often called "Following the Problem".
Trying to verify exactly what an issue is and how to re-create it is a big step in helping fix the problem.
The scope of the issue, is it effecting everyone/a room of people/one VLAN/one person. Try to find common similarities.
In a work environment, you shouldn't get tunnel vision as their may be bigger issues that arise.
Once you have gathered a good amount of data, you should try to pose a hypothesis as to what you think the issue is.
Also think about the risk of what could happen if something goes wrong. If there is you may want to schedule a possible window where the maintenance will take place. Think about some workarounds for small scale issues.
Always make sure to document your steps along the way, your hypothesis, and your implementation. Include why it happened and how to make it so it won't happen again, (RCA 'Root Cause Analysis').
Common Documentation Types:
As Built Configurations: A configuration of software that is available as a roll back to a working version.
BOM: A list of what is included in the system.
Configs: A way to compare what changed, and also can be used as a "back-up" almost, where you can use those settings in a new way if you need to accomplish a similar task. These take very little space and can save you a lot a time.
Help Desk Tickets: Where problems will first come into view.
Change Management System: Helps keep everyone on the same page, different groups can see what others are doing.
Life Cycle Management:
Defined as the beginning to end process of acquiring, installing, maintaining, tracking and the retirement of an asset.