performance data - cockpit-project/cockpit GitHub Wiki
###Notes:###
-
PCP archives are not turned on by default
-
Needs to work for both mouse and touch
-
Applies to both dashboard and graphs on server page?
-
how to indicate current vs historical
-
Needs zooming and panning
-
Is PCP archives an on and off thing, or does it need controls for network, cpu and disk individually?
-
Network graph is the sum of individual interfaces, but it's hard to know which interfaces to include in that sum without digging out the vlan/bond/bridge configuration of the machine. For example, if you have a bond, all traffic over that bond will be counted twice if we just sum up all interfaces.
Maybe we should let the user configure explicitly which interfaces to include in the sum. Alternatively, we just sum up everything except "lo" and rely on the more detailed graphs on the dedicated networking page to tell a better story.
Or, we go the full length and dig out enough information from NetworkManager et al. The problem with that is that it needs to be done in a platform independent way, or we allow the individual machines to contribute code to the dashboard.
Stories:
Robert is a sysadmin at a small IT company. They have 3 servers, one run a file server, one that runs their build server and one that runs the company website. People in the office have been complaining that the file server started acting strange yesterday.
Sarah is a sysadmin at a 50 people company. They use Owncloud for their internal file sharing needs, but it's been starting to run slow as syrup.
###Workflows###
Robert logs into Cockpit on the file server and looks at the performance graphs. The graphs looks like they are unusually high. He scrolls back through the graphs to when they started and sees a bump in both memory and CPU on yesterday around lunch. He goes to the journal and sees that there was a software update yesterday that brought in a new dependency. This dependency now eats up resources. He rolls back the OS version to the one before the update for now. He'll have to investigate the issue deeper, possibly order new hardware, but at least the file server works for everyone again now.
Sarah logs in to the server hosting Owncloud and looks at the performance graphs. She notices that the CPU and RAM are going through the roof, but that the disk IO is almost flat. She scrolls back through the graphs and notices that the CPU and RAM started spiking two days ago, and that the disk IO went from normal to flat at the same time. She decides to take a closer look at the disk in question under Storage. She sees that the disk is all full and therefore can't handle the caching. She adds a new disk to the pool that handles the cache and goes back to the server page with the graphs again. Now the CPU and RAM are back to more normal levels.
###Feedback###