Collecting memory snapshots during high CPU and analyzing them - mitikov/KeepSitecoreSimple GitHub Wiki

Agenda

Our application randomly consumes 100500 % CPU:

highcpu

We have a right to know why!

Game plan

A) We'll use ProcDump and Processor performance counters together to get snapshots just-in-time.

  1. ProcDump can be safely used in the Production environment since it does not need the installer.

  2. It must be executed with admin permissions, though.

B) Expect to get 2-3 sequential snapshots within the short interval.

C) Gonna check how much CPU time did each thread spend, and find one what threads have high increment between snapshots.

D) Will see if threads are system ( like GC ), or user threads ( like ThreadPool worker threads ).

E) Will compare call stacks of the most CPU time-consuming threads in both snapshots.

Memory dump collection part

Configuring ProcDump

The ProcDump is a powerful and lightweight tool by Sysinternals. We are going to launch it with following args:

procdump64.exe -c 80 -s 2 -ma w3wp.exe -w -n 3

proc_dump

-c 80 specifies the CPU load percent (80% in the case) to trigger the rule

-s 2 number of seconds (2 in the case) CPU load stays above limit

-ma produce full memory snapshot/userdump

w3wp.exe name or ID of the process to inspect

-w can wait if application has not started yet

-n 3 create a number (3 in the case) snapshots before exiting.

Notes

  1. ProcDump must be running with admin credentials.

  2. At least 2 snapshots are needed to find out the cause of the high CPU.

  3. The system should have enough free space ( at least 3 times more than process memory usage ).

  4. The exact process would be monitored. Do not expect snapshots once another process started.

  5. Since process must be fully suspended to create a memory snapshot, it will not reply to IIS Ping command and could be terminated before full snapshot produced.

Investigation part

Please refer to Opening Memory Snapshots generated on other machines locally article in order to load snapshots into WinDbg.

Open a few instances of WinDbg with different memory snapshots.

Getting time-related information

You can get snapshot generation time via .time command:

time

We have 2 snapshots produced in 10 seconds for the demo.

Confirming high CPU

Next, we gonna check the CPU usage when the snapshot was created via !ThreadPool command:

threadpool

The CPU usage in both snapshots is more than 80 percent per ProcDump setting.

Locating suspects

Wanna see what threads took CPU time? Easy with !runaway command:

runaway

There are 4 threads that took almost 5 sec of CPU time in 10 seconds.

Simple math shows 4 threads *5 seconds = 20 seconds [ not 10 :) ]. I have the multi-core processor, so it is okay.

NOTES

The value is aggregated time taken by the thread from the moment of creation. The longer application works, the higher numbers you'll get.

It is vital to have at least 2 snapshots to measure aggregated time growth in the short interval.

The longer delay between snapshots, the higher chances not to meet the same operation in both snapshots.

As a result, snapshots may not bring closer to the reason.

Do we suspect system threads?

Next we gonna find system threads ( f.e. GC, Finalizer... ) via !threads -special command:

threads_special

None of suspected time-consuming threads is the system. It means user code provokes CPU usage.

NOTES

GC threads would take noticeable CPU time in applications that produce high memory pressure ( a lot of allocations & tons of short lived objects ).

The more objects stay in RAM, the more difficult it gets for GC to clean up.

Examine suspects one by one

We need go to check call stacks for threads one by one ( via !clrstack or !mk (sosex) ), and check if same operation is being executed: comparing_callstacks

Yes! Same operation and same object :) Although this is a thread without information who launched it, we can find who prevents an object from being collected via !gcroot command:

gcroot

It seems like StartupProcessor constructs an instance of CpuLoadProducer in the GiveHighCPU method. The regenerated code from snapshot confirms our findings:

private void GiveHighCPU()

using (new CpuLoadProducer(85))
   Thread.Sleep(TimeSpan.FromMinutes(5));

Why is StartupProcessor called?

Switching to 3248 thread (aka #3 from !runaway ) shows us the exact reason:

mk_for_callee

The StartupProcessor is located inside initialize pipeline that is executed when Application starts.

Case solved.

Summary

  1. ProcDump tool was configured to capture snapshots just in time.
  2. Resulting snapshots were loaded into WinDbg and ordered by time. The high CPU was confirmed.
  3. CPU time accumulated per thread was compared between snapshots, and a few with maximum gain were added to suspects list.
  4. Suspects are not system threads.
  5. Same operation by same object instance was performed by suspected thread in both snapshots.
  6. The source code of the callee object class was regenerated.
  7. The logic produced suspected object was located & reason of a call was determined.

Life is not that easy

The demo show is the simplest high CPU case, things will get complicated fast in real life.

I strongly encourage you to follow the same steps locally using the processor & configuration under the link.

Good luck :)