Collecting Memory snapshot during application crash and analyzing them - mitikov/KeepSitecoreSimple GitHub Wiki

Agenda

Sitecore application crashes randomly. No ideas why =\

Preparations

Since process must be fully suspended to create a memory snapshot, it will not reply to IIS Ping command and could be terminated before full snapshot produced.

Thereby Ping command should be disabled during memory snapshot collection to avoid corrupted snapshot produced.

Secondly, a memory dump will have size similar or larger than memory used by process ( can be checked in Task Manager ). Please ensure there is enough space on hard drive to save the snapshot.

Thirdly, a tool to produce memory snapshots must be launched with admin credentials.

Actions

We will install DebugDiag tool by Microsoft to collect a snapshot just in needed moment.

We will start DebugDiag 2 Collection and configure it according to steps below:

debug_diag_rule

  1. Click Add Rule... in the DebugDiag main window
  2. Select Crash type for Specific IIS web application pool
  3. Ensure to set Maximum userdump limit (1-2 should be enough)
  4. Ensure to select a folder with plenty of free space and activate the rule

UserDump Count column will be incremented once the rule produces a userdump.

WinDBG part

Open snapshot

Please refer to Opening Memory Snapshots generated on other machines locally article on steps to load userdump into WinDbg.

Analyze

!Threads command (Sos.dll extension) should be executed first. It will show thread ID which produced exception:

threads

The exception was raised by 66 thread, so we will switch to the thread via ~66s command and print callstack via CLRStack command:

clrstack

NOTES or HINTS

!mk command ( powered by sosex ) will produce cleaner output. Use !mk -a parameter to try match variable from stack to method arguments.

Use !dumpstack when !mk and !clrstack commands fail. It works better but produces dirtier output that is more difficult to read.

Use kb if you wanna see native code as well.

Try !analyze -v command some day.

Execute !mdso to dump all stack objects and investigate object data.

Further steps

Once an exact code that throws exception is found, you can:

  1. Extract code from memory snapshot

  2. Load it into reverse-engineering tool (dotPeek or iLSpy)

  3. Analyze code flow using objects from thread stack (!mdso command ), and finding out what is wrong.

Having exact objects (field values obtained via !mdt address command ) and regenerated source code should help to identify & eliminate an issue.

Have fun :)