Collecting Memory snapshot during application crash and analyzing them - mitikov/KeepSitecoreSimple GitHub Wiki
Agenda
Sitecore application crashes randomly. No ideas why =\
Preparations
Since process must be fully suspended to create a memory snapshot, it will not reply to IIS Ping command and could be terminated before full snapshot produced.
Thereby Ping command should be disabled during memory snapshot collection to avoid corrupted
snapshot produced.
Secondly, a memory dump will have size similar or larger than memory used by process ( can be checked in Task Manager ). Please ensure there is enough space on hard drive to save the snapshot.
Thirdly, a tool to produce memory snapshots must be launched with admin
credentials.
Actions
We will install DebugDiag tool by Microsoft to collect a snapshot just in needed moment.
We will start DebugDiag 2 Collection
and configure it according to steps below:
- Click
Add Rule...
in theDebugDiag
main window - Select
Crash
type forSpecific IIS web application pool
- Ensure to set
Maximum userdump limit
(1-2 should be enough) - Ensure to select a folder with plenty of free space and activate the rule
UserDump Count
column will be incremented once the rule produces a userdump
.
WinDBG part
Open snapshot
Please refer to Opening Memory Snapshots generated on other machines locally article on steps to load userdump into WinDbg.
Analyze
!Threads
command (Sos.dll extension) should be executed first. It will show thread ID which produced exception:
The exception was raised by 66 thread, so we will switch to the thread via ~66s
command and print callstack via CLRStack
command:
NOTES or HINTS
!mk
command ( powered by sosex ) will produce cleaner output. Use !mk -a
parameter to try match variable from stack to method arguments.
Use !dumpstack
when !mk
and !clrstack
commands fail. It works better but produces dirtier output that is more difficult to read.
Use kb
if you wanna see native code as well.
Try !analyze -v
command some day.
Execute !mdso
to dump all stack objects and investigate object data.
Further steps
Once an exact code that throws exception is found, you can:
-
Analyze code flow using objects from thread stack (
!mdso
command ), and finding out what is wrong.
Having exact objects (field values obtained via !mdt address
command ) and regenerated source code should help to identify & eliminate an issue.
Have fun :)