CPU and Threads - microsoft/MSO-Scripts GitHub Wiki
Why is the CPU running?
- Every 1 ms (configurable) ETW samples the execution address of every running thread.
- It also captures a stackwalk at each thread's execution point (optional).
- The Windows Performance Analyzer (WPA) aggregates the samples to reveal the most active code paths.
Why is the CPU not running?
- Each time a thread becomes blocked (waiting) or unblocked, ETW logs the execution context and time stamp.
- It also captures a stackwalk at the execution point of the unblocking and releasing threads.
- WPA aggregates the traced events to reveal the blocking and releasing code paths.
Why is the application blocked/frozen?
- ETW is able to run down all threads on the system and capture a stackwalk of each execution context, running or blocked.
- WPA exposes the context and the execution stack for every thread, both at the beginning and at the end of the trace.
- Either download and unzip a recent Release, or clone the Repository:
<> Code ↓
-
MSO-Scripts\
TraceCPU Start [-Lean] [-Lite] [-Hang] [-Loop]
Exercise the application/scenario. -
MSO-Scripts\
TraceCPU Stop [-Hang]
-
MSO-Scripts\
TraceCPU View [-Hang]
Options for reduced data collection:
-Lean
: CPU samples only, no stackwalk.
-Lite
: CPU samples only, with stackwalk.
-Hang
: Stackwalk all threads at Start and Stop. (May be slow to Start and to Stop with lots of threads.)
-Loop
: Record only the last few minutes of activity (circular tracing buffer).
List all options:
-
MSO-Scripts\
TraceCPU -?
Important
If the PowerShell script does not run, you can instead run the Batch script:
MSO-Scripts\TraceCPU.BAT ...
See: What if I can't run PowerShell scripts in my environment?
Tip
You can also trace system activity during Windows start-up or shutdown. See: Analyze Windows Boot / Shutdown
At every 1 ms interval (configurable), a snapshot is captured of each of the Logical CPUs: the execution context (process, thread, module, function), with call stacks. The Windows Performance Analyzer (WPA) aggregates these together to reveal the most active code paths.
Caution
Because of its high data logging rate, do not run this script for more than a minute or two without adding a limiting switch:
-Lean
: CPU samples only, no stackwalk.
-Lite
: CPU samples only, with stackwalk.
-Hang
: Stackwalk all threads at Start and Stop.
-Loop
: Record only the last few minutes (circular tracing buffer).
Also, you can adjust the CPU sample interval from its default 1 ms:
WPR -SetProfInt <Interval in ms * 10,000>
The interval value can be 1221 - 100,000,000 (i.e. ⅛ ms - 10,000 ms), limited by hardware.
Confirm the new CPU sample interval:
WPR -ProfInt
Reset the CPU sample interval to 1 ms:
WPR -ResetProfInt
WPA's CPU Sample Table
The CPU Samples Chart/Table answers the question: Why is the CPU running?
Legend
Count: The number of 1 ms CPU samples seen running on that thread with that call stack.
Weight: A more accurate accounting of CPU time, closer to the number of CPU-milliseconds consumed by that thread with that call stack.
Thread Start Module/Function: Identifies the ThreadProc of each executing thread.
Stack: the expandable aggregation of execution call stacks.
Tip
To apply predefined labels to each thread, use the View Editor (Ctrl+E)
to enable the Stack Tag column.
Note that the first/main thread of the process has Module = *.exe [eg. Excel.exe], and (for Native code) its call stack contains the function *WinMain*.
Note
Each sample event records both a stackwalk (except with -Lean
) and the execution location (code address / module).
To view the concentration of the most active functions / modules, even without stackwalks:
- In WPA, the CPU Samples tab (top chart/table), open the View Editor (Ctrl+E).
- Turn off the 'Stack' column.
- Turn on the 'Module' and 'Function' columns.
OR - Simply scroll to the bottom of the "CPU Samples" tab to find the bottom chart/table: "CPU Usage (Sampled) Samples by Source/Line"
Every time a thread switches becomes blocked (enters the Wait state) or gets unblocked (enters the Ready state), the execution state and stackwalk are captured in the trace. In the Windows Performance Analyzer (WPA) we can tell where and why each thread blocked / switched out, how long it stayed switched out, and what event allowed it to switch back in.
The Wait Analysis Chart/Table answers the question: Why is the CPU NOT running?
Legend
New Process/Thread: The thread which was blocked / switched out, but is now unblocked / switching back in.
Ready(ing) Process/Thread: The thread which triggered the event allowing the New Thread to unblock and switch in.
Thread Start Module/Function: Identifies the ThreadProc of each thread.
New Thread Stack: The call stack(s) where the New Thread had been blocked.
Ready Thread Stack: The call stack(s) where the Ready Thread executed that allowed the New Thread to unblock (eg. SetEvent, LeaveCriticalSection, etc.).
Wait (ms) Sum : The amount of time the New Thread spent blocked / switched out.
Wait (ms) Max : The longest single Wait period for the New Thread.
Count: Waits: The number of times that the New Thread blocked / waited.
Count: A super-set of "Count: Waits" which also includes the number of times the New Thread was preempted.
Ready (ms) Sum : The amount of time the New Thread spent not blocked, but was ready to be scheduled.
Ready (ms) Max : The longest single time period where the New Thread was not blocked, but was ready to be scheduled.
Lifecycle of a Thread
The New Thread (represented here) transitions from Waiting → Ready → Scheduled/Running.
(It is the "New Thread" because it has newly entered the pool of schedulable threads.}
The Ready(ing) Thread signals/releases the New Thread to enter the Ready state.
Q: Why was that thread blocked, consuming no CPU and making no progress (until it eventually unblocked)!?
A: Use WPA to do the Wait Analysis, step-by-step:
-
Filter to the process and thread of interest:
- Select the Wait Analysis tab.
- Right-click on your process of interest and Filter to Selection.
- Identify the New Thread of interest by its ThreadProc (New Thread Start Module/Function)
or by its Stack Tag (turn on the New Thread Stack Tag column).
Right-click that row and Filter to Selection. - Give yourself some more room: Shrink or turn off the columns to the left of New Thread Stack.
-
Zoom the timeline chart to the time range of interest (optional):
- Identify the begin/end time of interest.
Example 1:
In the table, find and select the function which begins idle processing. (Everything before that is app-launch.)
In the timeline chart, click-drag the cursor to select the app-launch time range before the idle processing.
Example 2:
In the table find and select some other long-running function of interest.
The graph/chart will select its time range, highlighted in light-blue. - Right click within the light blue selection of the graph/chart and choose Zoom.
- Identify the begin/end time of interest.
-
Expand the call stack to reveal the blocking code.
- Make sure that the table is sorted descending by the 'Wait (ms) Max' column.
- The 'Wait (ms) Sum' column shows the sum of the time that the thread blocked under that call stack.
- The 'Wait (ms) Max' column shows the longest individual blocking occurrence under that call stack.
- The 'Count: Waits' column shows the total number of blocking occurrences under that call stack.
- Select a row / function of interest, and press the right-arrow key on the keyboard until the selection stops moving.
- Observe the Ready Thread Stack to find the code which unblocked the New Thread, ex: LeaveCriticalSection, SetEvent, etc.
Expand the selection to reveal the blocking call. -
Move to the next largest blocking code.
Once a blocking call has been analyzed, either filter it out or annotate it away:- Filter it out: Right-click a row of the blocking stack and choose: Filter Out Selection
- Annotate it away: Right-click a row of the blocking stack and choose: Annotate Selection > Legacy Annotate > Give it a label.
-
Return to step 3 ↑ to reveal another section of blocking code.
Find additional analysis tips here: Getting Started with the Windows Performance Analyzer (WPA)
Note
When you see "Dpc" (Delayed/Deferred Procedure Call) or "Apc" (Asynchronous Procedure Call) in any Stack column, it is work being injected into that thread, usually by the Kernel.
TraceCPU captures (usually) a rundown of stackwalks for all threads at the start and end of each tracing session.
Therefore, if you suspect that a thread was blocked/deadlocked at the start or end of the trace, examine its Start Rundown or End Rundown Stack.
To do this ensure WPA's 'Thread Info' tab is open. (This is automatic if you launched WPA with: TraceCPU View -Hang
)
To manually open the WPA View Profile for 'Thread Info':
- In WPA: Profiles > Apply... > Browse... > MSO-Scripts\WPAP\Threads.wpaProfile
- In WPA's View Editor (Ctrl+E):
- Enable the columns: Start Rundown Stack/Tag (and the intervening Thread ID?)
- OR enable the columns: End Rundown Stack/Tag, etc.
- Find the Process, Thread, and the deadlocked(?) thread's stackwalk of interest.
Note
- To trace only stackwalks for Thread Rundown and Process/Thread Creation, run:
TraceCPU Start -Hang
- Otherwise, TraceCPU may NOT capture a start/end rundown of thread stackwalks when the number of threads on the system exceeds 2500, or when the thread count for any process exceeds 500. When this happens, it will warn you. Close as many apps as possible and try again.
- WPR requires between 0.1 and 400 ms to capture the rundown stackwalk of each thread.
The 400 ms value is a default limit controlled by this registry value (REG_DWORD in ms):HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\WMI\Trace!StackCaptureTimeout
Reducing this value to below 400 will increase the system thread count threshold limit for this tracing script. - The 'Creation Stack/Tag' columns ('Thread Info' tab) also reveal the stackwalk of the call to CreateThread (from within some other thread) which launched the selected thread.
- Expose CPU & Thread Activity on Windows with TraceCPU - Video
- CPU Analysis in WPA
- CPU Scheduling and Threads
- Understanding Critical Path and Wait Analysis
- Understanding Lock Contention in WPA
- Critical Path Analysis with Windows Performance Analyzer - Video
- Windows Performance Analyzer (WPA) Usage Guide