macOS:sample - chunhualiao/public-docs GitHub Wiki
The sample
tool on macOS is a lightweight, command-line utility for profiling processes, capturing timing data and call graphs to help diagnose performance issues. On Macs with M-series chips (Apple Silicon), it remains a valuable tool for developers and system administrators, as it’s included by default in macOS and works seamlessly with Apple’s ARM-based architecture. Below, I’ll explain its functionality, how it generates timing and call graph data, and considerations specific to M-series chips.
The sample
tool collects call stack samples from a running process over a specified period, providing insights into which functions are executing and how much time they consume. It’s simpler than Apple’s Instruments (part of Xcode), focusing on raw call stack data without the overhead of a GUI. This makes it ideal for quick profiling from the Terminal, especially for non-developer applications or when minimal overhead is desired.
-
Function Timing:
-
sample
captures snapshots of a process’s call stack at regular intervals (default 1ms). Each sample shows which functions are on the stack, allowing you to estimate the time spent in each function based on their frequency in the samples. - The output includes a call tree, showing the hierarchy of function calls and their relative time (as a percentage of total samples). However, it doesn’t provide precise timing in milliseconds; instead, it infers time from sample frequency.
-
-
Call Graphs:
- The tool generates a textual call graph, showing the relationships between functions (caller-callee). Each node in the graph represents a function, and edges indicate calls between them.
- The call graph is dynamic, reflecting the actual execution during the sampling period, unlike static call graphs that analyze all possible code paths.
-
Low Overhead:
-
sample
is designed to have minimal impact on the profiled process, making it suitable for analyzing performance issues like hangs or high CPU usage without significantly altering the process’s behavior.
-
The sample
tool is fully compatible with M-series chips (M1, M2, M3, M4, etc.), as macOS natively supports ARM64 architecture. However, there are some considerations:
-
Rosetta 2: If profiling an Intel-based application running under Rosetta 2 (Apple’s translation layer),
sample
works but may show translated function names or slightly different stack traces due to emulation. -
ARM64 Optimizations: M-series chips have unique performance characteristics (e.g., high-performance and high-efficiency cores, unified memory).
sample
doesn’t directly expose metrics like core usage or memory bandwidth, but it captures function-level behavior across all cores. -
System Libraries: Apple’s system libraries on M-series chips are optimized for ARM, and
sample
may show fewer low-level system calls in the stack compared to Intel-based Macs, as some operations are offloaded to hardware accelerators (e.g., Neural Engine).
-
Find the Process ID (PID):
- Use
ps aux | grep <process_name>
or Activity Monitor to find the PID of the target process.
- Use
-
Run
sample
:- Basic syntax:
sample <PID> <duration> <interval> -file <output_file>
-
<PID>
: Process ID to profile. -
<duration>
: Sampling duration in seconds (e.g., 30). -
<interval>
: Sampling interval in milliseconds (default 1ms). -
<output_file>
: Path to save the output (optional; defaults to stdout).
-
- Example:
sudo sample 5773 30 1 -file profile.txt
- Profiles process 5773 for 30 seconds, sampling every 1ms, saving to
profile.txt
.
- Profiles process 5773 for 30 seconds, sampling every 1ms, saving to
- Basic syntax:
-
Analyze Output:
- The output is a text file with a call tree, showing functions, their stack depth, and sample counts. For example:
Call graph: 1000 Thread_12345 1000 main 800 process_data 500 compute_heavy 300 helper_function 200 idle_loop
- Each number represents the number of samples where the function appeared, indicating relative time spent. In this example,
compute_heavy
was sampled 500 times, suggesting it’s a performance bottleneck.
- Each number represents the number of samples where the function appeared, indicating relative time spent. In this example,
- The output is a text file with a call tree, showing functions, their stack depth, and sample counts. For example:
-
Visualizing Call Graphs:
- The raw output is textual, but you can use tools like
filtercalltree
or third-party scripts (e.g., FlameGraph withstackcollapse-sample.awk
) to create visual call graphs. For example:filtercalltree profile.txt | stackcollapse-sample.awk | flamegraph.pl > profile.svg
- The raw output is textual, but you can use tools like
-
No Hardware Counters: Unlike Instruments or Linux’s
perf
,sample
doesn’t access hardware performance counters (e.g., cache misses, branch mispredictions) on M-series chips. For such metrics, use Instruments’ Counters template. -
System Calls:
sample
may not capture time spent in system calls (e.g., I/O waits) unless the process is actively running. To prioritize the process for better sampling, userenice
(e.g.,sudo renice -20 -p <PID>
). - Sample Overload: High sampling frequency (e.g., 1ms) can generate excessive data, making analysis cumbersome. For severe hangs, fewer samples (or manual debugger pauses) may suffice.
-
M-Series Specifics: The heterogeneous core architecture (performance vs. efficiency cores) isn’t explicitly reflected in
sample
output. To analyze core-specific behavior, Instruments’ Time Profiler or CPU Counters are better suited.
-
Simplicity:
sample
is faster to use from the Terminal and doesn’t require Xcode, unlike Instruments. -
Depth: Instruments provides richer data (e.g., CPU usage per thread, lifecycle events) and a GUI for easier navigation. Use Instruments for detailed analysis or when
sample
’s output is too coarse. -
Use Case: Use
sample
for quick, lightweight profiling or when debugging non-developer apps without a debugger. Use Instruments for in-depth analysis or GUI-based exploration.
- Focus on Bottlenecks: Look for functions with high sample counts in the call graph, as they indicate time-intensive code.
-
Invert Call Trees: Tools like
filtercalltree
can invert the call tree to focus on leaf functions (where time is actually spent) rather than parent functions. -
Combine with Other Tools: Use
sample
alongside Activity Monitor to identify high-CPU processes, then drill down withsample
for function-level details. - Trigger Workloads: For specific operations (e.g., rendering a view), trigger the operation repeatedly during sampling to ensure relevant code appears in the call graph.
To profile a hanging application (e.g., PID 1234):
- Run:
sudo sample 1234 10 1 -file hang_profile.txt
- Open
hang_profile.txt
in a text editor or usefiltercalltree
to simplify the call tree. - Look for functions with high sample counts or repeated appearances in the stack.
- Optionally, generate a FlameGraph for visualization.
The sample
tool is a powerful, lightweight option for profiling on macOS with M-series chips, offering function timing and call graphs with minimal setup. While it lacks the depth of Instruments, its simplicity makes it ideal for quick diagnostics, especially for non-developer apps or command-line workflows. For advanced profiling (e.g., hardware counters, thread-specific data), complement sample
with Instruments or other tools like DTrace.