macOS:sample - chunhualiao/public-docs GitHub Wiki

MacOS>

The sample tool on macOS is a lightweight, command-line utility for profiling processes, capturing timing data and call graphs to help diagnose performance issues. On Macs with M-series chips (Apple Silicon), it remains a valuable tool for developers and system administrators, as it’s included by default in macOS and works seamlessly with Apple’s ARM-based architecture. Below, I’ll explain its functionality, how it generates timing and call graph data, and considerations specific to M-series chips.

Overview of the `sample` Tool

The sample tool collects call stack samples from a running process over a specified period, providing insights into which functions are executing and how much time they consume. It’s simpler than Apple’s Instruments (part of Xcode), focusing on raw call stack data without the overhead of a GUI. This makes it ideal for quick profiling from the Terminal, especially for non-developer applications or when minimal overhead is desired.

Key Features

Function Timing:
- sample captures snapshots of a process’s call stack at regular intervals (default 1ms). Each sample shows which functions are on the stack, allowing you to estimate the time spent in each function based on their frequency in the samples.
- The output includes a call tree, showing the hierarchy of function calls and their relative time (as a percentage of total samples). However, it doesn’t provide precise timing in milliseconds; instead, it infers time from sample frequency.
Call Graphs:
- The tool generates a textual call graph, showing the relationships between functions (caller-callee). Each node in the graph represents a function, and edges indicate calls between them.
- The call graph is dynamic, reflecting the actual execution during the sampling period, unlike static call graphs that analyze all possible code paths.
Low Overhead:
- sample is designed to have minimal impact on the profiled process, making it suitable for analyzing performance issues like hangs or high CPU usage without significantly altering the process’s behavior.

Using `sample` on M-Series Chips

The sample tool is fully compatible with M-series chips (M1, M2, M3, M4, etc.), as macOS natively supports ARM64 architecture. However, there are some considerations:

Rosetta 2: If profiling an Intel-based application running under Rosetta 2 (Apple’s translation layer), sample works but may show translated function names or slightly different stack traces due to emulation.
ARM64 Optimizations: M-series chips have unique performance characteristics (e.g., high-performance and high-efficiency cores, unified memory). sample doesn’t directly expose metrics like core usage or memory bandwidth, but it captures function-level behavior across all cores.
System Libraries: Apple’s system libraries on M-series chips are optimized for ARM, and sample may show fewer low-level system calls in the stack compared to Intel-based Macs, as some operations are offloaded to hardware accelerators (e.g., Neural Engine).

How to Use `sample`

Find the Process ID (PID):
- Use ps aux | grep <process_name> or Activity Monitor to find the PID of the target process.
Run sample:
- Basic syntax: sample <PID> <duration> <interval> -file <output_file>
  - <PID>: Process ID to profile.
  - <duration>: Sampling duration in seconds (e.g., 30).
  - <interval>: Sampling interval in milliseconds (default 1ms).
  - <output_file>: Path to save the output (optional; defaults to stdout).
- Example: sudo sample 5773 30 1 -file profile.txt
  - Profiles process 5773 for 30 seconds, sampling every 1ms, saving to profile.txt.
Analyze Output:
- The output is a text file with a call tree, showing functions, their stack depth, and sample counts. For example:
```
Call graph:
    1000 Thread_12345
       1000 main
          800 process_data
             500 compute_heavy
             300 helper_function
          200 idle_loop
```
  - Each number represents the number of samples where the function appeared, indicating relative time spent. In this example, compute_heavy was sampled 500 times, suggesting it’s a performance bottleneck.
Visualizing Call Graphs:
- The raw output is textual, but you can use tools like filtercalltree or third-party scripts (e.g., FlameGraph with stackcollapse-sample.awk) to create visual call graphs. For example:
```
filtercalltree profile.txt | stackcollapse-sample.awk | flamegraph.pl > profile.svg
```
  - This generates a FlameGraph, a visual representation of the call stack where width indicates time spent.

Limitations and Considerations

No Hardware Counters: Unlike Instruments or Linux’s perf, sample doesn’t access hardware performance counters (e.g., cache misses, branch mispredictions) on M-series chips. For such metrics, use Instruments’ Counters template.
System Calls: sample may not capture time spent in system calls (e.g., I/O waits) unless the process is actively running. To prioritize the process for better sampling, use renice (e.g., sudo renice -20 -p <PID>).
Sample Overload: High sampling frequency (e.g., 1ms) can generate excessive data, making analysis cumbersome. For severe hangs, fewer samples (or manual debugger pauses) may suffice.
M-Series Specifics: The heterogeneous core architecture (performance vs. efficiency cores) isn’t explicitly reflected in sample output. To analyze core-specific behavior, Instruments’ Time Profiler or CPU Counters are better suited.

Comparison with Instruments

Simplicity: sample is faster to use from the Terminal and doesn’t require Xcode, unlike Instruments.
Depth: Instruments provides richer data (e.g., CPU usage per thread, lifecycle events) and a GUI for easier navigation. Use Instruments for detailed analysis or when sample’s output is too coarse.
Use Case: Use sample for quick, lightweight profiling or when debugging non-developer apps without a debugger. Use Instruments for in-depth analysis or GUI-based exploration.

Tips for Effective Use

Focus on Bottlenecks: Look for functions with high sample counts in the call graph, as they indicate time-intensive code.
Invert Call Trees: Tools like filtercalltree can invert the call tree to focus on leaf functions (where time is actually spent) rather than parent functions.
Combine with Other Tools: Use sample alongside Activity Monitor to identify high-CPU processes, then drill down with sample for function-level details.
Trigger Workloads: For specific operations (e.g., rendering a view), trigger the operation repeatedly during sampling to ensure relevant code appears in the call graph.

Example Workflow

To profile a hanging application (e.g., PID 1234):

Run: sudo sample 1234 10 1 -file hang_profile.txt
Open hang_profile.txt in a text editor or use filtercalltree to simplify the call tree.
Look for functions with high sample counts or repeated appearances in the stack.
Optionally, generate a FlameGraph for visualization.

Conclusion

The sample tool is a powerful, lightweight option for profiling on macOS with M-series chips, offering function timing and call graphs with minimal setup. While it lacks the depth of Instruments, its simplicity makes it ideal for quick diagnostics, especially for non-developer apps or command-line workflows. For advanced profiling (e.g., hardware counters, thread-specific data), complement sample with Instruments or other tools like DTrace.

macOS:sample - chunhualiao/public-docs GitHub Wiki

Overview of the sample Tool

Key Features

Using sample on M-Series Chips

How to Use sample