linux perf - animeshtrivedi/notes GitHub Wiki

Bookmarks

Perf locking analysis

Perf metric groups

https://lpc.events/event/17/contributions/1514/attachments/1335/2675/LPC%202023%20Linux%20Perf%20Metrics%20Talk%20Slides.pdf (local copy: file:///home/atr/Downloads/2023-LPC-Linux%20Perf%20Tool%20Metrics%202023%20Linux%20Perf%20Metrics%20Talk%20Slides.pdf)

Composite event class, to see how they are calculated

$ sudo perf list --details 
[...]
Metric Groups:

branch_prediction:
  branch_misprediction_ratio
       [Execution-Time Branch Misprediction Ratio (Non-Speculative)]
       [d_ratio(ex_ret_brn_misp, ex_ret_brn)]

data_fabric:
  all_remote_links_outbound
       [Approximate: Outbound data bytes for all Remote Links for a node (die)]
       [remote_outbound_data_controller_0 + remote_outbound_data_controller_1 + remote_outbound_data_controller_2 + remote_outbound_data_controller_3]
  nps1_die_to_dram
       [Approximate: Combined DRAM B/bytes of all channels on a NPS1 node (die)]
       [dram_channel_data_controller_0 + dram_channel_data_controller_1 + dram_channel_data_controller_2 + dram_channel_data_controller_3 +
        dram_channel_data_controller_4 + dram_channel_data_controller_5 + dram_channel_data_controller_6 + dram_channel_data_controller_7]

decoder:
  macro_ops_dispatched
       [Macro-ops Dispatched]
       [de_dis_cops_from_decoder.disp_op_type.any_integer_dispatch + de_dis_cops_from_decoder.disp_op_type.any_fp_dispatch]

l2_cache:
  all_l2_cache_accesses
       [All L2 Cache Accesses]
       [l2_request_g1.all_no_prefetch + l2_pf_hit_l2 + l2_pf_miss_l2_hit_l3 + l2_pf_miss_l2_l3]
  all_l2_cache_hits
       [All L2 Cache Hits]
       [l2_cache_req_stat.ic_dc_hit_in_l2 + l2_pf_hit_l2]
  all_l2_cache_misses
       [All L2 Cache Misses]
       [l2_cache_req_stat.ic_dc_miss_in_l2 + l2_pf_miss_l2_hit_l3 + l2_pf_miss_l2_l3]
  ic_fetch_miss_ratio
       [Instruction Cache (32B) Fetch Miss Ratio]
       [d_ratio(ic_tag_hit_miss.instruction_cache_miss, ic_tag_hit_miss.all_instruction_cache_accesses)]
  l2_cache_accesses_from_l2_hwpf
       [L2 Cache Accesses from L2 HWPF]
       [l2_pf_hit_l2 + l2_pf_miss_l2_hit_l3 + l2_pf_miss_l2_l3]
  l2_cache_misses_from_l2_hwpf
       [L2 Cache Misses from L2 Cache HWPF]
       [l2_pf_miss_l2_hit_l3 + l2_pf_miss_l2_l3]
  op_cache_fetch_miss_ratio
       [Op Cache (64B) Fetch Miss Ratio]
       [d_ratio(op_cache_hit_miss.op_cache_miss, op_cache_hit_miss.all_op_cache_accesses)]

l3_cache:
  l3_read_miss_latency
       [Average L3 Read Miss Latency (in core clocks)]
       [xi_sys_fill_latency * 16 / xi_ccx_sdp_req1]

tlb:
  l1_itlb_misses
       [L1 ITLB Misses]
       [bp_l1_tlb_miss_l2_tlb_hit + bp_l1_tlb_miss_l2_tlb_miss]
[...]

Metric group only works with stats (not with record, for that you need to use the raw events)

sudo perf stat -M 
    -M, --metrics <metric/metric group list>
                          monitor specified metrics or metric groups (separated by ,)

So for CPU cache L2 monitoring with memset:

atr@f20u24:~/src/python-fio-automation/tmp/l2-profile$ sudo perf stat -M all_l2_cache_accesses,all_l2_cache_misses,all_l2_cache_hits -p `pidof memset`  -- sleep 10

 Performance counter stats for process id '5866':

           729,753      l2_pf_miss_l2_l3                 # 13335140259.00 all_l2_cache_accesses  (66.65%)
     1,496,264,279      l2_pf_hit_l2                                                            (66.65%)
     3,596,871,668      l2_pf_miss_l2_hit_l3                                                    (66.65%)
     8,241,274,559      l2_request_g1.all_no_prefetch                                           (66.65%)
           741,878      l2_pf_miss_l2_l3                 # 7508684886.00 all_l2_cache_misses    (33.42%)
     7,167,909,193      l2_pf_miss_l2_hit_l3                                                    (33.42%)
       340,033,815      l2_cache_req_stat.ic_dc_miss_in_l2                                        (33.42%)
     1,494,462,593      l2_pf_hit_l2                     # 9448133571.00 all_l2_cache_hits      (66.77%)
     7,953,670,978      l2_cache_req_stat.ic_dc_hit_in_l2                                        (66.77%)

      10.002230649 seconds time elapsed

$ formula 
  all_l2_cache_accesses
       [All L2 Cache Accesses]
       [l2_request_g1.all_no_prefetch + l2_pf_hit_l2 + l2_pf_miss_l2_hit_l3 + l2_pf_miss_l2_l3]
  all_l2_cache_hits
       [All L2 Cache Hits]
       [l2_cache_req_stat.ic_dc_hit_in_l2 + l2_pf_hit_l2]
  all_l2_cache_misses
       [All L2 Cache Misses]
       [l2_cache_req_stat.ic_dc_miss_in_l2 + l2_pf_miss_l2_hit_l3 + l2_pf_miss_l2_l3]

To remove statistical sampling, I dropped some counters (do an -r, then :

atr@f20u24:~/src/python-fio-automation/tmp/l2-profile$ sudo perf stat -M all_l2_cache_accesses,all_l2_cache_hits -p `pidof memset` -r 3 -- sleep 10

 Performance counter stats for process id '5909' (3 runs):

           700,043      l2_pf_miss_l2_l3                 # 19014233156.00 all_l2_cache_accesses  ( +-  1.59% )
     1,981,499,478      l2_pf_hit_l2                                                            ( +-  0.04% )
     7,866,034,241      l2_pf_miss_l2_hit_l3                                                    ( +-  0.03% )
     9,165,999,394      l2_request_g1.all_no_prefetch                                           ( +-  0.03% )
     1,981,499,664      l2_pf_hit_l2                     # 1981502953.00 all_l2_cache_hits      ( +-  0.04% )
             3,289      l2_cache_req_stat.ic_dc_hit_in_l2                                        ( +- 37.91% )

         10.001910 +- 0.000144 seconds time elapsed  ( +-  0.00% )

atr@f20u24:~/src/python-fio-automation/tmp/l2-profile$ 

Generating Flame Graphs

https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html

# git clone https://github.com/brendangregg/FlameGraph  # or download it from github
# cd FlameGraph
# perf record -F 99 -a -g -- sleep 60
# perf script | ./stackcollapse-perf.pl > out.perf-folded
# ./flamegraph.pl out.perf-folded > perf.svg
# firefox perf.svg  # or chrome, etc.

There is a script here: python-fio-automation/flame-graphs/cat sh-generate-graph.sh

July 30th, 2024

Mostly works out of the box many things. I have not tested it comprehensively.

How to compile with RH-DEB issues

  • I can not build a deb package, so I just have to untar the source code which was generated during the build process and make the tool in there. The code is untar in /usr/src/linux/
  • Some issues in the make, but mostly ok, see here

Here is the command sequence Building Linux Perf tool:

sudo apt install libzstd1 libdwarf-dev libdw-dev binutils-dev libcap-dev libelf-dev libnuma-dev python3 python3-dev libssl-dev libunwind-dev libdwarf-dev zlib1g-dev liblzma-dev libaio-dev libtraceevent-dev debuginfod libpfm4-dev libslang2-dev systemtap-sdt-dev libperl-dev binutils-dev libbabeltrace-dev libiberty-dev libzstd-dev libpopt-dev libnl-3-dev libcapstone-dev 

# I gave up building others as they take much time. 
make tools/perf 

Example run for collection and mapping traces:

atr@u24clean:~/tmp$ sudo /usr/src/linux-6.9.0-atr-2024-07-05/tools/perf/perf record -e cycles -- sleep  10 
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data (10 samples) ]
atr@u24clean:~/tmp$ sudo /usr/src/linux-6.9.0-atr-2024-07-05/tools/perf/perf report -n -m --stdio --full-source-path --source -s symbol,srcline 
# To display the perf.data header info, please use --header/--header-only options.
#
addr2line /usr/lib/debug/boot/vmlinux-6.9.0-atr-2024-07-05: could not read first record
addr2line /usr/lib/debug/boot/vmlinux-6.9.0-atr-2024-07-05: could not read first record
#
# Total Lost Samples: 0
#
# Samples: 10  of event 'cycles'
# Event count (approx.): 1178318
#
# Overhead       Samples  Symbol                          Source:Line                                                 IPC   [IPC Coverage]
# ........  ............  ..............................  ..........................................................  ....................
#
    26.36%             1  [k] pud_val                     /home/atr/src/linux/mm/memory.c:473                         -      -            
    24.27%             1  [k] ___slab_alloc               /home/atr/src/linux/mm/slub.c:3488                          -      -            
    19.48%             1  [k] mas_store_prealloc          mas_store_prealloc+102                                      -      -            
    13.58%             1  [k] kmem_cache_alloc_bulk       /home/atr/src/linux/mm/slub.c:4623                          -      -            
     8.07%             1  [.] dl_main                     ./elf/./setup-vdso.h:38                                     -      -            
     4.39%             1  [k] irqentry_exit_to_user_mode  irqentry_exit_to_user_mode+352                              -      -            
     2.02%             1  [k] psi_group_change            /home/atr/src/linux/kernel/sched/psi.c:238                  -      -            
     0.92%             1  [k] pwq_tryinc_nr_active        /home/atr/src/linux/kernel/workqueue.c:1808                 -      -            
     0.91%             1  [k] _raw_spin_lock_irq          /home/atr/src/linux/./arch/x86/include/asm/atomic.h:115     -      -            
     0.00%             1  [k] native_write_msr            /home/atr/src/linux/./arch/x86/include/asm/jump_label.h:27  -      -            


#
# (Tip: Show estimate cycles per function and IPC in annotate use perf record -b ... ; perf report --total-cycles)
#
(reverse-i-search)`dd': git a^C src/nvme-cdfs-iops/single-ssd.py 
atr@u24clean:~/tmp$ 

NOTE: addr2line takes a long time, hence, we need to be patient.

Commands from ZR:

perf record -e instructions -F 99 -o $PERF_RES/${e}.perf.out CMD-OF-WHAT-TO-RECORD
perf report --vmlinux $VMLINUX -n -m --stdio --full-source-path --source -s symbol -i $PERF_RES/${e}.perf.out >> $PERF_PARSED_LIST/perf_parsed_${e}.txt

For sudo issues in the path, I made an alias

alias sudo='sudo '
alias perf='/usr/src/linux-6.9.0-atr-2024-07-05/tools/perf/perf' 

Unresolved

Notes taken on 5.12 kernel

  • [Resolved] full path expansion of the kernel symbols. It use to be there, now it just shows the binary file location. Not the source file location where the symbol comes from. Somehow I remember -P use to do that.
atr@node1:/home/atr/4tb/file-io$ sudo perf kallsyms -v pick_next_entity  
/proc/{kallsyms,modules} inconsistency while looking for "[bpf]" module!
/proc/{kallsyms,modules} inconsistency while looking for "[bpf]" module!
/proc/{kallsyms,modules} inconsistency while looking for "[bpf]" module!
/proc/{kallsyms,modules} inconsistency while looking for "[bpf]" module!
/proc/{kallsyms,modules} inconsistency while looking for "[bpf]" module!
/proc/{kallsyms,modules} inconsistency while looking for "[bpf]" module!
/proc/{kallsyms,modules} inconsistency while looking for "[bpf]" module!
/proc/{kallsyms,modules} inconsistency while looking for "[bpf]" module!
pick_next_entity: [kernel] [kernel.kallsyms] 0xffffffffbd0e0fb0-0xffffffffbd0e1100 (0xffffffffbd0e0fb0-0xffffffffbd0e1100)
atr@node1:/home/atr/4tb/file-io$ 

See: https://github.com/iovisor/bcc/issues/2872

Perf resources

Further setup when symbols are not found: https://stackoverflow.com/questions/10933408/how-can-i-get-perf-to-find-symbols-in-my-program

1. compile the code with -g 
2. record -g dwarf -F 97 /path/to/my/program
3. echo 0 > /proc/sys/kernel/kptr_restrict

Setup

Link the source code to /usr/src/linux, also it takes the vmlinux file there.

Recording the instructions

sudo perf record -e instructions -a -F 99 -g -v --vmlinux /usr/src/linux/vmlinux -n -- ./perf-file-creat-test /mnt/sdb/atr/file-io/ 10000
  • -n stopped it from sampling, with -F 99 frequency, hence, no samples
  • Now the profile has all cumulative symbols with -g

Remove the call graph information, and the individual symbols are there:

sudo perf record -e instructions -a -v --vmlinux /usr/src/linux/vmlinux -- ./perf-file-creat-test /mnt/sdb/atr/file-io/ 10000

Remove the call graph, and use --stdio interface, there it is more intuitively to see how things are cumulative

Report

sudo perf report -k /usr/src/linux/vmlinux -n -m --demangle-kernel --full-source-path --source --stdio

Annotation (somehow ugly!)

This shows the annotated code, a bit

sudo perf annotate -k /usr/src/linux/vmlinux -n -m  --source --stdio

Generating full path symbols

sort on based on the srcline and then it generates the complete kernel source path.

sudo perf report --vmlinux /usr/src/linux/vmlinux -n -m --stdio --full-source-path --source -s symbol,srcline

Example:

# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 52K of event 'instructions'
# Event count (approx.): 3364033240
#
# Overhead       Samples  Symbol                                       Source:Line                                                                                         IPC   [IPC Coverage]
# ........  ............  ...........................................  ..................................................................................................  ....................
#
     2.18%           930  [k] fscrypt_match_name                       /mnt/sdb/atr/src/linux/fs/crypto/fname.c:488                                                        -      -            
     2.06%           854  [k] fscrypt_match_name                       /mnt/sdb/atr/src/linux/fs/crypto/fname.c:505                                                        -      -            
     1.40%           886  [k] _raw_spin_lock                           /mnt/sdb/atr/src/linux/./arch/x86/include/asm/atomic.h:202                                          -      -            
     1.27%           552  [k] __ext4_check_dir_entry                   /mnt/sdb/atr/src/linux/fs/ext4/dir.c:108                                                            -      -            
     0.98%           429  [k] ext4_find_dest_de                        /mnt/sdb/atr/src/linux/fs/ext4/namei.c:1948                                                         -      -            
     0.88%           394  [k] __ext4_check_dir_entry                   /mnt/sdb/atr/src/linux/fs/ext4/dir.c:71                                                             -      -            
     0.78%           321  [k] ext4_search_dir                          /mnt/sdb/atr/src/linux/fs/ext4/namei.c:1394                                                         -      -            
     0.77%           452  [k] update_nohz_stats                        /mnt/sdb/atr/src/linux/kernel/sched/fair.c:8362                                                     -      -            
     0.72%           493  [k] jbd2_journal_write_revoke_records        /mnt/sdb/atr/src/linux/fs/jbd2/revoke.c:549                                                         -      -            
     0.70%           497  [k] _raw_spin_lock_irqsave                   /mnt/sdb/atr/src/linux/./arch/x86/include/asm/atomic.h:202                                          -      -            
     0.70%           517  [k] read_tsc                                 /mnt/sdb/atr/src/linux/./arch/x86/include/asm/msr.h:234                                             -      -            
     0.55%           385  [k] native_sched_clock                       /mnt/sdb/atr/src/linux/./arch/x86/include/asm/msr.h:205                                             -      -            
     0.53%           224  [k] memcmp                                   /mnt/sdb/atr/src/linux/lib/string.c:938                                                             -      -            
     0.52%           223  [k] fscrypt_match_name                       /mnt/sdb/atr/src/linux/fs/crypto/fname.c:493                                                        -      -            
     0.50%           215  [k] ext4_find_dest_de                        /mnt/sdb/atr/src/linux/fs/ext4/namei.c:1955                                                         -      -            
     0.46%           305  [k] __switch_to_asm                          /mnt/sdb/atr/src/linux/arch/x86/entry/entry_64.S:253                                                -      -            
     0.45%           191  [k] fscrypt_match_name                       /mnt/sdb/atr/src/linux/./include/linux/fortify-string.h:235                                         -      -            
     0.44%           300  [k] jbd2_clear_buffer_revoked_flags          /mnt/sdb/atr/src/linux/fs/jbd2/revoke.c:494                                                         -      -            
     0.44%           187  [k] ext4_find_dest_de                        /mnt/sdb/atr/src/linux/fs/ext4/namei.c:1375                                                         -      -            
     0.39%           159  [k] ext4_search_dir                          /mnt/sdb/atr/src/linux/fs/ext4/namei.c:1360                                                         -      -            
     0.37%           229  [k] _raw_read_lock                           /mnt/sdb/atr/src/linux/./include/asm-generic/qrwlock.h:79                                           -      -            
     0.35%           150  [k] __ext4_check_dir_entry                   /mnt/sdb/atr/src/linux/fs/ext4/dir.c:88                                                             -      -            
     0.35%           146  [k] __ext4_check_dir_entry                   /mnt/sdb/atr/src/linux/fs/ext4/dir.c:77                                                             -      -            
     0.34%           304  [k] menu_select                              /mnt/sdb/atr/src/linux/drivers/cpuidle/governors/menu.c:236                                         -      -            
     0.33%           142  [k] fscrypt_match_name                       /mnt/sdb/atr/src/linux/fs/crypto/fname.c:496                                                        -      -