linux development - animeshtrivedi/notes GitHub Wiki

Notes on

Keep track of Linux kernel feature releases

Notes on updates (v6.6)

Faster asynchronous Direct I/O using io_uring, https://kernelnewbies.org/Linux_6.6#Faster_asynchronous_Direct_I.2FO_using_io_uring
- There are two optimizations that have an effect on the performance:
  - Reduce lock contention
  - work wakeup optimization
- There are also some cache optimizations that might affect the performance:
User xattrs and direct IO https://kernelnewbies.org/Linux_6.6#TMPFS

Bookmarks

Kernel Concurrency References, https://hackmd.io/@0xff07/linux-concurrency/%2F%400xff07%2FSk-G0xhY6
Crash white paper: https://crash-utility.github.io/crash_whitepaper.html
kernel boot params: https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
Elixir: https://elixir.bootlin.com/linux/v6.9/source

format of `/proc/kallsyms`

cat /proc/kallsyms | grep 'memset'
0000000000000000 t __pfx_text_poke_memset
0000000000000000 t text_poke_memset
0000000000000000 T __pfx_memset_io
0000000000000000 T memset_io
0000000000000000 T __pfx___memset
0000000000000000 T __pfx_memset
0000000000000000 T memset
0000000000000000 T __memset
0000000000000000 t __pfx_memset_orig
0000000000000000 t memset_orig
0000000000000000 r __ksymtab___memset
0000000000000000 r __ksymtab_memset
0000000000000000 r __ksymtab_memset_io
0000000000000000 t time_nsec_memset_show	[null_ablk]
0000000000000000 b time_nsec_memset	[null_ablk]
0000000000000000 t num_fcount_memset_show	[null_ablk]
0000000000000000 b num_fcount_memset	[null_ablk]
0000000000000000 d kobj_num_fcount_memset	[null_ablk]
0000000000000000 d kobj_time_nsec_memset	[null_ablk]
0000000000000000 t __pfx_time_nsec_memset_show	[null_ablk]
0000000000000000 t __pfx_num_fcount_memset_show	[null_ablk]
0000000000000000 t __pfx_memset_probe2	[null_ablk]
0000000000000000 t memset_probe2	[null_ablk]
0000000000000000 t memset_extent_buffer	[btrfs]
0000000000000000 t memset_extent_buffer.cold	[btrfs]
0000000000000000 t __pfx_memset_extent_buffer	[btrfs]

https://man7.org/linux/man-pages/man1/nm.1.html: If lowercase, the symbol is usually local; if uppercase, the symbol is global (external). There are however a few lowercase symbols that are shown for special global symbols ("u", "v" and "w").

"B"
"b" The symbol is in the BSS data section.  This section
               typically contains zero-initialized or uninitialized
               data, although the exact behavior is system dependent.
"D"
"d" The symbol is in the initialized data section.

"T"
"t" The symbol is in the text (code) section.

"R"
"r" The symbol is in a read only data section.

T means that symbol is globally visible, and can be used in other kernel's code. https://stackoverflow.com/questions/39120818/what-is-the-difference-between-t-and-t-in-proc-kallsyms

kernel module symbols not showing up in perf

https://stackoverflow.com/questions/44326565/perf-kernel-module-symbols-not-showing-up-in-profiling

make sure to add to gcc -g -fno-omit-frame-pointer

Then make install so that the module shows up in /lib/modules/uname -r/[extra|updates]/

Also do not forget to do, sudo depmod -a afterwards.

See the OOT-nullblk Makefile/Kbuild files.

Enable/disable IOMMU and check

boot time parameter set in /etc/default/grub and then update-grub2

BOOT_IMAGE=/boot/vmlinuz-6.9.0-atr root=UUID=615a8273-ed80-47b3-87ff-d967f08e23af ro amd_pstate=disable amd_prefcore=disable cpuidle.off=1 cpufreq.off=1 processor.max_cstate=0 idle=halt nosmt=force iommu=off crashkernel=512M-:192M

checking : https://stackoverflow.com/questions/44286683/check-for-iommu-support-on-linux

Not enabled:

$ sudo find /sys | grep dmar
$

Enabled:

$ sudo find /sys | grep dmar
/sys/class/iommu/dmar2
/sys/class/iommu/dmar0
/sys/class/iommu/dmar3
/sys/class/iommu/dmar1
[...]

Check debug symbol information

Compile with -fno-omit-frame-pointer -g
check with objdump --syms or file

animesh.trivedi@flex20:~/fio$ objdump --syms /usr/bin/fio

/usr/bin/fio:     file format elf64-x86-64

SYMBOL TABLE:
no symbols


animesh.trivedi@flex20:~/fio$ file /usr/bin/fio
/usr/bin/fio: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=cc5e0dd0e9922054dbdc229d347e67e887eff56f, for GNU/Linux 3.2.0, stripped
animesh.trivedi@flex20:~/fio$ file `which fio`
/home/animesh.trivedi/local/bin//fio: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=0f4edf72789f267f27cc426deb210a916cd47bc0, for GNU/Linux 3.2.0, with debug_info, not stripped
animesh.trivedi@flex20:~/fio$ objdump --syms ./fio

./fio:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    df *ABS*	0000000000000000              Scrt1.o
00000000000003c4 l     O .note.ABI-tag	0000000000000020              __abi_tag
0000000000000000 l    df *ABS*	0000000000000000              gettime.c
000000000002c450 l     F .text	0000000000000038              clock_cmp
000000000002c490 l     F .text	00000000000001e0              clock_thread_fn
000000000002c670 l     F .text	0000000000000025              fio_get_mono_time.part.0
00000000000bbfe0 l     O .rodata	0000000000000012              __PRETTY_FUNCTION__.2
000000000002c6a0 l     F .text	00000000000001e8              __fio_gettime
00000000001d0190 l     O .bss	0000000000000004              cycles_wrap
00000000001d01b8 l     O .bss	0000000000000008              cycles_start
00000000001d0194 l     O .bss	0000000000000004              max_cycles_shift
[..]

`swapon` on copy-on-write file systems

sudo btrfs filesystem mkswapfile --size 4g --uuid clear ~/swapfile 
sudo swapon -p 101 ~/swapfile 
swapon

https://forum.garudalinux.org/t/create-a-swapfile-afterwards-problems-with-btrfs/34326

details of a CPU (cpuid command)

cpuid command is useful to extract microarchitectural features of a CPU https://linux.die.net/man/1/cpuid

sudo apt-get install cpuid

Setup `hugetlbfs`

Redhat writeup

# reserve some pages 
echo 512 > /proc/sys/vm/nr_hugepages
# then mount the file system 
mount -t hugetlbfs -o uid=$USER,mode=700,pagesize=2M,size=2G none ~/mnt/hugetlbfs/

Compiling kernel module

Excellent introduction: https://sysprog21.github.io/lkmpg/ (https://github.com/sysprog21)

When compiling kernel module, it will inherit all the defined modules from /usr/src/linux-headers-xxx/include/generated/autoconf.h

Compile kernel from source Ubuntu

https://wiki.ubuntu.com/KernelTeam/GitKernelBuild

make -j $(getconf _NPROCESSORS_ONLN) deb-pkg LOCALVERSION=-custom

Where is BTF information?

-> Kernel hacking
-> Compile-time checks and compiler options
-> Generate BTF typeinfo (DEBUG_INFO_BTF [=n])

Sysbench benchmarking

https://github.com/akopytov/sysbench

sysbench --threads=128 --time=10 memory --memory-total-size=1T --memory-block-size=$((128*1024)) --memory-scope=global --memory-access-mode=rnd --memory-oper=read run

Enable/disable power and energy features in the kernel

Excellent write up with experiments: https://wiki.linuxfoundation.org/realtime/documentation/howto/applications/cpuidle
https://docs.kernel.org/admin-guide/pm/cpuidle.html
https://vstinner.github.io/intel-cpus.html
https://wiki.archlinux.org/title/CPU_frequency_scaling

AMD: amd_pstate=disable amd_prefcore=disable cpuidle.off=1 cpufreq.off=1 processor.max_cstate=0 idle=halt

Intel: intel_pstate=disable cpuidle.off=1 cpufreq.off=1 processor.max_cstate=0 intel_idle.max_cstate=0 idle=halt

There are four CPUIdle governors available, menu, TEO, ladder and haltpoll. Which of them is used by default depends on the configuration of the kernel and in particular on whether or not the scheduler tick can be stopped by the idle loop. Available governors can be read from the available_governors, and the governor can be changed at runtime. The name of the CPUIdle governor currently used by the kernel can be read from the current_governor_ro or current_governor file under /sys/devices/system/cpu/cpuidle/ in sysfs. Which CPUIdle driver is used, on the other hand, usually depends on the platform the kernel is running on, but there are platforms with more than one matching driver. For example, there are two drivers that can work with the majority of Intel platforms, intel_idle and acpi_idle, one with hardcoded idle states information and the other able to read that information from the system’s ACPI tables, respectively. Still, even in those cases, the driver chosen at the system initialization time cannot be replaced later, so the decision on which one of them to use has to be made early (on Intel platforms the acpi_idle driver will be used if intel_idle is disabled for some reason or if it does not recognize the processor). The name of the CPUIdle driver currently used by the kernel can be read from the current_driver file under /sys/devices/system/cpu/cpuidle/ in sysfs.

tickless vs ticked system with idle state management

The kernel can be configured to disable stopping the scheduler tick in the idle loop altogether. That can be done through the build-time configuration of it (by unsetting the CONFIG_NO_HZ_IDLE configuration option) or by passing nohz=off to it in the command line. In both cases, as the stopping of the scheduler tick is disabled, the governor’s decisions regarding it are simply ignored by the idle loop code and the tick is never stopped.

If the given system is tickless, it will use the menu governor by default and if it is not tickless, the default CPUIdle governor on it will be ladder.

flex19:~$ sudo cat /sys/devices/system/cpu/cpuidle/current_governor
menu
flex19:~$ sudo cat /sys/devices/system/cpu/cpuidle/current_governor_ro 
menu
flex19:~$ sudo ls  /sys/devices/system/cpu/cpuidle/
available_governors  current_driver  current_governor  current_governor_ro
flex19:~$ sudo cat /sys/devices/system/cpu/cpuidle/available_governors 
ladder menu teo 
flex19:~$ sudo cat /sys/devices/system/cpu/cpuidle/current_driver 
acpi_idle
animesh.trivedi@flex19:~$ cat /boot/config-`uname -r` | grep CONFIG_NO_HZ_IDLE 
CONFIG_NO_HZ_IDLE=y

Kernel/AMD modules:

amd-uncore 
amd-pstate 
amd_freq_sensitivity
# 
sudo modprobe -v amd_pstate
# Does not do anything?

Different states and details

For each CPU in the system, there is a /sys/devices/system/cpu/cpu/cpuidle/ directory in sysfs, where the number is assigned to the given CPU at the initialization time. That directory contains a set of subdirectories called state0, state1 and so on, up to the number of idle state objects defined for the given CPU minus one. Each of these directories corresponds to one idle state object and the larger the number in its name, the deeper the (effective) idle state represented by it.

/sys/devices/system/cpu/cpu0/cpuidle/

idle=??? What does it mean?

The x86 architecture support code recognizes three kernel command line options related to CPU idle time management: idle=poll, idle=halt, and idle=nomwait. The first two of them disable the acpi_idle and intel_idle drivers altogether, which effectively causes the entire CPUIdle subsystem to be disabled and makes the idle loop invoke the architecture support code to deal with idle CPUs. How it does that depends on which of the two parameters is added to the kernel command line. In the idle=halt case, the architecture support code will use the HLT instruction of the CPUs (which, as a rule, suspends the execution of the program and causes the hardware to attempt to enter the shallowest available idle state) for this purpose, and if idle=poll is used, idle CPUs will execute a more or less “lightweight” sequence of instructions in a tight loop. [Note that using idle=poll is somewhat drastic in many cases, as preventing idle CPUs from saving almost any energy at all may not be the only effect of it. For example, on Intel hardware it effectively prevents CPUs from using P-states (see CPU Performance Scaling) that require any number of CPUs in a package to be idle, so it very well may hurt single-thread computations performance as well as energy-efficiency. Thus using it for performance reasons may not be a good idea at all.] The idle=nomwait option prevents the use of MWAIT instruction of the CPU to enter idle states. When this option is used, the acpi_idle driver will use the HLT instruction instead of MWAIT. On systems running Intel processors, this option disables the intel_idle driver and forces the use of the acpi_idle driver instead. Note that in either case, acpi_idle driver will function only if all the information needed by it is in the system’s ACPI tables.

Grub menu entry update

How can I boot with an older kernel version? What does GRUB_DEFAULT="1>2" mean?

ubuntu:~$ sudo grub-mkconfig | grep -iE "menuentry 'Ubuntu, with Linux" | awk '{print i++ " : "$1, $2, $3, $4, $5, $6, $7}'

0 : menuentry 'Ubuntu, with Linux 5.4.0-80-generic' --class ubuntu
1 : menuentry 'Ubuntu, with Linux 5.4.0-80-generic (recovery mode)'
2 : menuentry 'Ubuntu, with Linux 4.15.0-159-generic' --class ubuntu
3 : menuentry 'Ubuntu, with Linux 4.15.0-159-generic (recovery mode)'
4 : menuentry 'Ubuntu, with Linux 4.15.0-45-generic' --class ubuntu
5 : menuentry 'Ubuntu, with Linux 4.15.0-45-generic (recovery mode)'

Modify the GRUB_DEFAULT=0 value as per your need. Currently my server booted with 5.4.0-80-generic

ubuntu:~# uname -srn
Linux ubuntu 5.4.0-80-generic

so i want to boot my system with 4.15.0-45-generic which is menu entry 4

modified GRUB_DEFAULT="1>4" value in /etc/default/grub executed below command to regenerate a grub config file with modified GRUB_DEFAULT settings.

Explained "1>4" format here

sudo update-grub
sudo systemctl reboot

post reboot my ubuntu server booted with old kernel 4.15.0-45-generic

ubuntu:~# uname -srn
Linux ubuntu 4.15.0-45-generic

Setup tmpfs

sudo mount -t tmpfs -o size=32G,noswap,uid=$USER,mpol=prefer:0,huge=never animesh.trivedi ~/mnt/tmpfs/

Setup null-block device

https://docs.kernel.org/block/null_blk.html

sudo modprobe null_blk queue_mode=2 home_node=0 gb=32 bs=4096 nr_devices=1 irqmode=1 hw_queue_depth=8 use_per_node_hctx=1 memory_backed=1 cache_size=0 mbps=0 no_sched=1 blocking=0

2 = multi-queue
home_node=0 (0 NUMA node)
irq_mode=1 (uses IPI, only with mode 2 i.e. Timer it will simulate the latency injection with completion_nsec, completion_nsec=1000)
use_per_node_hctx=1 tells to use 1 queue:NUMA_NODE mode (otherwise, set this to 0, and then specify the queues using submit_queues)
- parm: use_per_node_hctx:Use per-node allocation for hardware context queues. Default: false (bool)
hw_queue_depth=8 (The hardware queue depth of the device)
memory_backed=1 (yes, do actual work is done)
no_sched=1 (no scheduler, 0 is MQ-sched)
blocking=? Register as a blocking blk-mq driver device, null_blk will set the BLK_MQ_F_BLOCKING flag, indicating that it sometimes/always needs to block in its ->queue_rq() function.

VScode Hacks

keyboard shortcuts

https://stackoverflow.com/questions/35424367/how-can-i-navigate-back-to-the-last-cursor-position-in-visual-studio-code

ctrl + shift + - (forward)
ctrl + alt + - (backward)

reload disconnected window

Reload window: https://stackoverflow.com/questions/60714159/is-there-a-way-to-reconnect-to-a-disconnected-vs-code-remote-ssh-connection ctl + shift + P and then "reload window"

include path to compile the kernel

VScode has the following in the include path that needs to be expanded in order to compile the kernel source module.

include path expanded here:

${workspaceFolder}/**
/usr/src/linux-headers-6.9.0-atr-2024-07-05/arch/x86/include/
/usr/src/linux-headers-6.9.0-atr-2024-07-05/arch/x86/include/generated/
/usr/src/linux-headers-6.9.0-atr-2024-07-05/include/
/usr/src/linux-headers-6.9.0-atr-2024-07-05/arch/x86/include/uapi/
/usr/src/linux-headers-6.9.0-atr-2024-07-05/arch/x86/include/generated/uapi/
/usr/src/linux-headers-6.9.0-atr-2024-07-05/include/linux/

also in the defined:

__GNUC__
__KERNEL__
MODULE

Debug build with Make

make -n

Debugging common crash commands/tricks

Crash commands are documented here: https://crash-utility.github.io/help_pages/mod.html (or crash> help mod)

There are some options on changing the dump format, see Makedumpfile options: vim /etc/default/kdump-tools https://hackmd.io/@0xff07/S1ASmzgun#Optional-Set-dump-file-format-in-etc

memory size: etc/default/grub.d/kdump-tools.cfg or directly in the grub file, and then do update-grub2.

Ok, the key problem is what is causing the kernel crashing with the nvmev kernel module:

(inconclusive)

Step-1 : Can we find the precise step where the fault is happening?

crash> bt
PID: 1164     TASK: ffffa01f82b02f40  CPU: 2    COMMAND: "insmod"
 #0 [ffffbe4e40e7f6f0] machine_kexec at ffffffff9d095df0
 #1 [ffffbe4e40e7f748] __crash_kexec at ffffffff9d2335df
 #2 [ffffbe4e40e7f808] crash_kexec at ffffffff9d233b84
 #3 [ffffbe4e40e7f810] oops_end at ffffffff9d043d44
 #4 [ffffbe4e40e7f830] page_fault_oops.cold at ffffffff9e0d1913
 #5 [ffffbe4e40e7f8b8] exc_page_fault at ffffffff9e18d20e
 #6 [ffffbe4e40e7f8e0] asm_exc_page_fault at ffffffff9e2012a6
    [exception RIP: NVMEV_PCI_INIT+346]
    RIP: ffffffffc0c913fa  RSP: ffffbe4e40e7f990  RFLAGS: 00010282
    RAX: 0000000000000000  RBX: ffffa01f84064000  RCX: 00000000ffffffff
    RDX: ffffffffc0c2d880  RSI: ffffffffc0c2d8c0  RDI: 0000000000000010
    RBP: 0000000000032040   R8: 0000000000000000   R9: ffffbe4e40e7f8e8
    R10: ffffffff9eaffe10  R11: 0000000000000000  R12: ffffffffffffffff
    R13: ffffa01f84064000  R14: ffffa01f90ae5740  R15: ffffa01f85e621c0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffbe4e40e7f9c0] init_module at ffffffffc0c97310 [nvmev]
 #8 [ffffbe4e40e7f9f0] do_one_initcall at ffffffff9d002a88
 #9 [ffffbe4e40e7fa60] do_init_module at ffffffff9d1fb7a0
#10 [ffffbe4e40e7fa80] init_module_from_file at ffffffff9d1fe626
#11 [ffffbe4e40e7fb30] idempotent_init_module at ffffffff9d1fe791
#12 [ffffbe4e40e7fbb8] __x64_sys_finit_module at ffffffff9d1fea1e
#13 [ffffbe4e40e7fbe8] do_syscall_64 at ffffffff9e1864b2
#14 [ffffbe4e40e7ff50] entry_SYSCALL_64_after_hwframe at ffffffff9e20012f
    RIP: 00007f987af2725d  RSP: 00007ffda87abb18  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000561c82191760  RCX: 00007f987af2725d
    RDX: 0000000000000000  RSI: 0000561c821912a0  RDI: 0000000000000003
    RBP: 00007ffda87abbd0   R8: 0000000000000040   R9: 0000000000000000
    R10: 00007f987b003b20  R11: 0000000000000246  R12: 0000561c821912a0
    R13: 0000000000000000  R14: 0000561c82191730  R15: 0000561c821912a0
    ORIG_RAX: 0000000000000139  CS: 0033  SS: 002b
crash>

With this bt, we get the RIP at ffffffffc0c913fa, so if we look it up:

crash> sym ffffffffc0c913fa
ffffffffc0c913fa (T) NVMEV_PCI_INIT+346 [nvmev] /usr/src/linux-headers-6.9.0-atr-2024-07-05/./include/linux/topology.h: 96
crash>

Step-2, how did we get to this offending instruction? -r is for

-r  (reverse) displays all instructions from the start of the 
routine up to and including the designated address.

so with that we get:

crash> dis -lr NVMEV_PCI_INIT+346
/home/atr/src/nvmevirt/pci.c: 616
0xffffffffc0c912a0 <NVMEV_PCI_INIT>:    nopw   (%rax)
0xffffffffc0c912a4 <NVMEV_PCI_INIT+4>:  nopl   0x0(%rax,%rax,1)
/home/atr/src/nvmevirt/pci.c: 617
0xffffffffc0c912a9 <NVMEV_PCI_INIT+9>:  movabs $0x10000201100c51,%rsi
/home/atr/src/nvmevirt/pci.c: 616
0xffffffffc0c912b3 <NVMEV_PCI_INIT+19>: push   %r15
/home/atr/src/nvmevirt/pci.c: 484
0xffffffffc0c912b5 <NVMEV_PCI_INIT+21>: movabs $0xffeffffd00000000,%rdx
/home/atr/src/nvmevirt/pci.c: 616
0xffffffffc0c912bf <NVMEV_PCI_INIT+31>: push   %r14
0xffffffffc0c912c1 <NVMEV_PCI_INIT+33>: push   %r13
0xffffffffc0c912c3 <NVMEV_PCI_INIT+35>: push   %r12
0xffffffffc0c912c5 <NVMEV_PCI_INIT+37>: push   %rbp
/usr/src/linux-headers-6.9.0-atr-2024-07-05/./include/linux/topology.h: 96
0xffffffffc0c912c6 <NVMEV_PCI_INIT+38>: mov    $0x32040,%rbp
/home/atr/src/nvmevirt/pci.c: 616
0xffffffffc0c912cd <NVMEV_PCI_INIT+45>: push   %rbx
/home/atr/src/nvmevirt/pci.c: 617
0xffffffffc0c912ce <NVMEV_PCI_INIT+46>: mov    0x10(%rdi),%rax
/home/atr/src/nvmevirt/pci.c: 616
0xffffffffc0c912d2 <NVMEV_PCI_INIT+50>: mov    %rdi,%rbx
/home/atr/src/nvmevirt/pci.c: 617
0xffffffffc0c912d5 <NVMEV_PCI_INIT+53>: mov    0x40(%rdi),%rcx
/home/atr/src/nvmevirt/pci.c: 493
0xffffffffc0c912d9 <NVMEV_PCI_INIT+57>: and    (%rax),%rdx
0xffffffffc0c912dc <NVMEV_PCI_INIT+60>: movb   $0x0,0xe(%rax)
/home/atr/src/nvmevirt/pci.c: 499
0xffffffffc0c912e0 <NVMEV_PCI_INIT+64>: or     %rsi,%rdx
/home/atr/src/nvmevirt/pci.c: 501
0xffffffffc0c912e3 <NVMEV_PCI_INIT+67>: mov    0x10(%rax),%esi
/home/atr/src/nvmevirt/pci.c: 495
0xffffffffc0c912e6 <NVMEV_PCI_INIT+70>: movl   $0x1080201,0x8(%rax)
/home/atr/src/nvmevirt/pci.c: 502
0xffffffffc0c912ed <NVMEV_PCI_INIT+77>: mov    %rdx,(%rax)
/home/atr/src/nvmevirt/pci.c: 501
0xffffffffc0c912f0 <NVMEV_PCI_INIT+80>: mov    %rcx,%rdx
/home/atr/src/nvmevirt/pci.c: 504
0xffffffffc0c912f3 <NVMEV_PCI_INIT+83>: shr    $0x20,%rcx
/home/atr/src/nvmevirt/pci.c: 501
0xffffffffc0c912f7 <NVMEV_PCI_INIT+87>: shr    $0xe,%rdx
0xffffffffc0c912fb <NVMEV_PCI_INIT+91>: and    $0x3ff9,%esi
/home/atr/src/nvmevirt/pci.c: 504
0xffffffffc0c91301 <NVMEV_PCI_INIT+97>: mov    %ecx,0x14(%rax)
/home/atr/src/nvmevirt/pci.c: 529
0xffffffffc0c91304 <NVMEV_PCI_INIT+100>:        movabs $0x2000807f6011,%rcx
/home/atr/src/nvmevirt/pci.c: 501
0xffffffffc0c9130e <NVMEV_PCI_INIT+110>:        shl    $0xe,%edx
/home/atr/src/nvmevirt/pci.c: 507
0xffffffffc0c91311 <NVMEV_PCI_INIT+113>:        movq   $0x370d0c51,0x2c(%rax)
/home/atr/src/nvmevirt/pci.c: 501
0xffffffffc0c91319 <NVMEV_PCI_INIT+121>:        or     $0x4,%edx
/home/atr/src/nvmevirt/pci.c: 511
0xffffffffc0c9131c <NVMEV_PCI_INIT+124>:        movb   $0x40,0x34(%rax)
/home/atr/src/nvmevirt/pci.c: 501
0xffffffffc0c91320 <NVMEV_PCI_INIT+128>:        or     %esi,%edx
0xffffffffc0c91322 <NVMEV_PCI_INIT+130>:        mov    %edx,0x10(%rax)
/home/atr/src/nvmevirt/pci.c: 514
0xffffffffc0c91325 <NVMEV_PCI_INIT+133>:        mov    $0xf,%edx
0xffffffffc0c9132a <NVMEV_PCI_INIT+138>:        mov    %dx,0x3c(%rax)
/home/atr/src/nvmevirt/pci.c: 618
0xffffffffc0c9132e <NVMEV_PCI_INIT+142>:        mov    0x18(%rdi),%rdx
/home/atr/src/nvmevirt/pci.c: 524
0xffffffffc0c91332 <NVMEV_PCI_INIT+146>:        mov    (%rdx),%eax
0xffffffffc0c91334 <NVMEV_PCI_INIT+148>:        and    $0xfff80000,%eax
0xffffffffc0c91339 <NVMEV_PCI_INIT+153>:        or     $0x35001,%eax
0xffffffffc0c9133e <NVMEV_PCI_INIT+158>:        mov    %eax,(%rdx)
0xffffffffc0c91340 <NVMEV_PCI_INIT+160>:        movzbl 0x4(%rdx),%eax
0xffffffffc0c91344 <NVMEV_PCI_INIT+164>:        and    $0xfffffff4,%eax
0xffffffffc0c91347 <NVMEV_PCI_INIT+167>:        or     $0x8,%eax
0xffffffffc0c9134a <NVMEV_PCI_INIT+170>:        mov    %al,0x4(%rdx)
/home/atr/src/nvmevirt/pci.c: 619
0xffffffffc0c9134d <NVMEV_PCI_INIT+173>:        mov    0x20(%rdi),%rdx
/home/atr/src/nvmevirt/pci.c: 539
0xffffffffc0c91351 <NVMEV_PCI_INIT+177>:        mov    (%rdx),%rax
0xffffffffc0c91354 <NVMEV_PCI_INIT+180>:        and    $0x78000000,%eax
0xffffffffc0c91359 <NVMEV_PCI_INIT+185>:        or     %rcx,%rax
/home/atr/src/nvmevirt/pci.c: 544
0xffffffffc0c9135c <NVMEV_PCI_INIT+188>:        movabs $0x100085a100020010,%rcx
/home/atr/src/nvmevirt/pci.c: 529
0xffffffffc0c91366 <NVMEV_PCI_INIT+198>:        mov    %rax,(%rdx)
/home/atr/src/nvmevirt/pci.c: 544
0xffffffffc0c91369 <NVMEV_PCI_INIT+201>:        movabs $0xe0037000c1000000,%rax
/home/atr/src/nvmevirt/pci.c: 539
0xffffffffc0c91373 <NVMEV_PCI_INIT+211>:        movl   $0x8000,0x8(%rdx)
/home/atr/src/nvmevirt/pci.c: 620
0xffffffffc0c9137a <NVMEV_PCI_INIT+218>:        mov    0x28(%rdi),%rdx
/home/atr/src/nvmevirt/pci.c: 559
0xffffffffc0c9137e <NVMEV_PCI_INIT+222>:        and    (%rdx),%rax
0xffffffffc0c91381 <NVMEV_PCI_INIT+225>:        or     %rcx,%rax
0xffffffffc0c91384 <NVMEV_PCI_INIT+228>:        mov    %rax,(%rdx)
/home/atr/src/nvmevirt/pci.c: 621
0xffffffffc0c91387 <NVMEV_PCI_INIT+231>:        mov    0x30(%rdi),%rax
/home/atr/src/nvmevirt/pci.c: 570
0xffffffffc0c9138b <NVMEV_PCI_INIT+235>:        movl   $0x15010001,(%rax)
/home/atr/src/nvmevirt/pci.c: 575
0xffffffffc0c91391 <NVMEV_PCI_INIT+241>:        movl   $0x18010002,0x50(%rax)
/home/atr/src/nvmevirt/pci.c: 580
0xffffffffc0c91398 <NVMEV_PCI_INIT+248>:        movl   $0x19010004,0x80(%rax)
/home/atr/src/nvmevirt/pci.c: 585
0xffffffffc0c913a2 <NVMEV_PCI_INIT+258>:        movl   $0x2701000e,0x90(%rax)
/home/atr/src/nvmevirt/pci.c: 590
0xffffffffc0c913ac <NVMEV_PCI_INIT+268>:        movl   $0x2a010003,0x170(%rax)
/home/atr/src/nvmevirt/pci.c: 595
0xffffffffc0c913b6 <NVMEV_PCI_INIT+278>:        movl   $0x10019,0x1a0(%rax)
/home/atr/src/nvmevirt/pci.c: 626
0xffffffffc0c913c0 <NVMEV_PCI_INIT+288>:        mov    -0x628bf(%rip),%rax        # 0xffffffffc0c2eb08 <nvmev_vdev>
0xffffffffc0c913c7 <NVMEV_PCI_INIT+295>:        movb   $0x0,0x130(%rdi)
/usr/src/linux-headers-6.9.0-atr-2024-07-05/./include/linux/topology.h: 96
0xffffffffc0c913ce <NVMEV_PCI_INIT+302>:        movslq 0x60(%rax),%r12
0xffffffffc0c913d2 <NVMEV_PCI_INIT+306>:        cmp    $0x2000,%r12
0xffffffffc0c913d9 <NVMEV_PCI_INIT+313>:        jae    0xffffffffc0c91672 <NVMEV_PCI_INIT+978>
0xffffffffc0c913df <NVMEV_PCI_INIT+319>:        mov    -0x613c72e0(,%r12,8),%rax
/home/atr/src/nvmevirt/pci.c: 395
0xffffffffc0c913e7 <NVMEV_PCI_INIT+327>:        mov    $0xffffffffc0c2d880,%rdx
0xffffffffc0c913ee <NVMEV_PCI_INIT+334>:        mov    $0x10,%edi
0xffffffffc0c913f3 <NVMEV_PCI_INIT+339>:        mov    $0xffffffffc0c2d8c0,%rsi
/usr/src/linux-headers-6.9.0-atr-2024-07-05/./include/linux/topology.h: 96
0xffffffffc0c913fa <NVMEV_PCI_INIT+346>:        mov    (%rax,%rbp,1),%eax
crash>

not entirely clean, why topology.h: 96 is offending. The crash utility shows the key reason being:

PANIC: "Oops: 0000 [#1] PREEMPT SMP NOPTI" (check log for details)

Step-3: I am trying to print local variables values but does not work, so leaving it for now. issue-1

ok, here are some more examples, and modes of navigating the address: gdb list FUNC+OFF (shows directly the faulting location). It is challenging with the function inlining.

crash> gdb list *NVMEV_PCI_INIT+346
0xffffffffc0c913fa is in NVMEV_PCI_INIT (./include/linux/topology.h:96).
91      #endif
92      
93      #ifndef cpu_to_node
94      static inline int cpu_to_node(int cpu)
95      {
96              return per_cpu(numa_node, cpu);
97      }
98      #endif
99      
100     #ifndef set_numa_node
crash>

help bt has lots of help

# Display the stack trace of the active task(s) when the kernel panicked:
crash> bt -a

# Display the stack trace of the active task(s) when the kernel panicked,
  and filter out the stack of the idle tasks:
crash> bt -a -n idle

# Display the stack trace of the active task on CPU 0 and 1:
crash> bt -c 0,1

# Display the stack traces of task f2814000 and PID 1592:
crash> bt f2814000 1592

# Dump the text symbols found in the current context's stack:
crash> bt -t

# Search the current stack for possible exception frames:
crash> bt -e

#crash using using -f, -F, and -FF
crash> bt -f | -F | -FF 

# Check the kernel stack of all tasks for evidence of a stack overflow:
crash> bt -v

See the dmesg log for this crash, that also contain useful details:

crash> log 
[...]
[   20.989762] NVMeVirt: FTL physical space: 4293918720, logical space: 4013008149 (physical/logical * 100 = 107)
[   20.989763] NVMeVirt: ns 0/1: size 3827 MiB
[   20.989764] ------------[ cut here ]------------
[   20.989767] UBSAN: array-index-out-of-bounds in ./include/linux/topology.h:96:9
[   20.989782] index -1 is out of range for type 'long unsigned int [8192]'
[   20.989790] CPU: 2 PID: 1164 Comm: insmod Kdump: loaded Tainted: G           OE      6.9.0-atr-2024-07-05 #13
[   20.989794] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[   20.989795] Call Trace:
[   20.989800]  <TASK>
[   20.989801]  dump_stack_lvl+0x5d/0x80
[   20.989819]  ubsan_epilogue+0x5/0x30
[   20.989832]  __ubsan_handle_out_of_bounds.cold+0x46/0x4b
[   20.989834]  NVMEV_PCI_INIT+0x3e1/0x3f0 [nvmev]
[   20.989842]  NVMeV_init+0x4c0/0x547 [nvmev]
[   20.989846]  ? __pfx_NVMeV_init+0x10/0x10 [nvmev]
[...]

Setup Ubuntu kernel crash dump utility

Follow the installation setup https://ubuntu.com/server/docs/kernel-crash-dump
Man page: https://man7.org/linux/man-pages/man8/crash.8.html
Github: https://github.com/crash-utility/ and https://github.com/crash-utility/crash

post installation configuration command options:

sudo dpkg-reconfigure kdump-tools 
sudo dpkg-reconfigure kdump-tools
# check status 
kdump-config show (or status)

Where is the dump file after a crash: Once completed, the system will reboot to its normal operational mode. You will then find the kernel crash dump file, and related subdirectories, in the /var/crash directory by running, e.g. ls /var/crash, which produces the following:

atr@u24clean:~$ ll /var/crash/
total 48K
drwxrwxrwt  4 root root 4.0K Jul 16 11:51 ./
drwxr-xr-x 13 root root 4.0K Jul  4 10:51 ../
drwxr-xr-x  2 root root 4.0K Jul 16 11:41 202407161141/
drwxr-xr-x  2 root root 4.0K Jul 16 11:51 202407161151/
-rw-r--r--  1 root root    0 Jul 16 11:51 kdump_lock
-rw-r--r--  1 root root  283 Jul 17 13:44 kexec_cmd
-rw-r--r--  1 root root  25K Jul 16 11:41 linux-image-6.9.0-atr-2024-07-05-202407161141.crash
atr@u24clean:~$

How to use the crash utility and other tabs

old tutorial: https://www.dedoimedo.com/computers/crash-analyze.html
kdump: https://www.kernel.org/doc/Documentation/kdump/kdump.txt
Nick's page: https://github.com/nicktehrany/notes/wiki/Kernel-Hacking
https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/kernel_administration_guide/kernel_crash_dump_guide#sect-crash-running-the-utility
Setup kdump on Ubuntu 22.04 (Excellent, July 2023): https://hackmd.io/@0xff07/S1ASmzgun
A CRASH COURSE ON DEBUGGING KERNEL CRASHES USING THE CRASH UTILITY https://walac.github.io/kernel-crashes/
https://walac.github.io/kernel-tracing/
Debugging the Linux kernel using the GDB, https://wiki.st.com/stm32mpu/wiki/Debugging_the_Linux_kernel_using_the_GDB
https://github.com/crash-utility/crash/issues/47
crash> mod -s nvmev /home/atr/src/nvmevirt/nvmev.ko https://stackoverflow.com/questions/32069887/not-able-to-load-my-module-symbols-in-crash-utility

Command with what I am trying to locate out-of-tree build of a kernel module

sudo crash --src /usr/src/linux-6.9.0-atr-2024-07-05/ --src /home/atr/src/nvmevirt/ /lib/debug/boot/vmlinux-6.9.0-atr-2024-07-05 /var/crash/202407161151/dump.202407161151

So this one finds the kernel symbols, but out of tree build sources.

July 18th, 2024

If I pass an invalid path name then crash complains that the path is invalid, hence at least it is registering it:

crash: invalid --src argument: /home/atr/src/nvmevirtxxx/

reading the man page, crash can be extended with extension modules: https://crash-utility.github.io/extensions.html (-x dir)

ok, passing --mod /home/atr/src/nvmevirt/ --mod /lib/modules/uname -r/ still does not help,l so for now we need to do manually.

crash> mod -s nvmev /home/atr/src/nvmevirt/nvmev.ko 
     MODULE       NAME                              TEXT_BASE         SIZE  OBJECT FILE
ffffffffc0c2e600  nvmev                          ffffffffc0c90000    90112  /home/atr/src/nvmevirt/nvmev.ko 
crash> mod -S 
     MODULE       NAME                              TEXT_BASE         SIZE  OBJECT FILE
ffffffffc05cfec0  floppy                         ffffffffc05bc000   159744  /lib/modules/6.9.0-atr-2024-07-05/kernel/drivers/block/floppy.ko 
[...] 
05/kernel/drivers/platform/x86/intel/pmc/intel_pmc_core.ko 
ffffffffc0c2e600  nvmev                          ffffffffc0c90000    90112  /home/atr/src/nvmevirt/nvmev.ko 
ffffffffc0c9b1c0  intel_uncore_frequency_common  ffffffffc0c99000    16384  /lib/modules/6.9.0-atr-2024-07-05/kernel/drivers/platform/x86/intel/uncore-frequency/intel-uncore-frequency-common.ko 
crash>

Still the problem is how to attach it to a source code? OK, found the issue (unsure why):

Instead of loading the out-of-tree kernel module specifically (that just loads the symbols), load the source directory. So instead of doing mod -s nvmev /home/atr/src/nvmevirt/nvmev.ko, do mod -S /home/atr/src/nvmevirt/ and then call mod -S (without path to load the rest of the kernel symbols from the standard path location). So, here is a successful sequencing:

sudo crash --src /usr/src/linux-6.9.0-atr-2024-07-05/ \
--src /home/atr/src/nvmevirt/ \
--mod /home/atr/src/nvmevirt/ \
--mod /lib/modules/`uname -r`/ \
/lib/debug/boot/vmlinux-6.9.0-atr-2024-07-05 \
/var/crash/202407161151/dump.202407161151

crash 8.0.4
Copyright (C) 2002-2022  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011, 2020-2022  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
Copyright (C) 2015, 2021  VMware, Inc.
[...]
For help, type "help".
Type "apropos word" to search for commands related to "word"...

      KERNEL: /lib/debug/boot/vmlinux-6.9.0-atr-2024-07-05  [TAINTED]
    DUMPFILE: /var/crash/202407161151/dump.202407161151  [PARTIAL DUMP]
        CPUS: 8
        DATE: Tue Jul 16 11:50:57 UTC 2024
      UPTIME: 00:00:20
LOAD AVERAGE: 0.12, 0.03, 0.01
       TASKS: 205
    NODENAME: u24clean
     RELEASE: 6.9.0-atr-2024-07-05
     VERSION: #13 SMP PREEMPT_DYNAMIC Tue Jul  9 11:37:22 CEST 2024
     MACHINE: x86_64  (2995 Mhz)
      MEMORY: 4 GB
       PANIC: "Oops: 0000 [#1] PREEMPT SMP NOPTI" (check log for details)
         PID: 1164
     COMMAND: "insmod"
        TASK: ffffa01f82b02f40  [THREAD_INFO: ffffa01f82b02f40]
         CPU: 2
       STATE: TASK_RUNNING (PANIC)

crash> mod -S /home/atr/src/nvmevirt/
mod: cannot find or load object file for floppy module
[...]
mod: cannot find or load object file for intel_pmc_core module
?? Section *UND* not found for symbol __this_module
     MODULE       NAME                              TEXT_BASE         SIZE  OBJECT FILE
ffffffffc0c2e600  nvmev                          ffffffffc0c90000    90112  /home/atr/src/nvmevirt/nvmev.o 
mod: cannot find or load object file for intel_uncore_frequency_common module
crash> mod -S 
     MODULE       NAME                              TEXT_BASE         SIZE  OBJECT FILE
ffffffffc05cfec0  floppy                         ffffffffc05bc000   159744  /lib/modules/6.9.0-atr-2024-07-05/kernel/drivers/block/floppy.ko 
[...]
ffffffffc0c74180  intel_pmc_core                 ffffffffc0c6d000   126976  /lib/modules/6.9.0-atr-2024-07-05/kernel/drivers/platform/x86/intel/pmc/intel_pmc_core.ko 
ffffffffc0c2e600  nvmev                          ffffffffc0c90000    90112  /home/atr/src/nvmevirt/nvmev.o 
ffffffffc0c9b1c0  intel_uncore_frequency_common  ffffffffc0c99000    16384  /lib/modules/6.9.0-atr-2024-07-05/kernel/drivers/platform/x86/intel/uncore-frequency/intel-uncore-frequency-common.ko 
crash> bt -s 
PID: 1164     TASK: ffffa01f82b02f40  CPU: 2    COMMAND: "insmod"
 #0 [ffffbe4e40e7f6f0] machine_kexec+464 at ffffffff9d095df0
 #1 [ffffbe4e40e7f748] __crash_kexec+127 at ffffffff9d2335df
 #2 [ffffbe4e40e7f808] crash_kexec+36 at ffffffff9d233b84
 #3 [ffffbe4e40e7f810] oops_end+164 at ffffffff9d043d44
 #4 [ffffbe4e40e7f830] page_fault_oops.cold+624 at ffffffff9e0d1913
 #5 [ffffbe4e40e7f8b8] exc_page_fault+126 at ffffffff9e18d20e
 #6 [ffffbe4e40e7f8e0] asm_exc_page_fault+38 at ffffffff9e2012a6
    [exception RIP: NVMEV_PCI_INIT+346]
    RIP: ffffffffc0c913fa  RSP: ffffbe4e40e7f990  RFLAGS: 00010282
    RAX: 0000000000000000  RBX: ffffa01f84064000  RCX: 00000000ffffffff
    RDX: ffffffffc0c2d880  RSI: ffffffffc0c2d8c0  RDI: 0000000000000010
    RBP: 0000000000032040   R8: 0000000000000000   R9: ffffbe4e40e7f8e8
    R10: ffffffff9eaffe10  R11: 0000000000000000  R12: ffffffffffffffff
    R13: ffffa01f84064000  R14: ffffa01f90ae5740  R15: ffffa01f85e621c0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffbe4e40e7f9c0] init_module+1216 at ffffffffc0c97310 [nvmev]
 #8 [ffffbe4e40e7f9f0] do_one_initcall+88 at ffffffff9d002a88
 #9 [ffffbe4e40e7fa60] do_init_module+144 at ffffffff9d1fb7a0
#10 [ffffbe4e40e7fa80] init_module_from_file+134 at ffffffff9d1fe626
#11 [ffffbe4e40e7fb30] idempotent_init_module+289 at ffffffff9d1fe791
#12 [ffffbe4e40e7fbb8] __x64_sys_finit_module+94 at ffffffff9d1fea1e
#13 [ffffbe4e40e7fbe8] do_syscall_64+130 at ffffffff9e1864b2
#14 [ffffbe4e40e7ff50] entry_SYSCALL_64_after_hwframe+118 at ffffffff9e20012f
    RIP: 00007f987af2725d  RSP: 00007ffda87abb18  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000561c82191760  RCX: 00007f987af2725d
    RDX: 0000000000000000  RSI: 0000561c821912a0  RDI: 0000000000000003
    RBP: 00007ffda87abbd0   R8: 0000000000000040   R9: 0000000000000000
    R10: 00007f987b003b20  R11: 0000000000000246  R12: 0000561c821912a0
    R13: 0000000000000000  R14: 0000561c82191730  R15: 0000561c821912a0
    ORIG_RAX: 0000000000000139  CS: 0033  SS: 002b
crash> dis -l init_module 
/home/atr/src/nvmevirt/main.c: 604
0xffffffffc0c96e50 <NVMeV_init>:        endbr64 
[...]
0xffffffffc0c96f21 <init_module+209>:   je     0xffffffffc0c96f3e <init_module+238>
/home/atr/src/nvmevirt/main.c: 218
crash> dis -s init_module
FILE: /home/atr/src/nvmevirt/main.c
LINE: 604

  599           NVMEV_INFO("Version %x.%x for >> %s <<\n",
  600                           (NVMEV_VERSION & 0xff00) >> 8, (NVMEV_VERSION & 0x00ff), type);
  601   }
[...]
  644           VDEV_FINALIZE(nvmev_vdev);
  645           return -EIO;
  646   }
crash>

Check `journalctl` logs

# will list boots 
journalctl --list-boots 
# will show the last point of the logs 
journalctl -e 
# follow the log at the bottom  
journalctl -e -f
# select a priority-level between 0/"emerg" and 7/"debug"
journalctl -p ###