command examples - animeshtrivedi/notes GitHub Wiki

GPU utilization

https://askubuntu.com/questions/387594/how-to-measure-gpu-usage

pip install gpustat
gpustat -h 
gpustat --id 0,1 -i 2 -cp

https://askubuntu.com/questions/387594/how-to-measure-gpu-usage

Conda

conda create --name ssd
conda activate ssd 

fio-commands

rwmixread=int Percentage of a mixed workload that should be reads. Default: 50.

rwmixwrite=int Percentage of a mixed workload that should be writes. If both rwmixread and rwmixwrite is given and the values do not add up to 100%, the latter of the two will be used to override the first. This may interfere with a given rate setting, if fio is asked to limit reads or writes to a certain rate. If that is the case, then the distribution may be skewed. Default: 50.

All with io_uring

BW examples

  • Time based on a block device

fio --name="bandwidth" --bs=1M --iodepth=16 --numjobs=4 --cpus_allowed=0-31 --time_based=1 --ramp_time=5s --runtime=30s --ioengine=io_uring --registerfiles=1 --fixedbufs=1 --ioscheduler=none --size=100% --norandommap=1 --group_reporting=1 --direct=1 --rw=write --allow_file_create=0 --filename=/dev/nvme4n1

  • size based (200%), 4 threads, full device 2x times.

fio --name="bandwidth" --bs=1M --iodepth=16 --numjobs=4 --cpus_allowed=0-31 --ioengine=io_uring --registerfiles=1 --fixedbufs=1 --ioscheduler=none --size=200% --norandommap=1 --group_reporting=1 --direct=1 --rw=write --allow_file_create=0 --filename=/dev/nvme4n1

  • parallel files sequential writes (128KB size) with all CPUs ($nproc, time based). Each thread will have its own private file in a given -directory. See also, fallocate=str
sudo fio --name="bandwidth" --bs=128K --iodepth=16 --numjobs=$(nproc) --cpus_allowed=0-$(($(nproc)-1)) \
--ioengine=io_uring --registerfiles=1 --fixedbufs=1 --ioscheduler=none --size=1g --norandommap=1 \
--group_reporting=1 --direct=1 --rw=write --allow_file_create=1 --directory=/home/atr/rocksdb-exp/ram/ \
--nrfiles=$(nproc) --time_based=1 --ramp_time=5s --runtime=30s --fallocate=posix 

per second logging?

fio --name="bandwidth" --bs=1M --iodepth=16 --numjobs=4 --cpus_allowed=0-31 --ioengine=io_uring --registerfiles=1 --fixedbufs=1 --ioscheduler=none --size=10g --norandommap=1 --group_reporting=1 --direct=1 --rw=read --allow_file_create=0 --filename=/dev/nvme4n1 --bwavgtime=2000 --log_avg_msec=2000 --bandwidth-log

IOPS examples

single thread (randread):

sudo fio --name="iops" --bs=4K --iodepth=512 --numjobs=1 --cpus_allowed=0 --ioengine=io_uring --registerfiles=1 --fixedbufs=1 --ioscheduler=none --size=100% --norandommap=1 --group_reporting=1 --direct=1 --rw=randread --allow_file_create=0 --time_based=1 --ramp_time=5s --runtime=30s --filename=/dev/nvme4n1

multiple devices: filename=/dev/nvme4n1:/dev/nvme4n2

Latency example

fio --name="latency" --bs=4K --iodepth=1 --numjobs=1 --cpus_allowed=0 --time_based=1 --ramp_time=5s --runtime=30s --ioengine=io_uring --registerfiles=1 --fixedbufs=1 --ioscheduler=none --size=100% --norandommap=1 --group_reporting=1 --direct=1 --rw=randread --allow_file_create=0 --filename=/dev/nvme10n1 

Rocksdb

db_bench --num=1000000 --compression_type=none --value_size=400 --key_size=20 --use_direct_io_for_flush_and_compaction --use_existing_db=true --use_direct_reads --max_bytes_for_level_multiplier=10 --max_background_jobs=48 --threads=48 --enable_pipelined_write=true --allow_concurrent_memtable_write=true --wal_size_limit_MB=0 --write_buffer_size=67108864 --max_write_buffer_number=48 --histogram --report_bg_io_stats=true --report_file=./readwhilewriting-per-second-file.csv --report_interval_seconds=1 --benchmarks=readwhilewriting --seed=42 --db=/home/atr/rocksdb-exp//ssd -wal_dir=/home/atr/rocksdb-exp//ram --read_cache_size=0 --blob_cache_size=0 --cache_size=0 --compressed_cache_size=0 --prepopulate_block_cache=0 --num_file_reads_for_auto_readahead=0 --statistics --file_opening_threads=48

Linux commands

Dynamic CPU affinity with taskset

atr: verified and worked

  • taskset -p pid* get the CPU mask for a process (PID)
  • taskset -cp <desired cpu(s) comma separated list> pid* - set a CPU affinity for a running process with PID

ref

Linux block device information

There are two folders:

  • /sys/block/nvme0n1/queue/ -- here you have global queue related parameters.
  • /sys/block/nvme0n1/mq/ -- here you have multi queue related information, which CPUs these queues are mapped to, tag information.

journalctl

ref

  • journalctl -b - since last boot
  • journalctl --list-boots - show all boots, and then use their offsets with journalctl -b -1
  • time based
    • journalctl --since "1 hour ago"
    • journalctl --since "2 days ago"
    • journalctl --since "2015-06-26 23:15:00" --until "2015-06-26 23:20:00"
  • journalctl -u nginx.service -u mysql.service - specific service
  • journalctl -f - follow
  • journalctl -n 50 --since "1 hour ago" - most recent #50 entries since last hour

Make CPU online/offline

for x in /sys/devices/system/cpu/cpu{1..11}/online; do echo 1 >"$x"; done

deleting files, directory with special characters

rm -r \~

ref

Drop Linux buffer cache

sudo sh -c "/usr/bin/echo 3 > /proc/sys/vm/drop_caches"

Current value of kernel module parameters

Check in /sys/module/[name]/parameters/

$cat /sys/module/nvme/parameters/sgl_threshold 
32768

Find file or directory name

find . -type d -name "*nvme*" 
       -type c
              File is of type c:
              b      block (buffered) special
              c      character (unbuffered) special
              d      directory
              p      named pipe (FIFO)
              f      regular file
              l      symbolic link; this is never true if the -L option or the -follow option is in effect, unless the symbolic link is broken.  If you want to search for symbolic links when -L is in effect, use -xtype.
              s      socket
              D      door (Solaris)

Mount tmpfs

sshfs atr@localhost:/home/atr/ ./vm-qemu7777/ -p 7777

Mount tmpfs

sudo mount -t tmpfs -o size=32G,noswap,uid=$USER,mpol=prefer:0,huge=never $USER ~/mnt/tmpfs/

https://man7.org/linux/man-pages/man5/tmpfs.5.html

image

Get the NUMA information

lstopo  

or for NVMe

cat /sys/block/nvme1n1/device/numa_node 
0

or generic

cat /sys/class/?/"dev_name"/device/numa_node

PCIe topology

lspci -t -v 

How Linux hardware details

https://www.baeldung.com/linux/list-network-cards

sudo lshw 
sudo lshw -C network
lshw -class disk -class storage

ebpf/bcc one-liners (references)

size histogram

~/src/bcc/tools/bitehist.py

On f20 it fails due to my disabling PYTHONPATH for conda:

    from bcc import BPF
ModuleNotFoundError: No module named 'bcc'

Solution: https://github.com/iovisor/bcc/blob/master/FAQ.txt

  • it picks up old path from the installed packages which I do not like but here we go
export PYTHONPATH=$(dirname `find /usr/lib -name bcc`):$PYTHONPATH

Then it works.

Get the CPU stack profiler

https://github.com/iovisor/bcc/blob/master/tools/profile_example.txt

Show the count for all active stack on the CPU (and filters)

$ sudo profile 

Stack counter on particular pattern or filter

https://github.com/iovisor/bcc/blob/master/tools/stackcount_example.txt

Monitoring

dstat

-c CPU, -d disk, -i interrupts, -m memory, -n network, -p process, -r io stats, -s swap, -y systems stats. Further --aio, --fs, --ipc, --lock,

dstat -pcmrd

image

vmstat

$ vmstat 1 1000 
procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu-------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st gu
 1  0 7370204 1611780   4576 13788080   62  128   215   443 2135   24 11  5 84  0  0  0
 0  0 7370204 1647792   4576 13788080   32    0    32   452 7935 14916  5  4 91  0  0  0

TODO