Useful nvidia smi queries - lmmx/devnotes GitHub Wiki

Memory usage

To get the total size of all jobs on the GPU:

nvidia-smi --query-compute-apps=used_memory --format=csv,noheader | cut -d " " -f 1

(Then paste into a Python variable x, sum as sum(map(int, x.split())), and convert MB to GB) If you want a rough idea you can use numfmt (something like --from=iec --to=si --suffix=B)

GPU_CORE_SIZES=$(nvidia-smi --query-compute-apps=used_memory --format=csv,noheader | cut -d " " -f 1)
python -c 'print(sum(map(int, """'"$GPU_CORE_SIZES"'""".split())))'

or to convert to GB

GPU_CORE_SIZES=$(nvidia-smi --query-compute-apps=used_memory --format=csv,noheader | cut -d " " -f 1)
python -c 'print(sum(map(int, """'"$GPU_CORE_SIZES"'""".split())) / (2**10), end="GB\n")'

As a bashrc function:

function gpu_memory_usage {
	GPU_CORE_SIZES=$(nvidia-smi --query-compute-apps=used_memory --format=csv,noheader | cut -d " " -f 1)
	python -c 'print(sum(map(int, """'"$GPU_CORE_SIZES"'""".split())) / (2**10), end="GB\n")'
}

which you can then just loop over while your jobs run to get a sense of the distribution over time

for x in {1..10000}; do clear; gpu_memory_usage; sleep 1; done

See also nvtop (Installing nvtop)