Useful nvidia smi queries - lmmx/devnotes GitHub Wiki
Memory usage
To get the total size of all jobs on the GPU:
nvidia-smi --query-compute-apps=used_memory --format=csv,noheader | cut -d " " -f 1
(Then paste into a Python variable x
, sum as sum(map(int, x.split()))
, and convert MB to GB)
If you want a rough idea you can use numfmt
(something like --from=iec --to=si --suffix=B
)
GPU_CORE_SIZES=$(nvidia-smi --query-compute-apps=used_memory --format=csv,noheader | cut -d " " -f 1)
python -c 'print(sum(map(int, """'"$GPU_CORE_SIZES"'""".split())))'
or to convert to GB
GPU_CORE_SIZES=$(nvidia-smi --query-compute-apps=used_memory --format=csv,noheader | cut -d " " -f 1)
python -c 'print(sum(map(int, """'"$GPU_CORE_SIZES"'""".split())) / (2**10), end="GB\n")'
As a bashrc function:
function gpu_memory_usage {
GPU_CORE_SIZES=$(nvidia-smi --query-compute-apps=used_memory --format=csv,noheader | cut -d " " -f 1)
python -c 'print(sum(map(int, """'"$GPU_CORE_SIZES"'""".split())) / (2**10), end="GB\n")'
}
which you can then just loop over while your jobs run to get a sense of the distribution over time
for x in {1..10000}; do clear; gpu_memory_usage; sleep 1; done
- See also nvtop (Installing nvtop)