Introduction to Profiling using nvvp - eecse4750/e4750_2024Fall_students_repo GitHub Wiki

Introduction to CUDA Profiling

Relevant Links:

  1. https://devblogs.nvidia.com/parallelforall/cuda-pro-tip-nvprof-your-handy-universal-gpu-profiler/
  2. https://github.com/UoB-HPC/UoB-HPC.github.io/blob/master/_posts/2015-05-27-nvvp-import-opencl.md
  3. https://devblogs.nvidia.com/parallelforall/cuda-pro-tip-improve-nvvp-loading-large-profiles/

What is profiling?

When you are in the business of making efficient, fast code, you need tools that help you iron out inefficiencies and bottlenecks that might be slowing down your programs.

Code profiling will give you answers to questions like:

  • How long does the execution of a portion of my code take?
  • How much memory is this process occupying at runtime?
  • Is there some unseen overhead causing the code to be slower than expected?
  • Core usage, either on CPU or GPU

Tools that allow you to collate and interpret such details are called code profilers.

CUDA Profiling allows you to find answers to such questions and is therefore an indispensible tool. Nvidia provides this as part of the CUDA toolkit. Until CUDA v.10, this tool is named nvprof. The application that allows you to view the output from nvprof is called the Nvidia Visual Profiler (NVVP). Starting with CUDA v.11, the two have been bundled together under the name nsight.

Profiling with nvprof

Run a command in the same vein as example below

(cudaEnv)$ nvprof python my_pycuda_code.py > ./profiling_output.txt

Your output will be piped into the output text file.

Make sure to add the following line at the top of your python file.

#!/usr/bin/env python

"""
.
.
.
Python Code
.
.
.
"""

Also, make sure your code is executable by running:

$ chmod u+x my_pycuda_code.py

You can specify an output file for nvprof:

(cudaEnv)$ nvprof -o my_pycuda_code-analysis.nvprof ./my_pycuda_code.py"

Viewing the profiling result with NVVP

Ordinarily, one can view the profiling output in the Nvidia Visual Profiler by running:

$ nvvp my_pycuda_code_profile.nvprof

However, since we are using port forwarding to view the 'display' environment remotely on the VM (See GUI Setup), this command doesn't work out of the box.

nvvp-error

nvvp-error-msg

One needs to specify the path to the OpenJDK installation on the VM machine. This is done by modifying the command to

$ nvvp -vm /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java my_profile.nvprof

As you can see, this is wordy and impossible to type out every single time. So to make things easier, we will create a simple bash function that will envelope this command.

viewprofile() {
    nvvp -vm /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java "$1"
}

Paste this function to the end of your ~/.bashrc by opening the text editor in the terminal:

$ nano ~/.bashrc

And then run

$ source ~/.bashrc

Now, you can use this new viewprofile command to open .nvprof logs:

$ viewprofile my_profile.nvprof

Example

Let's look at the PyCUDA code from assignment-1 and try to profile it.

Connect to your VM through VNC Viewer as explained in Step-4 of the GUI Installation Tutorial.

You can open a terminal window by right-clicking the desktop background.

First, create the profiling output using nvprof:

$ nvprof -o hw1_prof_output.nvprof python hw1.py

The program will run and then save the output to the current directory. You can check if it's there using ls.

Next, its time to view the profiling output using NVVP.

$ viewprofile hw1_prof_output.nvprof

Additional Reference:

  • For nvprof

  • Nvidia Visual Profiler NVVP (from 26:15)