Introduction to Profiling using nvvp - eecse4750/e4750_2024Fall_students_repo GitHub Wiki
Introduction to CUDA Profiling
Relevant Links:
- https://devblogs.nvidia.com/parallelforall/cuda-pro-tip-nvprof-your-handy-universal-gpu-profiler/
- https://github.com/UoB-HPC/UoB-HPC.github.io/blob/master/_posts/2015-05-27-nvvp-import-opencl.md
- https://devblogs.nvidia.com/parallelforall/cuda-pro-tip-improve-nvvp-loading-large-profiles/
What is profiling?
When you are in the business of making efficient, fast code, you need tools that help you iron out inefficiencies and bottlenecks that might be slowing down your programs.
Code profiling will give you answers to questions like:
- How long does the execution of a portion of my code take?
- How much memory is this process occupying at runtime?
- Is there some unseen overhead causing the code to be slower than expected?
- Core usage, either on CPU or GPU
Tools that allow you to collate and interpret such details are called code profilers.
CUDA Profiling allows you to find answers to such questions and is therefore an indispensible tool. Nvidia provides this as part of the CUDA toolkit. Until CUDA v.10, this tool is named nvprof
. The application that allows you to view the output from nvprof
is called the Nvidia Visual Profiler (NVVP). Starting with CUDA v.11, the two have been bundled together under the name nsight
.
nvprof
Profiling with Run a command in the same vein as example below
(cudaEnv)$ nvprof python my_pycuda_code.py > ./profiling_output.txt
Your output will be piped into the output text file.
Make sure to add the following line at the top of your python file.
#!/usr/bin/env python
"""
.
.
.
Python Code
.
.
.
"""
Also, make sure your code is executable by running:
$ chmod u+x my_pycuda_code.py
You can specify an output file for nvprof:
(cudaEnv)$ nvprof -o my_pycuda_code-analysis.nvprof ./my_pycuda_code.py"
Viewing the profiling result with NVVP
Ordinarily, one can view the profiling output in the Nvidia Visual Profiler by running:
$ nvvp my_pycuda_code_profile.nvprof
However, since we are using port forwarding to view the 'display' environment remotely on the VM (See GUI Setup), this command doesn't work out of the box.
One needs to specify the path to the OpenJDK installation on the VM machine. This is done by modifying the command to
$ nvvp -vm /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java my_profile.nvprof
As you can see, this is wordy and impossible to type out every single time. So to make things easier, we will create a simple bash function that will envelope this command.
viewprofile() {
nvvp -vm /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java "$1"
}
Paste this function to the end of your ~/.bashrc
by opening the text editor in the terminal:
$ nano ~/.bashrc
And then run
$ source ~/.bashrc
Now, you can use this new viewprofile
command to open .nvprof
logs:
$ viewprofile my_profile.nvprof
Example
Let's look at the PyCUDA code from assignment-1 and try to profile it.
Connect to your VM through VNC Viewer as explained in Step-4 of the GUI Installation Tutorial.
You can open a terminal window by right-clicking the desktop background.
First, create the profiling output using nvprof
:
$ nvprof -o hw1_prof_output.nvprof python hw1.py
The program will run and then save the output to the current directory. You can check if it's there using ls
.
Next, its time to view the profiling output using NVVP.
$ viewprofile hw1_prof_output.nvprof