NVIDIA GPU Code Generation Guide - sdake/dotfiles GitHub Wiki

NVIDIA GPU Code Generation Guide

Overview

GPU code comes in two forms:

  • SASS: Machine code specific to GPU architecture.
  • PTX: Intermediate code JIT-compiled to SASS at runtime.

These are packaged in containers (CUBIN/FATBIN) during compilation.

Build System Syntax

Usage Contents Syntax Example
PyTorch SASS only Space separated with -real suffix 80-real 86-real 87-real
PyTorch SASS+PTX Space separated 80 86 87
PyTorch PTX only Space separated with +PTX suffix 80+PTX 86+PTX 87+PTX
CMake SASS only Semicolon with -real suffix 80-real;86-real;87-real
CMake SASS+PTX Semicolon separated 80;86;87
CMake PTX only Semicolon with -virtual suffix 80-virtual;86-virtual;87-virtual
NVCC SASS only sm_ prefix sm_80 sm_86 sm_87
NVCC SASS+PTX gencode specification arch=compute_80,code=sm_80
NVCC PTX only compute_ prefix compute_80 compute_86 compute_87

Environment Variables

PyTorch style:

  • SASS only: TORCH_CUDA_ARCH_LIST="80-real 86-real 87-real"
  • SASS+PTX: TORCH_CUDA_ARCH_LIST="80 86 87"
  • PTX only: TORCH_CUDA_ARCH_LIST="80+PTX 86+PTX 87+PTX"

CMake style:

  • SASS only: CMAKE_CUDA_ARCHITECTURES="80-real;86-real;87-real"
  • SASS+PTX: CMAKE_CUDA_ARCHITECTURES="80;86;87"
  • PTX only: CMAKE_CUDA_ARCHITECTURES="80-virtual;86-virtual;87-virtual"

NVCC flags:

  • SASS only: CUDA_NVCC_FLAGS="-arch=sm_80 -arch=sm_86 -arch=sm_87"
  • SASS+PTX: CUDA_NVCC_FLAGS="-gencode arch=compute_80,code=sm_80 -gencodearch=compute_86,code=sm_86 -gencode arch=compute_87,code=sm_87"`
  • PTX only: CUDA_NVCC_FLAGS="-arch=compute_80 -arch=compute_86 -arch=compute_87"