Install CUDA and graphics drivers - AD-EYE/AD-EYE_Core GitHub Wiki

Installation of NVIDIA Driver, CUDA and cuDNN

Disable Nouveau (default graphic driver on Ubuntu)

  1. Open /etc/modprobe.d/blacklist-nouveau.conf and add the following lines:

    blacklist nouveau
    options nouveau modeset=0

    Save it (sudo privilege may be required).

  2. Run sudo update-initramfs -u and reboot system.

Install NVIDIA GPU Driver

Download the latest NVIDIA GPU driver (.run file) from http://www.nvidia.com/Download/index.aspx

  1. Set the default run level on your system such that it will boot to a VGA console, and not directly to X. Doing so will make it easier to recover if there is a problem during the installation process. On Ubuntu:
    Before installation:

    sudo systemctl enable multi-user.target
    sudo systemctl set-default multi-user.target
    

    After installation has succeeded:

    sudo systemctl enable graphical.target
    sudo systemctl set-default graphical.target
    
  2. If graphical login-screen appears, press [Alt] + [Ctrl] + [F1] and login in the tty.

  3. Run sudo service lightdm stop to kill the X server temporarily.

  4. Remove all nvidia packages: sudo apt-get remove --purge nvidia*.

  5. Run sudo sh NVIDIA-*.run or $sudo sh NVIDIA-*.run --no-opengl-files (on laptops that have both integrated graphic card and NVIDIA-GPU).

    NOTE: DO NOT run the NVIDIA configuration for X windowing system at the end of the installation of the GPU driver on laptop, since the integrated graphic card will be used to display the desktop. The NVIDIA card will run whenever needed automatically.

  6. Reboot system

Install CUDA

Download a CUDA installer (.run file) from https://developer.nvidia.com/cuda-toolkit-archive with a version <= 10.0 to avoid issues with autoware.

  1. Run sudo sh cuda_***.run Do not install GPU driver contained in CUDA installer since you have already installed the latest one in the previous section.

  2. Once the installation completes, export PATH and LD_LIBRARY_PATH according to the installer message. EX. open ~/.bashrc and add these two lines:

    export PATH=/usr/local/cuda/bin:$PATH
    export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Some additional packages may be required in order to compile the CUDA samples: sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libgl1-mesa-dev

If you receive a compiler error such as /usr/bin/ld: cannot find -lGL, this command may resolve it:

sudo ln -s /usr/lib/libGL.so.1 /usr/lib/libGL.so

Source: https://github.com/autowarefoundation/autoware/wiki/NVIDIA-Driver

https://web.archive.org/web/20190620095443/https://github.com/autowarefoundation/autoware/wiki/NVIDIA-Driver

Install cuDNN

CuDNN could be useful if you have to do heavy GPU computation (like Deep Learning stuff). Everything's really well explained in the source link and no bugs were encountered during the installation and verification process.

Source: https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html

https://web.archive.org/save/https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html

Problems related to Cuda and graphics drivers

Login Loop

After updating the Ubuntu Kernel due to a security patch, the computer might get to a login loop. In that case, it is not possible to login into the user account.

The fix to this problem is to reinstall the NVIDIA Drivers using the virtual console (Ctrl)+(Alt)+(F1).

Computer stuck on boot screen

If the computer does not start after but hangs with an underscore in the top left corner after installing the graphics driver or Cuda it is likely the Kernel version of the graphics driver is not the same as the Client one, the nvidia package installed (nvidia-version_number, can be found using apt list --installed | grep nvidia).

To solve this issue restart the computer and choose the advanced Ubuntu options on the Grub menu and select the first recovery mode option. In the recovery menu select root. Having now access to the command line, remove all nvidia packages using:

apt-get remove --purge nvidia*.

Rebooting should now work. Type reboot. No graphics driver are installed anymore so install the driver corresponding to the Kernel version (see next part to find the Kernel version).

Finding nvidia versions after solving the unable to boot issue

The issue causing the computer to be stuck at boot is generally caused by a mismatch between the Client version (the one in the apt-get packages) and the Kernel one.

To investigate the problem, run the script nvidia-bug-report.sh (more information here, path: /usr/bin/nvidia-bug-report.sh ). It will generate a log archive in the working directory. In this log file, look for (ctrl + F) "API mismatch". This line will indicate both the Client and the Kernel versions when issues occured (look for the last occurence of issues). The client version has been uninstalled during the previous step so it has to be reinstalled with a version matching the Kernel one $ sudo apt-get install nvidia-version_number.

source: https://forums.developer.nvidia.com/t/cuda-9-1-on-ubuntu-16-04-installed-but-devicequery-fails/66945

Cuda is not working when trying to run the Caffe tests or to compile autoware

To investigate if Cuda is installed and working properly follow the next steps.

Check the graphics driver version driver:

$ cat /proc/driver/nvidia/version
     NVRM version: NVIDIA UNIX x86_64 Kernel Module  430.26  Tue Jun  4 17:40:52 CDT 2019
     GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12) 

Check the CUDA Toolkit version:

$ nvcc -V
     nvcc: NVIDIA (R) Cuda compiler driver
     Copyright (c) 2005-2018 NVIDIA Corporation
     Built on Sat_Aug_25_21:08:01_CDT_2018
     Cuda compilation tools, release 10.0, V10.0.130

Verify the ability to compile cuda samples:

cd ~/
apt-get install cuda-samples-10-0 -y #if not installed
cd /usr/local/cuda-10.0/samples
make

Run CUDA GPU jobs by executing the deviceQuery program:

Click to see the command and the expected result
$ '/usr/local/cuda-10.0/samples/bin/x86_64/linux/release/deviceQuery' 
     /usr/local/cuda-10.0/samples/bin/x86_64/linux/release/deviceQuery Starting...

     CUDA Device Query (Runtime API) version (CUDART static linking)

     Detected 1 CUDA Capable device(s)

     Device 0: "GeForce RTX 2080 Ti"
       CUDA Driver Version / Runtime Version          10.2 / 10.0
       CUDA Capability Major/Minor version number:    7.5
       Total amount of global memory:                 11016 MBytes (11551440896 bytes)
       (68) Multiprocessors, ( 64) CUDA Cores/MP:     4352 CUDA Cores
       GPU Max Clock rate:                            1545 MHz (1.54 GHz)
       Memory Clock rate:                             7000 Mhz
       Memory Bus Width:                              352-bit
       L2 Cache Size:                                 5767168 bytes
       Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
       Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
       Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
       Total amount of constant memory:               65536 bytes
       Total amount of shared memory per block:       49152 bytes
       Total number of registers available per block: 65536
       Warp size:                                     32
       Maximum number of threads per multiprocessor:  1024
       Maximum number of threads per block:           1024
       Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
       Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
       Maximum memory pitch:                          2147483647 bytes
       Texture alignment:                             512 bytes
       Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
       Run time limit on kernels:                     Yes
       Integrated GPU sharing Host Memory:            No
       Support host page-locked memory mapping:       Yes
       Alignment requirement for Surfaces:            Yes
       Device has ECC support:                        Disabled
       Device supports Unified Addressing (UVA):      Yes
       Device supports Compute Preemption:            Yes
       Supports Cooperative Kernel Launch:            Yes
       Supports MultiDevice Co-op Kernel Launch:      Yes
       Device PCI Domain ID / Bus ID / location ID:   0 / 66 / 0
       Compute Mode:
          < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

     deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.0, NumDevs = 1
     Result = PASS

If either the compilation or the deviceQuery program fail then there is an issue with the Cuda installation or with the graphics driver.

source: https://xcat-docs.readthedocs.io/en/stable/advanced/gpu/nvidia/verify_cuda_install.html


Next step: Install Caffe
Back to the overview: Installation
⚠️ **GitHub.com Fallback** ⚠️