Install CUDA and graphics drivers - AD-EYE/AD-EYE_Core GitHub Wiki
-
Open
/etc/modprobe.d/blacklist-nouveau.conf
and add the following lines:blacklist nouveau options nouveau modeset=0
Save it (
sudo
privilege may be required). -
Run
sudo update-initramfs -u
and reboot system.
Download the latest NVIDIA GPU driver (.run file) from http://www.nvidia.com/Download/index.aspx
-
Set the default run level on your system such that it will boot to a VGA console, and not directly to X. Doing so will make it easier to recover if there is a problem during the installation process. On Ubuntu:
Before installation:sudo systemctl enable multi-user.target sudo systemctl set-default multi-user.target
After installation has succeeded:
sudo systemctl enable graphical.target sudo systemctl set-default graphical.target
-
If graphical login-screen appears, press
[Alt] + [Ctrl] + [F1]
and login in the tty. -
Run
sudo service lightdm stop
to kill the X server temporarily. -
Remove all nvidia packages:
sudo apt-get remove --purge nvidia*
. -
Run
sudo sh NVIDIA-*.run
or$sudo sh NVIDIA-*.run --no-opengl-files
(on laptops that have both integrated graphic card and NVIDIA-GPU).NOTE: DO NOT run the NVIDIA configuration for X windowing system at the end of the installation of the GPU driver on laptop, since the integrated graphic card will be used to display the desktop. The NVIDIA card will run whenever needed automatically.
-
Reboot system
Download a CUDA installer (.run file) from https://developer.nvidia.com/cuda-toolkit-archive with a version <= 10.0 to avoid issues with autoware.
-
Run
sudo sh cuda_***.run
Do not install GPU driver contained in CUDA installer since you have already installed the latest one in the previous section. -
Once the installation completes, export
PATH
andLD_LIBRARY_PATH
according to the installer message. EX. open~/.bashrc
and add these two lines:export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
Some additional packages may be required in order to compile the CUDA samples:
sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libgl1-mesa-dev
If you receive a compiler error such as /usr/bin/ld: cannot find -lGL
, this command may resolve it:
sudo ln -s /usr/lib/libGL.so.1 /usr/lib/libGL.so
Source: https://github.com/autowarefoundation/autoware/wiki/NVIDIA-Driver
CuDNN could be useful if you have to do heavy GPU computation (like Deep Learning stuff). Everything's really well explained in the source link and no bugs were encountered during the installation and verification process.
Source: https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html
https://web.archive.org/save/https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html
After updating the Ubuntu Kernel due to a security patch, the computer might get to a login loop. In that case, it is not possible to login into the user account.
The fix to this problem is to reinstall the NVIDIA Drivers using the virtual console (Ctrl)+(Alt)+(F1).
If the computer does not start after but hangs with an underscore in the top left corner after installing the graphics driver or Cuda it is likely the Kernel version of the graphics driver is not the same as the Client one, the nvidia package installed (nvidia-version_number, can be found using apt list --installed | grep nvidia
).
To solve this issue restart the computer and choose the advanced Ubuntu options on the Grub menu and select the first recovery mode option. In the recovery menu select root. Having now access to the command line, remove all nvidia packages using:
apt-get remove --purge nvidia*
.
Rebooting should now work. Type reboot
.
No graphics driver are installed anymore so install the driver corresponding to the Kernel version (see next part to find the Kernel version).
The issue causing the computer to be stuck at boot is generally caused by a mismatch between the Client version (the one in the apt-get packages) and the Kernel one.
To investigate the problem, run the script nvidia-bug-report.sh (more information here, path: /usr/bin/nvidia-bug-report.sh
). It will generate a log archive in the working directory. In this log file, look for (ctrl + F) "API mismatch". This line will indicate both the Client and the Kernel versions when issues occured (look for the last occurence of issues). The client version has been uninstalled during the previous step so it has to be reinstalled with a version matching the Kernel one $ sudo apt-get install nvidia-version_number
.
source: https://forums.developer.nvidia.com/t/cuda-9-1-on-ubuntu-16-04-installed-but-devicequery-fails/66945
To investigate if Cuda is installed and working properly follow the next steps.
Check the graphics driver version driver:
$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 430.26 Tue Jun 4 17:40:52 CDT 2019
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)
Check the CUDA Toolkit version:
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
Verify the ability to compile cuda samples:
cd ~/
apt-get install cuda-samples-10-0 -y #if not installed
cd /usr/local/cuda-10.0/samples
make
Run CUDA GPU jobs by executing the deviceQuery
program:
Click to see the command and the expected result
$ '/usr/local/cuda-10.0/samples/bin/x86_64/linux/release/deviceQuery'
/usr/local/cuda-10.0/samples/bin/x86_64/linux/release/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce RTX 2080 Ti"
CUDA Driver Version / Runtime Version 10.2 / 10.0
CUDA Capability Major/Minor version number: 7.5
Total amount of global memory: 11016 MBytes (11551440896 bytes)
(68) Multiprocessors, ( 64) CUDA Cores/MP: 4352 CUDA Cores
GPU Max Clock rate: 1545 MHz (1.54 GHz)
Memory Clock rate: 7000 Mhz
Memory Bus Width: 352-bit
L2 Cache Size: 5767168 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 3 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 66 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.0, NumDevs = 1
Result = PASS
If either the compilation or the deviceQuery
program fail then there is an issue with the Cuda installation or with the graphics driver.
source: https://xcat-docs.readthedocs.io/en/stable/advanced/gpu/nvidia/verify_cuda_install.html