Installing CUDA - lmmx/devnotes GitHub Wiki
As the guide within Ubuntu's tutorial for GPU data processing inside LXD says, the best way to install CUDA is directly from NVIDIA's site here
Either use the local installer:
cd ~/Downloads
wget https://developer.download.nvidia.com/compute/cuda/11.2.0/local_installers/cuda_11.2.0_460.27.04_linux.run
sudo sh cuda_11.2.0_460.27.04_linux.run
or the network installer:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda
"Existing package manager installation of the driver found, it is strongly recommended that you uninstall that before continuing"
As Viacheslav
Kovalevskyi writes,
you should choose the .run
(local) script rather than the .deb
options:
I would strongly recommend use the installer script. First of all, it is agnostic to the version of the Linux that is used. Secondly, unlike some binary pre-build packages like deb file you can control where exactly CUDA library files will be installed.
You should not use a version of the NVIDIA driver with a version number lower than that in your
package installer name: e.g. cuda_11.2.0_460.27.04_linux.run
indicates CUDA 11.2 built with
NVIDIA driver 460.27 (see your nvidia-smi
header: for me 460.32.03 >= 460.27.04 so it'll work)
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
...
If you've already got the NVIDIA driver, you'll get an error
│ Existing package manager installation of the driver found. It is strongly │
│ recommended that you remove this before continuing. │
│ Abort │
│ Continue
If you're not sure what you do have installed for some reason, check with
apt list --installed | grep nvidia
(or if you're on Debian I think it's dpkg -l | grep nvidia
)
If you've already installed the driver from the package manager you'll see packages like
nvidia-driver-460
To ignore the warning message and install just the CUDA toolkit, run
sudo ./cuda_11.2.0_460.27.04_linux.run --toolkit
(This is usually done)
To instead get the list of all the installed nvidia packages as a list on one line run
apt list --installed | grep nvidia | cut -d "/" -f 1 | tr "\n" " "
e.g.
libnvidia-cfg1-460 libnvidia-common-460 libnvidia-compute-460 libnvidia-compute-460 libnvidia-decode-460 libnvidia-decode-460 libnvidia-encode-460 libnvidia-encode-460 libnvidia-extra-460 libnvidia-fbc1-460 libnvidia-fbc1-460 libnvidia-gl-460 libnvidia-gl-460 libnvidia-ifr1-460 libnvidia-ifr1-460 nvidia-compute-utils-460 nvidia-dkms-460 nvidia-driver-460 nvidia-kernel-common-460 nvidia-kernel-source-460 nvidia-prime-applet nvidia-prime nvidia-settings nvidia-utils-460 xserver-xorg-video-nvidia-460
The operating system urges you to install these upon installing (see
Linux Mint 20.1 NVIDIA graphics card driver setup), specifically nvidia-driver-460
warning that "Your system is currently running without video hardware acceleration."
In theory then it should suffice to uninstall that single package to take the rest away with it, after which we can then install CUDA from the run file.
To uninstall the Debian driver package run
sudo apt-get --purge remove nvidia-driver-460
- Oops: this was just
sudo apt purge nvidia-driver-460
, it should besudo apt purge --autoremove nvidia-driver-460
(see below)
This tells you that
The following packages were automatically installed and are no longer required:
libatomic1:i386 libbsd0:i386 libdrm-amdgpu1:i386 libdrm-intel1:i386
libdrm-nouveau2:i386 libdrm-radeon1:i386 libdrm2:i386 libedit2:i386 libelf1:i386
libexpat1:i386 libffi7:i386 libgl1:i386 libgl1-mesa-dri:i386 libglapi-mesa:i386
libglvnd0:i386 libglx-mesa0:i386 libglx0:i386 libllvm11:i386 libnvidia-cfg1-460
libnvidia-common-460 libnvidia-compute-460:i386 libnvidia-decode-460
libnvidia-decode-460:i386 libnvidia-encode-460 libnvidia-encode-460:i386
libnvidia-extra-460 libnvidia-fbc1-460 libnvidia-fbc1-460:i386 libnvidia-gl-460
libnvidia-gl-460:i386 libnvidia-ifr1-460 libnvidia-ifr1-460:i386 libpciaccess0:i386
libsensors5:i386 libstdc++6:i386 libvulkan1:i386 libwayland-client0:i386 libx11-6:i386
libx11-xcb1:i386 libxau6:i386 libxcb-dri2-0:i386 libxcb-dri3-0:i386 libxcb-glx0:i386
libxcb-present0:i386 libxcb-randr0:i386 libxcb-sync1:i386 libxcb-xfixes0:i386
libxcb1:i386 libxdamage1:i386 libxdmcp6:i386 libxext6:i386 libxfixes3:i386 libxnvctrl0
libxshmfence1:i386 libxxf86vm1:i386 mesa-vulkan-drivers:i386 nvidia-compute-utils-460
nvidia-dkms-460 nvidia-kernel-common-460 nvidia-kernel-source-460 nvidia-prime
nvidia-settings nvidia-utils-460 screen-resolution-extra xserver-xorg-video-nvidia-460
- Note the "screen-resolution-extra" package: if uninstalled, your resolution will be constrained
If we take off the colons (saving to a file deb_uninstalls.txt
):
xclip -o | tail --lines=+2 | cut -d " " -f 3- | tr " " "\n" | cut -d ":" -f 1
and compare to the earlier list (apt list --installed | ...
but without the last tr
call) as
nvidia_grepped_installs.txt
, summarising with:
# diff deb_uninstalls.txt nvidia_grepped_installs.txt | grep -E "^>|<"
echo "<" $(diff deb_uninstalls.txt nvidia_grepped_installs.txt | grep -E "^<" | cut -d " " -f 2)
echo ">" $(diff deb_uninstalls.txt nvidia_grepped_installs.txt | grep -E "^>" | cut -d " " -f 2)
⇣
< libatomic1 libbsd0 libdrm-amdgpu1 libdrm-intel1 libdrm-nouveau2 libdrm-radeon1 libdrm2 libedit2
libelf1 libexpat1 libffi7 libgl1 libgl1-mesa-dri libglapi-mesa libglvnd0 libglx-mesa0 libglx0
libllvm11 libpciaccess0 libsensors5 libstdc++6 libvulkan1 libwayland-client0 libx11-6 libx11-xcb1
libxau6 libxcb-dri2-0 libxcb-dri3-0 libxcb-glx0 libxcb-present0 libxcb-randr0 libxcb-sync1
libxcb-xfixes0 libxcb1 libxdamage1 libxdmcp6 libxext6 libxfixes3 libxnvctrl0 libxshmfence1
libxxf86vm1 mesa-vulkan-drivers screen-resolution-extra
> libnvidia-compute-460 nvidia-driver-460 nvidia-prime-applet
i.e. the packages that will be uninstalled include everything except libnvidia-compute-460
and
nvidia-prime-applet
(obviously nvidia-driver-460
was not mentioned as it's the primary
uninstallation target)
In other words we might want to remember to uninstall libnvidia-compute-460
afterwards
We can also check
apt list --installed | grep 460 | cut -d "/" -f 1 > 460_grepped_installs.txt
and do the same
# diff deb_uninstalls.txt 460_grepped_installs.txt | grep -E "^>|<"
echo "<" $(diff deb_uninstalls.txt 460_grepped_installs.txt | grep -E "^<" | cut -d " " -f 2)
echo ">" $(diff deb_uninstalls.txt 460_grepped_installs.txt | grep -E "^>" | cut -d " " -f 2)
⇣
< libatomic1 libbsd0 libdrm-amdgpu1 libdrm-intel1 libdrm-nouveau2 libdrm-radeon1 libdrm2 libedit2
libelf1 libexpat1 libffi7 libgl1 libgl1-mesa-dri libglapi-mesa libglvnd0 libglx-mesa0 libglx0
libllvm11 libpciaccess0 libsensors5 libstdc++6 libvulkan1 libwayland-client0 libx11-6 libx11-xcb1
libxau6 libxcb-dri2-0 libxcb-dri3-0 libxcb-glx0 libxcb-present0 libxcb-randr0 libxcb-sync1
libxcb-xfixes0 libxcb1 libxdamage1 libxdmcp6 libxext6 libxfixes3 libxnvctrl0 libxshmfence1
libxxf86vm1 mesa-vulkan-drivers nvidia-prime nvidia-settings screen-resolution-extra
> libnvidia-compute-460 nvidia-driver-460
Again libnvidia-compute-460
is the only one to watch out for lagging behind on the system.
!!! At this point I realised I didn't run the purge with
--autoremove
option earlier, so I re-ran the checks which gavelibatomic1:i386* libbsd0:i386* libdrm-amdgpu1:i386* libdrm-intel1:i386* libdrm-nouveau2:i386* libdrm-radeon1:i386* libdrm2:i386* libedit2:i386* libelf1:i386* libexpat1:i386* libffi7:i386* libgl1:i386* libgl1-mesa-dri:i386* libglapi-mesa:i386* libglvnd0:i386* libglx-mesa0:i386* libglx0:i386* libllvm11:i386* libnvidia-cfg1-460* libnvidia-common-460* libnvidia-compute-460:i386* libnvidia-decode-460* libnvidia-decode-460:i386* libnvidia-encode-460* libnvidia-encode-460:i386* libnvidia-extra-460* libnvidia-fbc1-460* libnvidia-fbc1-460:i386* libnvidia-gl-460* libnvidia-gl-460:i386* libnvidia-ifr1-460* libnvidia-ifr1-460:i386* libpciaccess0:i386* libsensors5:i386* libstdc++6:i386* libvulkan1:i386* libwayland-client0:i386* libx11-6:i386* libx11-xcb1:i386* libxau6:i386* libxcb-dri2-0:i386* libxcb-dri3-0:i386* libxcb-glx0:i386* libxcb-present0:i386* libxcb-randr0:i386* libxcb-sync1:i386* libxcb-xfixes0:i386* libxcb1:i386* libxdamage1:i386* libxdmcp6:i386* libxext6:i386* libxfixes3:i386* libxnvctrl0* libxshmfence1:i386* libxxf86vm1:i386* mesa-vulkan-drivers:i386* nvidia-compute-utils-460* nvidia-dkms-460* nvidia-driver-460* nvidia-kernel-common-460* nvidia-kernel-source-460* nvidia-prime* nvidia-settings* nvidia-utils-460* screen-resolution-extra* xserver-xorg-video-nvidia-460*
and compared the results
xclip -o | cut -d " " -f 3- | tr " " "\n" | cut -d ":" -f 1 | cut -d "*" -f 1 > autoremoved_deb_uninstalls.txt
Then show the differences (saving this as
show_install_uninstall_diff.sh
):# diff autoremoved_deb_uninstalls.txt nvidia_grepped_installs.txt | grep -E "^>|<" echo "<" $(diff autoremoved_deb_uninstalls.txt nvidia_grepped_installs.txt | grep -E "^<" | cut -d " " -f 2) echo echo ">" $(diff autoremoved_deb_uninstalls.txt nvidia_grepped_installs.txt | grep -E "^>" | cut -d " " -f 2) echo "\n\n" # diff autoremoved_deb_uninstalls.txt 460_grepped_installs.txt | grep -E "^>|<" echo "<" $(diff autoremoved_deb_uninstalls.txt 460_grepped_installs.txt | grep -E "^<" | cut -d " " -f 2) echo echo ">" $(diff autoremoved_deb_uninstalls.txt 460_grepped_installs.txt | grep -E "^>" | cut -d " " -f 2)
(Nothing changed,
> libnvidia-compute-460
,nvidia-prime-applet
and> libnvidia-compute-460
just like last time)
To clean out the NVIDIA driver you installed (note: you will often see apt-get remove --purge
which is a synonym for apt purge
, use the shorter)
- Edit - as mentioned below, in hindsight I'm curious whether this next command should target
nvidia-graphics-driver-460
instead ofnvidia-driver-460
(specifically to removenvidia-driver-460
which comes withnvidia-graphics-driver-460
). I think this is known as the 'metapackage'.
sudo apt purge --autoremove nvidia-driver-460 -y
sudo apt autoclean
nvidia-smi
is now gone, and if you re-run the grep checks...
apt list --installed | grep nvidia | cut -d "/" -f 1 | tr "\n" " "
apt list --installed | grep 460 | cut -d "/" -f 1 | tr "\n" " "
We now get
libnvidia-compute-460 nvidia-prime-applet
libnvidia-compute-460
...as expected!
If we let bash complete nvidia-
(+ tab) there are still:
nvidia-detector nvidia-optimus-offload-vulkan
nvidia-optimus-offload-glx
...so something wasn't quite thorough... I didn't check this beforehand on my installation but
from what I can see on the web, nvidia-prime-applet
may in fact have shipped with Mint 20.
libnvidia-compute-460
on the other hand has definitely been missed during the 460 uninstall,
so I'm going to go ahead and remove that one myself.
apt show libnvidia-compute-460
shows that this came fromnvidia-graphics-drivers-460
, whereas I purgednvidia-driver-460
, so perhaps I should have purged the metapackage...
sudo apt purge --autoremove libnvidia-compute-460 -y
sudo apt autoclean
There's a section on pre-installation actions for Ubuntu (and other OSs)
Viacheslav suggests:
sudo ./cuda-11.2.run --silent --toolkit --toolkitpath=/usr/local/cuda-11.2
- But note that this
--silent
flag will suppress the interactive prompt - If
--toolkitpath
is not provided it defaults to/usr/local/cuda-11.2
, so this is unnecessary to provide - For all options see the advanced options section of the installation guide
I was going to run:
sudo ./cuda-11.2.run --toolkit
(Read the rest of this section before going ahead!)
However since I just uninstalled my NVIDIA driver package, I'm going to want to get that back,
so forget about the --toolkit
flag now and install all parts:
- Driver
- 460.27.04
- CUDA Toolkit 11.2
- CUDA Samples 11.2
- CUDA Demo Suite 11.2
- CUDA Documentation 11.2
So I simply sudo ./cuda-11.2.run
and accept the EULA.
- See also: CUDA release notes
To install multiple versions of CUDA, it's advised in the article above that
IMPORTANT: cuda installer creates a link
/usr/local/cuda
to the installation folder. Therefore, it’s important either to remove the link, or to modify it to point to the CUDA that you want to use by default.
I.e. it will symlink the default CUDA to the installed CUDA. If this is your first CUDA don't worry about it, if you're installing an older version then you might want to symlink it back to the newer one after installation.
However if you want to avoid this, you don't need to modify it: under Toolkit Options in the installer you
can opt out of this symlink (which is why --silent
is not necessarily the easy choice).
- Access the Toolkit Options either by going to Options > Toolkit Options or press
a
with the cursor over[X] [CUDA Toolkit 11.2]
Hit Install to exit the wizard back onto the command line
...and "Installation failed. See log at /var/log/cuda-installer.log
for details."
This log shows that
[INFO]: Driver not installed.
[INFO]: Checking compiler version...
[INFO]: gcc location: /usr/bin/gcc
[INFO]: gcc version: gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)
[INFO]: Initializing menu
[INFO]: Setup complete
[INFO]: Components to install:
[INFO]: Driver
[INFO]: 460.27.04
[INFO]: Executing NVIDIA-Linux-x86_64-460.27.04.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd 2>&1
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed.
[ERROR]: Install of 460.27.04 failed, quitting
...after all that fuss demanding I uninstall the driver, it then fails to install the driver...
The CUDA installer attempted to install the 418.87.00 driver and the driver installation failed. To find out why the driver installation failed, you’ll need to check the driver installer log.
That log would typically be at:
/var/log/nvidia-installer.log
- (via)
This shows:
-> The file '/tmp/.X0-lock' exists and appears to contain the process ID '1314' of a running X server. ERROR: You appear to be running an X server; please exit X before installing. For further details, please see the section INSTALLING THE NVIDIA DRIVER in the README available on the Linux driver download page at www.nvidia.com. ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
In Mint 20.1 (presumably also for Ubuntu 20.04) the X server is LightDM (it was previously gdm
),
sudo service lightdm stop
is how to stop it, and then later sudo service lightdm start
will restart it.
This will kill your X server (including tmux server running on it) leaving you with a black screen so save anything you have open and get ready to SSH into it (or figure out how to get a terminal, I prefer to just SSH into it from another machine).
Then cd
back into the directory where you put your .run
installer file and re-run it
This time I got
-> An alternate method of installing the NVIDIA driver was detected. (This is usually a package
provided by your distributor.) A driver installed via that method may integrate better with your
system than a driver installed by nvidia-installer.
Please review the message provided by the maintainer of this alternate installation method and
decide how to proceed:
The NVIDIA driver provided by Ubuntu can be installed by launching the "Software & Updates"
application, and by selecting the NVIDIA driver from the "Additional Drivers" tab.
(Answer: Continue installation)
-> For some distributions, Nouveau can be disabled by adding a file in the modprobe configuration
directory. Would you like nvidia-installer to attempt to create this modprobe file for you?
(Answer: Yes)
-> One or more modprobe configuration files to disable Nouveau have been written. For some
distributions, this may be sufficient to disable Nouveau; other distributions may require
modification of the initial ramdisk. Please reboot your system and attempt NVIDIA driver
installation again. Note if you later wish to re-enable Nouveau, you will need to delete these
files: /usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf,
/etc/modprobe.d/nvidia-installer-disable-nouveau.conf
-> Would you like to register the kernel module sources with DKMS? This will allow DKMS to
automatically build a new module, if you install a different kernel later. (Answer: No)
ERROR: You do not appear to have libc header files installed on your system. Please install your
distribution's libc development package.
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details.
You may find suggestions on fixing installation problems in the README available on the Linux driver
download page at www.nvidia.com.
Annoyingly, although it complained that I should remove the drivers, it now tells me 'helpfully' that I can get the NVIDIA drivers from the package manager! It adds that
A driver installed via that method may integrate better with your system than a driver installed by nvidia-installer.
Thanks for that!
Bearing this in mind, I would now prefer to go back and ignore the warning about there being a driver already installed (as this is conflicting advice with the suggestion to install the driver via the system package manager).
At this point you can either restart the system or restart the X server. The installer log mentioned
Please reboot your system and attempt NVIDIA driver installation again.
but first I want to just restart the X server
sudo service lightdm start
At this point the display switches back on (from a black screen) but with low quality resolution. Immediately upon login you get the popup mentioned in Linux Mint 20.1 NVIDIA graphics card driver setup to “Check your video drivers”.
Since it was advised to shut down the machine (above) I ignored this popup and tried to shut down by clicking the power icon but for some reason it just logged out, and then clicking it again brought up a "quit" dialog box but with no button to actually shut down (?).
Instead I just ran shutdown "now"
over SSH. Initially it looked like nothing happened (the SSH
session closed) but then after a brief delay the machine powered down and I could boot it up again.
Upon logging back (still with the low resolution display due to the uninstalled "screen-resolution-extra" package), the popup re-appeared and I repeated the installation as described at the link and restarted the machine)
This time the screen resolution was back to normal, nvidia-smi
was back, and checking for nvidia
packages showed libnvidia-compute-460
had returned:
apt list --installed | grep nvidia | cut -d "/" -f 1 | tr "\n" " "
⇣
libnvidia-cfg1-460 libnvidia-common-460 libnvidia-compute-460 libnvidia-compute-460
libnvidia-decode-460 libnvidia-decode-460 libnvidia-encode-460 libnvidia-encode-460
libnvidia-extra-460 libnvidia-fbc1-460 libnvidia-fbc1-460 libnvidia-gl-460 libnvidia-gl-460
libnvidia-ifr1-460 libnvidia-ifr1-460 nvidia-compute-utils-460 nvidia-dkms-460 nvidia-driver-460
nvidia-kernel-common-460 nvidia-kernel-source-460 nvidia-prime-applet nvidia-prime nvidia-settings
nvidia-utils-460 xserver-xorg-video-nvidia-460
So the only thing left to do is to re-run the installer and this time just un-check the driver (but get all the samples etc.)
- I would suggest to just deselect it with a flag but I'm not sure which flag there is to select
CUDA demo suite, and documentation (and I don't see a good reason to exclude them), only flags
for
--toolkit
and--samples
are listed. - At a guess, the docs might be included in the toolkit (but they're a separate bullet point in the
installer wizard so I don't know for sure).
- The
--no-man-page
flag is likely the same as excluding "CUDA Documentation 11.2" (so just--toolkit --samples
will include them, but I'm unclear what controls inclusion of the demo suite...
- The
Edit: lastly, the samples require some 3rd party libraries:
sudo apt install g++ freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev
To cut to the chase here, you end up clicking install on this config:
│ CUDA Installer
│ - [ ] Driver
│ [ ] 460.27.04
│ + [X] CUDA Toolkit 11.2
│ [X] CUDA Samples 11.2
│ [X] CUDA Demo Suite 11.2
│ [X] CUDA Documentation 11.2
This will run on a single core for a while and then
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-11.2/
Samples: Installed in /home/louis/, but missing recommended libraries
Please make sure that
- PATH includes /usr/local/cuda-11.2/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-11.2/lib64, or, add /usr/local/cuda-11.2/lib64 to
/etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.2/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of
version at least 460.00 is required for CUDA 11.2 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller>
with the name of this run file:
sudo <CudaInstaller>.run --silent --driver
Logfile is /var/log/cuda-installer.log
- Annoyingly samples were "missing recommended libraries", I've gone back and added these as a note
above. The samples have gone into
~/NVIDIA_CUDA-11.2_Samples
- I have to update my
PATH
to include/usr/local/cuda-11.2/bin
andLD_LIBRARY_PATH
to include/usr/local/cuda-11.2/lib64
export PATH="/usr/local/cuda-11.2/bin:$PATH" export LD_LIBRARY_PATH="/usr/local/cuda-11.2/lib64:$LD_LIBRARY_PATH"
- Note that the NVIDIA guide
suggests something a little different (which looks like it takes substrings... I'm going to
avoid this and do it the standard way above as it's simpler, more consistent with my
.bashrc
and more readable)export PATH=/usr/local/cuda-11.2/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
To check your installed driver version, run:
nvidia-smi --query-gpu=driver_version --format=csv,noheader
⇣
460.32.03
- See
man nvidia-smi
for more options and here for some examples of useful queries
If you've installed a package like PyTorch for Python, you can now run:
torch.cuda.is_available()
⇣
True