Google Cloud VM Setup checkpoint - eecse4750/e4750_2024Fall_students_repo GitHub Wiki
The GCP platform provides several services for remote compute, storage and networking, with a particular focus on machine learning and big data. For this course, you will be using the Compute Engine service to create Virtual Machine (VM) instances, and will be executing all your assignments and projects on these VM instances.
Through your existing Google Account or CUID, you will be allowed $300 of credit for free (assuming you have not used Google Cloud services before). The billing for using GCP is nominal if you are responsible and don't forget to stop your instances, else they could run for hours (or days) and you will exhaust your credits and be charged.
In case you run out of the free credits, the course can accommodate student requests for additional Cloud credits through coupons.
A video tutorial for the setup for GCP (until firewall setup) is linked here in the 2024Fall Student folder of the course gdrive.
First, you must make sure you have the Google Cloud SDK installed on your system. For Debian based distros (like Ubuntu), follow this tutorial. If you are trying this on Windows, follow this tutorial. Mac users can follow this quickstart guide.
Once you have successfully installed and setup the Google Cloud SDK and initialized the desired settings and project, proceed as outlined below.
Login to Google Cloud using your columbia.edu
account. Navigate to your console, and then activate your free trial. This will only appear if you are logging in to the console for the first time.
Once you've done so, create a new project from the console home page. If you are using the course coupons, make sure you choose the billing account associated with E4750 as the billing account for this project. Then, nagivate to the side pane and look for the Compute Engine
. Click here to enable the API for your new project.
This will enable the Compute Engine API for your account. It might take a few minutes for this change to take effect. You can keep track of this from the notification bell on the top right of the screen.
Next, navigate to IAM & Admin > Quotas.
Search for gpu quotas and click on Compute Engine API: GPUs (all regions)
from the quota list. You can use the fiters shown in the figure below to identify the right option.
Then, click Edit Quota - seen on clicking the three dots to the right of GPU (all regions). You will now be prompted to enter a new value for the quota. You may enter 1 and add the description shown below. The quota increase request is ideally processed within a few minutes (Unless a value >> 1 is input).
You will be prompted next to enter you contact information. Since you will be using your lionmail id for setting up GCP, the same will be input in the email section.
You created a project in the earlier stage. Ensure that it is attched to the right billing account.
Example image of how a general associated billing account should look like is shown below.
Go to your Google Cloud Console and look for the Compute Engine in the side pane, and click on VM instances as shown.
Click on Create Instance on the banner.
- You may choose any name you want to; an example is shown.
- Select a region the region (e.g.
us-east1
) and a zone (e.g.us-east1-c
). Note that this stage might prove to be non trivial as finding a region that has an availability of the right compute resources (NVIDIA T4 GPU) can take a few trials. - For uniformity in assignments, make sure you choose the same processor and GPU configurations. You may choose other options if you are using this tutorial to create a VM for a personal project - but keep in mind that the running costs will rise with some configurations. Keep an eye on the estimate on the far right of the page.
Choose the boot image as shown. Ubuntu 22.04 LTS x86 image is used.
Ensure that you enable the two checkboxes for network access:
- Allow HTTP traffic
- Allow HTTPS traffic
Click Create. Bear in mind, as soon as GCP is done creating your image, it will boot and start running. If you do not plan to interact with it after reaching this stage, make sure you stop the instance, or you will continue to be billed for the resources it uses.
To use environments like jupyter, you need to add firewall rules that will let you access services run on certain ports.
- In GCP, got to 'VPC network -> Firewall'
- Create a firewall rule with any name you want, with the below configuration
Sometimes you will finish setting a VM in a specific region, but when you need to access it you won't find resources in that regions. This will force you to try set up another VM in different regions. Performing the previous installation steps, while straightforward, consumes a bit of your time. This can be easily avoided if you create snapshots of you boot image.
The instructions to do so are available here : https://cloud.google.com/compute/docs/disks/create-snapshots
What this will let you do is while you are creating a new VM, you can select this specific snapshot image instead of the standard Ubuntu image like you did earlier. To do this, you can go to the snapshots tab - next to the public images tab where the ubuntu image is selected. Once you select this snapshot as the source of you boot disk, you won't need to install dependencies again! Your jupyter setup (as shown below), github setup, will all be included by default, as you are creating a copy of your complete VM boot disk.
Once the VM has been created and it starts running, you can move to this stage. Click the ssh button that can be seen next to the running vm on the instances page. This will open a terminal that will let you interact with the ubuntu virtual machine that you created. Feel free to experiment with the terminal using any linux commands that you are aware of.
The installation of the cuda toolkit including the nvidia cuda driver and profiling tools can be done by following these steps:
#1. (Build system essentials - contains requirements essential to install cuda toolkit)
sudo apt update
sudo apt install build-essential
#2. (This will download the toolkit installer run file)
wget https://developer.download.nvidia.com/compute/cuda/12.6.1/local_installers/cuda_12.6.1_560.35.03_linux.run
#3. (Run the installer)
sudo sh cuda_12.6.1_560.35.03_linux.run
# This might fail initially (mostly will not), but should work after a few trials (rerun the installer).
#4. (Verify Installation)
# Run this command to check if the driver has been installed correctly. If there is an error, repeat the installation (rerun the installer).
nvidia-smi
#5. (Add cuda binaries to PATH - edit ~/.bashrc)
# This step is essential to let the system know the sources of the softwares installed with the cuda toolkit when they are called through the terminal
# To open the bashrc file, run this command
nano ~/.bashrc
# The nano editor will open in the terminal
# Add the following lines to the end of the bashrc file
export PATH=$PATH:/usr/local/cuda-12.6/bin
export LD_LIBRARY_PATH=/usr/local/cuda-12.6/targets/x86_64-linux/lib/:$LD_LIBRARY_PATH
# save the file and exit
# Now run this command to execute the exports and add the new paths to the environment
source ~/.bashrc
#6. (Run the following to check if the PATH has been successfully updated):
nsys --version
ncu --version
nvcc --version
# All these must output versions of each software. In case an error arises, the installation was erroneous
This is a common error that appears after successful installation of the toolkit. We need to perform the below steps to resolve it.
# In the folder /etc/modprobe.d - create a file <name>.conf
# Add the following line to the file - use nano editor as above
options nvidia NVreg_RestrictProfilingToAdminUsers=0
# save the file and close it
# run the following commands
sudo update-initramfs -u
sudo reboot # reboot system
Test ncu with the following -
ncu --target-processes all -o your_report python your_script.py
If this successfully runs, you are all set to use the cuda toolkit
NOTE: Once you setup the virtual environment and install dependencies in it, you can only work inside the virtual environment.
For example if your virtual environment is called "cuda_cl", go to the directory where it is installed and run
# FOR REFERENCE - DO NOT RUN
#############################
$source cuda_cl/bin/activate#
#############################
# CONTINUE TO RUN FROM NOW
#1.(Update system)
$sudo apt update && sudo apt upgrade -y
#2. Install pre-requisites
$sudo apt install python3-dev python3-pip python3-venv
#3. Create venv for development and activate
$python3 -m venv cuda_cl
$source cuda_cl/bin/activate
#(cuda_cl is a name given to the virtual environment. This can be replaced by any other name that you prefer)
# Now, your terminal will look like this:-
(cuda_cl)$
# This means the cuda_cl environment is active
# To deactivate the venv, you can run:
# DO NOTE EXECUTE THIS AT THIS STEP - PROVIDED JUST FOR REFERENCE
(cuda_cl)$deactivate
# The following installations will be done within the venv
#4. (Install prerequisites)
(cuda_cl)pip install numpy scipy pytest six matplotlib wheel mako Pillow pytest pybind11 six mock pytools
#5. (Install pycuda and pyopencl)
(cuda_cl)$pip install pycuda
(cuda_cl)$pip install pyopencl
(cuda_cl)$pip install siphash24
#6. (Verify Installation - OPTIONAL)
#Run the following code to query the device using pycuda and pyopencl
# create a file called install_check.py, add the following code, run it to test
# you can run
(cuda_cl)$touch install_check.py
(cuda_cl)$nano install_check.py
Copy the below code into the editor
######################################
# install_check.py
import pycuda.driver as cuda
import pycuda.autoinit
import pyopencl as cl
# Check OpenCL devices
platforms = cl.get_platforms()
for platform in platforms:
devices = platform.get_devices()
for device in devices:
print(device.name)
# Check CUDA device
print(cuda.Device(0).name())
######################################
Once you are done adding these lines, press ctrl+s followed by ctrl+x (or ctrl+x and follow instructions to save file). Exit nano. Then run the python script using
(cuda_cl)/$python ./install_check.py
If you can import without any errors, the installation was successful. This can be followed by additional setup of jupyter lab/ notebook for developing python code.
# 7. SETUP Jupyter Environment inside the created venv
(cuda_cl)$pip install jupyterlab
(cuda_cl)$pip install notebook
# config jupyter notebook - will generate file at <base root>\.jupyter\jupyter_notebook_config.py
(cuda_cl)$jupyter notebook --generate-config
# This will generate a config file and print it location. Open the file using nano
(cuda_cl)$nano <jupyter_config_file_path> # can be seen once the generate config command above is used
# Add these lines into the file below the first statement seen (c.NotebookApp)
#####################################################################
c.NotebookApp.ip = "*"
c.NotebookApp.open_browser = False
c.NotebookApp.port = 9999 # replace with other number - 8080 e.g
#####################################################################
Once you are done adding these lines, press ctrl+s followed by ctrl+x (or ctrl+x and follow instructions to save file)
# The run this command to setup a password
(cuda_cl)$jupyter notebook password #<add custom password or press enter>
# will prompt you to enter password - choose to enter as per wish - can also hit enter for empty password
# Addition jupyter lab extension for profiling with nvidia tools
(cuda_cl)$pip install jupyterlab-nvidia-nsight
The setup above works very well....most of the time. Unfortunately people have been facing issues with the NVIDIA driver crashing out of the blue! When one attempts a simple reinstall, they often encounter a driver issue. While resolutions exist for these problems, resolving them eats up a lot of precious time that one would rather spend on solving parallel computing problems and getting better at pycuda/pyopencl.
It is recommended that everyone sets up an ubuntu VM from scratch as the guide instructs, but if driver issues are encountered, one can switch to a public image.
While the installation performed earlier is for CUDA toolkit 12.6, public images for "Deep Learning on Linux" seem to come with 12.3 at best. This is accepted. One can "Deep Learning VM with CUDA 12.3 M125" image.
This comes with cuda toolkit installed by default. The required setup for this would be completing the final part of Step-2 - "Give profilers GPU access". This needs to be followed up by Step-3