google golab usage - unix1998/technical_notes GitHub Wiki
Google Colab is free for individual developers and provides an excellent platform to practice GPU and CUDA programming. Google Colab offers free access to NVIDIA GPUs, and you can use CUDA with Python libraries such as cuPy
and TensorFlow
.
Steps to Use Google Colab for GPU and CUDA Programming:
-
Access Google Colab:
- Go to Google Colab.
-
Create a New Notebook:
- Click on "New Notebook" to create a new Jupyter notebook in Google Colab.
-
Enable GPU:
- Navigate to
Runtime
>Change runtime type
. - Select
GPU
from the hardware accelerator dropdown menu. - Click
Save
.
- Navigate to
-
Verify GPU Availability:
- In a new code cell, run the following code to verify that a GPU is available:
import tensorflow as tf print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
- In a new code cell, run the following code to verify that a GPU is available:
-
Install cuPy:
- To use
cuPy
, you need to install it. Run the following command in a new code cell:
Replace!pip install cupy-cuda11x
11x
with the appropriate version, matching the CUDA version available in Colab. As of now, you might usecupy-cuda11x
, which is compatible with the provided CUDA runtime.
- To use
-
Run CUDA Code with cuPy:
- Here is an example of using
cuPy
to perform computations on the GPU:import cupy as cp # Create a random array using cuPy x = cp.random.rand(10000, 10000) # Perform a simple operation on the GPU y = cp.dot(x, x.T) print(y)
- Here is an example of using
Example of Running CUDA Kernels:
If you want to write custom CUDA kernels, you can do that with cupy
as well:
import cupy as cp
# Define a CUDA kernel
kernel = cp.RawKernel(r'''
extern "C" __global__
void add(float* x1, float* x2, float* y, int n) {
int tid = blockIdx.x * blockDim.x + threadIdx.x;
if (tid < n) {
y[tid] = x1[tid] + x2[tid];
}
}
''', 'add')
# Initialize data
n = 1024
x1 = cp.random.rand(n).astype(cp.float32)
x2 = cp.random.rand(n).astype(cp.float32)
y = cp.zeros_like(x1)
# Launch the kernel
threads_per_block = 128
blocks_per_grid = (n + threads_per_block - 1) // threads_per_block
kernel((blocks_per_grid,), (threads_per_block,), (x1, x2, y, n))
# Verify the result
print(y[:10])
By using Google Colab, you can practice and experiment with CUDA programming without the need to install CUDA and related software on your local machine. This is particularly useful if you don't have a powerful GPU on your laptop.