google golab usage - unix1998/technical_notes GitHub Wiki

Google Colab is free for individual developers and provides an excellent platform to practice GPU and CUDA programming. Google Colab offers free access to NVIDIA GPUs, and you can use CUDA with Python libraries such as cuPy and TensorFlow.

Steps to Use Google Colab for GPU and CUDA Programming:

  1. Access Google Colab:

  2. Create a New Notebook:

    • Click on "New Notebook" to create a new Jupyter notebook in Google Colab.
  3. Enable GPU:

    • Navigate to Runtime > Change runtime type.
    • Select GPU from the hardware accelerator dropdown menu.
    • Click Save.
  4. Verify GPU Availability:

    • In a new code cell, run the following code to verify that a GPU is available:
      import tensorflow as tf
      print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
      
  5. Install cuPy:

    • To use cuPy, you need to install it. Run the following command in a new code cell:
      !pip install cupy-cuda11x
      
      Replace 11x with the appropriate version, matching the CUDA version available in Colab. As of now, you might use cupy-cuda11x, which is compatible with the provided CUDA runtime.
  6. Run CUDA Code with cuPy:

    • Here is an example of using cuPy to perform computations on the GPU:
      import cupy as cp
      
      # Create a random array using cuPy
      x = cp.random.rand(10000, 10000)
      
      # Perform a simple operation on the GPU
      y = cp.dot(x, x.T)
      
      print(y)
      

Example of Running CUDA Kernels:

If you want to write custom CUDA kernels, you can do that with cupy as well:

import cupy as cp

# Define a CUDA kernel
kernel = cp.RawKernel(r'''
extern "C" __global__
void add(float* x1, float* x2, float* y, int n) {
    int tid = blockIdx.x * blockDim.x + threadIdx.x;
    if (tid < n) {
        y[tid] = x1[tid] + x2[tid];
    }
}
''', 'add')

# Initialize data
n = 1024
x1 = cp.random.rand(n).astype(cp.float32)
x2 = cp.random.rand(n).astype(cp.float32)
y = cp.zeros_like(x1)

# Launch the kernel
threads_per_block = 128
blocks_per_grid = (n + threads_per_block - 1) // threads_per_block
kernel((blocks_per_grid,), (threads_per_block,), (x1, x2, y, n))

# Verify the result
print(y[:10])

By using Google Colab, you can practice and experiment with CUDA programming without the need to install CUDA and related software on your local machine. This is particularly useful if you don't have a powerful GPU on your laptop.