HPC Ethical AI - JannatulFaika/HPC-AI-Resources GitHub Wiki

🚀✨ HPC: Ethical AI & Future Trends on HPC Workshop! 🌟✨

🎯 Goal

📊 Learn how to develop Ethical AI models while leveraging High-Performance Computing (HPC) for large-scale training. We will work with a sample multimodal dataset, representative of massive open-source image-text data, to demonstrate how HPC resources can be effectively used for ethical AI model training and analysis. 🚀

📌 What You Will Learn 🧠💡

✅ Understanding Ethical AI and its challenges (bias, fairness, explainability) 🤖 ✅ Exploring HPC as a solution for training large AI models responsibly 💻 ✅ Training AI models on large-scale datasets using multi-GPU HPC clusters 🏗️ ✅ Evaluating AI fairness, transparency, and accountability 🔍 ✅ Future trends in Ethical AI and responsible AI development 🌍

📚 1. Key Terminologies in Ethical AI & HPC 🔑

🌍 Ethical AI

AI that ensures fairness, transparency, and accountability in decision-making.
Addresses bias in datasets and models to prevent discrimination.

🚀 HPC for AI

High-Performance Computing (HPC) enables AI training on massive datasets by distributing computations across GPUs and multiple nodes.
Essential for training foundation models and large-scale NLP/CV applications.

🏗️ Bias & Fairness in AI

Algorithmic Bias: When AI models make unfair decisions based on skewed data.
Fairness Metrics: Statistical Parity, Equal Opportunity, Equalized Odds, Disparate Impact.
Debiasing Techniques: Data preprocessing, adversarial training, fairness constraints.

💻 Federated Learning & Privacy-Preserving AI

Federated Learning: AI training across distributed devices while keeping data localized.
Differential Privacy: Adding noise to AI models to protect sensitive data.

🌍 Future Trends in Ethical AI

Explainable AI (XAI): Making AI decision-making understandable.
Regulatory Frameworks: EU AI Act, AI Ethics Guidelines from NIST, UNESCO.
Sustainable AI: Reducing AI's carbon footprint through efficient HPC utilization.

🔍 2: Access HPC Terminal via JupyterHub

1️⃣ Go to CSUSB HPC if you are a learner or educator at CSUSB. Otherwise, have an educator from your school create an account for you using the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support ACCESS CI, a U.S. government program that provides free access to HPC resources. 2️⃣ Click CI Logon to log in using your school account. 3️⃣ Select the GPU model that best fits your needs. 4️⃣ After logging in, Welcome to JupyterLab. ✅ You're ready to go!

🔍 3. Hands-on: Loading a Multimodal Dataset on HPC

We will use CSUSB's High-Performance Computing (HPC) system to run our AI code. Follow these steps to access JupyterHub on HPC.

🚀 Step 1: Log In to HPC with CI Logon 🔐

Let's get you authenticated! Here's how:

1️⃣ Go to the CI Logon Portal

Open CI Logon in your browser.
Click Sign In with CI Logon

2️⃣ Select Your Identity Provider & Log In

Choose "California State University, San Bernardino" 🎓 from the dropdown.
Check "Remember this selection" to save time next login. ✅
Click Log in to proceed. 🚀

🖥️ Step 2: Launch Your JupyterHub Server

Your HPC Jupyter environment is ready—let's start coding!

1️⃣ Check Out the Launcher Page

You'll see several options like:
Notebook: Start a Jupyter Notebook (e.g., Python 3 🐍).
Console: Open a Python console for quick commands 📟.
Other: Access Terminal, Text File, Markdown File, or Help 📚.

2️⃣ Open a Notebook or File

Click Notebook → Python 3 (ipykernel) to start coding! ✍️
You can also browse and open existing files. 📂

3️⃣ Run Your Code & Save Your Work

Type your Python code and press Shift + Enter to run.
Save often to keep your work safe! 💾

❓ Why Use 2 GPUs and RTX A5000?

We are working with huge datasets that require powerful GPUs for efficient computation.

📂 Step 3: Load a Sample Multimodal Dataset

➕🐍 Add a New Code Cell

1️⃣ Click + Code in Jupyter Notebook to add a new code cell.
2️⃣ First, install the required packages:

!pip install torch torchvision transformers sklearn numpy

🔗ChatGPT prompt for generating the code

3️⃣ Add a new code cell and copy and paste the following code:

import os
import torch  # PyTorch for deep learning
import torchvision  # For handling image datasets
import torchvision.transforms as transforms  # For image preprocessing
import numpy as np
from PIL import Image
from torch.utils.data import DataLoader

# Step 1: Check if GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Step 2: Define image transformations
transform = transforms.Compose([
    transforms.Resize((224, 224)),  # Resize images for consistency
    transforms.ToTensor(),  # Convert images to tensors
    transforms.Normalize((0.5,), (0.5,))  # Normalize (adjust brightness/contrast)
])

# Step 3: Create a mock dataset (for learning purposes)
mock_dataset_path = "mock_dataset"
os.makedirs(f"{mock_dataset_path}/class1", exist_ok=True)
os.makedirs(f"{mock_dataset_path}/class2", exist_ok=True)

# Generate dummy images
for i in range(5):  # Reduce number for simplicity
    img1 = Image.fromarray(np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8))
    img1.save(f"{mock_dataset_path}/class1/img{i}.jpg")
    
    img2 = Image.fromarray(np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8))
    img2.save(f"{mock_dataset_path}/class2/img{i}.jpg")

# Step 4: Load dataset
train_dataset = torchvision.datasets.ImageFolder(root=mock_dataset_path, transform=transform)

# Step 5: Create DataLoader
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)

# Step 6: Explore the dataset interactively
for images, labels in train_loader:
    print(f"Batch shape: {images.shape}")  # Show shape of batch
    print(f"Labels: {labels}")  # Show class labels
    break  # Only show the first batch for now

🔗 ChatGPT explanation for the code

4️⃣ Click Run (▶) and check the output!

✅ Dataset should load successfully! You should now see the number of images available in the mock dataset. 🖼️📊🎉

🏗️ 4. Building and Training a Deep Learning Model for Ethical AI

🚀 Step 1: Define a Bias-Aware Vision Transformer (ViT) Model

➕🐍 Add a New Code Cell

1️⃣ Click + Code in Jupyter Notebook to add a new code cell.
2️⃣ Copy and paste the following code:

🔗ChatGPT prompt for generating the code

from transformers import ViTForImageClassification, ViTFeatureExtractor  # Import ViT model and feature extractor
import torch.nn as nn  # Import neural network module from PyTorch

# Create a model for our specific number of classes (2 in our mock example)
num_classes = len(train_dataset.classes)

# Load pre-trained Vision Transformer (ViT) model
model = ViTForImageClassification.from_pretrained(
    "google/vit-base-patch16-224", 
    num_labels=num_classes,
    ignore_mismatched_sizes=True  # Ignore size mismatch for classification head
)

# Move the model to the available device (GPU if available, otherwise CPU)
model.to(device)

# Print confirmation message and model info
print(f"Pre-trained Vision Transformer Model Loaded with {num_classes} output classes! 🚀")
print(f"Model is using device: {next(model.parameters()).device}")

🔗 ChatGPT explanation for the code

3️⃣ Click Run (▶) and check the output!

✅ Pre-trained Vision Transformer Model should load successfully! Your ViT model is now ready for image classification. 🚀🎉

🚀 Step 2: Train the Model with Ethical AI Constraints

➕🐍 Add a New Code Cell

1️⃣ Click + Code in Jupyter Notebook to add a new code cell.
2️⃣ Copy and paste the following code:

🔗ChatGPT prompt for generating the code

import torch.optim as optim  # Import optimizers from PyTorch

# Define loss function (CrossEntropyLoss for classification tasks)
criterion = nn.CrossEntropyLoss()

# Define optimizer (Adam with a learning rate of 0.0001)
optimizer = optim.Adam(model.parameters(), lr=0.0001)

# Training loop for a small number of epochs (for demonstration)
num_epochs = 2  # Reduced for demonstration

for epoch in range(num_epochs):
    running_loss = 0.0  # Initialize cumulative loss for the epoch
    
    for i, (inputs, labels) in enumerate(train_loader):  # Iterate over batches in the training set
        inputs, labels = inputs.to(device), labels.to(device)  # Move data to GPU (if available)
        
        optimizer.zero_grad()  # Reset gradients before each batch
        outputs = model(inputs).logits  # Forward pass (ViT's output is stored in `.logits`)
        loss = criterion(outputs, labels)  # Compute loss
        loss.backward()  # Backpropagate gradients
        optimizer.step()  # Update model parameters
        
        running_loss += loss.item()  # Accumulate loss
        
        # Print batch progress
        if i % 2 == 0:  # Print every 2 batches
            print(f"Epoch {epoch+1}, Batch {i+1}: Loss {loss.item():.4f}")
        
    # Print loss at the end of each epoch
    print(f"Epoch {epoch+1}/{num_epochs}: Average Loss {running_loss/len(train_loader):.4f}")

print("Training Complete! 🚀")

# Save the model (optional)
torch.save(model.state_dict(), "ethical_vit_model.pth")
print("Model saved to 'ethical_vit_model.pth'")

🔗 ChatGPT explanation for the code

3️⃣ Click Run (▶) and check the output!

✅ Training should complete successfully! Your Vision Transformer model has been trained for the specified number of epochs. 🚀📊🎉

🏆 5. Evaluating Model Fairness & Bias Mitigation

Evaluate Model Performance Across Demographics

➕🐍 Add a New Code Cell

1️⃣ Click + Code in Jupyter Notebook to add a new code cell.
2️⃣ Copy and paste the following code:

🔗 ChatGPT prompt for generating the code

import numpy as np  # Import NumPy for array operations
from sklearn.metrics import accuracy_score, confusion_matrix  # Import accuracy metric and confusion matrix
import matplotlib.pyplot as plt  # For visualization
# First, ensure we have the training dataset and number of classes
# If train_dataset is not defined, create mock data
try:
    dataset_size = len(train_dataset)
    num_classes_value = num_classes
except NameError:
    # If variables are not defined, create mock data instead
    print("Warning: train_dataset or num_classes not found, using mock data instead")
    dataset_size = 100  
    num_classes_value = 2  
# Define a function for fairness evaluation
def evaluate_fairness(predictions, labels, sensitive_attribute):
    """
    Evaluate fairness by computing accuracy per sensitive attribute group.
    Args:
    - predictions (np.array): Model-predicted class labels.
    - labels (np.array): Ground-truth class labels.
    - sensitive_attribute (np.array): Array of sensitive attributes (e.g., gender, race).
    Returns:
    - dict: Accuracy per unique group in the sensitive attribute.
    """
    unique_groups = np.unique(sensitive_attribute)  # Get unique sensitive groups
    group_accuracies = {}  # Dictionary to store accuracy per group
    for group in unique_groups:
        group_indices = (sensitive_attribute == group)  # Get indices for the current group
        # Check if there are any samples for this group
        if np.sum(group_indices) > 0:
            group_accuracy = accuracy_score(
                labels[group_indices],
                predictions[group_indices]
            )  # Compute accuracy
            group_accuracies[group] = group_accuracy  # Store result
        else:
            group_accuracies[group] = 0.0  # No samples for this group
    return group_accuracies  # Return dictionary with fairness metrics
# For demonstration, let's create simulated demographic data
# In a real scenario, this would be actual demographic information
y_true = np.random.randint(0, num_classes_value, size=dataset_size)  # Create mock ground truth labels
y_pred = np.random.randint(0, num_classes_value, size=dataset_size)  # Create mock predictions
# Assign random demographic groups (representing sensitive attributes)
# Add slight bias for demonstration purposes
sensitive_attr = np.random.choice(["Group A", "Group B", "Group C"], size=dataset_size,
                                 p=[0.5, 0.3, 0.2])  # Uneven distribution
# Intentionally add some bias to the predictions for demonstration
for i in range(dataset_size):
    if sensitive_attr[i] == "Group C":
        # Make predictions for Group C worse (50% chance of being wrong)
        if np.random.random() < 0.5:
            y_pred[i] = (y_true[i] + 1) % num_classes_value
# Evaluate fairness across different groups
fairness_results = evaluate_fairness(y_pred, y_true, sensitive_attr)
# Print fairness evaluation results
print("Fairness Evaluation Results (accuracy per group):")
for group, accuracy in fairness_results.items():
    print(f"{group}: {accuracy:.2f}")
# Visualize fairness metrics
plt.figure(figsize=(10, 6))
groups = list(fairness_results.keys())
accuracies = [fairness_results[group] for group in groups]
plt.bar(groups, accuracies, color=['blue', 'green', 'red'])
plt.ylim(0, 1.0)
plt.title('Model Fairness: Accuracy Across Demographic Groups')
plt.ylabel('Accuracy')
plt.xlabel('Demographic Group')
plt.grid(axis='y', linestyle='--', alpha=0.7)
for i, acc in enumerate(accuracies):
    plt.text(i, acc + 0.05, f'{acc:.2f}', ha='center')
plt.tight_layout()
plt.show()
# Calculate overall disparity
max_disparity = max(accuracies) - min(accuracies)
print(f"Maximum accuracy disparity between groups: {max_disparity:.2f}")
if max_disparity > 0.1:
    print(":warning: Potential fairness concern: Significant accuracy disparity between groups")
else:
    print(":white_check_mark: Model appears to be relatively fair across demographic groups")

🔗 ChatGPT explanation for the code

3️⃣ Click Run (▶) and check the output!

✅ Fairness evaluation should complete successfully! You should now see accuracy metrics per sensitive attribute group. 📊⚖️🎉

---

🎉 6. Wrap-Up & Next Steps

🎯 Congratulations! You've just built and trained an Ethical AI Model using HPC! 🚀

✅ Loaded a dataset for model training 📂 ✅ Built a bias-aware Vision Transformer model 🏗️ ✅ Trained the model using HPC with GPU acceleration 🔄 ✅ Evaluated bias and fairness across demographic groups 📊

🔗 Additional AI Resources 📚

Project Jupyter Documentation
Python Introduction (Use only the two green buttons "Previous" and "Next" to navigate the tutorial and avoid ads.)
Responsible AI by Microsoft
ACCESS CI (Free access to HPC for all using the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) U.S. government program)

🚀 Keep learning and see you at the next workshop! 🎉

📝 Workshop Feedback Survey

Thanks for completing this workshop!🎆

We'd love to hear what you think so we can make future workshops even better. 💡

📌 Survey link