HPC Ethical AI - JannatulFaika/HPC-AI-Resources GitHub Wiki

πŸš€βœ¨ HPC: Ethical AI & Future Trends on HPC Workshop! 🌟✨

🎯 Goal

πŸ“Š Learn how to develop Ethical AI models while leveraging High-Performance Computing (HPC) for large-scale training. We will work with a sample multimodal dataset, representative of massive open-source image-text data, to demonstrate how HPC resources can be effectively used for ethical AI model training and analysis. πŸš€

πŸ“Œ What You Will Learn πŸ§ πŸ’‘

βœ… Understanding Ethical AI and its challenges (bias, fairness, explainability) πŸ€– βœ… Exploring HPC as a solution for training large AI models responsibly πŸ’» βœ… Training AI models on large-scale datasets using multi-GPU HPC clusters πŸ—οΈ βœ… Evaluating AI fairness, transparency, and accountability πŸ” βœ… Future trends in Ethical AI and responsible AI development 🌍


πŸ“š 1. Key Terminologies in Ethical AI & HPC πŸ”‘

🌍 Ethical AI

  • AI that ensures fairness, transparency, and accountability in decision-making.
  • Addresses bias in datasets and models to prevent discrimination.

πŸš€ HPC for AI

  • High-Performance Computing (HPC) enables AI training on massive datasets by distributing computations across GPUs and multiple nodes.
  • Essential for training foundation models and large-scale NLP/CV applications.

πŸ—οΈ Bias & Fairness in AI

  • Algorithmic Bias: When AI models make unfair decisions based on skewed data.
  • Fairness Metrics: Statistical Parity, Equal Opportunity, Equalized Odds, Disparate Impact.
  • Debiasing Techniques: Data preprocessing, adversarial training, fairness constraints.

πŸ’» Federated Learning & Privacy-Preserving AI

  • Federated Learning: AI training across distributed devices while keeping data localized.
  • Differential Privacy: Adding noise to AI models to protect sensitive data.

🌍 Future Trends in Ethical AI

  • Explainable AI (XAI): Making AI decision-making understandable.
  • Regulatory Frameworks: EU AI Act, AI Ethics Guidelines from NIST, UNESCO.
  • Sustainable AI: Reducing AI's carbon footprint through efficient HPC utilization.

πŸ” 2: Access HPC Terminal via JupyterHub

1️⃣ Go to CSUSB HPC if you are a learner or educator at CSUSB. Otherwise, have an educator from your school create an account for you using the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support ACCESS CI, a U.S. government program that provides free access to HPC resources. 2️⃣ Click CI Logon to log in using your school account. 3️⃣ Select the GPU model that best fits your needs. 4️⃣ After logging in, Welcome to JupyterLab. βœ… You're ready to go!


πŸ” 3. Hands-on: Loading a Multimodal Dataset on HPC

We will use CSUSB's High-Performance Computing (HPC) system to run our AI code. Follow these steps to access JupyterHub on HPC.

πŸš€ Step 1: Log In to HPC with CI Logon πŸ”

Let's get you authenticated! Here's how:

1️⃣ Go to the CI Logon Portal

  • Open CI Logon in your browser.
  • Click Sign In with CI Logon

2️⃣ Select Your Identity Provider & Log In

  • Choose "California State University, San Bernardino" πŸŽ“ from the dropdown.
  • Check "Remember this selection" to save time next login. βœ…
  • Click Log in to proceed. πŸš€

πŸ–₯️ Step 2: Launch Your JupyterHub Server

Your HPC Jupyter environment is readyβ€”let's start coding!

1️⃣ Check Out the Launcher Page

  • You'll see several options like:
  • Notebook: Start a Jupyter Notebook (e.g., Python 3 🐍).
  • Console: Open a Python console for quick commands πŸ“Ÿ.
  • Other: Access Terminal, Text File, Markdown File, or Help πŸ“š.

2️⃣ Open a Notebook or File

  • Click Notebook β†’ Python 3 (ipykernel) to start coding! ✍️
  • You can also browse and open existing files. πŸ“‚

3️⃣ Run Your Code & Save Your Work

  • Type your Python code and press Shift + Enter to run.
  • Save often to keep your work safe! πŸ’Ύ

❓ Why Use 2 GPUs and RTX A5000?

We are working with huge datasets that require powerful GPUs for efficient computation.

πŸ“‚ Step 3: Load a Sample Multimodal Dataset

βž•πŸ Add a New Code Cell

1️⃣ Click + Code in Jupyter Notebook to add a new code cell.
2️⃣ First, install the required packages:

!pip install torch torchvision transformers sklearn numpy

πŸ”—ChatGPT prompt for generating the code

3️⃣ Add a new code cell and copy and paste the following code:

import os
import torch  # PyTorch for deep learning
import torchvision  # For handling image datasets
import torchvision.transforms as transforms  # For image preprocessing
import numpy as np
from PIL import Image
from torch.utils.data import DataLoader

# Step 1: Check if GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Step 2: Define image transformations
transform = transforms.Compose([
    transforms.Resize((224, 224)),  # Resize images for consistency
    transforms.ToTensor(),  # Convert images to tensors
    transforms.Normalize((0.5,), (0.5,))  # Normalize (adjust brightness/contrast)
])

# Step 3: Create a mock dataset (for learning purposes)
mock_dataset_path = "mock_dataset"
os.makedirs(f"{mock_dataset_path}/class1", exist_ok=True)
os.makedirs(f"{mock_dataset_path}/class2", exist_ok=True)

# Generate dummy images
for i in range(5):  # Reduce number for simplicity
    img1 = Image.fromarray(np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8))
    img1.save(f"{mock_dataset_path}/class1/img{i}.jpg")
    
    img2 = Image.fromarray(np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8))
    img2.save(f"{mock_dataset_path}/class2/img{i}.jpg")

# Step 4: Load dataset
train_dataset = torchvision.datasets.ImageFolder(root=mock_dataset_path, transform=transform)

# Step 5: Create DataLoader
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)

# Step 6: Explore the dataset interactively
for images, labels in train_loader:
    print(f"Batch shape: {images.shape}")  # Show shape of batch
    print(f"Labels: {labels}")  # Show class labels
    break  # Only show the first batch for now

πŸ”— ChatGPT explanation for the code

4️⃣ Click Run (β–Ά) and check the output!

βœ… Dataset should load successfully! You should now see the number of images available in the mock dataset. πŸ–ΌοΈπŸ“ŠπŸŽ‰


πŸ—οΈ 4. Building and Training a Deep Learning Model for Ethical AI

πŸš€ Step 1: Define a Bias-Aware Vision Transformer (ViT) Model

βž•πŸ Add a New Code Cell

1️⃣ Click + Code in Jupyter Notebook to add a new code cell.
2️⃣ Copy and paste the following code:

πŸ”—ChatGPT prompt for generating the code

from transformers import ViTForImageClassification, ViTFeatureExtractor  # Import ViT model and feature extractor
import torch.nn as nn  # Import neural network module from PyTorch

# Create a model for our specific number of classes (2 in our mock example)
num_classes = len(train_dataset.classes)

# Load pre-trained Vision Transformer (ViT) model
model = ViTForImageClassification.from_pretrained(
    "google/vit-base-patch16-224", 
    num_labels=num_classes,
    ignore_mismatched_sizes=True  # Ignore size mismatch for classification head
)

# Move the model to the available device (GPU if available, otherwise CPU)
model.to(device)

# Print confirmation message and model info
print(f"Pre-trained Vision Transformer Model Loaded with {num_classes} output classes! πŸš€")
print(f"Model is using device: {next(model.parameters()).device}")

πŸ”— ChatGPT explanation for the code

3️⃣ Click Run (β–Ά) and check the output!

βœ… Pre-trained Vision Transformer Model should load successfully! Your ViT model is now ready for image classification. πŸš€πŸŽ‰

πŸš€ Step 2: Train the Model with Ethical AI Constraints

βž•πŸ Add a New Code Cell

1️⃣ Click + Code in Jupyter Notebook to add a new code cell.
2️⃣ Copy and paste the following code:

πŸ”—ChatGPT prompt for generating the code

import torch.optim as optim  # Import optimizers from PyTorch

# Define loss function (CrossEntropyLoss for classification tasks)
criterion = nn.CrossEntropyLoss()

# Define optimizer (Adam with a learning rate of 0.0001)
optimizer = optim.Adam(model.parameters(), lr=0.0001)

# Training loop for a small number of epochs (for demonstration)
num_epochs = 2  # Reduced for demonstration

for epoch in range(num_epochs):
    running_loss = 0.0  # Initialize cumulative loss for the epoch
    
    for i, (inputs, labels) in enumerate(train_loader):  # Iterate over batches in the training set
        inputs, labels = inputs.to(device), labels.to(device)  # Move data to GPU (if available)
        
        optimizer.zero_grad()  # Reset gradients before each batch
        outputs = model(inputs).logits  # Forward pass (ViT's output is stored in `.logits`)
        loss = criterion(outputs, labels)  # Compute loss
        loss.backward()  # Backpropagate gradients
        optimizer.step()  # Update model parameters
        
        running_loss += loss.item()  # Accumulate loss
        
        # Print batch progress
        if i % 2 == 0:  # Print every 2 batches
            print(f"Epoch {epoch+1}, Batch {i+1}: Loss {loss.item():.4f}")
        
    # Print loss at the end of each epoch
    print(f"Epoch {epoch+1}/{num_epochs}: Average Loss {running_loss/len(train_loader):.4f}")

print("Training Complete! πŸš€")

# Save the model (optional)
torch.save(model.state_dict(), "ethical_vit_model.pth")
print("Model saved to 'ethical_vit_model.pth'")

πŸ”— ChatGPT explanation for the code

3️⃣ Click Run (β–Ά) and check the output!

βœ… Training should complete successfully! Your Vision Transformer model has been trained for the specified number of epochs. πŸš€πŸ“ŠπŸŽ‰


πŸ† 5. Evaluating Model Fairness & Bias Mitigation

Evaluate Model Performance Across Demographics

βž•πŸ Add a New Code Cell

1️⃣ Click + Code in Jupyter Notebook to add a new code cell.
2️⃣ Copy and paste the following code:

πŸ”— ChatGPT prompt for generating the code

import numpy as np  # Import NumPy for array operations
from sklearn.metrics import accuracy_score, confusion_matrix  # Import accuracy metric and confusion matrix
import matplotlib.pyplot as plt  # For visualization
# First, ensure we have the training dataset and number of classes
# If train_dataset is not defined, create mock data
try:
    dataset_size = len(train_dataset)
    num_classes_value = num_classes
except NameError:
    # If variables are not defined, create mock data instead
    print("Warning: train_dataset or num_classes not found, using mock data instead")
    dataset_size = 100  
    num_classes_value = 2  
# Define a function for fairness evaluation
def evaluate_fairness(predictions, labels, sensitive_attribute):
    """
    Evaluate fairness by computing accuracy per sensitive attribute group.
    Args:
    - predictions (np.array): Model-predicted class labels.
    - labels (np.array): Ground-truth class labels.
    - sensitive_attribute (np.array): Array of sensitive attributes (e.g., gender, race).
    Returns:
    - dict: Accuracy per unique group in the sensitive attribute.
    """
    unique_groups = np.unique(sensitive_attribute)  # Get unique sensitive groups
    group_accuracies = {}  # Dictionary to store accuracy per group
    for group in unique_groups:
        group_indices = (sensitive_attribute == group)  # Get indices for the current group
        # Check if there are any samples for this group
        if np.sum(group_indices) > 0:
            group_accuracy = accuracy_score(
                labels[group_indices],
                predictions[group_indices]
            )  # Compute accuracy
            group_accuracies[group] = group_accuracy  # Store result
        else:
            group_accuracies[group] = 0.0  # No samples for this group
    return group_accuracies  # Return dictionary with fairness metrics
# For demonstration, let's create simulated demographic data
# In a real scenario, this would be actual demographic information
y_true = np.random.randint(0, num_classes_value, size=dataset_size)  # Create mock ground truth labels
y_pred = np.random.randint(0, num_classes_value, size=dataset_size)  # Create mock predictions
# Assign random demographic groups (representing sensitive attributes)
# Add slight bias for demonstration purposes
sensitive_attr = np.random.choice(["Group A", "Group B", "Group C"], size=dataset_size,
                                 p=[0.5, 0.3, 0.2])  # Uneven distribution
# Intentionally add some bias to the predictions for demonstration
for i in range(dataset_size):
    if sensitive_attr[i] == "Group C":
        # Make predictions for Group C worse (50% chance of being wrong)
        if np.random.random() < 0.5:
            y_pred[i] = (y_true[i] + 1) % num_classes_value
# Evaluate fairness across different groups
fairness_results = evaluate_fairness(y_pred, y_true, sensitive_attr)
# Print fairness evaluation results
print("Fairness Evaluation Results (accuracy per group):")
for group, accuracy in fairness_results.items():
    print(f"{group}: {accuracy:.2f}")
# Visualize fairness metrics
plt.figure(figsize=(10, 6))
groups = list(fairness_results.keys())
accuracies = [fairness_results[group] for group in groups]
plt.bar(groups, accuracies, color=['blue', 'green', 'red'])
plt.ylim(0, 1.0)
plt.title('Model Fairness: Accuracy Across Demographic Groups')
plt.ylabel('Accuracy')
plt.xlabel('Demographic Group')
plt.grid(axis='y', linestyle='--', alpha=0.7)
for i, acc in enumerate(accuracies):
    plt.text(i, acc + 0.05, f'{acc:.2f}', ha='center')
plt.tight_layout()
plt.show()
# Calculate overall disparity
max_disparity = max(accuracies) - min(accuracies)
print(f"Maximum accuracy disparity between groups: {max_disparity:.2f}")
if max_disparity > 0.1:
    print(":warning: Potential fairness concern: Significant accuracy disparity between groups")
else:
    print(":white_check_mark: Model appears to be relatively fair across demographic groups")

πŸ”— ChatGPT explanation for the code

3️⃣ Click Run (β–Ά) and check the output!

βœ… Fairness evaluation should complete successfully! You should now see accuracy metrics per sensitive attribute group. πŸ“Šβš–οΈπŸŽ‰

image---

πŸŽ‰ 6. Wrap-Up & Next Steps

🎯 Congratulations! You've just built and trained an Ethical AI Model using HPC! πŸš€

βœ… Loaded a dataset for model training πŸ“‚ βœ… Built a bias-aware Vision Transformer model πŸ—οΈ βœ… Trained the model using HPC with GPU acceleration πŸ”„ βœ… Evaluated bias and fairness across demographic groups πŸ“Š

πŸ”— Additional AI Resources πŸ“š

πŸš€ Keep learning and see you at the next workshop! πŸŽ‰


πŸ“ Workshop Feedback Survey

Thanks for completing this workshop!πŸŽ†

We'd love to hear what you think so we can make future workshops even better. πŸ’‘

πŸ“Œ Survey link