HPC Ethical AI - JannatulFaika/HPC-AI-Resources GitHub Wiki
πβ¨ HPC: Ethical AI & Future Trends on HPC Workshop! πβ¨
π― Goal
π Learn how to develop Ethical AI models while leveraging High-Performance Computing (HPC) for large-scale training. We will work with a sample multimodal dataset, representative of massive open-source image-text data, to demonstrate how HPC resources can be effectively used for ethical AI model training and analysis. π
π What You Will Learn π§ π‘
β Understanding Ethical AI and its challenges (bias, fairness, explainability) π€ β Exploring HPC as a solution for training large AI models responsibly π» β Training AI models on large-scale datasets using multi-GPU HPC clusters ποΈ β Evaluating AI fairness, transparency, and accountability π β Future trends in Ethical AI and responsible AI development π
π 1. Key Terminologies in Ethical AI & HPC π
π Ethical AI
- AI that ensures fairness, transparency, and accountability in decision-making.
- Addresses bias in datasets and models to prevent discrimination.
π HPC for AI
- High-Performance Computing (HPC) enables AI training on massive datasets by distributing computations across GPUs and multiple nodes.
- Essential for training foundation models and large-scale NLP/CV applications.
ποΈ Bias & Fairness in AI
- Algorithmic Bias: When AI models make unfair decisions based on skewed data.
- Fairness Metrics: Statistical Parity, Equal Opportunity, Equalized Odds, Disparate Impact.
- Debiasing Techniques: Data preprocessing, adversarial training, fairness constraints.
π» Federated Learning & Privacy-Preserving AI
- Federated Learning: AI training across distributed devices while keeping data localized.
- Differential Privacy: Adding noise to AI models to protect sensitive data.
π Future Trends in Ethical AI
- Explainable AI (XAI): Making AI decision-making understandable.
- Regulatory Frameworks: EU AI Act, AI Ethics Guidelines from NIST, UNESCO.
- Sustainable AI: Reducing AI's carbon footprint through efficient HPC utilization.
π 2: Access HPC Terminal via JupyterHub
1οΈβ£ Go to CSUSB HPC if you are a learner or educator at CSUSB. Otherwise, have an educator from your school create an account for you using the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support ACCESS CI, a U.S. government program that provides free access to HPC resources. 2οΈβ£ Click CI Logon to log in using your school account. 3οΈβ£ Select the GPU model that best fits your needs. 4οΈβ£ After logging in, Welcome to JupyterLab. β You're ready to go!
π 3. Hands-on: Loading a Multimodal Dataset on HPC
We will use CSUSB's High-Performance Computing (HPC) system to run our AI code. Follow these steps to access JupyterHub on HPC.
CI Logon π
π Step 1: Log In to HPC withLet's get you authenticated! Here's how:
1οΈβ£ Go to the CI Logon Portal
- Open CI Logon in your browser.
- Click Sign In with CI Logon
2οΈβ£ Select Your Identity Provider & Log In
- Choose "California State University, San Bernardino" π from the dropdown.
- Check "Remember this selection" to save time next login. β
- Click Log in to proceed. π
π₯οΈ Step 2: Launch Your JupyterHub Server
Your HPC Jupyter environment is readyβlet's start coding!
1οΈβ£ Check Out the Launcher Page
- You'll see several options like:
- Notebook: Start a Jupyter Notebook (e.g., Python 3 π).
- Console: Open a Python console for quick commands π.
- Other: Access Terminal, Text File, Markdown File, or Help π.
2οΈβ£ Open a Notebook or File
- Click Notebook β Python 3 (ipykernel) to start coding! βοΈ
- You can also browse and open existing files. π
3οΈβ£ Run Your Code & Save Your Work
- Type your Python code and press Shift + Enter to run.
- Save often to keep your work safe! πΎ
β Why Use 2 GPUs and RTX A5000?
We are working with huge datasets that require powerful GPUs for efficient computation.
π Step 3: Load a Sample Multimodal Dataset
βπ Add a New Code Cell
1οΈβ£ Click + Code in Jupyter Notebook to add a new code cell.
2οΈβ£ First, install the required packages:
!pip install torch torchvision transformers sklearn numpy
πChatGPT prompt for generating the code
3οΈβ£ Add a new code cell and copy and paste the following code:
import os
import torch # PyTorch for deep learning
import torchvision # For handling image datasets
import torchvision.transforms as transforms # For image preprocessing
import numpy as np
from PIL import Image
from torch.utils.data import DataLoader
# Step 1: Check if GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
# Step 2: Define image transformations
transform = transforms.Compose([
transforms.Resize((224, 224)), # Resize images for consistency
transforms.ToTensor(), # Convert images to tensors
transforms.Normalize((0.5,), (0.5,)) # Normalize (adjust brightness/contrast)
])
# Step 3: Create a mock dataset (for learning purposes)
mock_dataset_path = "mock_dataset"
os.makedirs(f"{mock_dataset_path}/class1", exist_ok=True)
os.makedirs(f"{mock_dataset_path}/class2", exist_ok=True)
# Generate dummy images
for i in range(5): # Reduce number for simplicity
img1 = Image.fromarray(np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8))
img1.save(f"{mock_dataset_path}/class1/img{i}.jpg")
img2 = Image.fromarray(np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8))
img2.save(f"{mock_dataset_path}/class2/img{i}.jpg")
# Step 4: Load dataset
train_dataset = torchvision.datasets.ImageFolder(root=mock_dataset_path, transform=transform)
# Step 5: Create DataLoader
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)
# Step 6: Explore the dataset interactively
for images, labels in train_loader:
print(f"Batch shape: {images.shape}") # Show shape of batch
print(f"Labels: {labels}") # Show class labels
break # Only show the first batch for now
π ChatGPT explanation for the code
4οΈβ£ Click Run (βΆ) and check the output!
β Dataset should load successfully! You should now see the number of images available in the mock dataset. πΌοΈππ
ποΈ 4. Building and Training a Deep Learning Model for Ethical AI
π Step 1: Define a Bias-Aware Vision Transformer (ViT) Model
βπ Add a New Code Cell
1οΈβ£ Click + Code in Jupyter Notebook to add a new code cell.
2οΈβ£ Copy and paste the following code:
πChatGPT prompt for generating the code
from transformers import ViTForImageClassification, ViTFeatureExtractor # Import ViT model and feature extractor
import torch.nn as nn # Import neural network module from PyTorch
# Create a model for our specific number of classes (2 in our mock example)
num_classes = len(train_dataset.classes)
# Load pre-trained Vision Transformer (ViT) model
model = ViTForImageClassification.from_pretrained(
"google/vit-base-patch16-224",
num_labels=num_classes,
ignore_mismatched_sizes=True # Ignore size mismatch for classification head
)
# Move the model to the available device (GPU if available, otherwise CPU)
model.to(device)
# Print confirmation message and model info
print(f"Pre-trained Vision Transformer Model Loaded with {num_classes} output classes! π")
print(f"Model is using device: {next(model.parameters()).device}")
π ChatGPT explanation for the code
3οΈβ£ Click Run (βΆ) and check the output!
β Pre-trained Vision Transformer Model should load successfully! Your ViT model is now ready for image classification. ππ
π Step 2: Train the Model with Ethical AI Constraints
βπ Add a New Code Cell
1οΈβ£ Click + Code in Jupyter Notebook to add a new code cell.
2οΈβ£ Copy and paste the following code:
πChatGPT prompt for generating the code
import torch.optim as optim # Import optimizers from PyTorch
# Define loss function (CrossEntropyLoss for classification tasks)
criterion = nn.CrossEntropyLoss()
# Define optimizer (Adam with a learning rate of 0.0001)
optimizer = optim.Adam(model.parameters(), lr=0.0001)
# Training loop for a small number of epochs (for demonstration)
num_epochs = 2 # Reduced for demonstration
for epoch in range(num_epochs):
running_loss = 0.0 # Initialize cumulative loss for the epoch
for i, (inputs, labels) in enumerate(train_loader): # Iterate over batches in the training set
inputs, labels = inputs.to(device), labels.to(device) # Move data to GPU (if available)
optimizer.zero_grad() # Reset gradients before each batch
outputs = model(inputs).logits # Forward pass (ViT's output is stored in `.logits`)
loss = criterion(outputs, labels) # Compute loss
loss.backward() # Backpropagate gradients
optimizer.step() # Update model parameters
running_loss += loss.item() # Accumulate loss
# Print batch progress
if i % 2 == 0: # Print every 2 batches
print(f"Epoch {epoch+1}, Batch {i+1}: Loss {loss.item():.4f}")
# Print loss at the end of each epoch
print(f"Epoch {epoch+1}/{num_epochs}: Average Loss {running_loss/len(train_loader):.4f}")
print("Training Complete! π")
# Save the model (optional)
torch.save(model.state_dict(), "ethical_vit_model.pth")
print("Model saved to 'ethical_vit_model.pth'")
π ChatGPT explanation for the code
3οΈβ£ Click Run (βΆ) and check the output!
β Training should complete successfully! Your Vision Transformer model has been trained for the specified number of epochs. πππ
π 5. Evaluating Model Fairness & Bias Mitigation
Evaluate Model Performance Across Demographics
βπ Add a New Code Cell
1οΈβ£ Click + Code in Jupyter Notebook to add a new code cell.
2οΈβ£ Copy and paste the following code:
π ChatGPT prompt for generating the code
import numpy as np # Import NumPy for array operations
from sklearn.metrics import accuracy_score, confusion_matrix # Import accuracy metric and confusion matrix
import matplotlib.pyplot as plt # For visualization
# First, ensure we have the training dataset and number of classes
# If train_dataset is not defined, create mock data
try:
dataset_size = len(train_dataset)
num_classes_value = num_classes
except NameError:
# If variables are not defined, create mock data instead
print("Warning: train_dataset or num_classes not found, using mock data instead")
dataset_size = 100
num_classes_value = 2
# Define a function for fairness evaluation
def evaluate_fairness(predictions, labels, sensitive_attribute):
"""
Evaluate fairness by computing accuracy per sensitive attribute group.
Args:
- predictions (np.array): Model-predicted class labels.
- labels (np.array): Ground-truth class labels.
- sensitive_attribute (np.array): Array of sensitive attributes (e.g., gender, race).
Returns:
- dict: Accuracy per unique group in the sensitive attribute.
"""
unique_groups = np.unique(sensitive_attribute) # Get unique sensitive groups
group_accuracies = {} # Dictionary to store accuracy per group
for group in unique_groups:
group_indices = (sensitive_attribute == group) # Get indices for the current group
# Check if there are any samples for this group
if np.sum(group_indices) > 0:
group_accuracy = accuracy_score(
labels[group_indices],
predictions[group_indices]
) # Compute accuracy
group_accuracies[group] = group_accuracy # Store result
else:
group_accuracies[group] = 0.0 # No samples for this group
return group_accuracies # Return dictionary with fairness metrics
# For demonstration, let's create simulated demographic data
# In a real scenario, this would be actual demographic information
y_true = np.random.randint(0, num_classes_value, size=dataset_size) # Create mock ground truth labels
y_pred = np.random.randint(0, num_classes_value, size=dataset_size) # Create mock predictions
# Assign random demographic groups (representing sensitive attributes)
# Add slight bias for demonstration purposes
sensitive_attr = np.random.choice(["Group A", "Group B", "Group C"], size=dataset_size,
p=[0.5, 0.3, 0.2]) # Uneven distribution
# Intentionally add some bias to the predictions for demonstration
for i in range(dataset_size):
if sensitive_attr[i] == "Group C":
# Make predictions for Group C worse (50% chance of being wrong)
if np.random.random() < 0.5:
y_pred[i] = (y_true[i] + 1) % num_classes_value
# Evaluate fairness across different groups
fairness_results = evaluate_fairness(y_pred, y_true, sensitive_attr)
# Print fairness evaluation results
print("Fairness Evaluation Results (accuracy per group):")
for group, accuracy in fairness_results.items():
print(f"{group}: {accuracy:.2f}")
# Visualize fairness metrics
plt.figure(figsize=(10, 6))
groups = list(fairness_results.keys())
accuracies = [fairness_results[group] for group in groups]
plt.bar(groups, accuracies, color=['blue', 'green', 'red'])
plt.ylim(0, 1.0)
plt.title('Model Fairness: Accuracy Across Demographic Groups')
plt.ylabel('Accuracy')
plt.xlabel('Demographic Group')
plt.grid(axis='y', linestyle='--', alpha=0.7)
for i, acc in enumerate(accuracies):
plt.text(i, acc + 0.05, f'{acc:.2f}', ha='center')
plt.tight_layout()
plt.show()
# Calculate overall disparity
max_disparity = max(accuracies) - min(accuracies)
print(f"Maximum accuracy disparity between groups: {max_disparity:.2f}")
if max_disparity > 0.1:
print(":warning: Potential fairness concern: Significant accuracy disparity between groups")
else:
print(":white_check_mark: Model appears to be relatively fair across demographic groups")
π ChatGPT explanation for the code
3οΈβ£ Click Run (βΆ) and check the output!
β Fairness evaluation should complete successfully! You should now see accuracy metrics per sensitive attribute group. πβοΈπ
---
π 6. Wrap-Up & Next Steps
π― Congratulations! You've just built and trained an Ethical AI Model using HPC! π
β Loaded a dataset for model training π β Built a bias-aware Vision Transformer model ποΈ β Trained the model using HPC with GPU acceleration π β Evaluated bias and fairness across demographic groups π
π Additional AI Resources π
- Project Jupyter Documentation
- Python Introduction (Use only the two green buttons "Previous" and "Next" to navigate the tutorial and avoid ads.)
- Responsible AI by Microsoft
- ACCESS CI (Free access to HPC for all using the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) U.S. government program)
π Keep learning and see you at the next workshop! π
π Workshop Feedback Survey
Thanks for completing this workshop!π
We'd love to hear what you think so we can make future workshops even better. π‘
π Survey link