Proposed‐AWS Infra Structure Streamlit Docker & AWS Implementation Plan for VIDA - RutgersGRID/VIDAHub GitHub Wiki
Streamlit Docker & AWS Implementation Plan for rutgersgrid
See: Link to EmTech Cloud Repo
Overview
This implementation plan outlines how to containerize Streamlit applications from the rutgersgrid GitHub organization and deploy them to AWS in a cost-optimized manner. The approach follows a multi-app container architecture that bundles applications for efficient resource utilization.
Implementation Phases
Phase 1: Docker Configuration (Week 1)
1.1. Template Repository Updates
First, update the template Streamlit repository with Docker support:
-
Add the following files to your template repository:
Dockerfile
.dockerignore
docker-compose.yml
(for local development)
-
Create standard Dockerfile in the template:
# Base image
FROM python:3.10-slim
# Set working directory
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Expose Streamlit port
EXPOSE 8501
# Health check
HEALTHCHECK --interval=30s --timeout=5s --start-period=30s --retries=3 \
CMD curl -f http://localhost:8501/_stcore/health || exit 1
# Run the application
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
- Create
.dockerignore
file:
.git
.github
.gitignore
__pycache__/
*.py[cod]
*$py.class
.env
.env.local
venv/
.venv/
ENV/
- Create
docker-compose.yml
for local testing:
version: '3'
services:
streamlit:
build: .
ports:
- "8501:8501"
volumes:
- .:/app
environment:
- STREAMLIT_SERVER_PORT=8501
- STREAMLIT_SERVER_ADDRESS=0.0.0.0
- Update the README with Docker usage instructions.
1.2. Create Infrastructure Repository
Create a new repository named rutgersgrid-infrastructure
to manage AWS resources and deployment:
- Initialize with the following structure:
rutgersgrid-infrastructure/
├── .github/
│ └── workflows/
│ ├── deploy.yml
│ └── cost-report.yml
├── terraform/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ ├── ecs.tf
│ ├── networking.tf
│ ├── alb.tf
│ └── security.tf
├── docker/
│ ├── Dockerfile
│ ├── nginx.conf
│ └── start.sh
├── scripts/
│ ├── bundle-apps.sh
│ └── generate-cost-report.py
└── README.md
- Create the multi-app Dockerfile in
docker/Dockerfile
:
FROM python:3.10-slim
# Install Nginx
RUN apt-get update && apt-get install -y nginx curl && rm -rf /var/lib/apt/lists/*
# Install common Python packages
RUN pip install --no-cache-dir \
streamlit==1.32.0 \
pandas==2.2.0 \
numpy==1.26.3 \
matplotlib==3.8.2 \
boto3==1.34.34
# Create app directories
RUN mkdir -p /apps
# Copy Nginx configuration
COPY docker/nginx.conf /etc/nginx/nginx.conf
# Copy startup script
COPY docker/start.sh /start.sh
RUN chmod +x /start.sh
# Expose port for Nginx
EXPOSE 80
# Add a healthcheck
HEALTHCHECK --interval=30s --timeout=5s --start-period=30s --retries=3 \
CMD curl -f http://localhost/healthz || exit 1
CMD ["/start.sh"]
- Create the Nginx configuration in
docker/nginx.conf
:
worker_processes 1;
events {
worker_connections 1024;
}
http {
server {
listen 80;
# Health check endpoint
location /healthz {
access_log off;
return 200 'OK';
}
location / {
return 302 /app1/;
}
# App locations will be dynamically generated
# during the bundle process
}
}
- Create the startup script in
docker/start.sh
:
#!/bin/bash
# Start all Streamlit instances in background
# This will be dynamically populated during build
# Start Nginx in foreground
nginx -g "daemon off;"
Phase 2: AWS Infrastructure Setup (Week 2)
2.1. Define Terraform Configuration
Create the following Terraform files to provision AWS infrastructure:
terraform/main.tf
:
provider "aws" {
region = var.aws_region
}
# Store Terraform state in S3
terraform {
backend "s3" {
bucket = "rutgersgrid-terraform-state"
key = "streamlit-apps/terraform.tfstate"
region = "us-east-1"
}
}
# Resource tagging module
module "tags" {
source = "./modules/tags"
project = "rutgersgrid"
environment = var.environment
}
terraform/ecs.tf
:
# ECS Cluster
resource "aws_ecs_cluster" "streamlit_cluster" {
name = "rutgersgrid-streamlit-cluster"
setting {
name = "containerInsights"
value = "enabled"
}
tags = module.tags.common_tags
}
# ECS Task Execution Role
resource "aws_iam_role" "ecs_task_execution_role" {
name = "rutgersgrid-streamlit-execution-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ecs-tasks.amazonaws.com"
}
}
]
})
tags = module.tags.common_tags
}
# Attach policies to the execution role
resource "aws_iam_role_policy_attachment" "ecs_task_execution_role_policy" {
role = aws_iam_role.ecs_task_execution_role.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
# ECS Task Definition
resource "aws_ecs_task_definition" "streamlit_bundle" {
family = "rutgersgrid-streamlit-bundle"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = var.task_cpu
memory = var.task_memory
execution_role_arn = aws_iam_role.ecs_task_execution_role.arn
container_definitions = jsonencode([
{
name = "streamlit-bundle"
image = "${aws_ecr_repository.streamlit_apps.repository_url}:latest"
essential = true
portMappings = [
{
containerPort = 80
hostPort = 80
protocol = "tcp"
}
]
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = aws_cloudwatch_log_group.streamlit_logs.name
"awslogs-region" = var.aws_region
"awslogs-stream-prefix" = "streamlit"
}
}
}
])
tags = module.tags.common_tags
}
# ECS Service
resource "aws_ecs_service" "streamlit_service" {
name = "rutgersgrid-streamlit-service"
cluster = aws_ecs_cluster.streamlit_cluster.id
task_definition = aws_ecs_task_definition.streamlit_bundle.arn
launch_type = "FARGATE"
desired_count = var.service_desired_count
network_configuration {
subnets = aws_subnet.public[*].id
security_groups = [aws_security_group.ecs_sg.id]
assign_public_ip = true
}
load_balancer {
target_group_arn = aws_lb_target_group.streamlit_tg.arn
container_name = "streamlit-bundle"
container_port = 80
}
tags = module.tags.common_tags
depends_on = [aws_lb_listener.http]
}
# Auto-scaling configuration
resource "aws_appautoscaling_target" "ecs_target" {
max_capacity = var.max_capacity
min_capacity = var.min_capacity
resource_id = "service/${aws_ecs_cluster.streamlit_cluster.name}/${aws_ecs_service.streamlit_service.name}"
scalable_dimension = "ecs:service:DesiredCount"
service_namespace = "ecs"
}
# Scale based on CPU utilization
resource "aws_appautoscaling_policy" "ecs_policy_cpu" {
name = "cpu-auto-scaling"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.ecs_target.resource_id
scalable_dimension = aws_appautoscaling_target.ecs_target.scalable_dimension
service_namespace = aws_appautoscaling_target.ecs_target.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
target_value = 70.0
}
}
# Scale during business hours
resource "aws_appautoscaling_scheduled_action" "scale_up_morning" {
name = "scale-up-morning"
service_namespace = aws_appautoscaling_target.ecs_target.service_namespace
resource_id = aws_appautoscaling_target.ecs_target.resource_id
scalable_dimension = aws_appautoscaling_target.ecs_target.scalable_dimension
schedule = "cron(0 8 ? * MON-FRI *)"
scalable_target_action {
min_capacity = 1
max_capacity = var.max_capacity
}
}
# Scale down after hours
resource "aws_appautoscaling_scheduled_action" "scale_down_evening" {
name = "scale-down-evening"
service_namespace = aws_appautoscaling_target.ecs_target.service_namespace
resource_id = aws_appautoscaling_target.ecs_target.resource_id
scalable_dimension = aws_appautoscaling_target.ecs_target.scalable_dimension
schedule = "cron(0 18 ? * MON-FRI *)"
scalable_target_action {
min_capacity = 0
max_capacity = 0
}
}
# CloudWatch Logs
resource "aws_cloudwatch_log_group" "streamlit_logs" {
name = "/ecs/rutgersgrid-streamlit"
retention_in_days = 30
tags = module.tags.common_tags
}
# ECR Repository
resource "aws_ecr_repository" "streamlit_apps" {
name = "rutgersgrid-streamlit-apps"
image_tag_mutability = "MUTABLE"
image_scanning_configuration {
scan_on_push = true
}
tags = module.tags.common_tags
}
# ECR Lifecycle Policy
resource "aws_ecr_lifecycle_policy" "streamlit_apps_lifecycle" {
repository = aws_ecr_repository.streamlit_apps.name
policy = jsonencode({
rules = [
{
rulePriority = 1
description = "Keep only the 10 most recent images"
selection = {
tagStatus = "any"
countType = "imageCountMoreThan"
countNumber = 10
}
action = {
type = "expire"
}
}
]
})
}
- Create additional Terraform files for networking, ALB, security groups, etc.
2.2. GitHub Actions Workflows
Create GitHub Actions workflow files:
.github/workflows/deploy.yml
:
name: Deploy Streamlit Apps
on:
push:
branches: [ main ]
paths-ignore:
- '*.md'
- 'docs/**'
workflow_dispatch:
permissions:
id-token: write
contents: read
jobs:
discover-apps:
runs-on: ubuntu-latest
outputs:
app_list: ${{ steps.find-apps.outputs.app_list }}
steps:
- name: Checkout Infrastructure
uses: actions/checkout@v3
with:
fetch-depth: 1
- name: Find Streamlit Apps
id: find-apps
run: |
# Get list of Streamlit app repositories from rutgersgrid organization
APPS=$(gh api orgs/rutgersgrid/repos --jq '[.[] | select(.name | startswith("streamlit-")) | .name]')
echo "app_list=$APPS" >> $GITHUB_OUTPUT
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
build-and-deploy:
runs-on: ubuntu-latest
needs: discover-apps
steps:
- name: Checkout Infrastructure
uses: actions/checkout@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: ${{ secrets.AWS_REGION }}
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v1
- name: Clone App Repositories
run: |
mkdir -p apps
APP_LIST='${{ needs.discover-apps.outputs.app_list }}'
for app in $(echo $APP_LIST | jq -r '.[]'); do
echo "Cloning $app"
git clone https://github.com/rutgersgrid/$app.git apps/$app
done
- name: Generate App Configs
run: |
# Generate Nginx config entries
PORT=8501
NGINX_LOCATIONS=""
START_COMMANDS=""
for app_dir in apps/*; do
if [ -d "$app_dir" ]; then
app_name=$(basename $app_dir)
echo "Configuring $app_name on port $PORT"
# Add to Nginx config
NGINX_LOCATIONS+="
location /$app_name/ {
proxy_pass http://localhost:$PORT/;
proxy_http_version 1.1;
proxy_set_header Upgrade \$http_upgrade;
proxy_set_header Connection \"upgrade\";
proxy_set_header Host \$host;
proxy_cache_bypass \$http_upgrade;
}"
# Add to startup commands
START_COMMANDS+="cd /apps/$app_name && streamlit run app.py --server.port=$PORT --server.baseUrlPath=/$app_name --server.enableCORS=false --server.enableXsrfProtection=false &\n"
# Increment port for next app
PORT=$((PORT+1))
fi
done
# Update Nginx config
sed -i "/# App locations will be dynamically generated/a\\$NGINX_LOCATIONS" docker/nginx.conf
# Update startup script
sed -i "/# This will be dynamically populated during build/a\\$START_COMMANDS" docker/start.sh
- name: Build Multi-App Docker Image
run: |
docker build \
-t ${{ steps.login-ecr.outputs.registry }}/rutgersgrid-streamlit-apps:latest \
-t ${{ steps.login-ecr.outputs.registry }}/rutgersgrid-streamlit-apps:${{ github.sha }} \
-f docker/Dockerfile .
- name: Push Docker Image
run: |
docker push ${{ steps.login-ecr.outputs.registry }}/rutgersgrid-streamlit-apps:latest
docker push ${{ steps.login-ecr.outputs.registry }}/rutgersgrid-streamlit-apps:${{ github.sha }}
- name: Deploy to ECS
run: |
aws ecs update-service \
--cluster rutgersgrid-streamlit-cluster \
--service rutgersgrid-streamlit-service \
--force-new-deployment
- name: Wait for Deployment
run: |
aws ecs wait services-stable \
--cluster rutgersgrid-streamlit-cluster \
--services rutgersgrid-streamlit-service
- name: Output Service URL
run: |
ALB_DNS=$(aws elbv2 describe-load-balancers \
--names rutgersgrid-streamlit-alb \
--query 'LoadBalancers[0].DNSName' \
--output text)
echo "Streamlit applications deployed to: http://$ALB_DNS/"
.github/workflows/cost-report.yml
:
name: Generate Cost Report
on:
schedule:
- cron: '0 0 * * MON' # Run every Monday
workflow_dispatch:
permissions:
id-token: write
contents: read
jobs:
cost-analysis:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: ${{ secrets.AWS_REGION }}
- name: Generate Cost Report
run: |
python scripts/generate-cost-report.py > cost-report.md
- name: Upload Cost Report
uses: actions/upload-artifact@v3
with:
name: cost-report
path: cost-report.md
Phase 3: CI/CD Integration (Week 3)
3.1. Create App Registration System
Develop a system to register Streamlit apps for deployment:
- Add a
release-metadata.json
file to each Streamlit app repository:
{
"name": "app-name",
"type": "streamlit",
"description": "Description of the app",
"route": "/app-name/",
"version": "1.0.0",
"tags": ["tag1", "tag2"]
}
- Create a script to gather this metadata during deployment.
3.2. Update Template Repository with CI/CD
Add GitHub Actions workflow to the template repository:
name: CI/CD Pipeline
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install pytest
- name: Run tests
run: |
pytest
build:
needs: test
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build Docker image
run: |
docker build -t streamlit-app:${{ github.sha }} .
- name: Test Docker image
run: |
docker run -d -p 8501:8501 --name test-app streamlit-app:${{ github.sha }}
sleep 10
curl -s http://localhost:8501/_stcore/health
- name: Trigger infrastructure update
uses: peter-evans/repository-dispatch@v2
with:
token: ${{ secrets.REPO_ACCESS_TOKEN }}
repository: rutgersgrid/rutgersgrid-infrastructure
event-type: app-updated
client-payload: '{"app_name": "${{ github.repository }}", "commit_sha": "${{ github.sha }}"}'
3.3. Cost Optimization Script
Create a script to analyze costs and provide optimization recommendations:
#!/usr/bin/env python3
"""
AWS Cost Analysis Script for rutgersgrid Streamlit Applications
"""
import boto3
import datetime
import json
from tabulate import tabulate
def get_date_range():
"""Get date range for the past 30 days"""
end_date = datetime.datetime.now()
start_date = end_date - datetime.timedelta(days=30)
return start_date.strftime('%Y-%m-%d'), end_date.strftime('%Y-%m-%d')
def analyze_costs():
"""Analyze AWS costs for the past 30 days"""
client = boto3.client('ce')
start_date, end_date = get_date_range()
# Get costs grouped by service
response = client.get_cost_and_usage(
TimePeriod={
'Start': start_date,
'End': end_date
},
Granularity='MONTHLY',
Metrics=['UnblendedCost'],
GroupBy=[
{
'Type': 'DIMENSION',
'Key': 'SERVICE'
}
],
Filter={
'Tags': {
'Key': 'Project',
'Values': ['rutgersgrid']
}
}
)
# Process results
cost_by_service = []
total_cost = 0.0
for result in response['ResultsByTime'][0]['Groups']:
service = result['Keys'][0]
amount = float(result['Metrics']['UnblendedCost']['Amount'])
total_cost += amount
cost_by_service.append([service, f"${amount:.2f}"])
# Sort by cost (descending)
cost_by_service.sort(key=lambda x: float(x[1].replace('$', '')), reverse=True)
# Add total
cost_by_service.append(['TOTAL', f"${total_cost:.2f}"])
# Generate markdown report
print(f"# Rutgersgrid Streamlit Cost Report\n")
print(f"## Cost Analysis ({start_date} to {end_date})\n")
print(tabulate(cost_by_service, headers=['Service', 'Cost'], tablefmt='pipe'))
# Get daily trend
daily_response = client.get_cost_and_usage(
TimePeriod={
'Start': start_date,
'End': end_date
},
Granularity='DAILY',
Metrics=['UnblendedCost'],
Filter={
'Tags': {
'Key': 'Project',
'Values': ['rutgersgrid']
}
}
)
# Generate daily cost data for trends
days = []
costs = []
for result in daily_response['ResultsByTime']:
day = result['TimePeriod']['Start']
cost = float(result['Total']['UnblendedCost']['Amount'])
days.append(day)
costs.append(cost)
print("\n## Daily Cost Trend\n")
# Calculate monthly forecast
days_in_month = 30
days_passed = len(days)
forecast = total_cost * (days_in_month / days_passed)
print(f"\n## Monthly Forecast\n")
print(f"Projected monthly cost: ${forecast:.2f}\n")
# Optimization recommendations
print("\n## Optimization Recommendations\n")
if total_cost > 100:
print("- Consider implementing application bundling to reduce container costs")
print("- Verify the auto-scaling settings to ensure instances scale to zero after hours")
print("- Review CloudWatch logs and metrics for optimization opportunities")
print("- Consider using Fargate Spot for non-critical workloads")
if __name__ == "__main__":
analyze_costs()
Phase 4: Testing and Deployment (Week 4)
4.1. Local Testing
Test the containerization locally:
- Build individual app containers
- Test multi-app container with Nginx routing
- Verify auto-scaling configuration
4.2. AWS Deployment
Deploy to AWS:
- Create AWS resources using Terraform
- Deploy the initial container
- Set up monitoring and alerting
- Test full deployment pipeline
4.3. Documentation
Update documentation:
- Add Docker usage instructions to each repository
- Create deployment guide
- Document cost optimization strategy
- Create troubleshooting guide
Cost Optimization Measures
Based on your cost analysis documents, implement these optimizations:
- Application Bundling: Bundle multiple Streamlit apps in a single container
- Auto-Scaling: Scale to zero during off-hours (evenings and weekends)
- Resource Sizing: Start with minimal resources (0.25 vCPU, 0.5GB RAM)
- Use Fargate Spot: For non-critical workloads to save ~70%
- Shared Resources: Single ALB for all applications
- CloudFront Integration: For static asset caching
Estimated AWS Costs
Based on your "Annual Cost Analysis" document:
Scale | Monthly Cost | Annual Cost |
---|---|---|
1-5 Apps | $28-40 | $336-480 |
10 Apps | ~$89 | ~$1,070 |
Fully Optimized (10 Apps) | ~$52 | ~$624 |
Cost optimization can provide savings of 30-66% depending on scale.
Next Steps
After initial implementation:
- Monitor Usage: Track application usage patterns
- Refine Scaling: Adjust auto-scaling based on actual usage
- Cost Tracking: Implement detailed cost allocation tagging
- Performance Optimization: Monitor and optimize performance
Streamlit Docker & AWS Implementation - User Stories
Epic 1: Docker Configuration and Containerization
-
As a developer, I want to have a standardized Docker template for Streamlit applications, so that I can consistently containerize any Streamlit app in the VIDA project.
-
As a developer, I want to create a multi-app Docker container architecture with Nginx, so that I can efficiently bundle multiple Streamlit applications into a single deployment.
-
As a DevOps engineer, I want to set up a central infrastructure repository, so that all deployment configurations and scripts are maintained in one place.
Epic 2: AWS Infrastructure Setup
-
As a cloud architect, I want to define AWS infrastructure using Terraform, so that cloud resources can be provisioned in a controlled, version-controlled manner.
-
As a system administrator, I want to configure auto-scaling for ECS services, so that resources scale based on demand and reduce costs during off-hours.
-
As a DevOps engineer, I want to set up a secure ECS cluster with appropriate IAM roles, so that the container environment follows security best practices.
-
As a system administrator, I want to configure a load balancer for the containerized applications, so that traffic is properly distributed and the system is resilient.
Epic 3: CI/CD Integration
-
As a developer, I want to implement an app registration system, so that new Streamlit applications can be easily added to the deployment pipeline.
-
As a DevOps engineer, I want to create GitHub Actions workflows for automated deployment, so that changes to applications trigger appropriate infrastructure updates.
-
As a project manager, I want to implement a cost analysis and reporting system, so that AWS expenses are tracked and optimization opportunities are identified.
-
As a developer, I want to update the template repository with CI/CD configuration, so that new applications automatically integrate with the deployment pipeline.
Epic 4: Testing and Deployment
-
As a QA engineer, I want to test the container architecture locally, so that issues can be identified before cloud deployment.
-
As a DevOps engineer, I want to deploy the complete solution to AWS, so that the infrastructure is ready for production use.
-
As a technical writer, I want to create comprehensive documentation for the system, so that team members can understand, use, and maintain the deployment infrastructure.
-
As a system administrator, I want to set up monitoring and alerting, so that issues can be quickly identified and addressed.
Epic 5: Cost Optimization
-
As a finance manager, I want to implement recommended cost optimization measures, so that cloud expenses are minimized without compromising performance.
-
As a system administrator, I want to implement CloudFront for static asset caching, so that delivery of static content is optimized for performance and cost.
-
As a DevOps engineer, I want to set up detailed cost allocation tagging, so that expenses can be tracked at a granular level.