Creating Docker containers for deploying Python scripts - Observatorio-do-Trabalho-de-Pernambuco/documentation GitHub Wiki
This wiki provides a comprehensive guide for developing Docker containers for AWS Lambda functions and local testing using the templates in the infraestrutura-de-dados
repository.
- Overview
- Container Types
- File Structure
- Core Files Explained
- Development Workflow
- Docker Commands
- Testing Locally
- Deployment to AWS Lambda
- Best Practices
- Troubleshooting
This repository contains two types of Docker containers:
-
Template Containers (
containers/template/
): For local development and testing -
Lambda Containers (
containers/lambda/
): For AWS Lambda deployment
- Purpose: Local development, testing, and debugging
-
Base Image:
python:3.11-slim
-
Entry Point: Command-line interface with
main.py
- Use Case: Interactive development and batch processing
- Purpose: AWS Lambda deployment
-
Base Image:
public.ecr.aws/lambda/python:3.11
- Entry Point: Lambda handler function
- Use Case: Serverless execution in AWS
containers/
├── template/
│ ├── ftp_to_s3/
│ │ ├── Dockerfile
│ │ ├── main.py
│ │ ├── requirements.txt
│ │ └── README.md
│ ├── list_ftp_files/
│ │ ├── Dockerfile
│ │ ├── main.py
│ │ ├── requirements.txt
│ │ └── README.md
│ └── s3_to_parquet/
│ ├── Dockerfile
│ ├── main.py
│ ├── requirements.txt
│ └── README.md
└── lambda/
├── ftp_to_s3/
│ ├── Dockerfile
│ ├── lambda_handler.py
│ ├── requirements.txt
│ └── README.md
├── list_ftp_files/
│ ├── Dockerfile
│ ├── lambda_handler.py
│ ├── requirements.txt
│ └── README.md
└── s3_to_parquet/
├── Dockerfile
├── lambda_handler.py
├── requirements.txt
├── README.md
├── test_payload.json
└── build.sh
The Dockerfile defines how to build the container image. It specifies the base image, dependencies, and runtime configuration.
# Use Python 3.12 slim image for smaller size and efficiency
FROM python:3.12-slim
# Set working directory
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
gcc \
g++ \
&& rm -rf /var/lib/apt/lists/*
# Copy requirements and install Python dependencies
COPY containers/template/your_function/requirements.txt /app/
RUN pip install --no-cache-dir -r requirements.txt
# Copy the source code from the project root
COPY src/ /app/src/
# Copy the main.py script
COPY containers/template/your_function/main.py /app/
# Create a non-root user for security
RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
USER appuser
# Set environment variables
ENV AWS_DEFAULT_REGION=us-east-1
ENV PYTHONPATH=/app/src
# Set the entrypoint to run main.py with arguments
ENTRYPOINT ["python", "/app/main.py"]
# Default CMD (can be overridden)
CMD ["--help"]
# Use AWS Lambda Python runtime as base image
FROM public.ecr.aws/lambda/python:3.11
# Install system dependencies
RUN yum update -y && yum install -y \
gcc \
gcc-c++ \
&& yum clean all
# Copy requirements and install Python dependencies
COPY containers/lambda/your_function/requirements.txt ${LAMBDA_TASK_ROOT}/
RUN pip install --no-cache-dir -r requirements.txt
# Copy the source code from the project root
COPY src/ ${LAMBDA_TASK_ROOT}/src/
# Copy the Lambda handler
COPY containers/lambda/your_function/lambda_handler.py ${LAMBDA_TASK_ROOT}/
# Set the Lambda handler
CMD ["lambda_handler.lambda_handler"]
Lists all Python dependencies needed for the function to run.
boto3>=1.36.0
s3fs==2024.6.0
fsspec==2024.6.0
polars>=1.22.0
py7zr>=0.22.0
tqdm>=4.67.1
google-cloud-bigquery>=3.31.0
Best Practices:
- Pin major versions for stability
- Use
>=
for core libraries,==
for specific versions when needed - Keep dependencies minimal to reduce image size
- Test with exact versions in production
#!/usr/bin/env python3
"""
Main entry point for [Function Name] functionality.
This script accepts command line arguments and executes the [function_name] function.
"""
import argparse
import logging
import sys
# Add the src directory to the Python path
sys.path.append("/app/src")
from extract_and_load import your_function
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
handlers=[logging.StreamHandler(sys.stdout)],
)
def main():
"""Main function to parse arguments and execute [Function Name]."""
parser = argparse.ArgumentParser(
description="[Description of what the function does]",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
python main.py --param1 value1 --param2 value2
""",
)
# Add your arguments here
parser.add_argument("--param1", required=True, help="Description of param1")
parser.add_argument("--param2", required=True, help="Description of param2")
args = parser.parse_args()
# Log the operation details
logging.info("Starting [Function Name] operation")
logging.info(f"Param1: {args.param1}")
logging.info(f"Param2: {args.param2}")
try:
# Execute the function
success, error_message = your_function(
param1=args.param1,
param2=args.param2,
)
if success:
logging.info("✅ Operation completed successfully!")
sys.exit(0)
else:
logging.error(f"❌ Operation failed: {error_message}")
sys.exit(1)
except Exception as e:
logging.error(f"❌ Unexpected error: {str(e)}")
sys.exit(1)
if __name__ == "__main__":
main()
import json
import logging
import os
import sys
# Add the src directory to the Python path
if os.path.exists("/var/task/src"):
# Running in Lambda
sys.path.append("/var/task/src")
else:
# Running locally
sys.path.append("../../src")
from extract_and_load import your_function
# Configure logging for Lambda
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
"""
AWS Lambda handler for [Function Name] functionality.
Expected event structure:
{
"param1": "value1",
"param2": "value2"
}
Returns:
{
"statusCode": 200 or 500,
"body": JSON string with success message or error details
}
"""
try:
# Extract parameters from the event
param1 = event.get("param1")
param2 = event.get("param2")
# Validate required parameters
required_params = {
"param1": param1,
"param2": param2,
}
missing_params = [
param for param, value in required_params.items() if not value
]
if missing_params:
error_msg = f"Missing required parameters: {', '.join(missing_params)}"
logger.error(error_msg)
return {
"statusCode": 400,
"body": json.dumps({
"success": False,
"error": error_msg,
"required_parameters": list(required_params.keys()),
}),
}
# Log the operation details
logger.info("Starting [Function Name] operation")
logger.info(f"Param1: {param1}")
logger.info(f"Param2: {param2}")
# Execute the function
success, error_message = your_function(
param1=param1,
param2=param2,
)
if success:
logger.info("✅ Operation completed successfully!")
return {
"statusCode": 200,
"body": json.dumps({
"success": True,
"message": "Operation completed successfully",
"param1": param1,
"param2": param2,
}),
}
else:
logger.error(f"❌ Operation failed: {error_message}")
return {
"statusCode": 500,
"body": json.dumps({
"success": False,
"error": error_message,
"param1": param1,
"param2": param2,
}),
}
except Exception as e:
error_msg = f"[Function Name] failed: {str(e)}"
logger.error(error_msg, exc_info=True)
return {
"statusCode": 500,
"body": json.dumps({
"success": False,
"error": error_msg,
"param1": param1 if "param1" in locals() else None,
"param2": param2 if "param2" in locals() else None,
}),
}
-
Create Function Directory
mkdir -p containers/template/your_function mkdir -p containers/lambda/your_function
-
Create requirements.txt
- List all Python dependencies
- Use appropriate version constraints
-
Create main.py (Template)
- Implement command-line argument parsing
- Add proper logging
- Handle errors gracefully
- Return appropriate exit codes
-
Create lambda_handler.py (Lambda)
- Implement Lambda event handling
- Add parameter validation
- Return proper HTTP status codes
- Include comprehensive error handling
-
Create Dockerfile
- Choose appropriate base image
- Install system dependencies
- Copy application files
- Set up environment variables
- Configure entry point
-
Test Locally
- Build the Docker image
- Run with test data
- Verify functionality
-
Deploy to Lambda (if applicable)
- Build for Lambda runtime
- Push to ECR
- Create/update Lambda function
# Build template container
docker build -f containers/template/your_function/Dockerfile -t your-function:latest .
# Build with specific tag
docker build -f containers/template/your_function/Dockerfile -t your-function:v1.0 .
# Build Lambda container for AMD64 (recommended for AWS)
docker buildx build --platform linux/amd64 -t your-function-lambda-amd64 -f containers/lambda/your_function/Dockerfile . --load
# Build for local testing (current architecture)
docker build -f containers/lambda/your_function/Dockerfile -t your-function-lambda .
# Basic run
docker run --rm your-function:latest --help
# Run with AWS credentials
docker run --rm \
-v $HOME/.aws:/home/appuser/.aws:ro \
-e AWS_PROFILE=${AWS_PROFILE:-default} \
your-function:latest \
--param1 value1 \
--param2 value2
# Run with environment variables
docker run --rm \
-e AWS_ACCESS_KEY_ID=your_key \
-e AWS_SECRET_ACCESS_KEY=your_secret \
-e AWS_DEFAULT_REGION=us-east-1 \
your-function:latest \
--param1 value1 \
--param2 value2
# Run Lambda container locally
docker run -p 9000:8080 your-function-lambda
# Test with curl (in another terminal)
curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{
"param1": "value1",
"param2": "value2"
}'
# List all images
docker images
# Remove specific image
docker rmi your-function:latest
# Remove all unused images
docker image prune -a
# View image details
docker inspect your-function:latest
# View image history
docker history your-function:latest
-
Build the image
docker build -f containers/template/your_function/Dockerfile -t your-function:latest .
-
Test with help
docker run --rm your-function:latest --help
-
Test with sample data
docker run --rm \ -v $HOME/.aws:/home/appuser/.aws:ro \ -e AWS_PROFILE=default \ your-function:latest \ --param1 "test_value" \ --param2 "another_value"
-
Test with different parameters
# Test error handling docker run --rm your-function:latest --param1 "" --param2 "value" # Test with file paths docker run --rm \ -v $(pwd)/test_data:/app/test_data:ro \ your-function:latest \ --param1 "/app/test_data/file.txt" \ --param2 "value"
-
AWS CLI configured
aws configure
-
ECR repository created
aws ecr create-repository --repository-name your-function-lambda
-
IAM role with Lambda permissions
-
Authenticate to ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <account-id>.dkr.ecr.us-east-1.amazonaws.com
-
Build for Lambda
docker buildx build --platform linux/amd64 -t your-function-lambda-amd64 -f containers/lambda/your_function/Dockerfile . --load
-
Tag for ECR
docker tag your-function-lambda-amd64:latest <account-id>.dkr.ecr.us-east-1.amazonaws.com/your-function-lambda:latest
-
Push to ECR
docker push <account-id>.dkr.ecr.us-east-1.amazonaws.com/your-function-lambda:latest
-
Create Lambda function
aws lambda create-function \ --function-name your-function \ --package-type Image \ --code ImageUri=<account-id>.dkr.ecr.us-east-1.amazonaws.com/your-function-lambda:latest \ --role arn:aws:iam::<account-id>:role/your-lambda-role \ --timeout 900 \ --memory-size 512
-
Update existing function
aws lambda update-function-code \ --function-name your-function \ --image-uri <account-id>.dkr.ecr.us-east-1.amazonaws.com/your-function-lambda:latest