Creating Docker containers for deploying Python scripts - Observatorio-do-Trabalho-de-Pernambuco/documentation GitHub Wiki

Docker Container Development

This wiki provides a comprehensive guide for developing Docker containers for AWS Lambda functions and local testing using the templates in the infraestrutura-de-dados repository.

Table of Contents

  1. Overview
  2. Container Types
  3. File Structure
  4. Core Files Explained
  5. Development Workflow
  6. Docker Commands
  7. Testing Locally
  8. Deployment to AWS Lambda
  9. Best Practices
  10. Troubleshooting

Overview

This repository contains two types of Docker containers:

  • Template Containers (containers/template/): For local development and testing
  • Lambda Containers (containers/lambda/): For AWS Lambda deployment

Container Types

Template Containers

  • Purpose: Local development, testing, and debugging
  • Base Image: python:3.11-slim
  • Entry Point: Command-line interface with main.py
  • Use Case: Interactive development and batch processing

Lambda Containers

  • Purpose: AWS Lambda deployment
  • Base Image: public.ecr.aws/lambda/python:3.11
  • Entry Point: Lambda handler function
  • Use Case: Serverless execution in AWS

File Structure

containers/
├── template/
│   ├── ftp_to_s3/
│   │   ├── Dockerfile
│   │   ├── main.py
│   │   ├── requirements.txt
│   │   └── README.md
│   ├── list_ftp_files/
│   │   ├── Dockerfile
│   │   ├── main.py
│   │   ├── requirements.txt
│   │   └── README.md
│   └── s3_to_parquet/
│       ├── Dockerfile
│       ├── main.py
│       ├── requirements.txt
│       └── README.md
└── lambda/
    ├── ftp_to_s3/
    │   ├── Dockerfile
    │   ├── lambda_handler.py
    │   ├── requirements.txt
    │   └── README.md
    ├── list_ftp_files/
    │   ├── Dockerfile
    │   ├── lambda_handler.py
    │   ├── requirements.txt
    │   └── README.md
    └── s3_to_parquet/
        ├── Dockerfile
        ├── lambda_handler.py
        ├── requirements.txt
        ├── README.md
        ├── test_payload.json
        └── build.sh

Core Files Explained

1. Dockerfile

The Dockerfile defines how to build the container image. It specifies the base image, dependencies, and runtime configuration.

Template Dockerfile Pattern:

# Use Python 3.12 slim image for smaller size and efficiency
FROM python:3.12-slim

# Set working directory
WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    g++ \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements and install Python dependencies
COPY containers/template/your_function/requirements.txt /app/
RUN pip install --no-cache-dir -r requirements.txt

# Copy the source code from the project root
COPY src/ /app/src/

# Copy the main.py script
COPY containers/template/your_function/main.py /app/

# Create a non-root user for security
RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
USER appuser

# Set environment variables
ENV AWS_DEFAULT_REGION=us-east-1
ENV PYTHONPATH=/app/src

# Set the entrypoint to run main.py with arguments
ENTRYPOINT ["python", "/app/main.py"]

# Default CMD (can be overridden)
CMD ["--help"]

Lambda Dockerfile Pattern:

# Use AWS Lambda Python runtime as base image
FROM public.ecr.aws/lambda/python:3.11

# Install system dependencies
RUN yum update -y && yum install -y \
    gcc \
    gcc-c++ \
    && yum clean all

# Copy requirements and install Python dependencies
COPY containers/lambda/your_function/requirements.txt ${LAMBDA_TASK_ROOT}/
RUN pip install --no-cache-dir -r requirements.txt

# Copy the source code from the project root
COPY src/ ${LAMBDA_TASK_ROOT}/src/

# Copy the Lambda handler
COPY containers/lambda/your_function/lambda_handler.py ${LAMBDA_TASK_ROOT}/

# Set the Lambda handler
CMD ["lambda_handler.lambda_handler"]

2. requirements.txt

Lists all Python dependencies needed for the function to run.

Example requirements.txt:

boto3>=1.36.0
s3fs==2024.6.0
fsspec==2024.6.0
polars>=1.22.0
py7zr>=0.22.0
tqdm>=4.67.1
google-cloud-bigquery>=3.31.0

Best Practices:

  • Pin major versions for stability
  • Use >= for core libraries, == for specific versions when needed
  • Keep dependencies minimal to reduce image size
  • Test with exact versions in production

3. main.py (Template) / lambda_handler.py (Lambda)

Template main.py Pattern:

#!/usr/bin/env python3
"""
Main entry point for [Function Name] functionality.
This script accepts command line arguments and executes the [function_name] function.
"""

import argparse
import logging
import sys

# Add the src directory to the Python path
sys.path.append("/app/src")

from extract_and_load import your_function

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
    handlers=[logging.StreamHandler(sys.stdout)],
)

def main():
    """Main function to parse arguments and execute [Function Name]."""
    
    parser = argparse.ArgumentParser(
        description="[Description of what the function does]",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
Examples:
  python main.py --param1 value1 --param2 value2
        """,
    )

    # Add your arguments here
    parser.add_argument("--param1", required=True, help="Description of param1")
    parser.add_argument("--param2", required=True, help="Description of param2")

    args = parser.parse_args()

    # Log the operation details
    logging.info("Starting [Function Name] operation")
    logging.info(f"Param1: {args.param1}")
    logging.info(f"Param2: {args.param2}")

    try:
        # Execute the function
        success, error_message = your_function(
            param1=args.param1,
            param2=args.param2,
        )

        if success:
            logging.info("✅ Operation completed successfully!")
            sys.exit(0)
        else:
            logging.error(f"❌ Operation failed: {error_message}")
            sys.exit(1)

    except Exception as e:
        logging.error(f"❌ Unexpected error: {str(e)}")
        sys.exit(1)

if __name__ == "__main__":
    main()

Lambda lambda_handler.py Pattern:

import json
import logging
import os
import sys

# Add the src directory to the Python path
if os.path.exists("/var/task/src"):
    # Running in Lambda
    sys.path.append("/var/task/src")
else:
    # Running locally
    sys.path.append("../../src")

from extract_and_load import your_function

# Configure logging for Lambda
logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    """
    AWS Lambda handler for [Function Name] functionality.

    Expected event structure:
    {
        "param1": "value1",
        "param2": "value2"
    }

    Returns:
    {
        "statusCode": 200 or 500,
        "body": JSON string with success message or error details
    }
    """

    try:
        # Extract parameters from the event
        param1 = event.get("param1")
        param2 = event.get("param2")

        # Validate required parameters
        required_params = {
            "param1": param1,
            "param2": param2,
        }

        missing_params = [
            param for param, value in required_params.items() if not value
        ]

        if missing_params:
            error_msg = f"Missing required parameters: {', '.join(missing_params)}"
            logger.error(error_msg)

            return {
                "statusCode": 400,
                "body": json.dumps({
                    "success": False,
                    "error": error_msg,
                    "required_parameters": list(required_params.keys()),
                }),
            }

        # Log the operation details
        logger.info("Starting [Function Name] operation")
        logger.info(f"Param1: {param1}")
        logger.info(f"Param2: {param2}")

        # Execute the function
        success, error_message = your_function(
            param1=param1,
            param2=param2,
        )

        if success:
            logger.info("✅ Operation completed successfully!")

            return {
                "statusCode": 200,
                "body": json.dumps({
                    "success": True,
                    "message": "Operation completed successfully",
                    "param1": param1,
                    "param2": param2,
                }),
            }
        else:
            logger.error(f"❌ Operation failed: {error_message}")

            return {
                "statusCode": 500,
                "body": json.dumps({
                    "success": False,
                    "error": error_message,
                    "param1": param1,
                    "param2": param2,
                }),
            }

    except Exception as e:
        error_msg = f"[Function Name] failed: {str(e)}"
        logger.error(error_msg, exc_info=True)

        return {
            "statusCode": 500,
            "body": json.dumps({
                "success": False,
                "error": error_msg,
                "param1": param1 if "param1" in locals() else None,
                "param2": param2 if "param2" in locals() else None,
            }),
        }

Development Workflow

Step-by-Step Process

  1. Create Function Directory

    mkdir -p containers/template/your_function
    mkdir -p containers/lambda/your_function
  2. Create requirements.txt

    • List all Python dependencies
    • Use appropriate version constraints
  3. Create main.py (Template)

    • Implement command-line argument parsing
    • Add proper logging
    • Handle errors gracefully
    • Return appropriate exit codes
  4. Create lambda_handler.py (Lambda)

    • Implement Lambda event handling
    • Add parameter validation
    • Return proper HTTP status codes
    • Include comprehensive error handling
  5. Create Dockerfile

    • Choose appropriate base image
    • Install system dependencies
    • Copy application files
    • Set up environment variables
    • Configure entry point
  6. Test Locally

    • Build the Docker image
    • Run with test data
    • Verify functionality
  7. Deploy to Lambda (if applicable)

    • Build for Lambda runtime
    • Push to ECR
    • Create/update Lambda function

Docker Commands

Building Images

Template Container:

# Build template container
docker build -f containers/template/your_function/Dockerfile -t your-function:latest .

# Build with specific tag
docker build -f containers/template/your_function/Dockerfile -t your-function:v1.0 .

Lambda Container:

# Build Lambda container for AMD64 (recommended for AWS)
docker buildx build --platform linux/amd64 -t your-function-lambda-amd64 -f containers/lambda/your_function/Dockerfile . --load

# Build for local testing (current architecture)
docker build -f containers/lambda/your_function/Dockerfile -t your-function-lambda .

Running Containers

Template Container:

# Basic run
docker run --rm your-function:latest --help

# Run with AWS credentials
docker run --rm \
  -v $HOME/.aws:/home/appuser/.aws:ro \
  -e AWS_PROFILE=${AWS_PROFILE:-default} \
  your-function:latest \
  --param1 value1 \
  --param2 value2

# Run with environment variables
docker run --rm \
  -e AWS_ACCESS_KEY_ID=your_key \
  -e AWS_SECRET_ACCESS_KEY=your_secret \
  -e AWS_DEFAULT_REGION=us-east-1 \
  your-function:latest \
  --param1 value1 \
  --param2 value2

Lambda Container (Local Testing):

# Run Lambda container locally
docker run -p 9000:8080 your-function-lambda

# Test with curl (in another terminal)
curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{
    "param1": "value1",
    "param2": "value2"
}'

Managing Images

# List all images
docker images

# Remove specific image
docker rmi your-function:latest

# Remove all unused images
docker image prune -a

# View image details
docker inspect your-function:latest

# View image history
docker history your-function:latest

Testing Locally

Template Container Testing

  1. Build the image

    docker build -f containers/template/your_function/Dockerfile -t your-function:latest .
  2. Test with help

    docker run --rm your-function:latest --help
  3. Test with sample data

    docker run --rm \
      -v $HOME/.aws:/home/appuser/.aws:ro \
      -e AWS_PROFILE=default \
      your-function:latest \
      --param1 "test_value" \
      --param2 "another_value"
  4. Test with different parameters

    # Test error handling
    docker run --rm your-function:latest --param1 "" --param2 "value"
    
    # Test with file paths
    docker run --rm \
      -v $(pwd)/test_data:/app/test_data:ro \
      your-function:latest \
      --param1 "/app/test_data/file.txt" \
      --param2 "value"

Deployment to AWS Lambda

Prerequisites

  1. AWS CLI configured

    aws configure
  2. ECR repository created

    aws ecr create-repository --repository-name your-function-lambda
  3. IAM role with Lambda permissions

Deployment Steps

  1. Authenticate to ECR

    aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <account-id>.dkr.ecr.us-east-1.amazonaws.com
  2. Build for Lambda

    docker buildx build --platform linux/amd64 -t your-function-lambda-amd64 -f containers/lambda/your_function/Dockerfile . --load
  3. Tag for ECR

    docker tag your-function-lambda-amd64:latest <account-id>.dkr.ecr.us-east-1.amazonaws.com/your-function-lambda:latest
  4. Push to ECR

    docker push <account-id>.dkr.ecr.us-east-1.amazonaws.com/your-function-lambda:latest
  5. Create Lambda function

    aws lambda create-function \
      --function-name your-function \
      --package-type Image \
      --code ImageUri=<account-id>.dkr.ecr.us-east-1.amazonaws.com/your-function-lambda:latest \
      --role arn:aws:iam::<account-id>:role/your-lambda-role \
      --timeout 900 \
      --memory-size 512
  6. Update existing function

    aws lambda update-function-code \
      --function-name your-function \
      --image-uri <account-id>.dkr.ecr.us-east-1.amazonaws.com/your-function-lambda:latest

Additional Resources

⚠️ **GitHub.com Fallback** ⚠️