Docker Container Development

This wiki provides a comprehensive guide for developing Docker containers for AWS Lambda functions and local testing using the templates in the infraestrutura-de-dados repository.

Overview
Container Types
File Structure
Core Files Explained
Development Workflow
Docker Commands
Testing Locally
Deployment to AWS Lambda
Best Practices
Troubleshooting

Overview

This repository contains two types of Docker containers:

Template Containers (containers/template/): For local development and testing
Lambda Containers (containers/lambda/): For AWS Lambda deployment

Container Types

Template Containers

Purpose: Local development, testing, and debugging
Base Image: python:3.11-slim
Entry Point: Command-line interface with main.py
Use Case: Interactive development and batch processing

Lambda Containers

Purpose: AWS Lambda deployment
Base Image: public.ecr.aws/lambda/python:3.11
Entry Point: Lambda handler function
Use Case: Serverless execution in AWS

File Structure

containers/
├── template/
│   ├── ftp_to_s3/
│   │   ├── Dockerfile
│   │   ├── main.py
│   │   ├── requirements.txt
│   │   └── README.md
│   ├── list_ftp_files/
│   │   ├── Dockerfile
│   │   ├── main.py
│   │   ├── requirements.txt
│   │   └── README.md
│   └── s3_to_parquet/
│       ├── Dockerfile
│       ├── main.py
│       ├── requirements.txt
│       └── README.md
└── lambda/
    ├── ftp_to_s3/
    │   ├── Dockerfile
    │   ├── lambda_handler.py
    │   ├── requirements.txt
    │   └── README.md
    ├── list_ftp_files/
    │   ├── Dockerfile
    │   ├── lambda_handler.py
    │   ├── requirements.txt
    │   └── README.md
    └── s3_to_parquet/
        ├── Dockerfile
        ├── lambda_handler.py
        ├── requirements.txt
        ├── README.md
        ├── test_payload.json
        └── build.sh

Core Files Explained

1. Dockerfile

The Dockerfile defines how to build the container image. It specifies the base image, dependencies, and runtime configuration.

Template Dockerfile Pattern:

# Use Python 3.12 slim image for smaller size and efficiency
FROM python:3.12-slim

# Set working directory
WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    g++ \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements and install Python dependencies
COPY containers/template/your_function/requirements.txt /app/
RUN pip install --no-cache-dir -r requirements.txt

# Copy the source code from the project root
COPY src/ /app/src/

# Copy the main.py script
COPY containers/template/your_function/main.py /app/

# Create a non-root user for security
RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
USER appuser

# Set environment variables
ENV AWS_DEFAULT_REGION=us-east-1
ENV PYTHONPATH=/app/src

# Set the entrypoint to run main.py with arguments
ENTRYPOINT ["python", "/app/main.py"]

# Default CMD (can be overridden)
CMD ["--help"]

Lambda Dockerfile Pattern:

# Use AWS Lambda Python runtime as base image
FROM public.ecr.aws/lambda/python:3.11

# Install system dependencies
RUN yum update -y && yum install -y \
    gcc \
    gcc-c++ \
    && yum clean all

# Copy requirements and install Python dependencies
COPY containers/lambda/your_function/requirements.txt ${LAMBDA_TASK_ROOT}/
RUN pip install --no-cache-dir -r requirements.txt

# Copy the source code from the project root
COPY src/ ${LAMBDA_TASK_ROOT}/src/

# Copy the Lambda handler
COPY containers/lambda/your_function/lambda_handler.py ${LAMBDA_TASK_ROOT}/

# Set the Lambda handler
CMD ["lambda_handler.lambda_handler"]

2. requirements.txt

Lists all Python dependencies needed for the function to run.

Example requirements.txt:

boto3>=1.36.0
s3fs==2024.6.0
fsspec==2024.6.0
polars>=1.22.0
py7zr>=0.22.0
tqdm>=4.67.1
google-cloud-bigquery>=3.31.0

Best Practices:

Pin major versions for stability
Use >= for core libraries, == for specific versions when needed
Keep dependencies minimal to reduce image size
Test with exact versions in production

3. main.py (Template) / lambda_handler.py (Lambda)

Template main.py Pattern:

#!/usr/bin/env python3
"""
Main entry point for [Function Name] functionality.
This script accepts command line arguments and executes the [function_name] function.
"""

import argparse
import logging
import sys

# Add the src directory to the Python path
sys.path.append("/app/src")

from extract_and_load import your_function

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
    handlers=[logging.StreamHandler(sys.stdout)],
)

def main():
    """Main function to parse arguments and execute [Function Name]."""
    
    parser = argparse.ArgumentParser(
        description="[Description of what the function does]",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
Examples:
  python main.py --param1 value1 --param2 value2
        """,
    )

    # Add your arguments here
    parser.add_argument("--param1", required=True, help="Description of param1")
    parser.add_argument("--param2", required=True, help="Description of param2")

    args = parser.parse_args()

    # Log the operation details
    logging.info("Starting [Function Name] operation")
    logging.info(f"Param1: {args.param1}")
    logging.info(f"Param2: {args.param2}")

    try:
        # Execute the function
        success, error_message = your_function(
            param1=args.param1,
            param2=args.param2,
        )

        if success:
            logging.info("✅ Operation completed successfully!")
            sys.exit(0)
        else:
            logging.error(f"❌ Operation failed: {error_message}")
            sys.exit(1)

    except Exception as e:
        logging.error(f"❌ Unexpected error: {str(e)}")
        sys.exit(1)

if __name__ == "__main__":
    main()

Lambda lambda_handler.py Pattern:

import json
import logging
import os
import sys

# Add the src directory to the Python path
if os.path.exists("/var/task/src"):
    # Running in Lambda
    sys.path.append("/var/task/src")
else:
    # Running locally
    sys.path.append("../../src")

from extract_and_load import your_function

# Configure logging for Lambda
logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    """
    AWS Lambda handler for [Function Name] functionality.

    Expected event structure:
    {
        "param1": "value1",
        "param2": "value2"
    }

    Returns:
    {
        "statusCode": 200 or 500,
        "body": JSON string with success message or error details
    }
    """

    try:
        # Extract parameters from the event
        param1 = event.get("param1")
        param2 = event.get("param2")

        # Validate required parameters
        required_params = {
            "param1": param1,
            "param2": param2,
        }

        missing_params = [
            param for param, value in required_params.items() if not value
        ]

        if missing_params:
            error_msg = f"Missing required parameters: {', '.join(missing_params)}"
            logger.error(error_msg)

            return {
                "statusCode": 400,
                "body": json.dumps({
                    "success": False,
                    "error": error_msg,
                    "required_parameters": list(required_params.keys()),
                }),
            }

        # Log the operation details
        logger.info("Starting [Function Name] operation")
        logger.info(f"Param1: {param1}")
        logger.info(f"Param2: {param2}")

        # Execute the function
        success, error_message = your_function(
            param1=param1,
            param2=param2,
        )

        if success:
            logger.info("✅ Operation completed successfully!")

            return {
                "statusCode": 200,
                "body": json.dumps({
                    "success": True,
                    "message": "Operation completed successfully",
                    "param1": param1,
                    "param2": param2,
                }),
            }
        else:
            logger.error(f"❌ Operation failed: {error_message}")

            return {
                "statusCode": 500,
                "body": json.dumps({
                    "success": False,
                    "error": error_message,
                    "param1": param1,
                    "param2": param2,
                }),
            }

    except Exception as e:
        error_msg = f"[Function Name] failed: {str(e)}"
        logger.error(error_msg, exc_info=True)

        return {
            "statusCode": 500,
            "body": json.dumps({
                "success": False,
                "error": error_msg,
                "param1": param1 if "param1" in locals() else None,
                "param2": param2 if "param2" in locals() else None,
            }),
        }

Development Workflow

Step-by-Step Process

Create Function Directory

mkdir -p containers/template/your_function
mkdir -p containers/lambda/your_function

Create requirements.txt
- List all Python dependencies
- Use appropriate version constraints
Create main.py (Template)
- Implement command-line argument parsing
- Add proper logging
- Handle errors gracefully
- Return appropriate exit codes
Create lambda_handler.py (Lambda)
- Implement Lambda event handling
- Add parameter validation
- Return proper HTTP status codes
- Include comprehensive error handling
Create Dockerfile
- Choose appropriate base image
- Install system dependencies
- Copy application files
- Set up environment variables
- Configure entry point
Test Locally
- Build the Docker image
- Run with test data
- Verify functionality
Deploy to Lambda (if applicable)
- Build for Lambda runtime
- Push to ECR
- Create/update Lambda function

Docker Commands

Building Images

Template Container:

# Build template container
docker build -f containers/template/your_function/Dockerfile -t your-function:latest .

# Build with specific tag
docker build -f containers/template/your_function/Dockerfile -t your-function:v1.0 .

Lambda Container:

# Build Lambda container for AMD64 (recommended for AWS)
docker buildx build --platform linux/amd64 -t your-function-lambda-amd64 -f containers/lambda/your_function/Dockerfile . --load

# Build for local testing (current architecture)
docker build -f containers/lambda/your_function/Dockerfile -t your-function-lambda .

Running Containers

Template Container:

# Basic run
docker run --rm your-function:latest --help

# Run with AWS credentials
docker run --rm \
  -v $HOME/.aws:/home/appuser/.aws:ro \
  -e AWS_PROFILE=${AWS_PROFILE:-default} \
  your-function:latest \
  --param1 value1 \
  --param2 value2

# Run with environment variables
docker run --rm \
  -e AWS_ACCESS_KEY_ID=your_key \
  -e AWS_SECRET_ACCESS_KEY=your_secret \
  -e AWS_DEFAULT_REGION=us-east-1 \
  your-function:latest \
  --param1 value1 \
  --param2 value2

Lambda Container (Local Testing):

# Run Lambda container locally
docker run -p 9000:8080 your-function-lambda

# Test with curl (in another terminal)
curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{
    "param1": "value1",
    "param2": "value2"
}'

Managing Images

# List all images
docker images

# Remove specific image
docker rmi your-function:latest

# Remove all unused images
docker image prune -a

# View image details
docker inspect your-function:latest

# View image history
docker history your-function:latest

Testing Locally

Template Container Testing

Build the image

docker build -f containers/template/your_function/Dockerfile -t your-function:latest .

Test with help

docker run --rm your-function:latest --help

Test with sample data

docker run --rm \
  -v $HOME/.aws:/home/appuser/.aws:ro \
  -e AWS_PROFILE=default \
  your-function:latest \
  --param1 "test_value" \
  --param2 "another_value"

Test with different parameters

# Test error handling
docker run --rm your-function:latest --param1 "" --param2 "value"

# Test with file paths
docker run --rm \
  -v $(pwd)/test_data:/app/test_data:ro \
  your-function:latest \
  --param1 "/app/test_data/file.txt" \
  --param2 "value"

Deployment to AWS Lambda

Prerequisites

AWS CLI configured
```
aws configure
```

ECR repository created

aws ecr create-repository --repository-name your-function-lambda

IAM role with Lambda permissions

Deployment Steps

Authenticate to ECR

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <account-id>.dkr.ecr.us-east-1.amazonaws.com

Build for Lambda

docker buildx build --platform linux/amd64 -t your-function-lambda-amd64 -f containers/lambda/your_function/Dockerfile . --load

Tag for ECR

docker tag your-function-lambda-amd64:latest <account-id>.dkr.ecr.us-east-1.amazonaws.com/your-function-lambda:latest

Push to ECR

docker push <account-id>.dkr.ecr.us-east-1.amazonaws.com/your-function-lambda:latest

Create Lambda function

aws lambda create-function \
  --function-name your-function \
  --package-type Image \
  --code ImageUri=<account-id>.dkr.ecr.us-east-1.amazonaws.com/your-function-lambda:latest \
  --role arn:aws:iam::<account-id>:role/your-lambda-role \
  --timeout 900 \
  --memory-size 512

Update existing function

aws lambda update-function-code \
  --function-name your-function \
  --image-uri <account-id>.dkr.ecr.us-east-1.amazonaws.com/your-function-lambda:latest

Creating Docker containers for deploying Python scripts - Observatorio-do-Trabalho-de-Pernambuco/documentation GitHub Wiki

Docker Container Development

Table of Contents

Overview

Container Types

Template Containers

Lambda Containers

File Structure

Core Files Explained

1. Dockerfile

Template Dockerfile Pattern:

Lambda Dockerfile Pattern:

2. requirements.txt

Example requirements.txt:

3. main.py (Template) / lambda_handler.py (Lambda)

Template main.py Pattern:

Lambda lambda_handler.py Pattern:

Development Workflow

Step-by-Step Process

Docker Commands

Building Images

Template Container:

Lambda Container:

Running Containers

Template Container:

Lambda Container (Local Testing):

Managing Images

Testing Locally

Template Container Testing

Deployment to AWS Lambda

Prerequisites

Deployment Steps

Additional Resources

⚠️ GitHub.com Fallback ⚠️

Creating Docker containers for deploying Python scripts - Observatorio-do-Trabalho-de-Pernambuco/documentation GitHub Wiki

Docker Container Development

Table of Contents

Overview

Container Types

Template Containers

Lambda Containers

File Structure

Core Files Explained

1. Dockerfile

Template Dockerfile Pattern:

Lambda Dockerfile Pattern:

2. requirements.txt

Example requirements.txt:

3. main.py (Template) / lambda_handler.py (Lambda)

Template main.py Pattern:

Lambda lambda_handler.py Pattern:

Development Workflow

Step-by-Step Process

Docker Commands

Building Images

Template Container:

Lambda Container:

Running Containers

Template Container:

Lambda Container (Local Testing):

Managing Images

Testing Locally

Template Container Testing

Deployment to AWS Lambda

Prerequisites

Deployment Steps

Additional Resources

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️