Migration Guide - fedjo/CEI-InOE GitHub Wiki

Migration Guide: Flyway → Alembic + SQLAlchemy

This guide documents the migration from Flyway SQL migrations to Alembic with SQLAlchemy models.

Overview of Changes

Schema Changes

Old Name	New Name	Notes
`device`	`datasource`	Generic term for all data sources
`generic_device`	(merged into `datasource`)	Unified model
`ingest_file`	`ingest_batch`	Better reflects batch processing
`file_id` (FK)	`batch_id`	Foreign key renamed

Architecture Changes

API Service: Now uses SQLAlchemy ORM for clean, type-safe queries
Ingestor Service: Uses SQLAlchemy Core for high-performance bulk operations
Shared Package: Single source of truth for models, schemas, and database utilities

Step-by-Step Migration

Step 1: Backup Your Data

# Dump existing data
pg_dump $DATABASE_URL > backup_$(date +%Y%m%d).sql

# Or export specific tables
pg_dump $DATABASE_URL -t fact_energy_hourly -t fact_energy_daily > energy_backup.sql

Step 2: Set Up Environment

# From project root
cd /path/to/CEI-InOE

# Create/activate virtual environment
python -m venv venv
source venv/bin/activate

# Install shared package (editable mode)
pip install -e ./shared

# Install migration dependencies
pip install alembic psycopg2-binary

Step 3: Review Migration

Check the initial migration at alembic/versions/20260226_0001_001_initial_*.py:

# Show SQL that would be generated
alembic upgrade head --sql

Step 4: Apply Migration

Option A: Fresh Database (Recommended for dev)

# Drop existing schema and apply fresh
psql $DATABASE_URL -c "DROP SCHEMA public CASCADE; CREATE SCHEMA public;"
alembic upgrade head

Option B: Side-by-Side (For production planning)

# Create new tables alongside old ones, then migrate data
alembic upgrade head
# Data migration scripts would go here

Step 5: Seed Data

# Load datasource definitions
psql $DATABASE_URL -f db/seeds/datasources.sql

Step 6: Update Services

API Service

Replace the router imports in api/app/main.py:

# Old imports (remove)
from app.routers import devices, energy, environmental, dairy

# New imports (add)
from app.routers import datasources, energy_orm, environmental_orm, dairy_orm, batches

Update router registration:

# Old (remove)
app.include_router(devices.router)
app.include_router(energy.router)

# New (add)
app.include_router(datasources.router)
app.include_router(energy_orm.router)
app.include_router(environmental_orm.router)
app.include_router(dairy_orm.router)
app.include_router(batches.router)

Or simply use the new main_orm.py:

# Replace main.py with the ORM version
cp api/app/main_orm.py api/app/main.py

Ingestor Service

Update DAO imports in ingestor files:

# Old import
from app.dao.factory import DAOFactory

# New import  
from app.core_dao.factory import CoreDAOFactory

Update DAO usage:

# Old
factory = DAOFactory(conn)
device_dao = factory.get_device_dao()
file_dao = factory.get_ingest_file_dao()
device_id = device_dao.get_or_create_device(external_id, source_type)
file_id = file_dao.create_or_get_file(filename, device_id)

# New
factory = CoreDAOFactory(conn)
datasource_dao = factory.get_datasource_dao()
batch_dao = factory.get_batch_dao()
datasource_id = datasource_dao.get_or_create(external_id, source_type)
batch_id = batch_dao.create_or_get_batch(filename, datasource_id)

Step 7: Update Pipeline Mappings

Update YAML mapping files to use new column names:

# Old mapping
foreign_keys:
  file_id: "{{ file_id }}"

# New mapping
foreign_keys:
  batch_id: "{{ batch_id }}"

Step 8: Docker Deployment

Use the new docker-compose configuration:

# Backup old compose file
mv docker-compose.yaml docker-compose.old.yaml
mv docker-compose.new.yaml docker-compose.yaml

# Run with migrations
docker-compose up -d

The new compose file includes an alembic-migrations service that runs automatically on startup.

Verification Checklist

All tables created: datasource, ingest_batch, fact_energy_*, etc.
Indexes created on timestamp and foreign key columns
Foreign key constraints active
API health endpoint returns OK: curl http://localhost:8000/health
Ingestor can process test file
Grafana dashboards updated to use new table names

Rollback Plan

If issues occur:

# Revert to old schema
alembic downgrade base

# Or restore from backup
psql $DATABASE_URL < backup_YYYYMMDD.sql

File Reference

File	Purpose
`shared/src/shared/models.py`	SQLAlchemy model definitions
`shared/src/shared/schemas.py`	Pydantic schemas for API
`shared/src/shared/database.py`	Database connection utilities
`alembic/versions/*.py`	Migration files
`api/app/routers/*_orm.py`	New ORM-based API routers
`api/app/db/queries/*.py`	ORM query functions
`ingestor/app/core_dao/*.py`	SQLAlchemy Core DAOs

Common Issues

"Import 'shared' could not be resolved"

Install the shared package: pip install -e ./shared

"relation 'datasource' does not exist"

Run migrations: alembic upgrade head

"column 'file_id' does not exist"

Update code to use batch_id instead of file_id

Foreign key violations during data migration

Ensure datasource records exist before inserting ingest_batch records.