Schema Change Guide - fedjo/CEI-InOE GitHub Wiki

Schema Change Guide

This guide explains how to handle database schema changes and their impact on the CEI-InOE services.

Overview

The database schema is the source of truth for data structures. Changes flow through:

Migration → PostgreSQL → Services (Ingestor + API)

Types of Schema Changes

Change Type Complexity Downtime?
Add nullable column Low No
Add required column Medium No*
Rename column High No*
Drop column Medium No*
Add table Medium No
Add view Low No
Modify view Low No

* With proper coordination


Example Scenarios

Scenario 1: Add New Nullable Column

Requirement: Add quality_score to dairy production.

Step 1: Create Migration

-- db/migrations/V010__dairy_quality_score.sql

ALTER TABLE dairy_production 
ADD COLUMN quality_score DECIMAL(4,2);

COMMENT ON COLUMN dairy_production.quality_score IS 
    'Calculated milk quality score (0-100)';

Step 2: Update Ingestor (if field comes from source)

# ingestor/app/models.py
class DairyProductionRecord(BaseModel):
    # ...existing fields...
    quality_score: Optional[float] = None  # New field
# ingestor/app/mappings/dairy_production.yaml
columns:
  # ...existing mappings...
  quality_score:
    source: "Quality Score"
    type: float
    nullable: true

Step 3: Update API

# api/app/schemas/dairy.py
class DairyProductionRecord(BaseModel):
    # ...existing fields...
    quality_score: float | None = None  # New field
# api/app/db/queries/dairy.py
# Add quality_score to SELECT statement

Step 4: Deploy

# 1. Apply migration
docker-compose run flyway migrate

# 2. Deploy API (can read existing data, new field will be null)
docker-compose up -d api

# 3. Deploy ingestor (will start writing new field)
docker-compose up -d ingestor

Impact:

  • ✅ API works before ingestor deploys (returns null for new field)
  • ✅ No downtime required
  • ✅ Backward compatible

Scenario 2: Rename Column

Requirement: Rename energy_kwh to kwh in energy tables.

Step 1: Migration (Transition Period)

-- db/migrations/V010__energy_rename_kwh.sql

-- Add new column with trigger to keep both in sync
ALTER TABLE fact_energy_hourly 
ADD COLUMN kwh FLOAT;

UPDATE fact_energy_hourly 
SET kwh = energy_kwh;

-- Create sync trigger
CREATE OR REPLACE FUNCTION sync_energy_kwh()
RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'INSERT' OR TG_OP = 'UPDATE' THEN
        IF NEW.kwh IS NULL AND NEW.energy_kwh IS NOT NULL THEN
            NEW.kwh := NEW.energy_kwh;
        ELSIF NEW.energy_kwh IS NULL AND NEW.kwh IS NOT NULL THEN
            NEW.energy_kwh := NEW.kwh;
        END IF;
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER tr_sync_energy_kwh
BEFORE INSERT OR UPDATE ON fact_energy_hourly
FOR EACH ROW EXECUTE FUNCTION sync_energy_kwh();

Step 2: Update API to use new column name

# api/app/db/queries/energy.py
# Change: energy_kwh as kwh → kwh

Step 3: Update Ingestor to write to new column

# ingestor/app/...
# Update to write to 'kwh' column

Step 4: Cleanup migration (after all services updated)

-- db/migrations/V011__energy_drop_old_kwh.sql

DROP TRIGGER tr_sync_energy_kwh ON fact_energy_hourly;
DROP FUNCTION sync_energy_kwh();
ALTER TABLE fact_energy_hourly DROP COLUMN energy_kwh;

Impact:

  • ⚠️ Requires two-phase deployment
  • ✅ Zero downtime with transition period
  • ✅ Can rollback during transition

Scenario 3: Add New Table

Requirement: Add water consumption tracking.

Step 1: Migration

-- db/migrations/V010__water_consumption.sql

CREATE TABLE water_consumption (
    id SERIAL PRIMARY KEY,
    device_id INTEGER REFERENCES generic_device(id),
    ts TIMESTAMP NOT NULL,
    liters DECIMAL(12,2),
    source_file UUID REFERENCES ingest_file(file_id),
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_water_consumption_ts ON water_consumption(ts);
CREATE INDEX idx_water_consumption_device ON water_consumption(device_id);

Step 2: Add Ingestor Components

ingestor/app/
├── models.py                    # Add WaterConsumptionRecord
├── mappings/
│   └── water_consumption.yaml   # New mapping file
└── dao/
    └── (extend data_dao or create water_dao)

Step 3: Add API Components

api/app/
├── schemas/
│   └── water.py                 # New response schemas
├── db/queries/
│   └── water.py                 # New query functions
└── routers/
    └── water.py                 # New router
# api/app/main.py
from app.routers import water

app.include_router(
    water.router, 
    prefix="/api/v1/water", 
    tags=["Water"]
)

Step 4: Deploy

# 1. Apply migration
docker-compose run flyway migrate

# 2. Deploy API (endpoints ready, no data yet)
docker-compose up -d api

# 3. Deploy ingestor (starts collecting data)
docker-compose up -d ingestor

Scenario 4: Add Aggregation View

Requirement: Add monthly energy summary view.

Step 1: Migration only

-- db/migrations/V010__energy_monthly_view.sql

CREATE OR REPLACE VIEW v_energy_monthly AS
SELECT 
    DATE_TRUNC('month', ts)::date AS month,
    device_id,
    SUM(energy_kwh) AS total_kwh,
    COUNT(*) AS reading_count
FROM fact_energy_hourly
GROUP BY DATE_TRUNC('month', ts), device_id
ORDER BY month DESC;

Step 2: Add API endpoint (optional query function)

# api/app/db/queries/energy.py
def get_monthly_summary(limit: int = 12):
    return execute_query(
        "SELECT * FROM v_energy_monthly LIMIT %(limit)s",
        {"limit": limit}
    )

Impact:

  • ✅ No ingestor changes needed
  • ✅ Grafana can also query the view
  • ✅ Simple deployment

Change Impact Matrix

Change Ingestor API Grafana
Add nullable column Update if ingesting Update if exposing May need update
Add required column Must update Must update Must update
Rename column Update mappings Update queries Update queries
Add table Add model+DAO Add router+schema Add dashboard
Add view None Can query Can query
Drop column Remove from model Remove from schema Remove from queries

Deployment Order

Safe Order (No Downtime)

1. Apply database migration
   └── Schema changes are live

2. Deploy API service
   └── Reads are safe (new columns return null)
   └── Old endpoints still work

3. Deploy Ingestor service
   └── Writes now include new columns

Why This Order?

  • API first: Read operations are inherently safe
  • New columns default to null: Existing API code handles missing data
  • Ingestor last: Ensures writes are compatible with new schema

Rollback Procedures

Column Addition Rollback

-- If something goes wrong after adding a column
ALTER TABLE table_name DROP COLUMN column_name;

Complex Change Rollback

For changes with triggers/functions:

-- Store rollback script
-- db/migrations/V010__change_ROLLBACK.sql (not applied automatically)
DROP TRIGGER IF EXISTS trigger_name ON table_name;
DROP FUNCTION IF EXISTS function_name();
ALTER TABLE table_name DROP COLUMN IF EXISTS new_column;

Pre-Change Checklist

  • Document the change requirement
  • Identify affected services (Ingestor, API, Grafana)
  • Write migration SQL
  • Test migration on local database
  • Update Ingestor models/mappings (if applicable)
  • Update API schemas/queries (if applicable)
  • Run tests for both services
  • Plan deployment order
  • Prepare rollback procedure

Post-Change Checklist

  • Migration applied successfully
  • API health check passes
  • API endpoints return expected data
  • Ingestor starts without errors
  • New data is being ingested correctly
  • Grafana dashboards work (if affected)
  • Monitor for errors in logs