Skill: Setup DDL

Purpose

Extracts DDL and catalog metadata from a live SQL Server (or Fabric Warehouse) and writes local artifact files that downstream skills consume. Produces manifest.json, per-object DDL files in ddl/, and per-object catalog JSON files in catalog/. This is the first skill in the pipeline and must complete before any discovery, scoping, or profiling work.

Invocation

/setup-ddl

No arguments. The skill runs an interactive workflow that prompts for database and schema selection before extracting anything.

Trigger phrases: "set up DDL", "extract DDL from SQL Server", "populate DDL", "connect to the remote database and get DDL", "pull DDL from the source database".

Prerequisites

toolbox binary on PATH -- the mssql MCP server requires genai-toolbox. Run toolbox --version to verify. Install from https://github.com/googleapis/genai-toolbox/releases if missing.

Environment variables set -- the mssql MCP server reads these at startup:

Variable	Description	Example
`MSSQL_HOST`	SQL Server hostname or IP	`localhost`
`MSSQL_PORT`	SQL Server port	`1433`
`MSSQL_DB`	Default database (use `master` if no specific default)	`master`
`SA_PASSWORD`	SQL login password	(from env)

Project root -- the skill confirms pwd with the user before proceeding. If manifest.json already exists, the skill reads source_database and extracted_schemas from it and skips database selection.

Pipeline

Step 1 -- Select database (interactive)

Lists user databases (database_id > 4) via sys.databases and asks the user to pick one. Skipped when manifest.json already exists.

Step 2 -- Select schemas (interactive)

Lists non-system schemas with object counts (tables, procedures, views, functions) and presents an all option. The user picks one or more schemas.

Step 3 -- Extraction preview and confirm (interactive)

Runs count queries and catalog signal availability checks, then presents a summary:

Extraction preview for [database]
Schemas: silver, gold

  Object counts:
    Tables:     42
    Procedures: 18
    Views:      5
    Functions:  3

  Catalog signals available:
    Primary keys:    38 constraints
    Foreign keys:    22 constraints
    Identity cols:   15 columns
    CDC-tracked:     2 tables

  DDL files will be written to:     ./ddl/
  Catalog files will be written to: ./catalog/

The user must confirm before any files are written.

Step 4 -- Write manifest (deterministic)

uv run --project <shared-path> setup-ddl write-manifest \
  --technology sql_server \
  --database <database> \
  --schemas <comma-separated-schemas>

Step 5 -- Export procedures, views, and functions (MCP + CLI)

For each object type, queries OBJECT_DEFINITION() via the mssql MCP tool, saves the result to .staging/<type>.json, then runs:

uv run --project <shared-path> setup-ddl assemble-modules \
  --input ./.staging/<type>.json \
  --type <procedures|views|functions>

Step 6 -- Export tables (MCP + CLI)

Queries sys.tables, sys.columns, sys.types, and sys.identity_columns, saves to .staging/table_columns.json, then runs:

uv run --project <shared-path> setup-ddl assemble-tables \
  --input ./.staging/table_columns.json

Step 7 -- Extract catalog signals and references (MCP + CLI)

Runs 12 catalog signal queries plus 3 DMF reference queries via MCP, saving each result to .staging/. Then processes all staging files in one pass:

uv run --project <shared-path> setup-ddl write-catalog \
  --staging-dir ./.staging \
  --database <database>

Step 8 -- AST enrichment (deterministic)

uv run --project <shared-path> catalog-enrich --project-root .

Augments catalog files with AST-derived references that catalog queries miss: CTAS/SELECT INTO targets, TRUNCATE targets, and indirect writers through EXEC call chains. Entries carry "detection": "ast_scan".

Step 9 -- Report

Displays a summary of all extracted files and tells the user they can run discover or scoping next.

Reads

This skill reads from a live SQL Server via the mssql MCP tool. No local catalog files are consumed.

Writes

`manifest.json`

Field	Type	Required	Description
`schema_version`	string	yes	Always `"1.0"`
`technology`	string	yes	Source technology. Enum: `sql_server`, `fabric_warehouse`, `fabric_lakehouse`, `snowflake`
`dialect`	string	yes	sqlglot dialect. Enum: `tsql`, `spark`, `snowflake`
`source_database`	string	yes	Name of the source database
`extracted_schemas`	string[]	yes	List of schemas included in the extraction
`extracted_at`	string	yes	ISO 8601 timestamp of extraction
`sandbox`	object	no	Sandbox metadata (`run_id`, `database`) -- added later by the test harness

Technology-to-dialect mapping:

Technology	Dialect	Delimiter
`sql_server`	`tsql`	`GO`
`fabric_warehouse`	`tsql`	`GO`
`fabric_lakehouse`	`spark`	`;`
`snowflake`	`snowflake`	`;`

`ddl/` directory

File	Contents
`ddl/tables.sql`	CREATE TABLE statements, one per table
`ddl/procedures.sql`	CREATE PROCEDURE statements
`ddl/views.sql`	CREATE VIEW statements
`ddl/functions.sql`	CREATE FUNCTION statements

`catalog/tables/<schema>.<table>.json`

Initial fields written by setup-ddl (profile and scoping sections are added later by other skills):

Field	Type	Required	Description
`schema`	string	yes	Schema name
`name`	string	yes	Table name
`ddl_hash`	string	no	SHA-256 of normalized source DDL
`stale`	boolean	no	`true` when object was present in prior extraction but absent in latest
`columns`	array	no	Column definitions: `name`, `sql_type`, `is_nullable`, `is_identity`
`primary_keys`	array	yes	Declared PKs: `constraint_name`, `columns[]`
`unique_indexes`	array	yes	Unique indexes: `index_name`, `columns[]`
`foreign_keys`	array	yes	FK constraints: `constraint_name`, `columns[]`, `referenced_schema`, `referenced_table`, `referenced_columns[]`
`auto_increment_columns`	array	yes	Identity/auto-increment columns: `column`, `seed`, `increment`, `mechanism`
`change_capture`	object	no	CDC/change tracking: `enabled`, `mechanism` (enum: `cdc`, `change_tracking`, `stream`, `change_data_feed`)
`sensitivity_classifications`	array	no	PII labels: `column`, `label`, `information_type`
`referenced_by`	object	yes	Inbound references grouped by type (`procedures`, `views`, `functions`), each split into `in_scope` and `out_of_scope`

`catalog/procedures/<schema>.<proc>.json`

See Skill Analyzing Object for the full procedure catalog schema.

Catalog signal staging files

The 12 catalog signal queries produce these staging files in .staging/:

Staging file	Signal
`table_columns.json`	Column definitions with types, nullability, identity
`pk_unique.json`	Primary key and unique index constraints
`foreign_keys.json`	Foreign key relationships
`identity_columns.json`	Identity columns
`cdc.json`	CDC-tracked tables
`change_tracking.json`	Change tracking tables (graceful -- tolerates missing feature)
`sensitivity.json`	Sensitivity classifications (graceful)
`object_types.json`	Object type map (U, V, P, FN, IF, TF)
`definitions.json`	All proc/view/function definitions for routing flag scan
`proc_params.json`	Procedure parameters
`proc_dmf.json`	DMF references for procedures
`view_dmf.json`	DMF references for views
`func_dmf.json`	DMF references for functions

JSON Format

`manifest.json` example

{
  "schema_version": "1.0",
  "technology": "sql_server",
  "dialect": "tsql",
  "source_database": "AdventureWorksDW",
  "extracted_schemas": ["dbo", "silver", "gold"],
  "extracted_at": "2025-03-15T14:30:00Z"
}

`catalog/tables/silver.dimcustomer.json` example (initial)

{
  "schema": "silver",
  "name": "DimCustomer",
  "columns": [
    { "name": "CustomerKey", "sql_type": "BIGINT", "is_nullable": false, "is_identity": true },
    { "name": "FirstName", "sql_type": "NVARCHAR(50)", "is_nullable": true, "is_identity": false },
    { "name": "Region", "sql_type": "NVARCHAR(50)", "is_nullable": true, "is_identity": false }
  ],
  "primary_keys": [
    { "constraint_name": "PK_DimCustomer", "columns": ["CustomerKey"] }
  ],
  "unique_indexes": [],
  "foreign_keys": [
    {
      "constraint_name": "FK_DimCustomer_Region",
      "columns": ["Region"],
      "referenced_schema": "silver",
      "referenced_table": "DimGeography",
      "referenced_columns": ["GeographyKey"]
    }
  ],
  "auto_increment_columns": [
    { "column": "CustomerKey", "seed": 1, "increment": 1, "mechanism": "identity" }
  ],
  "change_capture": { "enabled": true, "mechanism": "cdc" },
  "sensitivity_classifications": [
    { "column": "FirstName", "label": "Confidential", "information_type": "Name" }
  ],
  "referenced_by": {
    "procedures": {
      "in_scope": [
        { "schema": "silver", "name": "usp_load_DimCustomer", "is_selected": true, "is_updated": true, "detection": "catalog_query" }
      ],
      "out_of_scope": []
    },
    "views": { "in_scope": [], "out_of_scope": [] },
    "functions": { "in_scope": [], "out_of_scope": [] }
  }
}

Troubleshooting

Error	Cause	Fix
`toolbox: command not found`	genai-toolbox not installed	Install from `https://github.com/googleapis/genai-toolbox/releases` and add to PATH
`MSSQL_HOST is not set`	Missing environment variable	Set `MSSQL_HOST`, `MSSQL_PORT`, `MSSQL_DB`, and `SA_PASSWORD` in `.env` or shell
`USE` statement not carrying across calls	Each MCP call is a discrete connection	Prepend `USE [<database>];` to every SQL block -- this is handled automatically by the skill
Change tracking query fails	Feature not enabled on server	Graceful degradation -- the query uses TRY/CATCH and saves an error marker instead of failing
Sensitivity classifications query fails	Feature not available (e.g., SQL Server edition)	Graceful degradation -- same TRY/CATCH pattern
DMF reference returns `ERROR:` rows	`sys.dm_sql_referenced_entities` cannot resolve a proc (e.g., missing dependent object)	Error rows are recorded in catalog; the skill continues with remaining objects
Dynamic SQL procs missing from `referenced_by`	`sys.dm_sql_referenced_entities` resolves at definition time, not runtime	Known limitation -- these procs require LLM analysis via Skill Analyzing Object
Re-running on existing project	`ddl/` and `catalog/` already exist	Skill warns and asks for confirmation -- re-run fully rebuilds both directories

Skill Setup DDL - accelerate-data/migration-utility GitHub Wiki

Skill: Setup DDL

Purpose

Invocation

Prerequisites

Pipeline

Step 1 -- Select database (interactive)

Step 2 -- Select schemas (interactive)

Step 3 -- Extraction preview and confirm (interactive)

Step 4 -- Write manifest (deterministic)

Step 5 -- Export procedures, views, and functions (MCP + CLI)

Step 6 -- Export tables (MCP + CLI)

Step 7 -- Extract catalog signals and references (MCP + CLI)

Step 8 -- AST enrichment (deterministic)

Step 9 -- Report

Reads

Writes

`manifest.json`

`ddl/` directory

`catalog/tables/<schema>.<table>.json`

`catalog/procedures/<schema>.<proc>.json`

Catalog signal staging files

JSON Format

`manifest.json` example

`catalog/tables/silver.dimcustomer.json` example (initial)

Troubleshooting

⚠️ GitHub.com Fallback ⚠️

Skill Setup DDL - accelerate-data/migration-utility GitHub Wiki

Skill: Setup DDL

Purpose

Invocation

Prerequisites

Pipeline

Step 1 -- Select database (interactive)

Step 2 -- Select schemas (interactive)

Step 3 -- Extraction preview and confirm (interactive)

Step 4 -- Write manifest (deterministic)

Step 5 -- Export procedures, views, and functions (MCP + CLI)

Step 6 -- Export tables (MCP + CLI)

Step 7 -- Extract catalog signals and references (MCP + CLI)

Step 8 -- AST enrichment (deterministic)

Step 9 -- Report

Reads

Writes

manifest.json

ddl/ directory

catalog/tables/<schema>.<table>.json

catalog/procedures/<schema>.<proc>.json

Catalog signal staging files

JSON Format

manifest.json example

catalog/tables/silver.dimcustomer.json example (initial)

Troubleshooting

⚠️ **GitHub.com Fallback** ⚠️

`manifest.json`

`ddl/` directory

`catalog/tables/<schema>.<table>.json`

`catalog/procedures/<schema>.<proc>.json`

`manifest.json` example

`catalog/tables/silver.dimcustomer.json` example (initial)

⚠️ GitHub.com Fallback ⚠️