Skill Setup DDL - accelerate-data/migration-utility GitHub Wiki
Extracts DDL and catalog metadata from a live SQL Server (or Fabric Warehouse) and writes local artifact files that downstream skills consume. Produces manifest.json, per-object DDL files in ddl/, and per-object catalog JSON files in catalog/. This is the first skill in the pipeline and must complete before any discovery, scoping, or profiling work.
/setup-ddl
No arguments. The skill runs an interactive workflow that prompts for database and schema selection before extracting anything.
Trigger phrases: "set up DDL", "extract DDL from SQL Server", "populate DDL", "connect to the remote database and get DDL", "pull DDL from the source database".
-
toolboxbinary on PATH -- themssqlMCP server requires genai-toolbox. Runtoolbox --versionto verify. Install fromhttps://github.com/googleapis/genai-toolbox/releasesif missing. -
Environment variables set -- the
mssqlMCP server reads these at startup:Variable Description Example MSSQL_HOSTSQL Server hostname or IP localhostMSSQL_PORTSQL Server port 1433MSSQL_DBDefault database (use masterif no specific default)masterSA_PASSWORDSQL login password (from env) -
Project root -- the skill confirms
pwdwith the user before proceeding. Ifmanifest.jsonalready exists, the skill readssource_databaseandextracted_schemasfrom it and skips database selection.
Lists user databases (database_id > 4) via sys.databases and asks the user to pick one. Skipped when manifest.json already exists.
Lists non-system schemas with object counts (tables, procedures, views, functions) and presents an all option. The user picks one or more schemas.
Runs count queries and catalog signal availability checks, then presents a summary:
Extraction preview for [database]
Schemas: silver, gold
Object counts:
Tables: 42
Procedures: 18
Views: 5
Functions: 3
Catalog signals available:
Primary keys: 38 constraints
Foreign keys: 22 constraints
Identity cols: 15 columns
CDC-tracked: 2 tables
DDL files will be written to: ./ddl/
Catalog files will be written to: ./catalog/
The user must confirm before any files are written.
uv run --project <shared-path> setup-ddl write-manifest \
--technology sql_server \
--database <database> \
--schemas <comma-separated-schemas>For each object type, queries OBJECT_DEFINITION() via the mssql MCP tool, saves the result to .staging/<type>.json, then runs:
uv run --project <shared-path> setup-ddl assemble-modules \
--input ./.staging/<type>.json \
--type <procedures|views|functions>Queries sys.tables, sys.columns, sys.types, and sys.identity_columns, saves to .staging/table_columns.json, then runs:
uv run --project <shared-path> setup-ddl assemble-tables \
--input ./.staging/table_columns.jsonRuns 12 catalog signal queries plus 3 DMF reference queries via MCP, saving each result to .staging/. Then processes all staging files in one pass:
uv run --project <shared-path> setup-ddl write-catalog \
--staging-dir ./.staging \
--database <database>uv run --project <shared-path> catalog-enrich --project-root .Augments catalog files with AST-derived references that catalog queries miss: CTAS/SELECT INTO targets, TRUNCATE targets, and indirect writers through EXEC call chains. Entries carry "detection": "ast_scan".
Displays a summary of all extracted files and tells the user they can run discover or scoping next.
This skill reads from a live SQL Server via the mssql MCP tool. No local catalog files are consumed.
| Field | Type | Required | Description |
|---|---|---|---|
schema_version |
string | yes | Always "1.0"
|
technology |
string | yes | Source technology. Enum: sql_server, fabric_warehouse, fabric_lakehouse, snowflake
|
dialect |
string | yes | sqlglot dialect. Enum: tsql, spark, snowflake
|
source_database |
string | yes | Name of the source database |
extracted_schemas |
string[] | yes | List of schemas included in the extraction |
extracted_at |
string | yes | ISO 8601 timestamp of extraction |
sandbox |
object | no | Sandbox metadata (run_id, database) -- added later by the test harness |
Technology-to-dialect mapping:
| Technology | Dialect | Delimiter |
|---|---|---|
sql_server |
tsql |
GO |
fabric_warehouse |
tsql |
GO |
fabric_lakehouse |
spark |
; |
snowflake |
snowflake |
; |
| File | Contents |
|---|---|
ddl/tables.sql |
CREATE TABLE statements, one per table |
ddl/procedures.sql |
CREATE PROCEDURE statements |
ddl/views.sql |
CREATE VIEW statements |
ddl/functions.sql |
CREATE FUNCTION statements |
Initial fields written by setup-ddl (profile and scoping sections are added later by other skills):
| Field | Type | Required | Description |
|---|---|---|---|
schema |
string | yes | Schema name |
name |
string | yes | Table name |
ddl_hash |
string | no | SHA-256 of normalized source DDL |
stale |
boolean | no |
true when object was present in prior extraction but absent in latest |
columns |
array | no | Column definitions: name, sql_type, is_nullable, is_identity
|
primary_keys |
array | yes | Declared PKs: constraint_name, columns[]
|
unique_indexes |
array | yes | Unique indexes: index_name, columns[]
|
foreign_keys |
array | yes | FK constraints: constraint_name, columns[], referenced_schema, referenced_table, referenced_columns[]
|
auto_increment_columns |
array | yes | Identity/auto-increment columns: column, seed, increment, mechanism
|
change_capture |
object | no | CDC/change tracking: enabled, mechanism (enum: cdc, change_tracking, stream, change_data_feed) |
sensitivity_classifications |
array | no | PII labels: column, label, information_type
|
referenced_by |
object | yes | Inbound references grouped by type (procedures, views, functions), each split into in_scope and out_of_scope
|
See Skill Analyzing Object for the full procedure catalog schema.
The 12 catalog signal queries produce these staging files in .staging/:
| Staging file | Signal |
|---|---|
table_columns.json |
Column definitions with types, nullability, identity |
pk_unique.json |
Primary key and unique index constraints |
foreign_keys.json |
Foreign key relationships |
identity_columns.json |
Identity columns |
cdc.json |
CDC-tracked tables |
change_tracking.json |
Change tracking tables (graceful -- tolerates missing feature) |
sensitivity.json |
Sensitivity classifications (graceful) |
object_types.json |
Object type map (U, V, P, FN, IF, TF) |
definitions.json |
All proc/view/function definitions for routing flag scan |
proc_params.json |
Procedure parameters |
proc_dmf.json |
DMF references for procedures |
view_dmf.json |
DMF references for views |
func_dmf.json |
DMF references for functions |
{
"schema_version": "1.0",
"technology": "sql_server",
"dialect": "tsql",
"source_database": "AdventureWorksDW",
"extracted_schemas": ["dbo", "silver", "gold"],
"extracted_at": "2025-03-15T14:30:00Z"
}{
"schema": "silver",
"name": "DimCustomer",
"columns": [
{ "name": "CustomerKey", "sql_type": "BIGINT", "is_nullable": false, "is_identity": true },
{ "name": "FirstName", "sql_type": "NVARCHAR(50)", "is_nullable": true, "is_identity": false },
{ "name": "Region", "sql_type": "NVARCHAR(50)", "is_nullable": true, "is_identity": false }
],
"primary_keys": [
{ "constraint_name": "PK_DimCustomer", "columns": ["CustomerKey"] }
],
"unique_indexes": [],
"foreign_keys": [
{
"constraint_name": "FK_DimCustomer_Region",
"columns": ["Region"],
"referenced_schema": "silver",
"referenced_table": "DimGeography",
"referenced_columns": ["GeographyKey"]
}
],
"auto_increment_columns": [
{ "column": "CustomerKey", "seed": 1, "increment": 1, "mechanism": "identity" }
],
"change_capture": { "enabled": true, "mechanism": "cdc" },
"sensitivity_classifications": [
{ "column": "FirstName", "label": "Confidential", "information_type": "Name" }
],
"referenced_by": {
"procedures": {
"in_scope": [
{ "schema": "silver", "name": "usp_load_DimCustomer", "is_selected": true, "is_updated": true, "detection": "catalog_query" }
],
"out_of_scope": []
},
"views": { "in_scope": [], "out_of_scope": [] },
"functions": { "in_scope": [], "out_of_scope": [] }
}
}| Error | Cause | Fix |
|---|---|---|
toolbox: command not found |
genai-toolbox not installed | Install from https://github.com/googleapis/genai-toolbox/releases and add to PATH |
MSSQL_HOST is not set |
Missing environment variable | Set MSSQL_HOST, MSSQL_PORT, MSSQL_DB, and SA_PASSWORD in .env or shell |
USE statement not carrying across calls |
Each MCP call is a discrete connection | Prepend USE [<database>]; to every SQL block -- this is handled automatically by the skill |
| Change tracking query fails | Feature not enabled on server | Graceful degradation -- the query uses TRY/CATCH and saves an error marker instead of failing |
| Sensitivity classifications query fails | Feature not available (e.g., SQL Server edition) | Graceful degradation -- same TRY/CATCH pattern |
DMF reference returns ERROR: rows |
sys.dm_sql_referenced_entities cannot resolve a proc (e.g., missing dependent object) |
Error rows are recorded in catalog; the skill continues with remaining objects |
Dynamic SQL procs missing from referenced_by
|
sys.dm_sql_referenced_entities resolves at definition time, not runtime |
Known limitation -- these procs require LLM analysis via Skill Analyzing Object |
| Re-running on existing project |
ddl/ and catalog/ already exist |
Skill warns and asks for confirmation -- re-run fully rebuilds both directories |