Skill: Profiling Table

Purpose

Profiles a single table for migration by assembling deterministic context from catalog files, reasoning over six profiling questions (classification, primary key, foreign keys, natural vs surrogate key, watermark, PII), presenting results for user approval, and writing the approved profile to the table catalog file. The profile drives materialization, test generation, and model generation downstream.

Invocation

/profiling-table <schema.table>

Argument is the fully-qualified table name. The skill asks if missing. The writer procedure is read automatically from the catalog's scoping.selected_writer field.

Trigger phrases: "profile a table", "classify a table", "what kind of model is this table", "determine PK, FK, watermark, or PII for a migration target".

Prerequisites

manifest.json must exist in the project root. If missing, run ad-migration setup-source first.
catalog/tables/<table>.json must exist. If missing, run /listing-objects list tables to see available tables.
scoping.selected_writer must be set in the table catalog. If missing, run Skill Analyzing Table first.
The skill checks profiling readiness through migrate-util ready and stops on the surfaced code if the object is not ready.

Pipeline

1. Assemble context (deterministic)

uv run --project <shared-path> profile context --table <table>

The CLI reads the selected writer from the table's catalog scoping section. Output shape is enforced by the ProfileContext Pydantic model in lib/shared/output_models.py:

catalog_signals -- PKs, FKs, identity columns, unique indexes, change capture, sensitivity classifications
writer_references -- outbound references from the writer procedure with column-level read/write flags
proc_body -- full SQL body of the writer procedure
columns -- target table column list with types and nullability
related_procedures -- related procedure context when referenced by the writer

2. LLM profiling (reasoning)

The LLM reads the context JSON and answers six profiling questions using signal tables and pattern matching rules from the profiling-signals reference.

3. Present for approval (interactive)

Profile is presented as a structured summary. If a required question (Q1, Q2, Q4, Q5) cannot be answered with reasonable confidence, the ambiguity is presented to the user for guidance. The user must explicitly approve before the profile is persisted.

4. Write to catalog (deterministic)

mkdir -p .staging
# Write profile JSON to .staging/profile.json
uv run --project <shared-path> profile write \
  --table <table> \
  --profile-file .staging/profile.json; rm -rf .staging

Reads

File	Description
`manifest.json`	Project root validation
`catalog/tables/<table>.json`	Column list, catalog signals, scoping section
`catalog/procedures/<writer>.json`	Writer procedure references and body
`ddl/procedures.sql`	Writer procedure raw DDL

Writes

`profile` section in `catalog/tables/<table>.json`

The profile section follows table_catalog.json#/$defs/profile_section.

Field	Type	Required	Description
`status`	string	yes	Enum: `ok`, `partial`, `error`
`writer`	string	yes	FQN of the writer procedure used for profiling
`classification`	object	no	Model classification results
`primary_key`	object	no	Primary key determination
`natural_key`	object	no	Natural key determination
`watermark`	object	no	Incremental watermark column
`foreign_keys`	array	no	Foreign key relationships with types
`pii_actions`	array	no	PII column handling recommendations
`warnings`	array	no	Diagnostics entries
`errors`	array	no	Diagnostics entries

The Six Profiling Questions

Q1 -- What kind of model is this? (Required)

Determines materialization strategy, dbt tests, and whether SCD2 history logic is needed.

Write-pattern signals:

Proc pattern	Classification
Pure INSERT, no UPDATE or DELETE	`fact_transaction`
INSERT with GROUP BY (aggregation before write)	`fact_aggregate`
TRUNCATE + INSERT with descriptive VARCHAR columns	`dim_non_scd`
TRUNCATE + INSERT with measure + FK columns	`fact_periodic_snapshot`
MERGE with simple WHEN MATCHED THEN UPDATE only	`dim_scd1`
MERGE with expire matched row + insert history row (`valid_to`, `is_current`)	`dim_scd2`
INSERT then UPDATE targeting milestone date columns	`fact_accumulating_snapshot`
Cross-join INSERT of low-cardinality flag combinations	`dim_junk`

Column shape signals:

Column pattern	Signal
`valid_from` / `valid_to` / `is_current` / `current_flag`	`dim_scd2`
Multiple milestone date columns (`order_date`, `ship_date`, `close_date`)	`fact_accumulating_snapshot`
`snapshot_date` / `as_of_date` / `period_date`	`fact_periodic_snapshot`
Multiple BIT/TINYINT flag columns, all low-cardinality	`dim_junk` candidate
Surrogate PK (`_sk`) + separate natural key column	Dimension (SCD1 or SCD2)
FK columns (`_sk`) + numeric measure columns	Fact table

classification object fields:

Field	Type	Description
`resolved_kind`	string	Enum: `dim_non_scd`, `dim_scd1`, `dim_scd2`, `dim_junk`, `fact_transaction`, `fact_periodic_snapshot`, `fact_accumulating_snapshot`, `fact_aggregate`
`rationale`	string	Why this classification was chosen
`source`	string	Enum: `catalog`, `llm`, `catalog+llm`

Q2 -- Primary key candidate (Required)

Required for unique_key in incremental models. Missing it forces full-refresh or produces duplicates.

Signals beyond catalog PKs:

Source	Notes
MERGE ON clause in proc code	Business key / table grain -- strongest code-level signal
UPDATE / DELETE WHERE col = @param	Single-row lookup key

primary_key object fields:

Field	Type	Description
`columns`	string[]	PK column names
`primary_key_type`	string	Enum: `surrogate`, `natural`, `composite`, `unknown`
`rationale`	string	Why these columns were identified as PK and why this type
`source`	string	Enum: `catalog`, `llm`, `catalog+llm`

Q3 -- Foreign key candidates (Nice-to-have)

Needed for relationships tests and detecting role-playing / degenerate dimensions.

FK type resolution:

`fk_type`	Rule
`standard`	One fact column joins one dimension key with no multi-role pattern
`role_playing`	Two or more distinct fact columns join the same dimension relation+key
`degenerate`	Column behaves as business key in fact usage but no dimension join target found

foreign_keys[] entry fields:

Field	Type	Description
`column`	string	FK column name on this table
`references_source_relation`	string	Referenced table (source-side SQL Server identifier)
`references_column`	string	Referenced column
`fk_type`	string	Enum: `standard`, `role_playing`, `degenerate`
`rationale`	string	Why this FK type was chosen
`source`	string	Enum: `catalog`, `llm`, `catalog+llm`

Q4 -- Natural key vs surrogate key (Required)

Determines whether the model calls dbt_utils.generate_surrogate_key and whether the incremental unique_key is a raw column or generated hash.

Signals:

Signal	Notes
`NEWID()` / `NEWSEQUENTIALID()` / `NEXT VALUE FOR` in proc body	Definitive proc-assigned surrogate
Column name suffix `_sk` / `_guid`	Surrogate
Column name suffix `_code` / `_number` / `_num`	Natural
MERGE ON uses different column from INSERT's PK column	MERGE ON = natural key; INSERT PK = surrogate key

natural_key object fields:

Field	Type	Description
`columns`	string[]	Natural key column names
`rationale`	string	Why these columns are the natural key
`source`	string	Enum: `catalog`, `llm`, `catalog+llm`

Q5 -- Incremental watermark (Required)

Without a watermark column, the model can only be table (full refresh). Operationally unacceptable for large fact tables.

Signals:

Source	Notes
WHERE clause in proc body	`WHERE load_date > @last_run` -- nearly definitive
Column name patterns	`modified_at`, `updated_at`, `load_date`, `etl_date`, `batch_date`, `_dt`, `_ts`, `_dttm`
CDC / CT metadata	Informs strategy but does not identify the watermark column

watermark object fields:

Field	Type	Description
`column`	string	Watermark column name
`rationale`	string	Why this column was chosen
`source`	string	Enum: `catalog`, `llm`, `catalog+llm`

Q6 -- PII handling candidates (Nice-to-have)

Does not affect SQL correctness. Missing PII detection is a compliance risk.

Signals beyond catalog sensitivity classifications:

Source	Notes
Column name patterns	`email`, `ssn`, `dob`, `phone`, `address`, `zip`, `credit_card`, `passport`, `national_id`, `ip_address`, `birth_date`, `first_name`, `last_name`, `full_name`
Column type + context	VARCHAR/NVARCHAR with PII-suggestive names

pii_actions[] entry fields:

Field	Type	Description
`column`	string	Column containing PII
`entity`	string	PII entity type (e.g., `email_address`, `phone_number`)
`suggested_action`	string	Enum: `mask`, `drop`, `tokenize`, `keep`
`rationale`	string	Why this action was suggested
`source`	string	Enum: `catalog`, `llm`, `catalog+llm`

Suggested action meanings:

Action	When
`mask`	Default for confirmed PII
`drop`	Column not needed downstream
`tokenize`	Joinability must be preserved
`keep`	Explicit business justification

The `source` Attribution Field

Every decision point in the profile includes a source field indicating how the answer was derived:

Value	Meaning
`catalog`	Entirely from catalog signals (PKs, FKs, identity columns, sensitivity classifications)
`llm`	Entirely from LLM reasoning over proc body, column names, or patterns
`catalog+llm`	Catalog provided partial signal, LLM completed the answer

Catalog signals are treated as facts. The LLM fills in what the catalog does not answer.

JSON Format

Profile section example

{
  "profile": {
    "status": "ok",
    "writer": "silver.usp_load_dimcustomer",
    "classification": {
      "resolved_kind": "dim_scd1",
      "rationale": "MERGE with WHEN MATCHED THEN UPDATE on all non-key columns. No valid_from/valid_to columns present, ruling out SCD2.",
      "source": "catalog+llm"
    },
    "primary_key": {
      "columns": ["CustomerKey"],
      "primary_key_type": "surrogate",
      "rationale": "CustomerKey is an identity column (catalog PK constraint PK_DimCustomer). Auto-increment seed=1, increment=1.",
      "source": "catalog"
    },
    "natural_key": {
      "columns": ["CustomerAlternateKey"],
      "rationale": "MERGE ON clause uses CustomerAlternateKey to match source to target, confirming it as the business key.",
      "source": "llm"
    },
    "watermark": {
      "column": "ModifiedDate",
      "rationale": "Proc WHERE clause filters on ModifiedDate > @LastLoadDate. DATETIME2 type confirms suitability as watermark.",
      "source": "llm"
    },
    "foreign_keys": [
      {
        "column": "GeographyKey",
        "references_source_relation": "silver.DimGeography",
        "references_column": "GeographyKey",
        "fk_type": "standard",
        "rationale": "Declared FK constraint FK_DimCustomer_Geography. Single column, single dimension -- standard FK.",
        "source": "catalog"
      }
    ],
    "pii_actions": [
      {
        "column": "EmailAddress",
        "entity": "email_address",
        "suggested_action": "mask",
        "rationale": "Column name matches PII pattern 'email'. NVARCHAR type confirms string content.",
        "source": "llm"
      },
      {
        "column": "Phone",
        "entity": "phone_number",
        "suggested_action": "mask",
        "rationale": "Column name matches PII pattern 'phone'.",
        "source": "llm"
      }
    ],
    "warnings": [],
    "errors": []
  }
}

Troubleshooting

Error	Cause	Fix
`profile context` exit code 1	Catalog file missing for table or writer	Run Command Setup DDL and Skill Analyzing Table first
`profile context` exit code 2	IO/parse error reading catalog files	Check file permissions and JSON validity in `catalog/`
`profile write` exit code 1	Validation failure -- invalid JSON, missing required fields, or invalid enum values	Check profile JSON against the field definitions and enum values above
`profile write` exit code 2	IO error -- catalog unreadable or write failure	Check file permissions on `catalog/tables/<table>.json`
Ambiguous classification	Write pattern signals conflict with column shape signals	The skill stops and asks the user for guidance rather than auto-resolving
Missing watermark	No WHERE clause filter or datetime column found	Profile is written with `status: "partial"`. Model will fall back to full-refresh `table` materialization

Skill Profiling Table - accelerate-data/migration-utility GitHub Wiki

Skill: Profiling Table

Purpose

Invocation

Prerequisites

Pipeline

1. Assemble context (deterministic)

2. LLM profiling (reasoning)

3. Present for approval (interactive)

4. Write to catalog (deterministic)

Reads

Writes

`profile` section in `catalog/tables/<table>.json`

The Six Profiling Questions

Q1 -- What kind of model is this? (Required)

Q2 -- Primary key candidate (Required)

Q3 -- Foreign key candidates (Nice-to-have)

Q4 -- Natural key vs surrogate key (Required)

Q5 -- Incremental watermark (Required)

Q6 -- PII handling candidates (Nice-to-have)

The `source` Attribution Field

JSON Format

Profile section example

Troubleshooting

⚠️ GitHub.com Fallback ⚠️

Skill Profiling Table - accelerate-data/migration-utility GitHub Wiki

Skill: Profiling Table

Purpose

Invocation

Prerequisites

Pipeline

1. Assemble context (deterministic)

2. LLM profiling (reasoning)

3. Present for approval (interactive)

4. Write to catalog (deterministic)

Reads

Writes

profile section in catalog/tables/<table>.json

The Six Profiling Questions

Q1 -- What kind of model is this? (Required)

Q2 -- Primary key candidate (Required)

Q3 -- Foreign key candidates (Nice-to-have)

Q4 -- Natural key vs surrogate key (Required)

Q5 -- Incremental watermark (Required)

Q6 -- PII handling candidates (Nice-to-have)

The source Attribution Field

JSON Format

Profile section example

Troubleshooting

⚠️ **GitHub.com Fallback** ⚠️

`profile` section in `catalog/tables/<table>.json`

The `source` Attribution Field

⚠️ GitHub.com Fallback ⚠️