One general model guesses; a fleet of narrow ones is accountable. Each agent has one job, one input contract and one
output contract. The Conductor sequences them per asset, the knowledge base is their shared memory, and your experts
approve the three gates. That is how the migration stays both autonomous in the loop and accountable at the gates.
Fleet command (the Conductor, the Librarian and the Chronicler) runs across every stage.
Fleet commandruns the mission across all stages
The Conductor
conductor
The Conductor is the fleet's scheduler. It reads the prioritized backlog and the dependency graph, then sequences the right agents for each asset and runs independent assets in parallel, factory-style. It knows where the three human gates sit and stops the line for sign-off rather than racing past it.
backlog + asset graph → a scheduled, gated run plan
Runs inEvery migration
The Librarian
librarian
Every Documenter writes into one shared knowledge base; the Librarian keeps it coherent. It deduplicates overlapping findings, cross-links related assets, and versions every entry so a change is never silently overwritten. When any other agent needs to know what an upstream asset does, it asks the Librarian.
documenter output → one curated, versioned knowledge base
Runs inEvery migration
The Chronicler
chronicler
Most migrations finish with a working platform and no current documentation. The Chronicler writes that documentation as the platform is built: a data dictionary, lineage docs, operational runbooks and onboarding guides. The project ends with knowledge your team can read, not knowledge locked in someone's head.
built assets + knowledge base → living platform documentation
Runs inEvery migration
Documentthe scribes: legacy logic, written down stage · Document
The Surveyor
estate-surveyor
The Surveyor runs first. It inventories every asset across the estate in minutes, scores each one for complexity, and flags the drivers that make a migration risky: script logic, dynamic SQL, unusual coupling. The output is a prioritized backlog: what to move first, what needs care, and where the effort really sits.
raw estate (packages, repos, databases) → scored, ranked backlog
Runs inEvery migration
SSIS Documenter
ssis-documenter
The SSIS Documenter reads the package XML inside .dtsx and .ispac, not just the task names. It reconstructs the control flow from precedence constraints, maps every Data Flow component, resolves expressions, captures variable and parameter scopes and connection managers, and isolates embedded T-SQL, all written to the knowledge base as a functional spec of what the package does.
.dtsx / .ispac packages → functional specs in the knowledge base
The ADF Documenter parses the Git-integration JSON directly: pipelines and their activities, the dependsOn edges that define execution order, datasets, linked services and triggers. It traces parameter usage through the graph so runtime coupling that's invisible in the canvas becomes explicit in the knowledge base.
ADF Git JSON → pipeline behavior + dependency graph in the knowledge base
Synapse pipelines look like ADF but carry workspace-specific coupling, to Spark pools, dedicated SQL pools and linked services that only exist inside the workspace. This Documenter captures the hybrid Spark/SQL orchestration and every workspace dependency so nothing breaks when the orchestration is rebuilt elsewhere.
The Synapse Notebooks Documenter reads Spark notebooks cell by cell. It captures session configuration, follows %run chains across notebooks, records Delta Lake operations and surfaces the side effects and ordering dependencies that make notebook logic fragile to move blindly.
Dedicated SQL pools encode performance decisions in their physical design. This Documenter captures the schemas, the distribution and partitioning strategies, the stored procedures and the statistics and workload patterns, so the target design preserves the intent behind the layout instead of copying it by accident.
dedicated SQL pool → schema, distribution & workload spec
Serverless SQL is the queryable face of a data lake. This Documenter maps the OPENROWSET queries, external tables and external data sources, and the views layered on top of them, the logical schema that lets analysts treat lake files as tables, captured so it can be rebuilt on the target.
serverless SQL → external-table + view map over the lake
The T-SQL Documenter dissects the procedural layer: stored procedures, views and functions. It separates the business logic from the plumbing, traces dependency chains across objects and databases, and flags the dialect quirks and dynamic SQL that need careful handling on a new engine.
stored procs, views, functions → documented business logic + dependencies
The Cartographer joins what the Documenters find into one picture. It stitches lineage across every asset (through staging hops, slowly changing dimension patterns and surrogate-key joins) into a single navigable graph, so you can trace your data from source system to final mart.
all documented assets → one navigable lineage graph
Runs inEvery migration
Architectthe design bench: one estate, one target design stage · Architect
The Architect
architect
The Architect turns the knowledge base into a plan. It decides workload placement (Warehouse versus Lakehouse on Microsoft Fabric, SQL versus Spark on Databricks), maps assets onto the medallion layers, and encodes your naming, error-handling and orchestration standards. The result is the Target Design Spec your experts sign off at Gate 2.
knowledge base + your standards → the Target Design Spec
Runs inEvery migration
Buildthe foundry: designs become code stage · Build
Fabric dbt Builder
fabric-dbt-builder
The Fabric dbt Builder generates a complete dbt project that targets the Microsoft Fabric Warehouse: sources, staging and mart models, snapshots for slowly changing dimensions, your macros, tests and generated docs. It writes to your layer conventions, so the project looks like one your team would have written.
Target Design Spec → a dbt project targeting the Fabric Warehouse
The Fabric Framework Builder generates into the way your team already works on Fabric: data pipelines, Warehouse T-SQL procedures and Lakehouse notebooks, following your naming, error handling and orchestration patterns. Warehouses and Lakehouses are both in scope, so the build matches your architecture rather than forcing a new one.
Target Design Spec → assets in your Fabric framework
The Databricks dbt Builder generates a dbt project that runs on Databricks SQL warehouses: models materialized to Delta tables, snapshots for slowly changing dimensions, your macros and tests. Incremental models use Databricks' MERGE so large tables update in place rather than rebuilding.
Target Design Spec → a dbt project targeting Databricks SQL
The Databricks Framework Builder generates into your existing Databricks structure: notebooks, Workflows for orchestration and declarative Delta pipelines, packaged in your repo with your patterns. The output drops into the way your team already ships, not a Plainsight house style.
Target Design Spec → assets in your Databricks framework
Testthe proving ground: trust is generated too stage · Test
Fabric Test Agent
fabric-test-agent
The Fabric Test Agent generates the tests, not just runs them. It checks schema parity against the documented legacy behavior, reconciles row counts, verifies slowly changing dimension effective dating, and asserts null and key integrity, turning "looks right" into a suite that either passes or tells you exactly what failed.
built assets + documented behavior → a passing/failing test suite
The Databricks Test Agent writes and runs the suite on Databricks: data quality expectations, schema and row-count parity against documented legacy behavior, and the gates that decide whether an asset is trustworthy. Failures feed straight back into the loop rather than onto a list.
built assets + documented behavior → a passing/failing test suite
The Reconciler is the proof that the new platform behaves like the old one. It runs legacy and rebuilt assets side by side and compares their outputs row by row and aggregate by aggregate. A non-zero delta is either fixed or explained in writing. There's no "close enough."
legacy output + new output → a reconciled, zero-or-explained delta
The Fabric Operator closes the loop. It deploys the build to DEV, runs the pipelines for real, reads every runtime failure, and routes the fix back to the builders, then re-runs. That cycle repeats autonomously until the suite is green, and every iteration is logged in the knowledge base.
built assets → a green run in DEV (via the build-test-run loop)
The Databricks Operator runs the same closed loop on Databricks: deploy to DEV, run the Workflows and pipelines, diagnose each failure, patch through the builders and re-run. It keeps iterating until the suite is green, logging each pass.
built assets → a green run in DEV (via the build-test-run loop)
The Cutover Agent owns the endgame. It plans the parallel-run period, keeps watermarks in sync between old and new, sequences the switchover itself, and produces the legacy decommissioning checklist, so going live is a controlled step, not a leap.
a green, signed-off build → a planned, executed cutover
Runs inEvery migration
The fleet is autonomous in the loop, accountable at the gates.
Everything the agents do inside the build-test-run loop is automatic. The three gates (assessment, design and promotion) stay human, mandatory and yours.