Autonomous in the loop. Accountable at the gates. An agentic offering by Plainsight

ADF migration, run by an agent fleet.

The Documenter reads Azure Data Factory (the logic, not just the labels) and writes it to the knowledge base. Then the fleet builds, tests and operates the result on Microsoft Fabric or Databricks, iterating until the tests pass, with your experts approving every gate.

ADF logo

What the Documenter reads in Azure Data Factory

Azure Data Factory (ADF) presents a clean visual surface. The logic that actually runs lives in the JSON beneath it, and in the runtime bindings the canvas never shows. The Documenter (the agent that builds the knowledge base before any rebuild begins) reads that logic, not just the labels. It parses the activity graph, resolves the expression chains, and follows the connection and compute bindings to the edge of the estate, then hands back a reviewable inventory a human signs off at the assessment gate. Autonomous in the loop. Accountable at the gates.

The JSON artifacts, read as ARM resources

When a factory is connected to Azure Repos Git or GitHub, every ADF artifact is persisted as a JSON file (one file per resource) conforming to Azure Resource Manager (ARM) template conventions. The Documenter reads the repo, not the live service, because the repo is the reproducible source of truth.

A pipeline file carries a name and a properties object holding activities[], parameters, concurrency, and annotations. Inside activities[] the Documenter separates the two kinds of step. Execution activities (Copy and transform activities such as data flow, stored procedure, and notebook) carry typeProperties, an optional linkedServiceName, a policy (defaults matter when reconstructing behavior: 12-hour timeout, zero retries, a 30-second retry interval, secureOutput), and dependsOn. Control activities (If Condition, ForEach, Until, Switch, Execute Pipeline) carry typeProperties and dependsOn but no activity-level linked service. Because a Git-backed factory exports datasets and linked services as their own files, the Documenter stitches the graph back together by resolving each referenceName.

Reconstructing execution order from dependsOn

dependsOn is ADF’s dependency graph and the analogue of SSIS precedence constraints. Each dependency names a prior activity and one or more dependency conditions: Succeeded, Failed, Skipped, or Completed. The Documenter reads these to rebuild not just the happy path but the error-handling branches: a Failed edge is a deliberate recovery route, and a Skipped edge propagates when an upstream activity never ran. Activities with no dependency between them may run in parallel, so concurrency is recorded as a property of the graph, not an accident of layout. The diagram shows boxes and arrows; the JSON tells you which arrow fires under which outcome.

Resolving expressions and parameters

ADF’s @-prefixed expression language is where metadata-driven behavior hides. Parameters surface as @pipeline().parameters.<name> and @dataset().parameters.<name>, and activity results chain through @activity('Name').output. Because datasets, linked services, pipelines, and data flows can all be parameterized, one generic pipeline can stand in for dozens of concrete runs. The Documenter resolves these chains to describe what the pipeline does; where the effective value depends on a control table or runtime input, it records the dependency rather than inventing a value. Embedded dynamic content and string interpolation are treated the same way: logic to be parsed and re-expressed, not free text.

Connections and compute: linked services, datasets, and integration runtimes

Linked services are the connection definitions, and datasets are the typed, parameterizable views over the data they reach. The Documenter flags a structural quirk: non-Key-Vault linked services publish immediately because their credentials must be encrypted, a frequent cause of Git/live-mode drift.

Compute is the Integration Runtime (IR), and its type determines what can move. The Azure IR is managed serverless compute for cloud-to-cloud copies, mapping data flows, and activity dispatch. The self-hosted IR (SHIR) is software installed inside a private or on-premises network, required to reach restricted stores or use custom drivers. The Azure-SSIS IR is a managed VM cluster that runs only lifted .dtsx packages. Every SHIR binding is an on-premises dependency that will not migrate on its own; every Azure-SSIS package is logged as a rebuild candidate, never a runtime to claim on the target.

Triggers

The Documenter also records what starts a pipeline. Schedule triggers fire on a wall-clock cadence, many-to-many with pipelines. Tumbling window triggers run fixed-size, non-overlapping intervals, retain state, and support backfill, exposing windowStartTime and windowEndTime. Event-based triggers fire on storage events or Event Grid custom events. Capturing the trigger family alongside the activity graph lets the rebuild reproduce the cadence and the backfill semantics on the target.

Your estate, in minutes

The Surveyor scores the risk before you commit.

Before a single asset moves, the Surveyor inventories your ADF estate, scores every asset for complexity, and flags the drivers that make a migration risky. You get a prioritized backlog and a clear-eyed view of where the effort really sits, typically the kind of work below.

  • Every asset inventoried and complexity-scored
  • Risk drivers flagged early, not near the deadline
  • A prioritized, data-driven migration backlog

Complexity scorecard

adf_estate.summary
Low Medium High
Parameter sprawl 0.28
Self-hosted IR dependencies 0.22
Mapping Data Flow complexity 0.17
Trigger / schedule coupling 0.12
minto inventory
0.89avg confidence
Two destinations

Take ADF to Fabric or Databricks.

The knowledge base is target-agnostic: document once, then choose the platform that fits your estate and strategy.

Microsoft Fabric logo

Microsoft Fabric

What the fleet builds when you take ADF to Microsoft Fabric: in dbt or your own framework, iterated until green.

Migrate ADF to Fabric →
Databricks logo

Databricks

What the fleet builds when you take ADF to Databricks: in dbt or your own framework, iterated until green.

Migrate ADF to Databricks →
Good questions

ADF migration, answered.

Does the fleet read ADF pipelines from the live service or from Git?

The Documenter reads the JSON artifacts a Git-connected factory produces: one file per resource, conforming to Azure Resource Manager (ARM) template conventions. When ADF is wired to Azure Repos Git or GitHub, every pipeline, dataset, linked service, and trigger is a separate JSON file, and cross-references are by name (`referenceName` plus a `*Reference` type such as `LinkedServiceReference` or `DatasetReference`). Reading the repo, not the portal, is what makes the inventory reproducible and reviewable at the assessment gate.

How does the fleet handle metadata-driven pipelines that loop over a control table?

A single generic pipeline driven by a control table and looped with `ForEach` is one of the hardest ADF patterns, because the behavior is data-dependent and not evident from any one JSON file. The Documenter resolves the parameter chain, `@pipeline().parameters`, `@dataset().parameters`, and chained `@activity('Name').output` references, and records that the effective run shape depends on control metadata. It does not guess hidden values; it flags the dependency so a human can supply the control data at the design gate.

What happens to a self-hosted integration runtime on migration?

A self-hosted IR (SHIR) is a hard dependency, not a portable artifact. Any pipeline that reaches an on-premises or network-restricted store through a SHIR is implicitly bound to that connectivity, its custom drivers, and the network topology around it. The Documenter records every SHIR-bound activity as an explicit risk item so the move is planned, never assumed. None of that connectivity moves by itself; it is re-planned for the target platform during design.

Are SSIS packages running on the Azure-SSIS IR migrated as a lift-and-shift?

No. The Azure-SSIS IR runs lifted `.dtsx` packages on a managed cluster, but native SSIS execution on Fabric or Databricks is not a generally available lift-and-shift. The Documenter inventories the packages and their highest-risk constructs (Script Tasks, dynamic SQL), and the rebuild re-expresses that logic on the target. We describe the work as a rebuild and do not claim a drop-in SSIS runtime on the destination.

Let's talk

Ready to migrate your ADF estate?

Tell us about your Azure Data Factory landscape and we'll run the assessment, score the risk, and show you the path to Fabric or Databricks.

Plan my migration

A short form, no spam. We usually reply within one business day.

Plan my migration