Autonomous in the loop. Accountable at the gates. An agentic offering by Plainsight

Migrate ADF to Microsoft Fabric.

Your ADF pipelines, data flows and triggers rebuilt natively on Microsoft Fabric, and proven against the originals before anyone signs off.

ADF logo Microsoft Fabric logo

ADF and Microsoft Fabric are close cousins, which is exactly the trap

On paper, moving Azure Data Factory to Microsoft Fabric looks like the easiest migration in the Microsoft estate. Data Factory in Microsoft Fabric is, in Microsoft’s own words, “the next generation of Azure Data Factory.” A Fabric Data pipeline has the same shape as an ADF pipeline: activities grouped into a workflow, a Copy activity that looks identical, the same family of control-flow activities, If Condition, Switch, For Each, Until. Microsoft states that roughly ninety percent of ADF activities are already available in Fabric. So the temptation is to treat this as a copy-paste exercise.

That closeness is the trap. The ten percent that does not map is concentrated in exactly the constructs that carry the most business logic and the most operational risk: Mapping Data Flows, self-hosted integration runtime dependencies, and the parameter sprawl of metadata-driven pipelines. A naive lift-and-shift sails through the easy ninety percent and then quietly breaks on the parts nobody can see from a single JSON file. That is why the agent fleet documents the logic, not just the labels, before it builds anything.

What actually breaks in a naive port

Mapping Data Flows have no runtime on Fabric. There is no “execute data flow” you can re-point. Microsoft maps Mapping Data Flow to Dataflow Gen2 (the Power Query experience) and Wrangling Data Flow alongside it. A flow’s real content - alter-row policies that drive SCD behavior, derived-column expressions, conditional splits, joins and aggregates - has to be re-expressed, either as Dataflow Gen2 steps, as set-based SQL, or as a PySpark notebook writing Delta. Copy the pipeline JSON and you copy a reference to a flow that simply will not execute.

Self-hosted IR dependencies do not travel. Any pipeline that reaches an on-premises or network-restricted store is implicitly bound to a self-hosted integration runtime - custom drivers, firewall rules, a particular network topology. On Fabric that role belongs to the On-premises Data Gateway, and Fabric’s managed compute removes the Azure/Auto-resolve IR entirely. None of that re-plumbs itself. If a migration assumes connectivity that no longer exists, the first real run fails at extract time.

Parameter sprawl hides the behavior. ADF rewards metadata-driven design: one generic pipeline, a control table, a ForEach loop, and dynamic-content expressions like @activity('Lookup').output threaded between activities. The consequence is that what actually runs is data-dependent and invisible in any single pipeline file. You cannot reconstruct the real workload by reading JSON; you have to resolve the parameter values and the control metadata.

Datasets, linked services and triggers all change shape. Fabric collapses datasets and linked services into Connections, with data properties defined inline in activities. ADF triggers split into Fabric schedules and event triggers - and tumbling-window triggers specifically map to interval-based schedules. Each of these is a small translation, but there are many of them, and the dependsOn dependency graph (with its Succeeded / Failed / Skipped / Completed conditions) has to be rebuilt faithfully as activity success and failure paths or the error-handling branches silently vanish.

How the fleet handles ADF → Fabric

The Documenter parses the Git-backed JSON artifacts (pipelines, activities, the dependsOn graph, datasets, linked services and triggers) and, crucially, resolves the parameterization rather than reading it literally. It follows a ForEach back to its control table and a dynamic-content expression back to the activity output that feeds it, so the knowledge base captures what the estate does, not just what it is named. Self-hosted IR usage is detected and raised at the assessment gate. That keeps the hard parts visible to a human before a single line is built.

From the knowledge base, the builders generate one of two flavors. In the dbt flavor, transformation logic from the data flows becomes Warehouse models; SCD Type 2 flows become dbt snapshots with dbt_valid_from / dbt_valid_to history; upserts become incremental models on the dbt-fabric default merge strategy keyed on a unique_key; and a thin Fabric Data pipeline still runs the Copy activity and then triggers the dbt run, because dbt transforms, it does not move bytes. In the your-framework flavor, the fleet rebuilds native Fabric Data pipelines with Copy activities and control-flow activities, rewrites each Mapping Data Flow as Dataflow Gen2 or a PySpark notebook, replaces linked services with Connections, and re-creates triggers as schedules and event triggers in your own orchestration conventions.

Then the build-test-run loop takes over. The test agent runs the rebuilt pipeline against real inputs and compares the output to the original ADF run: row counts, SCD history, schema. A failure is not added to a list; it goes back to the builders and the loop re-runs. The iteration log on this page is representative: a dropped alter-row policy, an unresolved pipeline parameter, a stale gateway schema, each caught and fixed, until the run comes back green with a row delta of zero. Autonomous in the loop. Accountable at the gates: assessment, design, and promotion are yours.

If Spark and Unity Catalog are a better fit for your estate, Migrate ADF to Databricks covers the same pair onto that platform.

How the fleet runs it

ADF to Microsoft Fabric, stage by stage.

For ADF → Microsoft Fabric, the same five stages run with the agents that handle this pair named and linked below. Inside the loop they iterate on their own; your experts hold the three gates.

Build → Test → Operate iterates until green on Fabric

Orchestrated end-to-end by The Conductor , The Librarian and The Chronicler across every ADF → Microsoft Fabric asset.

dbt or your framework

Built your way, on Fabric.

Take ADF to Microsoft Fabric as an open-source dbt project (free to run), or built into the framework your team has trusted for years: same migration, your standards either way.

What gets generated

dbt
  • A dbt project targeting the Fabric **Warehouse** via the dbt-fabric adapter, staging and mart models carrying the transformation logic that used to live in Mapping Data Flows
  • merge incremental models (the adapter default) keyed on a unique_key, replacing dataset-level upserts; dbt **snapshots** where a flow was doing SCD Type 2
  • dbt **sources** over the Delta tables an ADF Copy step lands, with unique, not_null and relationships tests gating every model
  • A thin Fabric Data pipeline for orchestration (Copy activity plus a dbt run): dbt transforms, the pipeline still moves the bytes
  • schema.yml documentation and a generated lineage graph that re-draws the dependsOn chain as a ref() DAG

Pattern mapping

Each ADF pattern, mapped to a deliberate Microsoft Fabric target.

Source pattern
Microsoft Fabric · dbt
Microsoft Fabric · your framework
ADF Copy activity (e.g. SQL → ADLS Parquet)
Fabric Data pipeline **Copy activity** lands Delta; the table becomes a dbt source() for downstream models
Fabric Data pipeline **Copy activity** (canvas: "Copy data"), or a **Copy job** for bulk/incremental/CDC movement
Mapping Data Flow (derived columns, joins, aggregates)
Set-based SQL in staging → mart models, materialized view/table/incremental in the Warehouse
**Dataflow Gen2** (Power Query, 300+ transforms) or a **PySpark notebook** writing Delta, Mapping Data Flow has no lift-and-shift on Fabric
Mapping Data Flow with SCD Type 2 (alter-row + upsert)
A dbt **snapshot** (timestamp or check strategy) producing dbt_valid_from / dbt_valid_to history
T-SQL MERGE in a Warehouse stored procedure, or Delta MERGE in a notebook
ADF triggers: schedule, tumbling window, storage event
Orchestration sits outside dbt: a Fabric schedule (or dbt job) runs the pipeline that runs dbt build
Fabric pipeline **schedules**, **interval-based schedules** for tumbling windows, and **event triggers** (storage/file via eventstreams + Activator)
Linked services + datasets (LinkedServiceReference, DatasetReference)
Warehouse connection in profiles.yml (one target); raw locations declared as dbt sources
Fabric **Connections**: datasets and linked services collapse into connections, with data properties inline in activities
ForEach over a control table (metadata-driven / parameter sprawl)
Parameters resolved at compile time with Jinja **macros**; the control table becomes a seed or source the models read
**For Each** activity over a **Lookup** of the control table, parameters passed via pipeline expressions
Stored Procedure / Script activity
Logic re-expressed as models; non-model side effects via dbt **hooks** / **operations**
Warehouse **stored procedure** called by the **Stored procedure activity** (or **Script** activity)
Self-hosted IR reaching an on-premises store
Out of dbt's scope, the Copy step that feeds dbt routes through the gateway
**On-premises Data Gateway** (Fabric's replacement for the self-hosted IR); flagged at the assessment gate
@activity('X').output / dynamic-content expressions
Re-expressed as explicit ref() dependencies and column logic in SQL
Pipeline expressions and **Set Variable** / **Lookup** outputs chained between activities

Before & after

A real ADF snippet and what the builders generate on Microsoft Fabric.

pipelines/PL_LoadCustomerDim.json legacy
{
  "name": "PL_LoadCustomerDim",
  "properties": {
    "activities": [
      {
        "name": "Copy_Customers_Raw",
        "type": "Copy",
        "typeProperties": {
          "source": { "type": "AzureSqlSource", "sqlReaderQuery": "SELECT * FROM dbo.Customers" },
          "sink": { "type": "ParquetSink" }
        },
        "linkedServiceName": { "referenceName": "LS_OnPremSql", "type": "LinkedServiceReference" }
      },
      {
        "name": "DF_CustomerSCD2",
        "type": "ExecuteDataFlow",
        "dependsOn": [
          { "activity": "Copy_Customers_Raw", "dependencyConditions": [ "Succeeded" ] }
        ],
        "typeProperties": { "dataFlow": { "referenceName": "DF_CustomerSCD2", "type": "DataFlowReference" } }
      }
    ],
    "parameters": { "RunDate": { "type": "String" } }
  }
}
dbt
models/marts/dim_customer.sql + snapshots/customer.sql dbt
-- snapshots/customer.sql  (the Mapping Data Flow's SCD2 becomes a dbt snapshot)
{% snapshot customer %}
{{
  config(
    target_schema='snapshots',
    strategy='timestamp',
    unique_key='customer_id',
    updated_at='modified_at'
  )
}}
select * from {{ source('raw', 'customers') }}
{% endsnapshot %}

-- models/marts/dim_customer.sql  (incremental MERGE, the dbt-fabric default)
{{ config(materialized='incremental', unique_key='customer_id') }}

select
    customer_id,
    customer_name,
    region,
    dbt_valid_from as effective_from,
    dbt_valid_to   as effective_to
from {{ ref('customer') }}
{% if is_incremental() %}
where dbt_valid_from > (select max(effective_from) from {{ this }})
{% endif %}
The loop that closes itself

Iterated until green, and logged.

A failure on this pair doesn't go on a list. It goes back to the builders, and the loop re-runs until the suite passes.

adf-to-fabric · iteration log build-test-operate
iter 1 alter-row policy from `DF_CustomerSCD2` dropped: updated rows inserted as new versions, snapshot row count 2.1x the source
iter 2 `RunDate` pipeline parameter unresolved, the incremental high-water filter compared against NULL and loaded 0 rows
iter 3 SCD2 history parity vs the original ADF run, row delta 0, all `unique` / `not_null` / `relationships` tests pass

Green on iteration 3, SCD2 history and row counts match the original ADF run exactly.

Questions for this migration

ADF → Microsoft Fabric, answered.

Can a Mapping Data Flow be lifted straight onto Fabric?

No. There is no Mapping Data Flow runtime on Microsoft Fabric, Microsoft maps it to **Dataflow Gen2** (Power Query). The fleet rebuilds each flow as Dataflow Gen2 or, for code-first logic, a PySpark notebook (dbt flavor: as Warehouse models). The Documenter reads the flow's transformations (derived columns, alter-row policies, joins, aggregates) and re-expresses them set-based, then the test gate proves output parity against the original flow.

What happens to our self-hosted integration runtime?

A self-hosted IR doesn't move automatically: it implies on-premises connectivity, custom drivers and a specific network topology. On Fabric its role is taken by the **On-premises Data Gateway**. We surface every pipeline bound to a self-hosted IR at the assessment gate so the gateway and firewall work is planned, not discovered mid-build. The fleet never claims a dependency moved when it hasn't.

How do ADF triggers map to Fabric?

A Fabric pipeline runs three ways: on-demand, **scheduled**, and **event-based**. Schedule triggers become Fabric schedules; ADF **tumbling window** triggers map to Fabric **interval-based schedules**; storage/event triggers become Fabric event triggers built on eventstreams and Activator. One caveat the builders handle: a Fabric schedule needs both a start and end date (there is no open-ended option) so long-running schedules are configured with a far-future end.

Does dbt replace our ADF pipelines entirely?

No: dbt transforms data, it does not move bytes. On the dbt flavor the transformation logic from your Mapping Data Flows becomes Warehouse `models`, snapshots and tests, while a thin Fabric Data pipeline still does the **Copy activity** and then triggers `dbt build`. Orchestration stays in pipelines (or a dbt job / Airflow job in Data Factory); dbt owns the modeling layer and its DAG.

How do you handle metadata-driven pipelines with parameter sprawl?

Heavy parameterization (a single generic pipeline driven by a control table looped via ForEach) means the real behavior is data-dependent and not visible in any one JSON file. The Documenter resolves the parameter and control-table values to reconstruct what actually runs. In the your-framework flavor that becomes a **For Each** over a **Lookup** of the control table; in dbt the control table becomes a seed or source and parameters resolve at compile time through Jinja **macros**.

Should we land in a Lakehouse or a Warehouse?

Both share OneLake and the same SQL engine. The dbt flavor targets the **Warehouse** (the `dbt-fabric` adapter writes there, and `MERGE` is GA). The your-framework flavor can use either: a Lakehouse plus notebooks for Spark-centric flows, or the Warehouse plus T-SQL for SQL-first teams. The Documenter's reading of each flow drives the recommendation, and you confirm it at the design gate.

Let's talk

Ready to migrate ADF to Microsoft Fabric?

Tell us about your ADF estate and we'll run the assessment, the Surveyor scores it before you commit to Microsoft Fabric.

Plan my migration

A short form, no spam. We usually reply within one business day.

Plan my migration