A perspective from Plainsight Migrating to Fabric or Databricks? Generative BI
For data teams

Faster, cheaper runs

Incremental models process only what changed, so runs are faster, compute costs less, and the warehouse is under less strain.

The problem

Many pipelines do the same expensive thing every single run: they drop the table and rebuild it from all of history. On day one, with a thousand rows, nobody notices. Two years later, reprocessing hundreds of millions of rows to capture one day of change is slow, costly, and hard on a shared warehouse that other teams are also trying to use. The bill climbs, the run window stretches past the time anyone is awake to watch it, and the data is stale by the time it lands.

What dbt does about it

dbt makes incremental loading a first-class pattern: a model processes only the new or changed data and merges it into what is already there. You declare how new rows should be combined, and dbt generates the right SQL to do it. There are four strategies, each for a different shape of change. Append adds fresh rows and fits events and logs. Merge updates matching records and inserts the rest, the natural fit for late-arriving dimension updates. Delete and insert clears a defined window then reloads it, which suits corrections and reprocessing a time range. Insert overwrite swaps entire partitions at once, built for large partitioned tables. You can read the full mechanics in dbt’s incremental models docs.

What it looks like

A daily orders table holds two years of data. The full-reload version rescans everything each night and runs for forty minutes. Switched to an incremental merge, the same model touches only yesterday’s orders plus any late corrections, and finishes in under two. The compute it no longer burns is compute you no longer pay for, and the warehouse is freed up for the analysts who need it.

How we think about it

Incremental loading should be the default for tables big enough to feel a full reload, not a special-case optimisation bolted on later. We pick the strategy from the shape of the change, not habit, and we keep a clean full-refresh path so logic changes are safe to roll out. Faster, cheaper runs are not just a cost line: they buy fresher data and headroom to grow. The same run results that prove a build was cheap also prove it was healthy, which is where reliability you can prove picks up the story.

Questions

Faster, cheaper runs, in short.

Is incremental loading always the right choice?

It is the right default for large, append-heavy or steadily growing tables, where reprocessing all of history every run is pure waste. Small lookup and dimension tables that rebuild in seconds are fine as full refreshes. The rule of thumb: go incremental when the cost of a full reload starts to hurt, not before.

What happens on the very first run, when there is no history?

On the first run, and any time you force a full refresh, dbt builds the table from scratch over all the data. After that, each run processes only the new or changed rows. You can rebuild the whole table on demand whenever the logic changes, so you are never locked into a partial state.

How do the four strategies differ?

Append adds new rows and suits events and logs. Merge updates matching rows and inserts the rest, which fits late-arriving dimension changes. Delete and insert clears a window then reloads it, handy for corrections and reprocessing. Insert overwrite swaps whole partitions, ideal for large partitioned tables.

Does incremental loading work on both Fabric and Databricks?

Yes. The same dbt model targets either platform; dbt generates the right SQL for each, using MERGE and partition-aware writes where the platform supports them. You write the intent once and dbt adapts it to Microsoft Fabric or Databricks.

Want this for your data?

If this is how you want your team to work, we should talk.

Talk to us
Talk to us