What migrating to Databricks actually involves
Databricks is a unified analytics and data-engineering platform built on the lakehouse architecture: open Delta Lake table storage on cloud object storage, governed centrally by Unity Catalog, with Apache Spark as the compute engine. The pieces a Microsoft estate lands on are specific and current. Transformation logic runs either as notebooks (Python, SQL, Scala, R) or as Lakeflow Declarative Pipelines; queries are served through Databricks SQL on SQL warehouses; ingestion comes through Lakeflow Connect, Auto Loader or COPY INTO; and everything is orchestrated by Workflows. Governance isn’t assembled from SQL Server schemas, roles and Azure RBAC the way it was on the source. It’s built in: Unity Catalog gives one three-level catalog.schema.object namespace with access control, automatic data lineage, audit logging, and discovery across tables, views, volumes, functions and models.
That difference is the whole point, and it’s also where a naive lift-and-shift goes wrong. Almost none of a Microsoft warehouse’s physical design survives a literal port. A Synapse dedicated pool’s hash distributions, round-robin staging, replicated dimensions and single-column partition switching are tuned to a 60-distribution MPP architecture that simply doesn’t exist on the lakehouse: physical layout is a Delta concern, handled by liquid clustering and OPTIMIZE, and the Spark optimizer decides broadcast joins on its own. The T-SQL dialect isn’t Spark SQL either: function names, type names, identifier quoting and IDENTITY semantics all differ. And native SSIS execution isn’t a GA lift-and-shift target here, so packages are rebuilt, not re-hosted on a runtime that doesn’t exist.
What the fleet changes is the order of operations and the accountability. The Documenter reads the source logic (not the labels) and writes it into the knowledge base: the access patterns behind a distribution key, the SCD transform behind a wizard, the dependency graph behind a set of precedence constraints. A human signs that inventory off at the assessment gate. The same logic is then rebuilt in your chosen flavor: as a dbt project against a SQL warehouse, where the merge strategy compiles to MERGE INTO and snapshots produce dbt_valid_from / dbt_valid_to history; or in your framework as Delta tables, notebooks, Lakeflow Declarative Pipelines with expectations, and Workflows. Then the build-test-run loop runs the suite and iterates until green (row counts, SCD spans and aggregate parity all matching source) before a second human approves the design and a third approves promotion. Autonomous in the loop, accountable at the gates. Counts come straight from config, so you reach a full inventory in minutes, not weeks.