Under the hood

How a dbt project is built.

The value pages make the business case. This is the short, practical view: the file types a dbt project is made of, how the folders are laid out, and the handful of commands you actually run. When you want the full detail, the links at the bottom take you there.

Three file types

A project is mostly SQL and YAML.

There is very little to learn. Models hold the logic, YAML holds the config and the checks, and Python is there for the rare case that needs it.

SQL models

.sql

A model is a single SELECT. dbt wraps it and materialises it as a table or view, so you write the logic and dbt handles the boilerplate of creating, replacing and ordering objects.

YAML config

.yml

YAML files sit next to your models and declare sources, tests, descriptions and config. Tests and docs live with the code, so they stay in step with what actually runs.

Python models

.py (optional)

Where SQL is awkward, a model can be Python instead, returning a dataframe. It is an alternative for the few cases that need it, not a different way of working.

How a project is laid out

folders, models, and the YAML beside them

structure

.sql defines the model: a single SELECT becomes a table or view .yml defines config, tests and docs, kept next to the model

The code and its documentation live together, so the docs cannot drift from what actually runs.

The commands you actually use

A few verbs cover the day to day.

Most work is one of these, run locally or in CI. Add a selector to scope any of them to part of the project.

dbt run Builds your models in dependency order, as tables or views on the warehouse.
dbt test Runs the tests declared in YAML against the data and reports any failures.
dbt build Runs and tests together, model by model, so bad data is caught before it flows downstream.
dbt docs generate Generates the searchable catalog and the lineage graph from your project.

Each command takes a selector, so you can run only what you need: dbt build --select tag:daily builds and tests just the models tagged daily, and their downstream dependents. That is how a large project stays fast to iterate on.

Core or Cloud

dbt Core or the paid platform?

dbt Core is open-source and free to run, and it is what we use for about 90 percent of our implementations. You own and run it: your CI, your scheduler, your warehouse. The paid dbt platform (formerly dbt Cloud) adds a hosted UI, managed scheduling and orchestration, and a governed catalog and semantic layer on top of the same project.

dbt Core (our default)

Open-source and free to run. You own the project end to end: run it in your own CI, schedule it with the orchestrator you already have, and target the warehouse or lakehouse you already own. It is what we reach for in most engagements.

The paid dbt platform

Formerly dbt Cloud. On top of the same project it adds a hosted UI, managed scheduling and orchestration, and a governed catalog and semantic layer, so you do not build that scaffolding yourself.

When we recommend the paid platform

When non-technical users need a UI to explore and run models, when you want managed scheduling without building your own CI and orchestration, or when you need the hosted catalog and semantic layer across many teams. Otherwise, dbt Core.

Go deeper

From here, the real documentation.

This page is a launchpad. For how we structure, style and test dbt projects in practice, see the Plainsight Playbook below. For the full reference, go straight to dbt's official docs.

dbt official documentation

Want this set up properly?

We structure, style and test dbt projects on Fabric and Databricks every day. Tell us where your data sits today.

Talk to us