dbt vs Airflow: When to Use Each (And When to Use Neither)
If you work in data engineering, you have almost certainly encountered the dbt-or-Airflow question. Both tools dominate their respective categories, but they solve fundamentally different problems. The confusion arises because their use cases overlap just enough to make the distinction unclear, especially for teams building their first production data stack.
This guide breaks down what each tool actually does, when you genuinely need both, and when the complexity of running two systems outweighs the benefits.
What dbt Does (and Doesn’t Do)
dbt (data build tool) is a SQL-first transformation framework. It takes SELECT statements, wraps them in version-controlled models, and materializes them inside your data warehouse. If you think of the ELT paradigm, dbt is the T: it transforms data that is already loaded into your warehouse.
What dbt handles well:
- SQL-based transformations with Jinja templating for reuse
- Dependency graphs between models so transformations run in the right order
- Testing and documentation baked into the workflow
- Incremental models that process only new or changed rows
What dbt does not do:
- It does not extract data from source systems. You need a separate ingestion tool.
- It does not orchestrate non-SQL tasks. Python models exist but are limited in scope.
- It does not schedule itself. dbt Core needs an external scheduler or orchestrator to run on a schedule. dbt Cloud includes built-in scheduling.
- It does not provide dashboards or visualization. You need a separate BI layer.
The key takeaway is that dbt is a compile-time tool. It generates SQL, sends it to your warehouse, and steps aside. It does not manage infrastructure, run long-lived processes, or handle anything outside the transformation layer.
What Airflow Does (and Doesn’t Do)
Apache Airflow is a workflow orchestration platform. You define Directed Acyclic Graphs (DAGs) in Python that describe sequences of tasks, their dependencies, and their schedules. Airflow handles the when and the order of execution.
What Airflow handles well:
- Scheduling and orchestration across arbitrary task types
- Dependency management with retry logic, backfill, and catch-up
- Extensibility through a massive operator ecosystem
- Monitoring with a built-in web interface for tracking DAG runs
What Airflow does not do:
- It does not perform SQL transformations with built-in testing or documentation.
- It does not optimize SQL execution. It dispatches tasks but does not analyze query plans.
- It is not lightweight. Running Airflow requires a metadata database, a scheduler process, a web server, and workers.
- It does not include a visualization or dashboarding layer.
Airflow is infrastructure-heavy by design. It is a general-purpose orchestrator that can run anything, which means it requires significant operational investment to run reliably.
When You Need Both
The most common architecture in the modern data stack pairs Airflow as the orchestrator with dbt as the transformation engine. In this setup, Airflow handles:
- Triggering data extraction from source systems
- Running dbt models after extraction completes
- Triggering downstream processes like reverse ETL or reporting refreshes
- Handling alerts and retries when things fail
This architecture works well for teams that have dedicated data platform engineers who can maintain both systems. The Airflow DAG typically calls dbt run and dbt test as shell commands or through the Cosmos library, which maps dbt models to Airflow tasks for finer-grained control.
In practice, the integration between these tools takes several forms. The simplest approach uses a BashOperator in Airflow that invokes the dbt CLI. This is easy to set up but provides minimal observability. Airflow sees the entire dbt run as a single task, so if one of fifty models fails, Airflow cannot retry just that model. The Cosmos library improves on this by parsing the dbt manifest and creating one Airflow task per dbt model, giving you granular retry and parallelism. However, Cosmos adds another dependency to manage and can increase DAG parsing time significantly for large dbt projects.
The challenge is that the integration between these tools remains shallow regardless of the approach. Airflow does not understand dbt’s dependency graph natively. Debugging a failure requires checking Airflow logs to find which dbt model failed, then switching to dbt’s own logs to understand why. Lineage tracking spans two systems with different metadata formats. Alerting typically requires configuring both Airflow’s notification system and dbt’s own failure hooks, creating two parallel alerting paths that need to stay in sync.
The Cost of Running dbt + Airflow
Running both tools carries costs that extend beyond licensing fees. Understanding the total cost of ownership requires looking at infrastructure, people, and process overhead together.
Infrastructure overhead: Airflow needs a persistent scheduler, a web server, a metadata database (usually Postgres), and a message broker if you use CeleryExecutor. Even managed Airflow services like MWAA or Astronomer add their own pricing layers on top. dbt Cloud costs scale with seat count and model build volume. Self-hosted dbt Core avoids that licensing cost but adds maintenance burden, requiring someone to manage CLI versions, adapter compatibility, and CI/CD integration.
Integration maintenance: The glue code between Airflow and dbt is custom work. Whether you use Cosmos, shell operators, or the dbt Cloud API, someone on your team owns that integration. When dbt releases a breaking change or Airflow upgrades its provider interface, that integration needs updating. Version pinning helps but does not eliminate the problem; eventually you must upgrade, and upgrades in one tool can cascade into the other.
Cognitive load: Engineers context-switch between Python DAG definitions in Airflow and SQL model files in dbt. Each system has its own testing framework, its own CLI, its own deployment process, and its own way of defining environments. Onboarding a new team member means teaching two tools and their interaction patterns. Many teams find that this dual-tool onboarding adds one to two weeks to ramp-up time for new hires.
Debugging complexity: When a pipeline fails at 3 AM, the engineer on call needs to determine whether the issue is in Airflow’s scheduling, the dbt model logic, the warehouse itself, or the integration layer. Tracing a data quality issue from a dashboard back through the dbt model graph and into the Airflow DAG that triggered it involves three or four different interfaces. There is no single pane of glass that shows the full picture.
Security and access fragmentation: Each tool has its own authentication and authorization model. Ensuring that the same user has consistent access levels across Airflow, dbt Cloud, and your BI tool requires either careful manual synchronization or an identity management layer that adds yet another tool to the stack.
For large organizations with dedicated platform teams, this complexity is manageable and often justified. For teams under 10 engineers, it can consume a disproportionate share of engineering capacity, leaving less time for the actual data work that drives business value.
When a Unified Platform Makes More Sense
The dbt + Airflow combination exists because no single tool historically covered both orchestration and transformation well. That gap has been closing.
A unified data platform combines pipeline building, SQL transformation, orchestration, and visualization into a single system. Instead of writing Python DAGs to orchestrate SQL models that feed into a separate BI tool, you build the entire flow in one environment.
This approach fits teams that:
- Want to go from raw data to dashboard without managing three or four tools
- Prefer visual pipeline building over writing Python DAGs
- Need role-based access control across the entire data flow, not per-tool
- Cannot justify the infrastructure cost of running Airflow for 20 pipelines
- Want SQL compilation with built-in optimization rather than raw SQL strings
Plotono takes this approach. The visual pipeline builder replaces Airflow’s Python DAGs with a drag-and-drop graph editor. The built-in SQL compiler handles transformations that dbt would normally manage. Dashboards sit on top of the same system, so there is no integration gap between transformation output and visualization input.
This is not the right choice for every team. If you have 500 DAGs running in Airflow and a mature dbt project with thousands of models, migration is a significant undertaking. But for teams starting fresh or teams drowning in tool complexity, a unified platform eliminates an entire category of integration work.
Decision Framework
The right tool depends on your team size, existing infrastructure, and growth trajectory. Here is a practical breakdown.
Choose dbt if your team already has an orchestrator in place and your primary need is SQL-based transformation with testing and documentation. dbt excels at making SQL workflows maintainable and testable. It is particularly strong when your analytics engineers are SQL-fluent and want version-controlled, testable transformation logic. Pair it with a lightweight scheduler like cron or a simple DAG runner if you do not need Airflow’s full capabilities.
Choose Airflow if you need to orchestrate complex workflows that span multiple systems, languages, and services. Airflow is the right choice when your pipelines involve significant non-SQL work like calling external APIs, running ML training jobs, or coordinating multi-step processes across different platforms. It is also appropriate when you need advanced scheduling features such as backfill, catch-up processing, or complex dependency patterns that simpler schedulers cannot handle.
Consider both together if you need Airflow’s orchestration power and dbt’s transformation rigor, and you have the engineering capacity to maintain the integration. This is the most common pattern at mid-to-large data teams with five or more data engineers. Budget for at least one engineer spending 20 percent of their time on maintaining the integration between the two systems.
Consider a unified platform if you want to reduce tool count without sacrificing capability. This path works best for teams under 20 data engineers, greenfield projects, or organizations that have hit the complexity ceiling of their multi-tool stack. Platforms like Plotono combine pipeline building, transformation, and visualization so you can focus on data work instead of infrastructure. The tradeoff is less flexibility for highly custom workflows, but the reduction in operational overhead is significant for teams that do not need that level of customization.
For a deeper comparison against each tool individually, see our detailed pages on Plotono vs dbt and Plotono vs Airflow.
Picking the Right Tool
dbt and Airflow are both excellent tools that solve different problems. dbt handles SQL transformation with testing and lineage. Airflow handles orchestration with scheduling and retry logic. Together they form a powerful but operationally demanding stack.
The question is not which tool is better, but whether your team benefits from running both, or whether a platform that combines their capabilities into a single system would let you move faster with less overhead. The answer depends on your team size, your existing investment, and how much engineering time you want to spend on tooling versus building data products.
Start by mapping your actual needs. If you need orchestration across dozens of heterogeneous systems, Airflow earns its complexity. If you need SQL transformations with testing, dbt earns its place. If you need both plus dashboards and you want it all in one system, explore what a unified platform can do.