TL;DR: dbt revolutionized data transformation by bringing software engineering practices to SQL. But as pipelines scale, the combination of SQL, Jinja, and YAML creates complexity that's hard to navigate and maintain. Malloy offers a different approach: a purpose-built declarative language where transformation logic, semantic modeling, and materialization are unified — not bolted together. The result: type-safe, composable transformations that compile in seconds and read clearly.
The dbt Revolution—and Its Growing Pains
dbt changed data engineering for the better. Before dbt, transformation logic lived in scattered scripts, scheduled jobs, and stored procedures with no version control, no testing, and no clear lineage. dbt introduced software engineering fundamentals: ref() functions for dependency management, YAML for configuration, and a model-driven approach to SQL.
Your dbt project probably started clean. A few models. Clear dependencies. Readable SQL with light Jinja templating.
But two years later, it looks different:
- 400+ models spread across dozens of directories
- Macros calling macros calling macros
{% if %}blocks nested five levels deep- YAML files that duplicate logic from SQL files
- 10-minute dbt compile times
- That one critical model nobody dares refactor
This isn't a dbt problem—it's a language problem.
SQL was designed for querying relational data, not orchestrating multi-step transformation pipelines. Jinja was designed for templating, not controlling flow. YAML was designed for configuration, not business logic. When you combine three languages that weren't designed to work together, complexity compounds. The dependency graph exists, but it's implicit — inferred from strings. Your IDE can't help you navigate it. Refactoring requires courage.
There's a better way.
Malloy: A Declarative Language Built for Data
Malloy is a high-level language for data. You express the logical structure and semantic meaning of your data — the result you want, not how to compute it. Documentation, visualization hints, and metadata all live directly in the model. Performance annotations are optional and additive — never required to understand the model itself.
SQL (and dbt) are low-level. They force you to describe how to compute what you want: join strategies, intermediate tables, CTEs, clustering keys, materializations, dependency graphs. Key metadata is scattered across SQL, YAML, macros, and downstream BI layers.
This matters beyond maintainability. It matters in two ways that are becoming urgent:
For the engineer who owns the pipeline: when someone needs a new column or table for analysis, the work today means going deep into the weeds of dbt — understanding macro behavior, tracing implicit dependencies, hoping a change doesn't break something downstream. Even with AI agents helping write the code, the underlying complexity is dangerous. Declarative structure makes the pipeline safe to change, because the compiler catches errors and the dependency graph is explicit.
For teams that don't have a dbt expert: as AI-powered analytics tools put more queries against your warehouse, materialization becomes essential to keep costs and latency under control. But if standing up a new materialized table requires deep dbt knowledge, only one team can do it. Malloy makes materialization accessible — any team can add it, because the language is readable and the tooling catches mistakes. That's the difference between a data stack that scales with headcount and one that scales with demand.
Both of these point in the same direction: declarative structure is the foundation for an AI-ready data stack — whether that's warehouse-native ML pipelines or traditional analytics.
Here's what that looks like in practice:
Type Safety
In dbt, column references are just strings — dbt compile won't catch a bad one. The error shows up when the warehouse runs the SQL.
In Malloy, every field reference is validated against the warehouse schema at compile time — before any query runs:
Explicit Dependencies
In dbt, dependencies are strings parsed at compile time. In Malloy, they're named variables — the compiler builds the dependency graph from the language itself:
Breaking changes are caught before runtime. Your IDE can navigate definitions with Cmd+click.
Clean Imports and Flexible Metadata
Malloy has a proper module system and an annotation system for attaching metadata anywhere in your model:
Tags encode documentation (#(doc)), rendering hints (# percent, # line_chart), or application-specific metadata. The system is intentionally flexible — applications interpret tags as needed.
Transformation Power Built Into the Language
dbt's transformation capabilities come from macros and packages. Malloy builds common patterns directly into the language.
The CTE Problem
In SQL, you can't reference a column alias in the same SELECT where it's defined. When transformations build on each other, you need CTEs. Here's a real-world example from our event logs at Credible:
In Malloy, dimensions reference other dimensions directly:
Four lines instead of four CTEs. Each dimension can reference any earlier dimension — no restructuring needed when logic changes.
Developer Experience
The Malloy extension for VS Code and Cursor provides LSP support — errors are caught as you type with autocomplete, hover documentation, and go-to-definition. The Malloy CLI compiles and validates transformations. And when you need database-specific functionality, sql() blocks give you an escape hatch with type safety intact.
Materialization as a First-Class Language Feature
If you've used dbt, you know materialization: saving the result of a transformation as a table so downstream queries read from that table instead of recomputing it. Malloy brings the same concept into the language itself — no separate YAML config, no external orchestration. (Malloy calls this feature "persistence" using the #@ persist annotation, but it's the same idea as dbt's materializations.)
The #@ persist annotation tells Malloy: "This source should be materialized." The backing query's result is saved as a warehouse table, and subsequent queries read from that table instead of recomputing it. Available today as an experimental feature:
- Materialization is visible in the model — no separate YAML config files
- Dependencies are automatic — the compiler knows what to refresh when sources change
- Builds are incremental — each source gets a content-addressed BuildID; if the logic hasn't changed, the table isn't rebuilt
- Logic stays unified — the same model definition powers both materialization and queries
Under the hood, malloy-cli build walks the dependency graph, creates tables in order, and writes a manifest. The Malloy extension reads that manifest automatically. This becomes especially powerful as AI agents query your data more frequently — materialized tables serve results at warehouse query speed instead of recomputing transformations on every request. It's also critical for expensive operations like LLM inference and embedding generation — see Warehouse-Native ML Pipelines and Entity Matching with Embeddings for real-world examples. See the Malloy persistence documentation to try it yourself.
Example: Full Transformation Comparison
A common pattern: building customer aggregates with joins, computed fields, and persistence.
The dbt Way
SQL: 4 SQL files + ~40 lines of YAML declaring tests, constraints, and docs separately from the SQL.
The Malloy Way
Malloy total: 1 file, 1 language. join_one encodes the relationship cardinality explicitly — one customer per order, one product per order. No duplicated structure definitions across files and languages — the join declaration is the documentation. #@ persist is the materialization — declared inline, right where the logic lives. Everything is in one place.
Get Started
The dbt revolution taught us that data transformation deserves software engineering practices. Malloy takes that further: a unified, purpose-built language for transformation, semantic modeling, and materialization. Not YAML. Not Jinja. Not SQL with config blocks. A language designed for the problem.
- Try Malloy: malloydata.dev — open source, with extensions for VS Code and Cursor
- Talk to Credible: credibledata.com — managed infrastructure, governed models, and AI-ready data at scale
- Keep reading: Warehouse-Native ML Pipelines · Entity Matching with Embeddings

.html.png)