dbt Wizard: reading the graph, not the files

I read about dbt Wizard and my first reaction was that dbt Labs had quietly shipped the thing I’d been bolting together by hand. My daily setup is Claude Code as the agent, the dbt MCP server feeding it project context, and the dbt agent skills installed on top. Wizard is that same idea, except dbt Labs built it as one piece instead of three I have to wire up and maintain. So I spent some time working out what’s actually different, and whether the productized version is worth giving up the one I rolled myself.

Point Claude Code at a dbt project today and it sees a folder of SQL-with-Jinja text files. What it doesn’t see is the thing that actually matters: the lineage. Which models feed this one, what’s downstream, which tests guard it, which metric it ends up in.

For a while I assumed that gap was about undocumented business logic, the stuff that only lives in someone’s head. Some of it is. But most of the context an agent needs is already in the project, it’s just not in the .sql text. It’s the DAG, the compiled manifest, the YAML. And here’s the part that took me a second: that graph is a derived thing. {{ ref('stg_orders') }} is just a string until something parses every file, resolves every ref, and builds it. A generic agent has no dbt parser inside it, so it can’t read the real graph. It can only fake one by reading a pile of files and inferring, which is lossy and burns tokens.

That explains a friction I’d felt without naming it: my setup sometimes overcomplicates a simple task. A blind agent can’t see the blast radius and can’t cheaply check its own work, so it either over-edits to be safe or makes a change it can’t verify and flails. The overcomplication was a symptom of working blind.

The model that made it click

The cut that helped me most: Wizard isn’t a smarter agent, it’s a better-situated one. Same intelligence, better seat.

Two halves do the work. Context in: a native metadata engine, an index of your project’s compiled state, lineage, build status, test results, semantic definitions, baked into the agent’s loop. Not files it dumps in your repo, a read model it consults. The dbt MCP server gets a general agent the same data, but the agent still has to decide, turn by turn, when to fetch it and which tool to call. Native grounding means that’s no longer a decision it can get wrong.

Checking out: this is the piece I underrated. Wizard says it self-validates a change before you ever review it, and my read is that’s dbt Fusion, the new Rust engine, doing the work. Fusion does static analysis: it parses your SQL into a syntax tree, resolves column references across model dependencies, and type-checks the whole thing, all at compile time, before a query touches the warehouse. So if a refactor renames a column a downstream model selects, Fusion catches it even if you never wrote a test for that column. That’s a deterministic check, not the model’s opinion. It’s a different category from “run the tests and hope,” which needs the warehouse and only catches what you thought to test.

Context in from the metadata engine, deterministic checking out from Fusion, and a diff you approve in the middle. That’s the whole shape.

The part I had wrong

I’d assumed Wizard was just a harness running on my own Claude subscription. Setting up the CLI showed me it’s more split than that. You bring your own model auth, but the options aren’t equal. With Anthropic it’s an API key, metered per token, like Claude Code on a key. With OpenAI you can log in with a ChatGPT subscription and ride a flat cost, which is the route I took. What you can’t do is point it at your Claude subscription the way Claude Code lets you. The platform version is different again, metered through dbt’s own billing, with a managed model that isn’t Claude either (dbt’s managed option runs OpenAI). It’s model-agnostic, and it isn’t free. Worth knowing before you assume your existing plan covers it.

When I’d reach for it

The honest tradeoff is grounding versus freedom. Wizard’s context and validation are better out of the box, and I don’t have to maintain a stack of bolted-together parts. What I give up is control: my own model, my own prompts, my own tools, and the freedom to point the agent at things that aren’t dbt. I’m also more tied to dbt’s platform and its tiers, betting on a public beta.

I think that bet reflects where a lot of this is going. The interesting work right now isn’t a smarter general model, it’s harness design: grounding the agent in real structure and making it deterministic where determinism matters. For work that lives entirely inside dbt, that’s a good trade. I’ve got the CLI running on a real project now, pointed only at my dev schema, and I’ll judge it on whether the grounding actually holds up day to day. If it doesn’t earn its place over the setup I already have, that’s a fine answer too.

Why a generic agent works blind

The model that made it click

The part I had wrong

When I’d reach for it