It's Working, Why Am I Not Happy?

(Continued from: Building It Right — Staging, Compliance, and Analytics) The previous three parts of this series covered the bumpy road from a laptop running Docker containers to a cloud-deployed system with a staging environment, compliance guardrails, and real observability. By the end of that phase, I had something I could stand behind. GPT-4o-mini was holding context, product cards were working, NeMo Guardrails was intercepting bad outputs at the infrastructure level, and PostHog was showing me what was actually happening in real sessions.

It was working. And I still wasn’t happy with it.

The Problem with Telling the AI Exactly What to Do

The original system was a workflow. Every single customer message triggered the same eight-step sequence: receive input, pull inventory context, pull strain knowledge, pull customer history, build prompt, call the model, format response, return output. Every time. Whether someone asked “do you have edibles?” or said “thanks, I’ll take it” — the system ran the full sequence regardless.

This works. It’s predictable, it’s testable, and it’s easy to reason about. For a first production deployment, those are real virtues.

But it has a ceiling. A workflow doesn’t think. It executes.

A real budtender doesn’t pull up every data source for every question. They read the customer, ask questions, skip the strain education if the customer already knows what they want. A workflow can’t do any of that — it just runs its sequence and hopes the right context ends up in the prompt.

I’d been reading about agentic architectures. The idea is straightforward: instead of telling the AI exactly what steps to take, you give it a set of tools and let it decide at runtime which ones are actually needed. The agent reasons about the conversation, chooses its actions, and builds its response from what it finds — rather than from what you pre-loaded into the context window.

That’s the difference between a system that follows instructions and one that actually thinks.

Enter PydanticAI

The framework I chose was PydanticAI. For a FastAPI backend already built on Pydantic models, the integration was natural — the type safety and validation patterns I was already using extended cleanly into the agent layer.

The new agent got three tools:

– **`search_inventory`** — semantic search against the live products table in Supabase

– **`search_strain_knowledge`** — the cannabis knowledge base, 5,000+ strains with embeddings, for the “what’s the difference between indica and sativa” moments

– **`get_customer_context`** — returning customer profile and purchase history, pulled by UUID

In the old workflow, all three of those ran on every request. Now the agent decides. Ask a simple question about availability and it calls `search_inventory`. Ask about effects and it might call `search_strain_knowledge` first, then cross-reference inventory. Log in as a returning customer and it pulls your history to personalize the response without being asked.

The agent also handles multi-turn conversation properly — tested a pivot from edibles to beverages mid-session and it tracked the shift without losing the thread. That’s not a small thing. Context continuity is the whole value proposition of a personalized budtender, and the old workflow was only as good as whatever fit in the prompt window.

What’s Actually Working Now

The new agent is deployed to the Railway staging backend and serving real traffic from the Vercel frontend. New customer flows are working well — the agent responds conversationally, asks clarifying questions when it needs them, and searches inventory on its own initiative. The `/customer/lookup` endpoint is working. PostHog telemetry is firing. The health check scheduler is running clean.

But, of course, there are loose ends. There are always loose ends.

I ran into several hard walls implementing this; the details are a post of their own, Bye, Nemo! covers the biggest one.”. For now, I’ll take a small victory lap.

Related posts: