(Continued from: Moving to the Cloud — and Finding the Right Model) By this point the core system was working. GPT-4o-mini was holding context, the product cards were giving customers a tangible output, and the hallucination problem was largely under control. I could have pushed to production and called it done.
Instead I built a staging environment.
The decision didn’t come from a tutorial or a best practice article. It came from muscle memory. Before AI, I spent years in infrastructure — and I always kept a lab. A couple of servers set aside specifically for testing software before it ever touched production. You don’t install anything on a live system without knowing how it behaves first. That instinct followed me into this build naturally. Every new component — every integration, every guardrail, every analytics hook — needed a safe place to fail before it touched real customers. Staging became that place. A mirror of production with its own database, its own Railway deployment, its own environment variables. Anything new goes to staging first.
It’s the kind of infrastructure decision that feels like overhead until the day it saves you.
Getting Serious About Compliance
The system prompt had always included soft guardrails — instructions telling the AI not to make medical claims, not to recommend cannabis as treatment for specific conditions. California law is unambiguous on this. But instructions in a prompt are a policy, not a proof. They can drift. They can be talked around. They don’t leave an audit trail.
I’d been reading about NeMo Guardrails — NVIDIA’s open-source framework for enforcing behavioral constraints on LLM outputs at the infrastructure level. I brought it to Claude, asked whether it was worth adding to the stack, and got an enthusiastic yes. NeMo operates as a layer between the user input and the LLM response, actively intercepting and blocking prohibited content rather than hoping the model behaves. That distinction matters. A cannabis business can’t hand regulators a system prompt and call it compliance. NeMo gives the system a technical enforcement mechanism with real teeth.
It went to staging first. Tested thoroughly. Then to production.
Knowing What’s Actually Happening
A system running in production without observability is a black box. I knew the AI was responding — I didn’t know how customers were actually using it, where conversations were dropping off, which queries were hitting dead ends, or whether the product recommendations were landing.
Adding PostHog wasn’t a new idea either. It was another instinct carried forward from infrastructure days. When you’ve been responsible for mission-critical systems — the kind where an outage at 2am has real consequences — you learn quickly that a monitoring dashboard isn’t optional. You need to see what’s happening in real time: what’s healthy, what’s straining, where the bottlenecks are forming before they become failures. I’d lived that lesson at scale. I wasn’t going to run a production AI system blind.
PostHog gave me that visibility. Every query, every product card generated, every session. Not just “is it running?” but “how is it actually being used?” The difference between those two questions is the difference between operating a system and understanding one.
It went to staging first. Production after.
What This Phase Was Really About
The staging environment, NeMo Guardrails, and PostHog analytics might seem like polish — things you add after the real work is done. I’d argue they’re the opposite. They’re what separates a demo from a system you can stand behind. A dispensary owner asking “how do you know it’s compliant?” deserves a better answer than “we told it to behave.” A client asking “is anyone actually using it?” deserves real data, not a shrug.
This phase was about building something I could defend. Not just to an owner, but to myself.
