Moving to the Cloud — and Finding the Right Model

(Continued from: What If a Dispensary Had an AI on Staff?) The local build had done its job. I knew the concept worked. I knew what the limitations were. Now I needed something I could actually demo to a dispensary owner without carrying my laptop into the store.

The move to the cloud was surprisingly smooth. I chose Vercel for the frontend, Railway for the backend, and Supabase for the database — not because they were the most powerful options, but because all three have generous free tiers. I was self-funding this entire build. If the owner wasn’t interested, I still needed to pay rent. Cost-consciousness isn’t a limitation — it’s a design principle.

For LLM access, I chose OpenRouter rather than locking into a single provider. The local build had already taught me that model choice matters enormously, and I wasn’t going to paint myself into a corner. OpenRouter gives me the flexibility to switch models without rebuilding the integration — swap a single environment variable and the entire system points somewhere new. If token costs shift, if a better model drops, if context requirements change, I’m covered.

That flexibility turned out to be immediately necessary.

The Hallucination Problem

The first cloud version ran on Llama 3.1 70B — a massive leap from the 3B model on my laptop. The conversational coherence was night and day. It could hold context, handle nuance, and speak about cannabis knowledgeably.

It also invented products with complete confidence.

Ask it for a sleep edible and it would recommend something that sounded completely plausible — a brand name, a potency, a format — that simply did not exist in the inventory. For a general chatbot, hallucination is an annoyance. For a dispensary assistant recommending products to real customers, it’s a liability. A customer asking a budtender to pull a product that doesn’t exist isn’t a minor UX issue. It erodes trust instantly.

The fix required two things. First, tightening the system prompt with strict product rules — the AI was explicitly instructed to only recommend products appearing in the inventory context, using exact product names, never inventing. Second, adding product cards to the interface so customers could build a shopping list to hand to the budtender. This created a natural verification step: if the AI recommended something real, it would appear as a card. If it hallucinated, there would be nothing to add.

Finding the Right Model

Even with the guardrails in place, context issues persisted with the larger model. Long conversations would drift. The system prompt would get crowded out. I went back to OpenRouter and switched to GPT-4o-mini — smaller than the 70B, but purpose-built for exactly this kind of task. Fast, coherent, cost-efficient, and most importantly: it holds the thread.

It’s the model running in beta today.

The model journey — 3B to 70B to GPT-4o-mini — wasn’t a failure arc. It was calibration. Each model taught me something specific about what this application actually needs. Not the most powerful model. The right one.

Continue reading: Building It Right — Staging, Compliance, and Analytics