What If a Dispensary Had an AI on Staff?

(Continued from the AI Budtender Project)

The AI Budtender is a customer-facing assistant that helps cannabis dispensary customers find products, get personalized recommendations, and build shopping lists — all powered by live inventory and purchase history. These are the details of the oft bumpy road that lead to the MVP.

The first version was about as scrappy as it gets. I ran the whole thing on my laptop using Docker and Ollama — a setup that lets you run AI models locally without sending data to the cloud. For a cannabis application where customer privacy is paramount, that felt like the right instinct. Keep everything local, keep everything controlled.

The architecture was four Docker containers working together:

  • Ollama + Model — the local LLM runtime running Llama 3.2 3B, small enough to run on a consumer machine
  • RAG (Retrieval Augmented Generation) — a domain knowledge layer giving the AI access to cannabis-specific information it wouldn’t otherwise have
  • MCP (Model Context Protocol) — handling customer history and live inventory, so the AI knew what was actually on the shelf and who it was talking to
  • Voice — Whisper for speech-to-text and Piper for text-to-speech, so customers could actually speak to the budtender and hear it respond

That last container felt like the most exciting piece. A customer walks up, asks a question out loud, and the AI answers back. That’s not a chatbot — that’s something closer to a real interaction.

And we got it working.

Then I spent time on the actual shop floor. Music pumping, customers talking, street noise bleeding in through the door. Whisper, as good as it is, wasn’t built for that environment. It couldn’t reliably capture what a customer was saying over that kind of ambient noise. Voice was scrapped.

It wasn’t a failure — it was a field test. I know exactly how to add it back if the environment ever calls for it, perhaps a kiosk with a noise-canceling mic in a quieter corner of the store. But I made a deliberate decision not to carry it into the cloud build. Adding complexity that isn’t going into production isn’t ambition — it’s debt.

For a first attempt, the local build taught me everything I needed to know. The RAG layer showed real promise. The MCP integration proved the inventory connection was viable. And Llama 3.2 3B revealed its ceiling clearly: after a couple of exchanges, it lost the thread entirely. Ask a follow-up question and it had already forgotten what you’d discussed two messages ago. For a budtender — whose entire value is remembering what you told them and building on it — that’s not a limitation you can work around.

There was also a practical reality I couldn’t ignore: I wanted to show this to the dispensary owner. And I wasn’t about to bring my laptop to work and say “here, let me plug this in.” A real demo needs to live somewhere accessible, stable, and not dependent on my PC staying open in the break room.

It was time to move to the cloud. Continue reading: Moving to the Cloud — and Finding the Right Model