Shopper Buddy

Short Pitch

Every year, millions of people with vision loss lose something most of us take for granted: the ability to walk into a store and shop for themselves.

They depend on someone else to read labels. To count change. To decide what goes in the basket.

That's not independence. That's a workaround.

Today, we're introducing Shopper Buddy.

Point your phone at a shelf and tap the button. Our AI reads the product: name, brand, price, and speaks it to you instantly. Tap, it's in your basket.

Shopper Buddy is a button-first, camera-powered shopping assistant for the visually impaired.

It uses AI to see what you can't, recognising products from a live camera feed using multimodal vision and embeddings. One tap triggers a scan. The result comes back as speech. No screen-reading required. No typing. No waiting.

You can also hold the button and speak — to add items, check your basket, or pay. But the button is always in control.

One button. Total independence.

"The most powerful thing we can build is something that gives someone their autonomy back."


Technical Details

Core Pipeline

  • User taps button → camera captures frame → image sent to AWS Bedrock
  • Claude 3 Haiku (vision) extracts: brand, name, quantity, packaging, colour, label text
  • Amazon Titan Embed Text v2 converts extracted text into 256-dim vectors
  • RAG (Retrieval-Augmented Generation): vectors queried against a pre-embedded product catalogue via cosine similarity search
  • Product catalogue built from CSV data covering the 5 largest supermarket chains in the Netherlands (last updated March 2026)
  • Matched product (name, brand, price) spoken aloud via OpenAI TTS
  • If confidence < 0.5, spoken as a probable match with disclaimer

Speech

  • Voice input: hold button → OpenAI Whisper (STT, batch on release) → transcript
  • Voice output: OpenAI GPT-4o Realtime API → streaming PCM audio playback
  • Intent parsing via rule-based situation graph

Basket & Payment

  • Tap to count quantity (TTS counts each tap aloud); 2.5s silence auto-confirms
  • Voice commands: add, remove, read basket
  • Bunq Banking API (live balance check)
  • Warns via TTS if basket exceeds available balance

UI

  • Single large button occupies bottom 30% of screen — the entire interaction surface
  • Live camera feed top 70%; minimal overlay with basket count + total
  • Dark, high-contrast theme; designed to be used without looking at the screen
  • Mobile-first (max 480px), deployed on Vercel (serverless)

How AI is used

AWS Bedrock powers the multimodal product recognition pipeline. We route every camera frame through Anthropic Claude 3 Haiku (via Amazon Bedrock) to extract product attributes — brand, name, quantity, packaging, colour, and label text — then embed them with Amazon Titan Embed Text v2 (also via Bedrock) into 256-dimensional vectors for cosine-similarity search against our pre-computed Dutch supermarket catalogue. The non-text modality is image: no barcode, no manual input, just a photo.

On the output side, audio replaces the screen entirely: OpenAI's Realtime API streams speech back to the user, while OpenAI Whisper adds a second audio modality for hands-free voice control. The result is an image-in, audio-out loop that requires no visual literacy to operate.


What's Next

As a next step, we envision partnerships with supermarket chains to integrate directly with their live product databases: real pricing, real stock, real aisle locations. On the product side, we aim to add features such as allergen and dietary alerts spoken automatically on scan and multi-language support beyond English.

Longer term, we see Shopper Buddy expanding beyond grocery retail into pharmacies, clothing stores, and any environment where a label stands between someone and their independence.

The technology is ready. The partnerships are next.

Built With

Share this project:

Updates