Inspiration

The shopping experience has come a long way in the past two decades — from self-checkout lanes to mobile apps and online delivery services. But despite these advances, very few have been designed with visually impaired shoppers in mind. We wanted to change that by reimagining an often-overlooked part of the experience: the shopping cart. Our goal was to make in-store shopping more accessible, intuitive, and independent for those with visual impairments.

What it does

WheelyWise is a smart shopping assistant on wheels that follows you around the store using real-time camera tracking. It uses object detection to stay close without getting in the way — and knows how to ignore other shoppers. When you stop to check out a product, WheelyWise can identify it and share helpful details like the price, nutrition facts, and whether it fits your dietary needs. You can even ask it questions using your voice, thanks to its built-in conversational assistant.

How we built it

The system's hardware is built upon the Trilobot robot kit, with the addition of a tray and an external web camera. The Raspberry Pi 4B was replaced with our own Pi 5 to use the Raspberry Pi OS instead of the QNX OS.

We used computer vision to enable the Trilobot to autonomously follow people and navigate its environment. It detects people using YOLO and calculates their position in the camera frame. The robot adjusts its movement based on the detected person’s location:

  • If the person is centered, the robot moves forward.
  • If the person is to the left or right, the robot curves in that direction to re-center.
  • If the person is very close (occupying a large part of the frame), the robot stops.
  • If no person is detected for a few seconds, the robot spins in the last seen direction to search for someone. This approach allows the robot to dynamically follow a person, avoid obstacles (by stopping if someone is too close), and resume searching if it loses track of its target. The navigation logic is tightly integrated with the vision system, making the robot responsive to real-time changes in its environment.

We used YOLOv11n model for object detection and labelling using a catalogue dataset for accurate classification. An external webcam is mounted above the tray to scan products, while the Raspberry Pi camera handles user tracking. In short, the algorithm assigns bounding boxes to detected objects and controls the robot’s wheels to keep the user centered in the camera’s view.

We initially explored Google’s Gemini TTS capabilities but ultimately transitioned to using a Python-based text-to-speech library to simplify integration and reduce latency. This allowed us to convert the assistant’s responses into spoken audio directly within our application. The audio is generated in real time and played back through the system’s default output or Bluetooth-connected speakers. While the voice quality is more synthetic compared to Gemini’s premium TTS, this approach improved reliability and ensured offline compatibility for our Raspberry Pi-based prototype.

Voice input is captured using Python’s speech_recognition library, allowing users to speak naturally. The assistant listens for commands or questions, transcribes them using Google Speech Recognition, and determines when the user wants to end the session with phrases like “I’m done” or “thank you.” This multi-turn conversation flow ensures a seamless and hands-free user experience.

Challenges we ran into

We initially chose QNX’s RTOS to power the robot for its real-time capabilities, but encountered challenges with setup and limited library support. After evaluating these constraints, we pivoted to Raspberry Pi OS, which allowed us to quickly integrate camera modules and motor controls, accelerating development and enabling smoother system interactions. We also faced difficulties with selecting the models to implement both people tracking / identification and hand gesture recognition at the same time.

Accomplishments that we're proud of

We’re proud of successfully integrating hardware and AI to build a fully functional system. Despite the challenges of working with limited hardware resources and coordinating multiple components—such as real-time object detection, voice recognition, and text-to-speech synthesis—we were able to create a seamless and responsive shopping assistant. Implementing voice interactions using the Gemini API and achieving natural language responses added another layer of complexity that we’re proud to have tackled within the timeframe. Bringing together both hardware and software elements into a cohesive, user-friendly experience was a major accomplishment for our team.

What we learned

Most of us started with only basic hardware programming knowledge, especially with hardware, learning as we went. Even getting the robot to move was a major hurdle at the beginning, let alone programming it to follow a person, classify products, and provide audio output.

What's next for WheelyWise

The next step with WheelyWise involves scaling its physical design and upgrading its onboard camera sensors to improve the accuracy and reliability of its object recognition model. Integrating LiDAR and an odometry system would significantly boost its spatial awareness, enabling more robust localization and obstacle avoidance. Finally, a centralized fleet management system could let multiple WheelyWise units work together in real time — making their movement and task coordination much more efficient across a shared space.

Built With

Share this project:

Updates