Inspiration

We’ve always wanted to build an AI assistant that naturally connects human navigation with robot control using simple voice commands. WALL-E was inspired by the idea of humans and robots collaborating effortlessly in the future. The project was REALLY driven by the challenge of making robotics easy and intuitive for all kinds of users. Plus, full disclosure- it was also inspired after we cried watching WALL-E (no regrets).

What it does

WALL-E is a voice-first AI assistant that brings together real-time navigation and robot control in one sleek dashboard. It uses Google Maps for routing, Gemini AI for smart voice command understanding, and TensorFlow with COCO-SSD for on-device object detection via live camera feed. We also integrated Genesys Cloud to support seamless communication and Auth0 within our system. Users can set destinations, guide robot movements, and get contextual help—all just by talking naturally.

How we built it

We built WALL-E using a mix of hardware and software. Gemini powers the AI language understanding, while Google Maps makes navigation reliable. The robot hardware runs on Raspberry Pi with a camera streaming live video for object detection using TensorFlow.js and COCO-SSD. Voice recognition and speech synthesis use browser APIs and Resemble. QNX handles robot control as a real-time OS. On the front end, a modern, responsive web dashboard was created mainly with JavaScript, HTML, and CSS.

Challenges we ran into

One major challenge was adapting our system for demo, with different versions of the innovation, because our real-world innovation couldn’t be fully demonstrated in a judging room. We had to scale down and recalculate real-world directions to 30cm ranges in a mini-map of Waterloo, which needed precise ratio conversions. Syncing live voice commands with video processing and robot control for smooth responsiveness was another big hurdle. Getting robust natural language understanding to work well with ambiguous commands took lots of iteration. Integrating Raspberry Pi, QNX, and voice synthesis while managing latency and UI performance was tough but critical.

Accomplishments that we're proud of

We are proud that WALL-E can understand and respond to natural voice commands to control navigation and a robot simultaneously. The on-device object detection through live camera feed enables dynamic obstacle awareness during navigation. Integrating Gemini AI for contextual command parsing significantly enhanced the system’s intelligence and user-friendliness. Delivering all these complex capabilities through a sleek and accessible web dashboard is a major achievement.

What we learned

We learnt how to integrate raspberry Pis, using ultrasonic sensors to determine object distance from rover, and most of all we learnt how to use Google Gemini significantly, rather than making a boring chatbot, we actually used it to detect varying objects as intel for the rover to avoid the objects.

What's next for WALL-E

A Major future plan includes using MappedIn, this is so that we can have Wall-E not only move through your neighbourhood, but guiding you through your living room as well!

Built With

Share this project:

Updates