Option 1: The "Hybrid" Approach
Inspiration
The inspiration for WALL-E WALL-E came from a desire to bridge the gap between traditional embedded systems and modern generative AI. While most small-scale robots rely solely on hard-coded sensor logic, we wanted to create a machine with a "soul"—one that could not only navigate the physical world but also understand and converse with it using Large Language Models (LLMs) and computer vision.
What it does
WALL-E WALL-E is a multi-modal autonomous robot designed for interaction and exploration.
Intelligent Navigation: It uses a dual-layer avoidance system combining ultrasonic sensors for proximity and YOLO-based computer vision for object recognition.
Precision Pathing: It utilizes colour sensors to detect and follow specific tracks or boundaries.
Natural Interaction: Through Gemini and ElevenLabs APIs, the robot provides real-time voice responses to user queries.
Gesture Control: Users can pilot the robot using hand gestures, moving beyond traditional remote controls.
How we built it
The project was built using a hybrid architecture:
Hardware: Integrated ultrasonic and color sensors with a microcontroller for low-latency movement.
Vision: Implemented a YOLO (You Only Look Once) model for real-time visual obstacle avoidance and hand gesture recognition.
Intelligence: Connected the Gemini API for natural language processing and ElevenLabs for high-fidelity text-to-speech output.
Software: Utilized Python for the core logic, drawing on previous experience with asynchronous programming to handle simultaneous sensor data and API calls.
Challenges we ran into
One of the primary hurdles was managing the latency between the physical sensors and the cloud-based APIs. Ensuring that WALL-E WALL-E could stop for an obstacle via ultrasonic sensors while simultaneously processing a voice request through Gemini required careful threading and priority management. Additionally, optimizing the YOLO model to run efficiently on a portable setup without significant lag was a steep learning curve.
Accomplishments that we're proud of
We are particularly proud of the seamless integration of gesture control. Being able to wave a hand and have the robot respond instantly feels like magic. We also succeeded in creating a distinctive "personality" for WALL-E WALL-E by fine-tuning the ElevenLabs voice and Gemini prompts to make the interaction feel more human and less robotic.
What we learned
This project taught us the importance of sensor fusion. We learned how to combine deterministic data (ultrasonic) with probabilistic data (YOLO vision) to create a more robust navigation system. We also deepened our understanding of integrating LLMs into hardware, specifically regarding prompt engineering for real-time edge devices.
What's next for WALL-E WALL-E
The next step is to improve the robot's spatial memory, allowing it to map its environment rather than just reacting to it. We also plan to move more of the AI processing locally to reduce reliance on an internet connection, making WALL-E WALL-E truly autonomous in any environment.
Log in or sign up for Devpost to join the conversation.