GitHub: https://github.com/arxvpatel/UTRA-Hacks
WEB DEMO, TRY DRIVE OUR CAR [wasd to move, b to boost, shift to brake]
https://yellow-brownies.vercel.app/
BROKEN AXLES COUNT WOOHOO: 37
We went through more axles than we want to admit, but somehow the robot survived. So did we.
What it does
Banana Brownies is an autonomous rover built for a closed-hardware course. Its job is simple on paper: follow colors, move through zones, reach the black center target, and interact with objects using a claw.
In reality, nothing is simple. Instead of relying on precise timing or distances that break the moment a motor drifts, the robot makes decisions based on what it sees and recovers when things go wrong. If a reading is sketchy, it backs up, wiggles a bit, and tries again.
What Makes It Unique
We combine ElevenLabs Speech-to-Text and Google Gemini into a single voice-to-result pipeline. Most demos use either voice input or AI understanding — we use both. Say what you want, and the system transcribes, reasons, and highlights the matching robot parts in one smooth flow. It’s a pretty unique way to explore technical systems.
How We Use ElevenLabs
ElevenLabs Scribe V2 turns your voice into accurate text.
- Audio capture — The frontend records from your microphone (WebM/WAV).
- Streaming upload — Audio is sent to our Express backend.
- Transcription — The backend calls ElevenLabs Scribe V2 for high-accuracy transcription.
- Pipeline handoff — The transcribed text is passed straight to Gemini for part identification.
Immersive Audio Experience
Component-Specific Sound Effects
- Every robot part has its own signature sound generated via ElevenLabs
- Click a wheel → hear a tire rolling
- Click a sensor → hear an electronic beep/ping
- Click the Arduino → hear a microcontroller boot sound
Custom Engine Sound Recording
- Record your own voice or sounds (up to 10 seconds) using your microphone
- Use your custom sound as the engine sound
Dynamic Audio System
- Engine sounds adjust playback rate based on car speed (0.5x to 2.0x)
- Volume automatically scales with speed for realistic acceleration
- Drift sound plays when braking + turning simultaneously
How We Use Gemini
Google Gemini 2.5 Flash is the reasoning layer that maps natural language to robot parts.
- Semantic matching — Queries like "Show me the part that controls speed" or "Where is the wireless communication module?" are analyzed against our robot parts database to find the most relevant components.
- Context-aware understanding — Gemini handles synonyms and context: "brain" → microcontroller, "wireless communication" → WiFi/Bluetooth, "movement" → motors and actuators.
- Structured output — A custom prompt produces validated JSON:
partIds,confidence, andreasoningso we can highlight parts and explain why they were chosen. - Graceful fallback — Without API keys, the app switches to keyword-based demo mode.
Gemini 2.5 Flash is fast (sub-second), cost-efficient, and works without training — perfect for hackathon-scale apps.
The Combined Pipeline
Voice Input → ElevenLabs Transcription → Gemini Analysis → Part Highlighting
The result: hands-free exploration, semantic search instead of exact keywords, and explanations for why parts were matched.
Features
- Interactive 3D robot with clickable parts
- Voice commands via ElevenLabs Speech-to-Text
- Natural language queries via Google Gemini
- Text search and category filtering
- Part highlighting with glow effects
Challenges
We ran through 8 Gemini API keys because we forgot to rate limit and did not bother checking until everything stopped working.
Every time we reassembled the robot, something else broke. Motors, sensors, it was always something new.
We kept losing screws and nuts, which somehow vanished into thin air. At one point we had to wait two hours for new screws to arrive just so we could keep building.
Also, the axles. Always the axles.
Tech stack
Frontend
- React 19 with Vite and TypeScript
- Three.js using @react-three/fiber and @react-three/drei
- Zustand
- Tailwind CSS
- Axios
Backend
- Node.js with Express
- TypeScript
- Multer, dotenv, CORS
- Axios
Robot
- Arduino (.ino, C++)
- Custom C and C++ modules for navigation, sensing, and control
Final thoughts
This build was exhausting, chaotic, and held together by determination and late-night food orders. Things broke constantly. We fixed them. Then they broke again.
Would we do it again?
Absolutely. 100 percent.
Log in or sign up for Devpost to join the conversation.