GitHub: https://github.com/arxvpatel/UTRA-Hacks


WEB DEMO, TRY DRIVE OUR CAR [wasd to move, b to boost, shift to brake]

https://yellow-brownies.vercel.app/


BROKEN AXLES COUNT WOOHOO: 37

We went through more axles than we want to admit, but somehow the robot survived. So did we.


What it does

Banana Brownies is an autonomous rover built for a closed-hardware course. Its job is simple on paper: follow colors, move through zones, reach the black center target, and interact with objects using a claw.

In reality, nothing is simple. Instead of relying on precise timing or distances that break the moment a motor drifts, the robot makes decisions based on what it sees and recovers when things go wrong. If a reading is sketchy, it backs up, wiggles a bit, and tries again.


What Makes It Unique

We combine ElevenLabs Speech-to-Text and Google Gemini into a single voice-to-result pipeline. Most demos use either voice input or AI understanding — we use both. Say what you want, and the system transcribes, reasons, and highlights the matching robot parts in one smooth flow. It’s a pretty unique way to explore technical systems.


How We Use ElevenLabs

ElevenLabs Scribe V2 turns your voice into accurate text.

  1. Audio capture — The frontend records from your microphone (WebM/WAV).
  2. Streaming upload — Audio is sent to our Express backend.
  3. Transcription — The backend calls ElevenLabs Scribe V2 for high-accuracy transcription.
  4. Pipeline handoff — The transcribed text is passed straight to Gemini for part identification.

Immersive Audio Experience

Component-Specific Sound Effects

  • Every robot part has its own signature sound generated via ElevenLabs
  • Click a wheel → hear a tire rolling
  • Click a sensor → hear an electronic beep/ping
  • Click the Arduino → hear a microcontroller boot sound

Custom Engine Sound Recording

  • Record your own voice or sounds (up to 10 seconds) using your microphone
  • Use your custom sound as the engine sound

Dynamic Audio System

  • Engine sounds adjust playback rate based on car speed (0.5x to 2.0x)
  • Volume automatically scales with speed for realistic acceleration
  • Drift sound plays when braking + turning simultaneously

How We Use Gemini

Google Gemini 2.5 Flash is the reasoning layer that maps natural language to robot parts.

  1. Semantic matching — Queries like "Show me the part that controls speed" or "Where is the wireless communication module?" are analyzed against our robot parts database to find the most relevant components.
  2. Context-aware understanding — Gemini handles synonyms and context: "brain" → microcontroller, "wireless communication" → WiFi/Bluetooth, "movement" → motors and actuators.
  3. Structured output — A custom prompt produces validated JSON: partIds, confidence, and reasoning so we can highlight parts and explain why they were chosen.
  4. Graceful fallback — Without API keys, the app switches to keyword-based demo mode.

Gemini 2.5 Flash is fast (sub-second), cost-efficient, and works without training — perfect for hackathon-scale apps.


The Combined Pipeline

Voice Input → ElevenLabs Transcription → Gemini Analysis → Part Highlighting

The result: hands-free exploration, semantic search instead of exact keywords, and explanations for why parts were matched.


Features

  • Interactive 3D robot with clickable parts
  • Voice commands via ElevenLabs Speech-to-Text
  • Natural language queries via Google Gemini
  • Text search and category filtering
  • Part highlighting with glow effects

Challenges

We ran through 8 Gemini API keys because we forgot to rate limit and did not bother checking until everything stopped working.

Every time we reassembled the robot, something else broke. Motors, sensors, it was always something new.

We kept losing screws and nuts, which somehow vanished into thin air. At one point we had to wait two hours for new screws to arrive just so we could keep building.

Also, the axles. Always the axles.


Tech stack

Frontend

  • React 19 with Vite and TypeScript
  • Three.js using @react-three/fiber and @react-three/drei
  • Zustand
  • Tailwind CSS
  • Axios

Backend

  • Node.js with Express
  • TypeScript
  • Multer, dotenv, CORS
  • Axios

Robot

  • Arduino (.ino, C++)
  • Custom C and C++ modules for navigation, sensing, and control

Final thoughts

This build was exhausting, chaotic, and held together by determination and late-night food orders. Things broke constantly. We fixed them. Then they broke again.

Would we do it again?

Absolutely. 100 percent.

Built With

Share this project:

Updates