Inspiration
Recycling is a universal burden. Even with the best intentions, people are often confused by hyper-local municipal rules—leading to 25% contamination in recycling streams. We wanted to build a "neighborly" assistant that removes the guesswork using real-time vision.
What it does
Recykle is a multimodal AI agent that "sees" through your camera and "hears" your questions. It identifies waste materials and cross-references them with local disposal rules to tell you exactly which bin to use. It supports real-time interruptions (barge-in) and provides structured visual feedback cards.
How we built it
The app is built with a Next.js frontend and a Node.js proxy hosted on Google Cloud Run. We utilized the Gemini 2.5 Flash Live API (v1beta1) via WebSockets to handle low-latency multimodal streams (PCM 16kHz audio and JPEG frames).
Challenges we ran into
Synchronizing video frames with audio chunks to ensure the AI "sees" what the user is talking about was a significant hurdle. We also focused heavily on managing WebSocket lifecycles on Cloud Run to ensure stable, long-running sessions.
Accomplishments that we're proud of
We successfully implemented a "barge-in" feature where the agent can be interrupted mid-sentence, creating a truly natural conversational flow. Our grounding system ensures the AI is 100% accurate to city-specific rules.
What we learned
We learned the power of the Gemini Live API's native multimodal capabilities and how to orchestrate stateful connections on serverless infrastructure.
Built With
- css
- gemini-live-api
- google-cloud-run
- next.js
- node.js
- tailwind
- typescript
- websockets
Log in or sign up for Devpost to join the conversation.