Inspiration

Recycling is a universal burden. Even with the best intentions, people are often confused by hyper-local municipal rules—leading to 25% contamination in recycling streams. We wanted to build a "neighborly" assistant that removes the guesswork using real-time vision.

What it does

Recykle is a multimodal AI agent that "sees" through your camera and "hears" your questions. It identifies waste materials and cross-references them with local disposal rules to tell you exactly which bin to use. It supports real-time interruptions (barge-in) and provides structured visual feedback cards.

How we built it

The app is built with a Next.js frontend and a Node.js proxy hosted on Google Cloud Run. We utilized the Gemini 2.5 Flash Live API (v1beta1) via WebSockets to handle low-latency multimodal streams (PCM 16kHz audio and JPEG frames).

Challenges we ran into

Synchronizing video frames with audio chunks to ensure the AI "sees" what the user is talking about was a significant hurdle. We also focused heavily on managing WebSocket lifecycles on Cloud Run to ensure stable, long-running sessions.

Accomplishments that we're proud of

We successfully implemented a "barge-in" feature where the agent can be interrupted mid-sentence, creating a truly natural conversational flow. Our grounding system ensures the AI is 100% accurate to city-specific rules.

What we learned

We learned the power of the Gemini Live API's native multimodal capabilities and how to orchestrate stateful connections on serverless infrastructure.

Built With

Share this project:

Updates