Inspiration People struggle to fix problems because instructions are static and disconnected from the real world. We wanted an AI that could see what you see and guide you instantly. What it does OmniGuide AI is a real-time multimodal assistant that watches your environment through the camera, listens to your voice, and guides you with spoken instructions and visual overlays. How we built it We used: Gemini Live API Google GenAI SDK Cloud Run WebRTC camera streaming Firestore memory Challenges real-time video streaming multimodal reasoning overlay synchronization Accomplishments We created an AI agent that feels like a human assistant beside you. What we learned Multimodal agents require streaming architecture and context awareness. What's next AR glasses integration robotics support industrial maintenance
Built With
- all
Log in or sign up for Devpost to join the conversation.