Inspiration
Physical tasks like mechanics, cooking, or DIY repairs often require hands-on focus. Pausing to check a manual or scroll through a video tutorial disrupts flow and causes frustration. We wanted to build a bridge between digital knowledge and physical action using the newest multimodal capabilities of AI.
What it does
Mentus is a "hands-free mentor". By leveraging the Gemini 3 Live API, it watches your video stream in real-time, identifies objects and actions, and provides immediate voice guidance. It's like having an expert standing right next to you.
How we built it
The core is built on Gemini 3 Flash for ultra-low latency reasoning. We use WebSockets to stream video and audio from the client to a Node.js server, ensuring real-time bidirectional communication.
What's next for Mentus
We are currently optimizing the latency pipeline and refining the "Mentor Persona" system instructions to handle complex, multi-step procedures.
Built With
- gemini-3-api
- google-cloud
- next.js
- node.js
- typescript
- websockets
Log in or sign up for Devpost to join the conversation.