Mentus: Real-time Multimodal Assistant

Łukasz Szymański posted an update — Dec 19, 2025 09:25 AM EST

Update: Mentus is Alive! Multi-modal Mentoring established.

We just hit a major milestone in the development of Mentus! After a intense battle with real-time streaming quotas, we successfully pivoted to a robust, high-performance REST-based architecture that brings the AI Mentor to life.

What's new in this update:

Vision Integration: Mentus now captures visual data every 10 seconds to analyze the user's environment and posture.
Voice Interaction: Implemented local Speech-to-Text (STT), allowing users to ask questions hands-free while performing tasks.
Cognitive Brain: Powered by Gemini 1.5 Flash, the system provides contextual advice based on both what it sees and what it hears.
Premium UI: Completely redesigned the interface from scratch. We moved away from the "sci-fi" look to a clean, minimalist "Modern Tech" aesthetic (think Apple/Tesla), prioritizing focus and usability.

Mentus can now recognize gestures, correct camera angles, and respond to verbal inquiries—all while maintaining a smooth, stable connection. Next stop: refining the domain-specific knowledge (Cooking/DIY modes)!

#GoogleGeminiHackathon #BuildWithGemini #AI #NextJS #MultimodalAI

Log in or sign up for Devpost to join the conversation.