Mentus: Real-time Multimodal Assistant

Inspiration

Physical tasks like mechanics, cooking, or DIY repairs often require hands-on focus. Pausing to check a manual or scroll through a video tutorial disrupts flow and causes frustration. We wanted to build a bridge between digital knowledge and physical action using the newest multimodal capabilities of AI.

What it does

Mentus is a "hands-free mentor". By leveraging the Gemini 3 Live API, it watches your video stream in real-time, identifies objects and actions, and provides immediate voice guidance. It's like having an expert standing right next to you.

How we built it

The core is built on Gemini 3 Flash for ultra-low latency reasoning. We use WebSockets to stream video and audio from the client to a Node.js server, ensuring real-time bidirectional communication.

What's next for Mentus

We are currently optimizing the latency pipeline and refining the "Mentor Persona" system instructions to handle complex, multi-step procedures.

Built With

Updates

Łukasz Szymański posted an update — Dec 19, 2025 10:51 AM EST

Major Update: Mentus is live on Vercel

We have achieved a significant milestone in the development of Mentus. The application is now fully operational and hosted on Vercel, providing a stable environment for multimodal interaction.

Key improvements in this version:

Cinematic Interface: We have implemented a new, minimalist UI that prioritizes the camera feed. The interface uses a cinematic 95 percent viewport layout with floating glassmorphism elements, ensuring a
premium and focused user experience.
Integrated Multimodal Engine: The system now successfully combines computer vision and local speech recognition. Mentus analyzes video snapshots and user transcripts simultaneously using Gemini 1.5 Flash to
provide contextual guidance.
Visual Feedback System: We added dedicated status indicators for Listening and Analyzing states. This
provides users with clear feedback on the AI's current activity without cluttering the screen.
Infrastructure Stability: The backend has been migrated to a robust serverless architecture. This change
resolved previous API quota issues and ensured reliable performance across different devices.

Live Demo: You can test the current version of the hands-free mentor at the following address: https://mentus.vercel.app/

We are now focusing on refining the AI's specialized knowledge and improving response latency for the upcoming final phases of the hackathon.

Log in or sign up for Devpost to join the conversation.

Łukasz Szymański posted an update — Dec 19, 2025 09:56 AM EST

Update: Mentus Goes Cinematic! (v3.1 Release)

We've just deployed the biggest visual and functional overhaul for Mentus. The goal? To make the AI Mentor feel less like a chat tool and more like a futuristic, hands-free companion.

What's New in v3.1:

Cinematic "Immersive" UI: We ditched the sidebar layout. The camera view now dominates 95% of the screen with a sleek, floating glassmorphism interface. It feels like FaceTime with a super-intelligent AI.
Voice-First Interaction: You can now speak naturally to Mentus. Our new hybrid engine captures your voice instantly in the browser and sends it alongside video frames to Gemini 1.5 Flash.
Smart Indicators: New "Listening" and "Analyzing" badges give you instant feedback on what the AI is doing—so you never have to guess if it heard you.
Stable Architecture: We hardened the backend to handle multimodal streams reliably, ensuring the demo works flawlessly even during peak API usage.

Mentus is now fully capable of seeing your workspace, hearing your questions ("Is this wire connected right?"), and speaking back to guide you.

Next steps: Polishing the "personality" of the AI and preparing the final demo video!

#GoogleGeminiHackathon #BuildWithGemini #Multimodal #UIUX #NextJS

Log in or sign up for Devpost to join the conversation.

Łukasz Szymański posted an update — Dec 19, 2025 09:25 AM EST

Update: Mentus is Alive! Multi-modal Mentoring established.

We just hit a major milestone in the development of Mentus! After a intense battle with real-time streaming quotas, we successfully pivoted to a robust, high-performance REST-based architecture that brings the AI Mentor to life.

What's new in this update:

Vision Integration: Mentus now captures visual data every 10 seconds to analyze the user's environment and posture.
Voice Interaction: Implemented local Speech-to-Text (STT), allowing users to ask questions hands-free while performing tasks.
Cognitive Brain: Powered by Gemini 1.5 Flash, the system provides contextual advice based on both what it sees and what it hears.
Premium UI: Completely redesigned the interface from scratch. We moved away from the "sci-fi" look to a clean, minimalist "Modern Tech" aesthetic (think Apple/Tesla), prioritizing focus and usability.

Mentus can now recognize gestures, correct camera angles, and respond to verbal inquiries—all while maintaining a smooth, stable connection. Next stop: refining the domain-specific knowledge (Cooking/DIY modes)!

#GoogleGeminiHackathon #BuildWithGemini #AI #NextJS #MultimodalAI

Log in or sign up for Devpost to join the conversation.

Łukasz Szymański started this project — Dec 19, 2025 09:09 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.