VoiceMax - Voice that states your mental health
VoiceMax is a web application designed to help users gain insights into their emotional state by analyzing the nuances in their voice. Record your voice, and let our AI provide you with an understanding of your primary emotion, suggest other potential underlying feelings, and offer personalized feedback.
Inspiration
In a world where mental well-being is increasingly recognized as crucial, we were inspired by the idea of using technology to offer accessible tools for self-reflection. The human voice is rich with emotional cues, often subconscious. VoiceMax aims to tap into this, providing users with a simple yet powerful way to connect with their feelings and understand themselves better. We believe that a moment of pause and reflection, guided by insightful feedback, can be a small step towards greater emotional awareness and well-being.
What it does
VoiceMax allows users to:
- Record Audio: Easily record their voice directly in the browser using their microphone.
- Analyze Emotion: Submit the recording for AI-powered analysis, which identifies the primary emotion expressed.
- Get Additional Insights: Receive suggestions for other emotions that might be subtly present in their voice.
- Receive Personalized Feedback: Obtain tailored feedback based on the detected primary emotion, including supportive messages and, for certain emotions, suggestions for coping mechanisms like breathing exercises.
- Review and Reset: Play back their recording and easily start a new analysis.
How we built it
VoiceMax is a modern web application built with a focus on user experience and cutting-edge AI:
- Frontend:
- Next.js: For a robust React framework with server-side rendering and static site generation capabilities.
- React: For building a dynamic and interactive user interface.
- TypeScript: For enhanced code quality and type safety.
- ShadCN UI & Tailwind CSS: For a sleek, responsive, and modern component library and styling, ensuring the app looks great on all devices.
- Backend & AI:
- Genkit (by Firebase): Used as the framework for defining and managing AI flows.
- Google AI (Gemini models): Powers the core AI functionalities:
- Emotion detection from audio.
- Suggestion of related emotions.
- Generation of personalized feedback.
- Utilizing tool-calling for features like suggesting breathing exercises.
- Platform:
- Designed to be easily deployable on platforms like Firebase App Hosting.
- Key Browser APIs:
-
navigator.mediaDevices.getUserMediaandMediaRecorder API: For in-browser audio recording.
-
The AI analysis is a multi-step process:
- The recorded audio (as a data URI) is sent to a Genkit flow (
analyzeAudioEmotion) that uses a Gemini model to determine the primary emotion. - This primary emotion, along with contextual information, is then passed to another flow (
suggestAdditionalEmotions) to get nuanced suggestions for other potential emotions. - Finally, the primary emotion is used by the
providePersonalizedFeedbackflow, which may also utilize a Genkit tool (e.g., for breathing exercise suggestions), to generate supportive feedback.
Challenges we ran into
Building VoiceMax involved several interesting challenges:
- Microphone Permissions & Recording: Ensuring smooth and reliable microphone access across different browsers and handling potential errors gracefully was a key focus. Implementing the
MediaRecorderAPI and converting audio to a data URI required careful handling. - Asynchronous AI Operations: Integrating multiple sequential AI calls (emotion detection, then suggestions, then feedback) while keeping the UI responsive and informative (e.g., showing loading states) required careful state management.
- Prompt Engineering: Crafting effective prompts for the Gemini models to accurately detect emotions, provide relevant suggestions, and generate truly personalized and helpful feedback took iterative refinement. This included structuring the output with Zod schemas for reliable parsing.
- Hydration Errors: We encountered and resolved a common Next.js hydration error related to browser extensions modifying the initial HTML, which was addressed using
suppressHydrationWarning. - State Management for Multi-Step UI: Managing the application's state through the recording, analysis, and results display phases, including error states and reset functionality, needed a clear and robust approach.
Accomplishments that we're proud of
- Seamless AI Integration: Successfully integrating a multi-step AI analysis pipeline using Genkit and Gemini models to provide a comprehensive emotional insight experience.
- User-Friendly Recording Experience: Creating an intuitive in-browser audio recording feature with clear user feedback and error handling.
- Personalized & Actionable Feedback: Going beyond simple emotion detection to offer personalized feedback and relevant suggestions (like breathing exercises) that users can find genuinely helpful.
- Responsive Design: Building a clean and modern UI with ShadCN and Tailwind CSS that works well across various devices.
- Robust Error Handling: Implementing mechanisms to catch and inform users about issues, whether they're related to microphone access or AI analysis.
What we learned
This project was a great learning experience, particularly in:
- Working with Genkit: Gaining hands-on experience with Genkit for defining, managing, and deploying AI flows, including prompt engineering, schema definition with Zod, and tool usage.
- Leveraging Modern AI Models: Understanding the capabilities and nuances of using large language models like Gemini for tasks beyond text generation, such as audio analysis and structured data output.
- Browser Media APIs: Deepening our understanding of the
MediaRecorderAPI and related browser functionalities for handling user media. - State Management in React/Next.js: Refining our approach to managing complex asynchronous operations and UI states in a client-side rendered Next.js application.
- Importance of Iterative Development: The AI prompts and UI flow benefited significantly from iterative testing and refinement based on observed results.
What's next for VoiceMax
VoiceMax has a lot of potential for growth. Some future ideas include:
- Emotion Trend Tracking: Allowing users to track their emotional states over time to identify patterns or progress.
- Journaling Integration: Enabling users to add text-based journal entries alongside their voice recordings for richer context.
- Advanced Analysis: Exploring more detailed vocal feature analysis (e.g., tone, pitch, speaking rate) for deeper insights.
- Integration with Wearables/Health Platforms: Connecting with other health and wellness data sources.
- More Diverse Coping Mechanisms: Expanding the range of suggested exercises or resources based on detected emotions.
- User Accounts & Data Persistence: Allowing users to save their analysis history securely.
- Sentiment Intensity: Not just detecting the emotion, but also its intensity (e.g., slightly sad vs. very sad).
Built With
- nextjs
Log in or sign up for Devpost to join the conversation.