🎤 VocalAIAgent PRO: A Multimodal Conversational Vocal Coach
https://vocal-ai.site/
https://prezi.com/p/uulwqjra-tgm/enhancing-vocal-skills-with-voiceai/?present=1
Author: Vocal Coach AI PRO Team
Overview
This project is a comprehensive AI-powered vocal coaching system built for the Bolt.new Hackathon. The application provides a full-stack solution for personalized vocal training, combining real-time voice analysis, intelligent coaching, and conversational AI agents to create a holistic vocal development experience.
VocalAIAgent solves the problem of fragmented vocal training by combining multiple functionalities into one intuitive and multimodal assistant. This AI-powered system simplifies vocal coaching by:
- Understanding natural voice patterns and interpreting audio recordings to gather vocal characteristics
- Providing personalized recommendations based on vocal analysis and user preferences
- Fetching real-time data for vocal metrics, progress tracking, and performance insights
- Generating dynamic lesson plans tailored to the user's vocal type, skill level, and practice goals
- Offering intelligent coaching conversations grounded in real-time voice analysis data
VocalAIAgent is not just an LLM chatbot. The agent is enhanced with numerous AI capabilities, including:
- Voice Understanding: Real-time pitch detection, vocal analysis with metrics like jitter, shimmer, vibrato rate
- Retrieval-Augmented Generation (RAG): Providing personalized coaching tips by retrieving relevant vocal techniques from a knowledge base
- Few-Shot Prompting: Generating dynamic lesson plans and exercises based on minimal user input
- Function Calling: Executing specific functions based on user commands, such as starting voice sessions, analyzing recordings, or generating progress reports
- Long Context Window: Managing and retaining user vocal profiles and practice history across multiple sessions
- Context Caching: Storing relevant vocal data temporarily to improve response speed and reduce redundant analysis
- AI Evaluation: Using LLM-based evaluation to assess vocal progress and provide "Vocal Scores" based on improvement and consistency
- Grounding: Ensuring that coaching recommendations are grounded in real-time vocal analysis data
- Embeddings: Utilizing embeddings for effective vocal pattern matching and personalized exercise recommendations
- Multimodal Integration: Understanding both voice inputs and conversational text for comprehensive coaching
Problem Statement
Vocal training can be an isolated and inconsistent process. Singers and speakers often struggle with:
- Lack of real-time feedback during practice sessions
- Limited access to personalized coaching based on their specific vocal characteristics
- Difficulty tracking progress and identifying improvement areas
- Fragmented resources across multiple platforms and tools
- Inconsistent practice routines without proper guidance
VocalAIAgent addresses these challenges by providing a unified, intelligent coaching platform that combines voice analysis, personalized AI coaching, and comprehensive progress tracking in one seamless experience.
🚀 Key Features
Core Vocal Analysis
🎵 Real-Time Pitch Detection: Instant feedback during practice sessions with live pitch visualization
📊 Deep Vocal Analysis: Advanced metrics including jitter, shimmer, vibrato rate, vocal range analysis
🎯 Voice Type Classification: Automatic classification of voice types (soprano, alto, tenor, bass)
📈 Progress Tracking: Comprehensive tracking of vocal improvements over time
AI-Powered Coaching System
🤖 Dual-AI Architecture: Proactive Fetch.ai Agent + Reactive Letta Conversational Agent
💬 Stateful Conversations: AI coach that remembers context and discusses specific progress
📋 Personalized Lesson Plans: Dynamic lesson generation based on vocal analysis and user goals
🎓 Exercise Recommendations: Tailored vocal exercises based on analysis results
Advanced Features
🗣️ VAPI Voice Integration: Real-time voice conversations with AI coach
📱 Multimodal Interface: Support for voice input, text chat, and visual feedback
🔄 Lesson Feedback Loop: Comprehensive storage and analysis of lesson completion data
📊 AI-Generated Reports: Daily summaries of performance trends and insights
🎪 Community Features: Progress sharing and vocal challenges
Data & Memory Management
💾 Persistent Memory: User preferences, vocal characteristics, and practice history retention
📤 Export Capabilities: Save vocal analyses, lesson plans, and progress reports
🔐 Secure Data Storage: Supabase integration with proper authentication and RLS
📋 Session Management: Comprehensive tracking of practice sessions and improvements
Why This Matters
Vocal training today lacks the personalized, data-driven approach that modern AI can provide. VocalAIAgent brings together voice science, conversational AI, and personalized coaching into one intelligent system, offering a more effective, engaging, and accessible vocal training experience. By combining real-time voice analysis, stateful AI conversations, and comprehensive progress tracking, this tool showcases the potential of Generative AI in revolutionizing music education and vocal development.
🛠 Tech Stack
Frontend
- React: Modern component-based UI framework
- TypeScript: Type-safe development with enhanced IDE support
- Vite: Fast build tool and development server
- Tailwind CSS: Utility-first CSS framework for responsive design
- Framer Motion: Smooth animations and transitions
Backend
- FastAPI: High-performance Python web framework with automatic API docs
- Python: Core backend language with extensive AI/ML libraries
- Uvicorn: ASGI server for production deployment
AI Services
- Letta: Stateful conversational AI with long-term memory capabilities
- Fetch.ai: Autonomous agent system for proactive analysis and reporting
- VAPI: Voice AI platform for real-time voice conversations
Database & Authentication
- Supabase: PostgreSQL database with built-in authentication and real-time features
- Row Level Security (RLS): Secure user data isolation
Voice Processing
- Web Audio API: Real-time audio processing and pitch detection
- Custom Voice Analyzer: Advanced vocal metrics calculation
Hosting & Deployment
- Netlify: Frontend hosting with automatic deployments
- Google Cloud Run: Scalable backend container hosting
- Docker: Containerized backend for consistent deployments
How It Works
Intent Recognition and Routing System
VocalAIAgent uses a sophisticated routing system that directs user requests to appropriate handlers based on vocal coaching context:
def interpret_vocal_request(user_input, session_context):
prompt = (
"You are a vocal coaching function router. Based on the user message and session context, "
"output ONLY a Python dictionary.\n"
"- If asking for voice analysis: {'intent': 'analyze_voice', 'session_type': '<TYPE>'}\n"
"- If requesting lesson: {'intent': 'start_lesson', 'category': '<CATEGORY>', 'level': '<LEVEL>'}\n"
"- If seeking feedback: {'intent': 'get_feedback', 'aspect': '<VOCAL_ASPECT>'}\n"
f"Session Context: {session_context}\n"
f"User: {user_input}"
)
# Route to appropriate vocal coaching handler
Vocal Analysis Pipeline
The voice analysis system combines multiple AI techniques:
- Real-time Processing: Web Audio API captures and processes audio in real-time
- Feature Extraction: Advanced algorithms extract vocal characteristics (pitch, formants, etc.)
- AI Classification: Machine learning models classify voice type and detect patterns
- Contextual Analysis: Results are interpreted within the user's vocal development context
Dual-AI Architecture
Proactive Fetch.ai Agent
class VocalCoachAgent:
"""Autonomous agent for vocal analysis and report generation"""
async def generate_daily_reports(self):
"""Automatically analyze practice sessions and generate insights"""
users = await self.get_active_users()
for user_id in users:
# Analyze vocal progress
report = await self.analyze_vocal_progress(user_id)
# Store insights for Letta conversations
await self.store_insights(user_id, report)
Reactive Letta Conversational Agent
class LettaVocalCoach:
"""Stateful conversational coach with long-term memory"""
async def generate_response(self, context, user_message):
"""Generate contextual coaching based on vocal analysis data"""
# Retrieve user's vocal history and analysis
vocal_context = await self.build_vocal_context(context.user_id)
# Generate personalized coaching response
response = await self.letta_client.agents.messages.create(
agent_id=self.agent_id,
messages=[{
"role": "user",
"content": f"Vocal Context: {vocal_context}\nUser: {user_message}"
}]
)
Session Management and Memory
VocalAIAgent maintains comprehensive session state and user memory:
def start_vocal_session():
session_memory = {
'vocal_profile': {
'voice_type': user.voice_type,
'skill_level': user.skill_level,
'practice_goals': user.goals,
'vocal_range': user.range_analysis
},
'practice_history': [],
'current_focus_areas': [],
'last_analysis_results': {}
}
# Main coaching loop with state management
while session_active:
user_input = await get_user_input()
action = interpret_vocal_request(user_input, session_memory)
if action['intent'] == 'analyze_voice':
await handle_voice_analysis(action, session_memory)
elif action['intent'] == 'start_lesson':
await handle_lesson_start(action, session_memory)
# ... additional handlers
Current Capabilities Demonstrated
✅ Voice Analysis & Processing
- Real-time pitch detection with Web Audio API
- Advanced vocal metrics (jitter, shimmer, vibrato)
- Voice type classification and range analysis
- Session recording and playback capabilities
✅ AI-Powered Coaching
- Fetch.ai autonomous agents for progress analysis
- Letta conversational AI with stateful memory
- VAPI real-time voice conversations
- Personalized lesson and exercise generation
✅ Data Management & Persistence
- Comprehensive user vocal profiles
- Session history and progress tracking
- Lesson feedback storage and retrieval
- Secure multi-user data isolation
✅ User Experience Features
- Modern, responsive React interface
- Real-time visual feedback during voice sessions
- Progress dashboards and analytics
- Community features and challenges
Current Errors and Solutions
Issues Identified:
- URL Construction Error: Double slash in API endpoints causing malformed URLs
- Database Connection Issues: Lesson feedback storage failing due to Supabase credential problems
- Error Handling: Generic error messages making debugging difficult
Solutions Implemented:
- Fixed URL Construction: Added trailing slash removal in frontend API calls
- Enhanced Error Logging: Improved backend error reporting with detailed messages
- Database Health Checks: Added endpoints to verify service connectivity
Limitations & Future Work
Current Limitations:
- Voice Processing Accuracy: Browser-based analysis has limitations compared to specialized hardware
- AI Model Training: Limited training data for vocal coaching specific AI models
- Scalability: Current architecture needs optimization for large-scale deployment
Future Enhancements:
High Priority:
- Enhanced Voice Processing: Integrate professional-grade voice analysis libraries
- Advanced AI Models: Fine-tune models specifically for vocal coaching contexts
- Mobile Applications: Native iOS/Android apps with enhanced voice processing
Medium Priority:
- Social Features: Enhanced community aspects with vocal challenges and peer learning
- Integration Ecosystem: Connect with music learning platforms and DAWs
- Offline Capabilities: Voice analysis and basic coaching without internet connection
Built With
Core Technologies
- React 18 with TypeScript for modern, type-safe frontend development
- FastAPI for high-performance Python backend with automatic API documentation
- Supabase for PostgreSQL database, authentication, and real-time features
- Tailwind CSS for responsive, utility-first styling
AI & Voice Technologies
- Letta for stateful conversational AI with long-term memory
- Fetch.ai for autonomous agent systems and proactive analysis
- VAPI for real-time voice AI conversations
- Web Audio API for browser-based voice processing
DevOps & Deployment
- Docker for containerized backend deployment
- Google Cloud Run for scalable, serverless backend hosting
- Netlify for frontend hosting with automatic deployments
VocalAIAgent demonstrates the transformative potential of AI in music education, combining cutting-edge voice processing, conversational AI, and personalized coaching to create a comprehensive vocal training platform.
Inspirations:
Developer 1: I’ve always loved singing, and the idea for Vocal AI came to me a long time ago when I realized I wanted to improve my vocal skills but didn’t have enough time for in-person lessons. I thought, why not create a website or app that helps with singing practice anytime? Developer 2: I realized that there is no such product available online and AI are more accessible to people than real coaches. I think Vocal AI is able to solve this problem
Challenges I ran into:
Developer 1: Balancing late nights without sleep, working on the website while preparing for exams, and pushing myself to stay focused and motivated despite the tiredness. It was tough, but I kept reminding myself that the final result would be worth it. Developer 2: Backend integration was tough and it took me some time to find best solutions for ideas that I had. I did not want to leave static frontend. Also, changing the images on the project developed by Bolt was tough.
Accomplishments I’m proud of:
Developer 1: I’m really happy with the design — it’s clean, not overloaded, and easy on the eyes. There are no harsh or bright elements that distract or tire the viewer, which makes the user experience pleasant. Developer 2: I'm proud to connect three distinct AIs to this project and integrate numerous services.
What I learned:
We learned that ideas which seem impossible at first can be brought to life with effort and teamwork. Dedication is important — sometimes working late nights is necessary to achieve great results. We also learned to be patient, manage my emotions, and focus on simplicity in design rather than adding too many elements.
What’s next for Vocal AI:
We plan to keep improving Vocal AI by enhancing both the design and functionality. Our goal is to make it a user-friendly and helpful app that many people can use to improve their singing skills. This version of a project is just the beginning, and we're excited to continue developing it further and deploying on other platforms. We're open for investments.
Built With
- bolt
- docker
- fastapi
- google-cloud
- letta
- netlify
- qroq
- react
- supabase
- vapi
- vite
Log in or sign up for Devpost to join the conversation.