🎤 VocalAIAgent: A Multimodal Conversational Vocal Coach
Author: Vocal Coach AI Team
Repository: Berkeley Hack 2025
Overview
This project is a comprehensive AI-powered vocal coaching system built for the Berkeley Hackathon 2025. The application provides a full-stack solution for personalized vocal training, combining real-time voice analysis, intelligent coaching, and conversational AI agents to create a holistic vocal development experience.
VocalAIAgent solves the problem of fragmented vocal training by combining multiple functionalities into one intuitive and multimodal assistant. This AI-powered system simplifies vocal coaching by:
- Understanding natural voice patterns and interpreting audio recordings to gather vocal characteristics
- Providing personalized recommendations based on vocal analysis and user preferences
- Fetching real-time data for vocal metrics, progress tracking, and performance insights
- Generating dynamic lesson plans tailored to the user's vocal type, skill level, and practice goals
- Offering intelligent coaching conversations grounded in real-time voice analysis data
VocalAIAgent is not just an LLM chatbot. The agent is enhanced with numerous AI capabilities, including:
- Voice Understanding: Real-time pitch detection, vocal analysis with metrics like jitter, shimmer, vibrato rate
- Retrieval-Augmented Generation (RAG): Providing personalized coaching tips by retrieving relevant vocal techniques from a knowledge base
- Few-Shot Prompting: Generating dynamic lesson plans and exercises based on minimal user input
- Function Calling: Executing specific functions based on user commands, such as starting voice sessions, analyzing recordings, or generating progress reports
- Long Context Window: Managing and retaining user vocal profiles and practice history across multiple sessions
- Context Caching: Storing relevant vocal data temporarily to improve response speed and reduce redundant analysis
- AI Evaluation: Using LLM-based evaluation to assess vocal progress and provide "Vocal Scores" based on improvement and consistency
- Grounding: Ensuring that coaching recommendations are grounded in real-time vocal analysis data
- Embeddings: Utilizing embeddings for effective vocal pattern matching and personalized exercise recommendations
- Multimodal Integration: Understanding both voice inputs and conversational text for comprehensive coaching
Problem Statement
Vocal training can be an isolated and inconsistent process. Singers and speakers often struggle with:
- Lack of real-time feedback during practice sessions
- Limited access to personalized coaching based on their specific vocal characteristics
- Difficulty tracking progress and identifying improvement areas
- Fragmented resources across multiple platforms and tools
- Inconsistent practice routines without proper guidance
VocalAIAgent addresses these challenges by providing a unified, intelligent coaching platform that combines voice analysis, personalized AI coaching, and comprehensive progress tracking in one seamless experience.
🚀 Key Features
Core Vocal Analysis
🎵 Real-Time Pitch Detection: Instant feedback during practice sessions with live pitch visualization
📊 Deep Vocal Analysis: Advanced metrics including jitter, shimmer, vibrato rate, vocal range analysis
🎯 Voice Type Classification: Automatic classification of voice types (soprano, alto, tenor, bass)
📈 Progress Tracking: Comprehensive tracking of vocal improvements over time
AI-Powered Coaching System
🤖 Dual-AI Architecture: Proactive Fetch.ai Agent + Reactive Letta Conversational Agent
💬 Stateful Conversations: AI coach that remembers context and discusses specific progress
📋 Personalized Lesson Plans: Dynamic lesson generation based on vocal analysis and user goals
🎓 Exercise Recommendations: Tailored vocal exercises based on analysis results
Advanced Features
🗣️ VAPI Voice Integration: Real-time voice conversations with AI coach
📱 Multimodal Interface: Support for voice input, text chat, and visual feedback
🔄 Lesson Feedback Loop: Comprehensive storage and analysis of lesson completion data
📊 AI-Generated Reports: Daily summaries of performance trends and insights
🎪 Community Features: Progress sharing and vocal challenges
Data & Memory Management
💾 Persistent Memory: User preferences, vocal characteristics, and practice history retention
📤 Export Capabilities: Save vocal analyses, lesson plans, and progress reports
🔐 Secure Data Storage: Supabase integration with proper authentication and RLS
📋 Session Management: Comprehensive tracking of practice sessions and improvements
Why This Matters
Vocal training today lacks the personalized, data-driven approach that modern AI can provide. VocalAIAgent brings together voice science, conversational AI, and personalized coaching into one intelligent system, offering a more effective, engaging, and accessible vocal training experience. By combining real-time voice analysis, stateful AI conversations, and comprehensive progress tracking, this tool showcases the potential of Generative AI in revolutionizing music education and vocal development.
🛠 Tech Stack
Frontend
- React: Modern component-based UI framework
- TypeScript: Type-safe development with enhanced IDE support
- Vite: Fast build tool and development server
- Tailwind CSS: Utility-first CSS framework for responsive design
- Framer Motion: Smooth animations and transitions
Backend
- FastAPI: High-performance Python web framework with automatic API docs
- Python: Core backend language with extensive AI/ML libraries
- Uvicorn: ASGI server for production deployment
AI Services
- Letta: Stateful conversational AI with long-term memory capabilities
- Fetch.ai: Autonomous agent system for proactive analysis and reporting
- VAPI: Voice AI platform for real-time voice conversations
Database & Authentication
- Supabase: PostgreSQL database with built-in authentication and real-time features
- Row Level Security (RLS): Secure user data isolation
Voice Processing
- Web Audio API: Real-time audio processing and pitch detection
- Custom Voice Analyzer: Advanced vocal metrics calculation
Hosting & Deployment
- Netlify: Frontend hosting with automatic deployments
- Google Cloud Run: Scalable backend container hosting
- Docker: Containerized backend for consistent deployments
How It Works
Intent Recognition and Routing System
VocalAIAgent uses a sophisticated routing system that directs user requests to appropriate handlers based on vocal coaching context:
def interpret_vocal_request(user_input, session_context):
prompt = (
"You are a vocal coaching function router. Based on the user message and session context, "
"output ONLY a Python dictionary.\n"
"- If asking for voice analysis: {'intent': 'analyze_voice', 'session_type': '<TYPE>'}\n"
"- If requesting lesson: {'intent': 'start_lesson', 'category': '<CATEGORY>', 'level': '<LEVEL>'}\n"
"- If seeking feedback: {'intent': 'get_feedback', 'aspect': '<VOCAL_ASPECT>'}\n"
f"Session Context: {session_context}\n"
f"User: {user_input}"
)
# Route to appropriate vocal coaching handler
Vocal Analysis Pipeline
The voice analysis system combines multiple AI techniques:
- Real-time Processing: Web Audio API captures and processes audio in real-time
- Feature Extraction: Advanced algorithms extract vocal characteristics (pitch, formants, etc.)
- AI Classification: Machine learning models classify voice type and detect patterns
- Contextual Analysis: Results are interpreted within the user's vocal development context
Dual-AI Architecture
Proactive Fetch.ai Agent
class VocalCoachAgent:
"""Autonomous agent for vocal analysis and report generation"""
async def generate_daily_reports(self):
"""Automatically analyze practice sessions and generate insights"""
users = await self.get_active_users()
for user_id in users:
# Analyze vocal progress
report = await self.analyze_vocal_progress(user_id)
# Store insights for Letta conversations
await self.store_insights(user_id, report)
Reactive Letta Conversational Agent
class LettaVocalCoach:
"""Stateful conversational coach with long-term memory"""
async def generate_response(self, context, user_message):
"""Generate contextual coaching based on vocal analysis data"""
# Retrieve user's vocal history and analysis
vocal_context = await self.build_vocal_context(context.user_id)
# Generate personalized coaching response
response = await self.letta_client.agents.messages.create(
agent_id=self.agent_id,
messages=[{
"role": "user",
"content": f"Vocal Context: {vocal_context}\nUser: {user_message}"
}]
)
Session Management and Memory
VocalAIAgent maintains comprehensive session state and user memory:
def start_vocal_session():
session_memory = {
'vocal_profile': {
'voice_type': user.voice_type,
'skill_level': user.skill_level,
'practice_goals': user.goals,
'vocal_range': user.range_analysis
},
'practice_history': [],
'current_focus_areas': [],
'last_analysis_results': {}
}
# Main coaching loop with state management
while session_active:
user_input = await get_user_input()
action = interpret_vocal_request(user_input, session_memory)
if action['intent'] == 'analyze_voice':
await handle_voice_analysis(action, session_memory)
elif action['intent'] == 'start_lesson':
await handle_lesson_start(action, session_memory)
# ... additional handlers
Current Capabilities Demonstrated
✅ Voice Analysis & Processing
- Real-time pitch detection with Web Audio API
- Advanced vocal metrics (jitter, shimmer, vibrato)
- Voice type classification and range analysis
- Session recording and playback capabilities
✅ AI-Powered Coaching
- Fetch.ai autonomous agents for progress analysis
- Letta conversational AI with stateful memory
- VAPI real-time voice conversations
- Personalized lesson and exercise generation
✅ Data Management & Persistence
- Comprehensive user vocal profiles
- Session history and progress tracking
- Lesson feedback storage and retrieval
- Secure multi-user data isolation
✅ User Experience Features
- Modern, responsive React interface
- Real-time visual feedback during voice sessions
- Progress dashboards and analytics
- Community features and challenges
Current Errors and Solutions
Issues Identified:
- URL Construction Error: Double slash in API endpoints causing malformed URLs
- Database Connection Issues: Lesson feedback storage failing due to Supabase credential problems
- Error Handling: Generic error messages making debugging difficult
Solutions Implemented:
- Fixed URL Construction: Added trailing slash removal in frontend API calls
- Enhanced Error Logging: Improved backend error reporting with detailed messages
- Database Health Checks: Added endpoints to verify service connectivity
Limitations & Future Work
Current Limitations:
- Voice Processing Accuracy: Browser-based analysis has limitations compared to specialized hardware
- AI Model Training: Limited training data for vocal coaching specific AI models
- Scalability: Current architecture needs optimization for large-scale deployment
Future Enhancements:
High Priority:
- Enhanced Voice Processing: Integrate professional-grade voice analysis libraries
- Advanced AI Models: Fine-tune models specifically for vocal coaching contexts
- Mobile Applications: Native iOS/Android apps with enhanced voice processing
Medium Priority:
- Social Features: Enhanced community aspects with vocal challenges and peer learning
- Integration Ecosystem: Connect with music learning platforms and DAWs
- Offline Capabilities: Voice analysis and basic coaching without internet connection
Built With
Core Technologies
- React 18 with TypeScript for modern, type-safe frontend development
- FastAPI for high-performance Python backend with automatic API documentation
- Supabase for PostgreSQL database, authentication, and real-time features
- Tailwind CSS for responsive, utility-first styling
AI & Voice Technologies
- Letta for stateful conversational AI with long-term memory
- Fetch.ai for autonomous agent systems and proactive analysis
- VAPI for real-time voice AI conversations
- Web Audio API for browser-based voice processing
DevOps & Deployment
- Docker for containerized backend deployment
- Google Cloud Run for scalable, serverless backend hosting
- Netlify for frontend hosting with automatic deployments
VocalAIAgent demonstrates the transformative potential of AI in music education, combining cutting-edge voice processing, conversational AI, and personalized coaching to create a comprehensive vocal training platform.
Built With
- bolt
- cursor
- docker
- fastapi
- google-cloud
- letta
- netlify
- qroq
- react
- supabase
- vapi
- vite


Log in or sign up for Devpost to join the conversation.