Welcome Screen
Dashboard
Dashboard 2
Lessons Screen
Letta Chat about Lessons
One of the Lessons
Practice Page
Practice Sessions
Practice Analysis
Live practice Session
Coach session
Coach page

🎤 VocalAIAgent: A Multimodal Conversational Vocal Coach

Author: Vocal Coach AI Team
Repository: Berkeley Hack 2025

Overview

This project is a comprehensive AI-powered vocal coaching system built for the Berkeley Hackathon 2025. The application provides a full-stack solution for personalized vocal training, combining real-time voice analysis, intelligent coaching, and conversational AI agents to create a holistic vocal development experience.

VocalAIAgent solves the problem of fragmented vocal training by combining multiple functionalities into one intuitive and multimodal assistant. This AI-powered system simplifies vocal coaching by:

Understanding natural voice patterns and interpreting audio recordings to gather vocal characteristics
Providing personalized recommendations based on vocal analysis and user preferences
Fetching real-time data for vocal metrics, progress tracking, and performance insights
Generating dynamic lesson plans tailored to the user's vocal type, skill level, and practice goals
Offering intelligent coaching conversations grounded in real-time voice analysis data

VocalAIAgent is not just an LLM chatbot. The agent is enhanced with numerous AI capabilities, including:

Voice Understanding: Real-time pitch detection, vocal analysis with metrics like jitter, shimmer, vibrato rate
Retrieval-Augmented Generation (RAG): Providing personalized coaching tips by retrieving relevant vocal techniques from a knowledge base
Few-Shot Prompting: Generating dynamic lesson plans and exercises based on minimal user input
Function Calling: Executing specific functions based on user commands, such as starting voice sessions, analyzing recordings, or generating progress reports
Long Context Window: Managing and retaining user vocal profiles and practice history across multiple sessions
Context Caching: Storing relevant vocal data temporarily to improve response speed and reduce redundant analysis
AI Evaluation: Using LLM-based evaluation to assess vocal progress and provide "Vocal Scores" based on improvement and consistency
Grounding: Ensuring that coaching recommendations are grounded in real-time vocal analysis data
Embeddings: Utilizing embeddings for effective vocal pattern matching and personalized exercise recommendations
Multimodal Integration: Understanding both voice inputs and conversational text for comprehensive coaching

Problem Statement

Vocal training can be an isolated and inconsistent process. Singers and speakers often struggle with:

Lack of real-time feedback during practice sessions
Limited access to personalized coaching based on their specific vocal characteristics
Difficulty tracking progress and identifying improvement areas
Fragmented resources across multiple platforms and tools
Inconsistent practice routines without proper guidance

VocalAIAgent addresses these challenges by providing a unified, intelligent coaching platform that combines voice analysis, personalized AI coaching, and comprehensive progress tracking in one seamless experience.

🚀 Key Features

Core Vocal Analysis

🎵 Real-Time Pitch Detection: Instant feedback during practice sessions with live pitch visualization
📊 Deep Vocal Analysis: Advanced metrics including jitter, shimmer, vibrato rate, vocal range analysis
🎯 Voice Type Classification: Automatic classification of voice types (soprano, alto, tenor, bass)
📈 Progress Tracking: Comprehensive tracking of vocal improvements over time

AI-Powered Coaching System

🤖 Dual-AI Architecture: Proactive Fetch.ai Agent + Reactive Letta Conversational Agent
💬 Stateful Conversations: AI coach that remembers context and discusses specific progress
📋 Personalized Lesson Plans: Dynamic lesson generation based on vocal analysis and user goals
🎓 Exercise Recommendations: Tailored vocal exercises based on analysis results

Advanced Features

🗣️ VAPI Voice Integration: Real-time voice conversations with AI coach
📱 Multimodal Interface: Support for voice input, text chat, and visual feedback
🔄 Lesson Feedback Loop: Comprehensive storage and analysis of lesson completion data
📊 AI-Generated Reports: Daily summaries of performance trends and insights
🎪 Community Features: Progress sharing and vocal challenges

Data & Memory Management

💾 Persistent Memory: User preferences, vocal characteristics, and practice history retention
📤 Export Capabilities: Save vocal analyses, lesson plans, and progress reports
🔐 Secure Data Storage: Supabase integration with proper authentication and RLS
📋 Session Management: Comprehensive tracking of practice sessions and improvements

Why This Matters

Vocal training today lacks the personalized, data-driven approach that modern AI can provide. VocalAIAgent brings together voice science, conversational AI, and personalized coaching into one intelligent system, offering a more effective, engaging, and accessible vocal training experience. By combining real-time voice analysis, stateful AI conversations, and comprehensive progress tracking, this tool showcases the potential of Generative AI in revolutionizing music education and vocal development.

🛠 Tech Stack

Frontend

React: Modern component-based UI framework
TypeScript: Type-safe development with enhanced IDE support
Vite: Fast build tool and development server
Tailwind CSS: Utility-first CSS framework for responsive design
Framer Motion: Smooth animations and transitions

Backend

FastAPI: High-performance Python web framework with automatic API docs
Python: Core backend language with extensive AI/ML libraries
Uvicorn: ASGI server for production deployment

AI Services

Letta: Stateful conversational AI with long-term memory capabilities
Fetch.ai: Autonomous agent system for proactive analysis and reporting
VAPI: Voice AI platform for real-time voice conversations

Database & Authentication

Supabase: PostgreSQL database with built-in authentication and real-time features
Row Level Security (RLS): Secure user data isolation

Voice Processing

Web Audio API: Real-time audio processing and pitch detection
Custom Voice Analyzer: Advanced vocal metrics calculation

Hosting & Deployment

Netlify: Frontend hosting with automatic deployments
Google Cloud Run: Scalable backend container hosting
Docker: Containerized backend for consistent deployments

How It Works

Intent Recognition and Routing System

VocalAIAgent uses a sophisticated routing system that directs user requests to appropriate handlers based on vocal coaching context:

def interpret_vocal_request(user_input, session_context):
    prompt = (
        "You are a vocal coaching function router. Based on the user message and session context, "
        "output ONLY a Python dictionary.\n"
        "- If asking for voice analysis: {'intent': 'analyze_voice', 'session_type': '<TYPE>'}\n"  
        "- If requesting lesson: {'intent': 'start_lesson', 'category': '<CATEGORY>', 'level': '<LEVEL>'}\n"
        "- If seeking feedback: {'intent': 'get_feedback', 'aspect': '<VOCAL_ASPECT>'}\n"
        f"Session Context: {session_context}\n"
        f"User: {user_input}"
    )
    # Route to appropriate vocal coaching handler

Vocal Analysis Pipeline

The voice analysis system combines multiple AI techniques:

Real-time Processing: Web Audio API captures and processes audio in real-time
Feature Extraction: Advanced algorithms extract vocal characteristics (pitch, formants, etc.)
AI Classification: Machine learning models classify voice type and detect patterns
Contextual Analysis: Results are interpreted within the user's vocal development context

Dual-AI Architecture

Proactive Fetch.ai Agent

class VocalCoachAgent:
    """Autonomous agent for vocal analysis and report generation"""

    async def generate_daily_reports(self):
        """Automatically analyze practice sessions and generate insights"""
        users = await self.get_active_users()
        for user_id in users:
            # Analyze vocal progress
            report = await self.analyze_vocal_progress(user_id)
            # Store insights for Letta conversations
            await self.store_insights(user_id, report)

Reactive Letta Conversational Agent

class LettaVocalCoach:
    """Stateful conversational coach with long-term memory"""

    async def generate_response(self, context, user_message):
        """Generate contextual coaching based on vocal analysis data"""
        # Retrieve user's vocal history and analysis
        vocal_context = await self.build_vocal_context(context.user_id)

        # Generate personalized coaching response
        response = await self.letta_client.agents.messages.create(
            agent_id=self.agent_id,
            messages=[{
                "role": "user", 
                "content": f"Vocal Context: {vocal_context}\nUser: {user_message}"
            }]
        )

Session Management and Memory

VocalAIAgent maintains comprehensive session state and user memory:

def start_vocal_session():
    session_memory = {
        'vocal_profile': {
            'voice_type': user.voice_type,
            'skill_level': user.skill_level,
            'practice_goals': user.goals,
            'vocal_range': user.range_analysis
        },
        'practice_history': [],
        'current_focus_areas': [],
        'last_analysis_results': {}
    }

    # Main coaching loop with state management
    while session_active:
        user_input = await get_user_input()
        action = interpret_vocal_request(user_input, session_memory)

        if action['intent'] == 'analyze_voice':
            await handle_voice_analysis(action, session_memory)
        elif action['intent'] == 'start_lesson':
            await handle_lesson_start(action, session_memory)
        # ... additional handlers

Current Capabilities Demonstrated

✅ Voice Analysis & Processing

Real-time pitch detection with Web Audio API
Advanced vocal metrics (jitter, shimmer, vibrato)
Voice type classification and range analysis
Session recording and playback capabilities

✅ AI-Powered Coaching

Fetch.ai autonomous agents for progress analysis
Letta conversational AI with stateful memory
VAPI real-time voice conversations
Personalized lesson and exercise generation

✅ Data Management & Persistence

Comprehensive user vocal profiles
Session history and progress tracking
Lesson feedback storage and retrieval
Secure multi-user data isolation

✅ User Experience Features

Modern, responsive React interface
Real-time visual feedback during voice sessions
Progress dashboards and analytics
Community features and challenges

Current Errors and Solutions

Issues Identified:

URL Construction Error: Double slash in API endpoints causing malformed URLs
Database Connection Issues: Lesson feedback storage failing due to Supabase credential problems
Error Handling: Generic error messages making debugging difficult

Solutions Implemented:

Fixed URL Construction: Added trailing slash removal in frontend API calls
Enhanced Error Logging: Improved backend error reporting with detailed messages
Database Health Checks: Added endpoints to verify service connectivity

Limitations & Future Work

Current Limitations:

Voice Processing Accuracy: Browser-based analysis has limitations compared to specialized hardware
AI Model Training: Limited training data for vocal coaching specific AI models
Scalability: Current architecture needs optimization for large-scale deployment

Future Enhancements:

High Priority:

Enhanced Voice Processing: Integrate professional-grade voice analysis libraries
Advanced AI Models: Fine-tune models specifically for vocal coaching contexts
Mobile Applications: Native iOS/Android apps with enhanced voice processing

Medium Priority:

Social Features: Enhanced community aspects with vocal challenges and peer learning
Integration Ecosystem: Connect with music learning platforms and DAWs
Offline Capabilities: Voice analysis and basic coaching without internet connection

Built With

Core Technologies

React 18 with TypeScript for modern, type-safe frontend development
FastAPI for high-performance Python backend with automatic API documentation
Supabase for PostgreSQL database, authentication, and real-time features
Tailwind CSS for responsive, utility-first styling

AI & Voice Technologies

Letta for stateful conversational AI with long-term memory
Fetch.ai for autonomous agent systems and proactive analysis
VAPI for real-time voice AI conversations
Web Audio API for browser-based voice processing

DevOps & Deployment

Docker for containerized backend deployment
Google Cloud Run for scalable, serverless backend hosting
Netlify for frontend hosting with automatic deployments

VocalAIAgent demonstrates the transformative potential of AI in music education, combining cutting-edge voice processing, conversational AI, and personalized coaching to create a comprehensive vocal training platform.

Built With

bolt
cursor
docker
fastapi
google-cloud
letta
netlify
qroq
react
supabase
vapi
vite

Submitted to

UC Berkeley AI Hackathon 2025
- Winner Vapi: Most Ambitious Vapi Project

Created by

I’m just a chill guy

Ramis H.
⚡ Software Developer
Built the website workflow and integrated pre-trained VAPI AI assistants to power the VocalAI real-time singing coach system

Bekzhan Abdimanapov
I worked on the Groq implementation to analyze the music files to generate insights and feedback.

Sumeet Khedkar
.NET Expert | WPF | C++ | Polyglot Programmer
I developed the Live Coach feature on the Vocal AI app during my first hackathon and first time using VAPI. We trained VAPI agent for over 180 minutes to sing Yesterday by The Beatles and In the End by Linkin Park. With real-time voice analysis during and after calls, VAPI now helps users learn to sing in a smart, interactive way.

ashok sravanam