ConvoCoach

Inspiration

For over 70 million people globally, speech disorders turn communication—a fundamental human connection—into a source of anxiety. Stuttering, articulation challenges, and social anxiety can erode confidence, limit opportunities, and create isolation. While traditional speech therapy plays a crucial role, it remains out of reach for many due to cost, geographic barriers, and social stigma, leaving a gap that technology has the potential to bridge.

The idea for Convo Coach emerged from a personal experience. One of our team members was once criticized in school for overusing filler words like “um”—a small comment that carried lasting emotional weight. This moment revealed a universal truth: even minor speech differences can fuel self-doubt. Many people hold back from speaking, not because they lack the words, but because they fear judgment. Over time, this fear discourages practice, leading to a cycle of avoidance that reinforces insecurities.

Speech challenges are more than technical obstacles—they are deeply human struggles. A person might remain silent in a meeting to avoid stuttering or rehearse conversations endlessly due to anxiety. Convo Coach was created to break this cycle. By providing real-time analysis of speech patterns and filler words, it empowers users to practice and improve in a private, judgment-free space. More than just a tool, it’s a step toward making speech a source of confidence, not anxiety, ensuring that everyone has the opportunity to be heard.

What it does

Convo Coach helps users reduce filler words in their speech by providing real-time AI-powered feedback. When a user records themselves speaking, the app accurately transcribes their speech, highlighting unnecessary fillers like “um,” “like,” and “uh.” After analyzing the speech, an AI avatar responds dynamically, providing personalized feedback on areas for improvement, such as suggesting pauses before complex phrases or reducing filler words for clearer communication. By offering instant, interactive feedback in a judgment-free environment, Convo Coach helps users refine their speech and build confidence in their communication skills.

How we built it

ConvoCoach is an AI-powered conversational trainer that helps users improve their speaking skills in real time. By leveraging advanced AI technologies, it listens to speech, transcribes it, analyzes patterns, and provides personalized coaching through a talking AI avatar. The system enhances fluency by identifying issues like filler words, pauses, stuttering, and pacing irregularities, offering actionable feedback in a natural and interactive way.

To process speech, ConvoCoach uses AssemblyAI for transcription and OpenAI API for speech analysis, detecting patterns such as overuse of filler words, extended pauses, and inconsistent pacing. Based on this, it generates concise, constructive feedback with practical suggestions like “Try slowing down a bit” or “Reduce filler words like ‘umm’ and ‘like.’” The feedback is then converted into speech by D-ID, which syncs it with a talking avatar to create a more engaging and immersive coaching experience.

By integrating AssemblyAI for transcription, GPT-4 for analysis, D-ID for avatar interaction, and Flask for backend processing, ConvoCoach delivers a seamless, real-time speech training tool. This AI-powered system makes conversational coaching more accessible and effective, transforming speech practice into an interactive, judgment-free experience.

Challenges we ran into

Integrating a Flask API for speech processing while maintaining low latency was a major challenge. Speech transcription and real-time sentiment analysis required optimizing backend processing to ensure responses felt instant and natural. Balancing accuracy with speed proved difficult, as deeper speech analysis introduced delays that could disrupt user experience.

Developing a real-time AI avatar that delivers personalized feedback added another layer of complexity. Generating dynamic, context-aware responses required fine-tuning sentiment analysis models and ensuring that the AI’s tone and pacing adapted to user input. Creating a seamless connection between the avatar and speech analysis while avoiding response lag was a significant hurdle.

Synchronizing the full-stack architecture between Flask and React required careful API optimization. Managing user speech data efficiently, ensuring smooth frontend-to-backend communication, and refining feedback delivery were key technical challenges. Through rigorous testing and iteration, we built a responsive, interactive system that delivers instant, AI-driven speech feedback to help users refine their communication skills.

Accomplishments that we're proud of

As newcomers to the hackathon scene, we are incredibly proud of building a fully functional project within such a short timeframe. Stepping outside our comfort zones, we embraced new technologies and tackled the challenges of creating a full-stack AI-powered speech coach from scratch. Beyond just technical achievements, we’re most proud that Convo Coach has real-world impact, providing an accessible tool that can help people improve their communication skills. We also think our Kenna avatar is pretty cool.

What we learned

Through the development of ConvoCoach, we gained valuable experience in integrating multiple AI technologies into a seamless, real-time system. We deepened our understanding of Machine Learning for speech analysis, learning how to process live audio input, extract meaningful insights, and generate feedback in a way that feels natural and intuitive. Working with AssemblyAI, GPT-4, and D-ID, we navigated the challenges of interfacing different AI models, ensuring smooth synchronization between transcription, analysis, and avatar-based feedback to create an engaging user experience.

Building ConvoCoach also pushed us to refine our skills in real-time data processing, minimizing latency while maintaining accuracy. We optimized API calls, synchronized AI outputs, and streamlined backend operations to ensure smooth performance. Beyond the technical aspects, this project reinforced the social and educational value of AI-powered speech coaching, highlighting its potential to help students, professionals, and individuals with speech difficulties improve their communication skills. This experience has shown us how AI can be leveraged not just for automation, but for meaningful, human-centered applications that empower users to communicate more effectively.

What's next for ConvoCoach

The development of ConvoCoach has opened up exciting possibilities for expanding AI-driven speech training. Moving forward, we aim to enhance the system by introducing gamification features, allowing users to earn experience points, level up, and complete speech challenges that encourage continuous improvement. By incorporating progress tracking, users will be able to monitor their speech development over time, fostering a sense of motivation and achievement as they refine their communication skills.

Beyond individual speech coaching, we see potential for ConvoCoach to be applied in education and classroom engagement. A possible extension of the platform would be an AI-powered substitute teacher that allows educators to upload lesson plans, which are then delivered interactively through AI avatars. Students would join as avatars, engage in discussions through voice or chat, and interact with AI-driven prompts in real-time. By integrating with platforms like Pear Deck, this system could facilitate structured class discussions and quizzes, making learning more engaging and accessible.

The broader vision for ConvoCoach is to expand its impact on accessibility and human-centric AI solutions. The platform has the potential to assist individuals with speech difficulties, improve public speaking skills, and provide an inclusive learning environment for those who may struggle with verbal communication. As AI-driven speech coaching continues to evolve, we are committed to refining ConvoCoach into a scalable, adaptive, and widely accessible tool that empowers users to communicate with confidence in any setting.