Hackathon Project: SpeakEasy

Inspiration

We have friends with special needs who struggle with communication, especially in understanding how to respond in specific scenarios. Witnessing these challenges firsthand inspired us to create a tool that provides real-time, actionable feedback to help improve their communication skills. We wanted to make a difference by leveraging technology to support effective communication and build confidence.

What it does

SpeakEasy is an innovative platform designed to enhance communication abilities through real-time feedback on tone, grammar, pronunciation, and facial expressions. Users can record and upload videos, which are then analyzed to provide detailed, sentence-by-sentence feedback, helping them refine their speaking skills and build confidence.

How we built it

  • Frontend: Built with Next.js, allowing users to easily record and upload videos directly from the web interface.
  • Backend: Implemented with FastAPI, handling video processing, feedback generation, and communication with AI models.
  • Whisper: Used for transcribing audio with word-level timestamps for precise feedback.
  • Pydub: Utilized to split audio into manageable segments for detailed analysis.
  • Hume AI: Analyzes prosody (tone) and facial expressions in video clips to provide comprehensive feedback.
  • OpenAI GPT-4: Generates detailed feedback on tone, grammar, pronunciation, and facial expressions based on the analyzed data.
  • MoviePy: Processes video files to extract and split clips corresponding to individual sentences.
  • FFmpeg: Used for real-time audio and video processing, ensuring accurate synchronization and seamless analysis.
  • Postman: Utilized for API testing to ensure seamless communication between frontend and backend components.

Challenges we ran into

  1. Rendering and Splitting Clips: Ensuring clean and accurate rendering of video clips without overlaps or errors was challenging. Training our AI models with reliable code samples helped ensure precise analysis.
  2. Syncing Audio and Video: Initially, syncing text-to-speech audio with video clips was problematic. FFmpeg was instrumental in stitching the audio and video accurately.
  3. Fine-Tuning AI Models: Prompt engineering and tuning our generative AI models to provide precise and useful feedback required significant effort.

Accomplishments that we're proud of

  • Successfully integrating multiple AI models to achieve smooth and accurate feedback.
  • Implementing an automated process where AI models correct their own errors, enhancing feedback reliability.
  • Creating an interactive and user-friendly frontend that allows users to easily navigate and engage with the feedback.

What we learned

  • Advanced prompt engineering techniques for effective AI communication.
  • Video and audio editing using command-line tools like FFmpeg.
  • How to integrate and orchestrate multiple AI models to work seamlessly together.
  • Developing a user-centric interface that enhances the learning experience.

What's next for SpeakEasy

  • Scenario-Based Training: Implementing features that ask users to respond to various scenarios, providing context-specific feedback to improve their communication in different situations.
  • Multilingual Support: Allowing users to choose their preferred language for feedback.
  • Subtitles and Accessibility: Implementing subtitles for feedback to increase accessibility.
  • Expanded Capabilities: Extending the platform to cover more disciplines and providing additional educational resources.
  • Optimized Backend: Continuously optimizing backend requests and improving error handling for a smoother user experience.

Join us in making communication accessible and effective for everyone!

Built With

Share this project:

Updates