AcousticPR

Inspiration

Effective communication is a critical life skill—one that influences everything from relationships to professional opportunities. Yet, many people struggle with speech anxiety, filler words, stuttering, and unclear emotional expression. We were inspired to build AcousticPR after seeing friends and colleagues with brilliant ideas struggle to express themselves due to poor communication habits.

AcousticPR offers a data-driven practice environment to help users break unproductive habits, build confidence, and strengthen their public speaking skills. By acting like a personal speech therapist, it not only improves communication but also supports your personal and social mental well-being. Your voice is your life. Take control of it today.

What it does

AcousticPR is a personalized speech analytics and coaching platform. It allows users to upload voice recordings, then analyzes their communication across several dimensions:

Transcription & Word Usage
Emotion Analysis
Pitch Variation & Pace
Stuttering & Filler Word Detection
Top Word Frequency & Repetition

Using these metrics, it delivers actionable feedback, highlighting emotional tone, disfluencies, pacing, and more. Users receive tailored llm-driven insights and visual graphs to guide improvement, making communication practice measurable and motivating.

How we built it

We used a full-stack architecture focused on performance, usability, and advanced AI/ML:

Frontend: React + Tailwind.js for a clean, responsive, and gamified UI
Backend: Flask + Socket.IO to support real-time feedback and REST APIs
File Handling: .webm audio + wave for efficient client-to-server transfer (even for long recordings)

Speech Processing Pipeline

Transcription: Utilized OpenAI’s Whisper API for high-accuracy, punctuated speech-to-text.

Emotion Analysis:

Used SpeechBrain’s emotion-diarization-wavlm-large model, trained on diverse datasets (IEMOCAP, MELD, MSP-IMPROV, MOSEI, RAVDESS), for detecting emotional tone in voice. We quantized the model for faster inference and ran it with GPU parallelism to support near real-time feedback.

Audio Feature Extraction:

Pitch analysis via Librosa
Speech rate computed from total syllables and duration
Disfluencies (stuttering, filler words, etc.) detected with custom regex-based NLP pipelines

Insight Generation:

Used ChatGPT-4o mini via OpenAI API to turn raw data into qualitative, structured feedback
Used LlamaIndex with Pydantic to organize metrics and generate repeatable, prompt-engineered insights

Visualization:

Integrated Chart.js to plot user metrics and progress in a visually engaging format

Challenges we ran into

Latency in heavy ML models:

Emotion detection and transcription are both computationally intensive. To resolve this, we parallelized processing tasks using thread pools and ran all ML inference on GPUs.

Unstructured-to-structured NLP:

Turning audio into truly helpful feedback (not just raw stats) required careful prompt engineering, pedantic data modeling, and experimentation with LLMs.

Handling large audio uploads:

Transferring multi-minute recordings quickly and reliably was solved using .webm compression and efficient upload handling via Flask.

Consistency in Feedback:

We needed to balance personalized advice with repeatable metrics, which required multiple design iterations on our LLM + LlamaIndex pipeline.

Accomplishments that we're proud of

Achieving near real-time feedback on 5+ minute audio files with GPU-accelerated, parallelized processing
Training and deploying a custom emotion analysis model from multiple datasets
Seamlessly integrating LLM insights into our app to deliver genuinely actionable advice
Designing a gamified UI that makes practicing communication feel fun

What we learned

Speech is a really complex signal, both acoustically and emotionally. Analyzing it meaningfully requires a blend of audio processing, NLP, and creativity.
Optimization matters: parallel processing, quantization, and efficient file handling were key to creating a fast, smooth UX.
LLMs are really powerful, but only when guided by strong structure and quality prompts.

What's next for AcousticPR

Live Feedback Mode: Enable users to receive analysis and coaching in real-time as they speak.
Camera based face and posture tracking for real time visual emotion indicators

AcousticPR is more than just a speech analyzer. It’s a personal communication coach designed to help you unlock confident, clear expression while improving your mental well-being through self-awareness and growth.