Project Story
Inspiration
I used to have a hard time speaking with others. I always felt insecure about how well I sounded—was I saying the right thing, was I making sense, was I embarrassing myself?
Things got a bit better when I met my teammate. Turns out, he struggled just as much—if not more. Weirdly, that made it easier. If we both sucked, at least we weren’t alone.
At some point we decided we actually wanted to do something about it instead of just joking about it. While looking around, we found these Discord channels where people hop in and practice speaking with each other. The idea was great—real people, real conversations.
But they lacked flexibility and personalization. You couldn’t really practice for your specific situation or get feedback tailored to what you needed.
That's when we got excited about building something with multiple "rooms" - like having different practice spaces for different real-life scenarios.
What it does
BreakThrough is basically like having a really patient friend who happens to be a speech coach, available 24/7, and never gets tired of listening to you stumble through presentations.
Here's what makes it fun:
- It listens to you talk and gives you real-time feedback (but in a nice way, not like that one professor we all had)
- Uses AI to coach you, but the AI actually has personality and doesn't sound like a robot
- You can practice with friends or fly solo - whatever floats your boat
- It transcribes everything so you can see exactly where you said "um" 47 times
We built 7 different "rooms" because honestly, practicing for a job interview while pretending you're giving a TED talk just doesn't work:
- JAM Session - For when you need to think on your feet (like when someone asks "tell us about yourself" and your brain goes blank)
- Debate Arena - Argue with AI and friends without ruining friendships
- Group Discussion - Practice not being that person who either never talks or never stops talking
- Reading Practice - Because sometimes you just need to read stuff out loud without sounding like a robot
- Interview Room - Mock interviews that won't judge you for wearing pajama pants (we can't see you anyway)
- Business Talks - Learn to sound professional without using "synergy" every other sentence
- Social Confidence - Practice small talk so you never have to discuss the weather again (unless you want to)
Each room gives you feedback that's actually useful, not just "speak louder" or "use more gestures."
How we built it
Okay, so building this thing was like assembling IKEA furniture, but the instructions were in 5 different programming languages and we kept losing the screws.
We went with the "let's use all the cool tech" approach:
- Frontend: React and TypeScript because we like our code to yell at us when we mess up
- Backend: Python and FastAPI because life's too short for slow APIs
- Database: MongoDB because sometimes you just need to throw data at something and hope it sticks
- AI: Google Gemini for the smart coaching bits (it's surprisingly good at not being condescending)
- Audio stuff: A bunch of different tools that somehow work together - Whisper for turning speech into text, and some other fancy algorithms that analyze how you sound
The whole thing started as a simple "what if we could practice speaking to an AI?" idea. Then we got carried away and added multiplayer features because everything's better with friends, right?
Building the real-time audio stuff was... interesting. Turns out browsers really don't like it when you try to stream audio while also analyzing it while also transcribing it. Who knew? We spent way too many late nights figuring out why Chrome would work perfectly but Safari would just give us the digital equivalent of a shrug.
The UI went through about 47 different versions. We finally landed on something that doesn't make you scroll forever on your phone, which honestly felt like a bigger win than getting the AI to work.
Challenges we ran into
Oh boy, where do I start?
The Great Audio Latency Battle of 2024: Turns out, when you're trying to give someone real-time feedback on their speech, "real-time" actually matters. Who would've thought? We spent weeks trying to figure out why there was a 3-second delay between someone talking and the app responding. Spoiler alert: it was everything. The audio processing, the AI thinking time, the network, probably the phase of the moon too.
Browser Wars: Each browser decided to handle audio differently, just to keep things spicy. Chrome was like "sure, here's your audio data!" while Safari was more like "audio? never heard of it." Firefox was somewhere in the middle, being Firefox.
The Mobile Nightmare: Making this work on phones was... an adventure. Turns out people want to practice speaking on their phones (crazy, right?), but mobile browsers have their own special relationship with audio permissions. We probably asked users to allow microphone access about 73 different ways before finding one that worked consistently.
AI Personality Crisis: Getting the AI to give helpful feedback without sounding like a robot or, worse, like that overly enthusiastic teacher everyone had was trickier than expected. We went through phases where it was either too harsh ("your speech patterns indicate severe deficiencies") or too nice ("wow, you said words! amazing!").
The Resume Upload Saga: For the interview room, we wanted people to upload their resumes. Simple, right? Wrong. PDFs are apparently the digital equivalent of a box of chocolates - you never know what you're gonna get. Some PDFs are text, some are images, some are apparently just chaos in digital form.
Multiple People Talking At Once: Turns out when you let multiple people in the same room, they sometimes talk at the same time. Revolutionary discovery, I know. Managing who's speaking when, without cutting people off mid-sentence, required some creative problem-solving and a lot of coffee.
The Great GCP Deployment Disaster: Oh man, this deserves its own horror story. We thought "hey, let's deploy this to Google Cloud Platform, how hard could it be?" Famous last words.
First, we had to figure out how to containerize everything properly. Docker works great on your laptop, but GCP has opinions about how things should be structured. We spent days wrestling with Cloud Run, trying to get our WebSocket connections to stay alive. Turns out GCP really doesn't like long-running connections and kept killing them after 15 minutes. Nothing like having your practice session cut off mid-sentence because the cloud decided to take a nap.
Then there was the SSL certificate nightmare. We had this domain from a friend (shoutout to them for letting us use it!), but getting HTTPS working properly was like solving a puzzle where half the pieces were missing. GCP wanted us to verify domain ownership, but the domain wasn't technically ours, so we had to do this weird dance with DNS records and prayer.
The audio streaming over HTTPS was another adventure. Browsers are super picky about audio permissions over secure connections, and what worked perfectly on localhost suddenly broke when we added SSL. We discovered that Chrome treats audio differently on secure vs insecure connections, and Safari... well, Safari just does whatever Safari wants.
Database Connection Drama: MongoDB Atlas and GCP had their own little feud going on. Connection timeouts, IP whitelisting issues, and the classic "it works in development but not in production" syndrome. We probably restarted our containers about 200 times trying to figure out why the database would randomly disconnect.
The Redis Cache Catastrophe: Redis on GCP Memory Store seemed simple enough, until we realized our WebSocket sessions weren't persisting properly across container restarts. Users would join a room, we'd deploy an update, and suddenly everyone would get kicked out. Not exactly the smooth experience we were going for.
Environment Variable Hell: Managing secrets across development, staging, and production environments while keeping everything secure was like juggling flaming torches. GCP Secret Manager is great in theory, but getting our containers to actually read the secrets without exposing them in logs took some creative configuration.
Accomplishments that we're proud of
We actually finished it! Seriously, this was not a given. There were moments when we thought we'd be stuck in audio processing hell forever.
The UI doesn't suck: After many, many iterations, we created something that actually looks good and works on your phone without making you want to throw it across the room. No endless scrolling, no tiny buttons you can't tap, no "this looks great on my 4K monitor but terrible everywhere else" situations.
7 working practice modes: Each one feels different and has its own personality. The interview AI is professional but encouraging, the debate AI is a bit feisty (in a good way), and the social confidence AI is like that supportive friend who believes in you even when you don't.
Real-time everything: When you talk, stuff happens immediately. The transcription appears as you speak, the feedback comes right away, and other people in the room can see what's happening without weird delays.
It actually helps people: This sounds obvious, but we've had people tell us they felt more confident after using it, which is honestly the best feeling ever. One person said they stopped saying "um" every third word, and another said they actually enjoyed their next job interview. That's the good stuff right there.
The AI doesn't sound like a robot: We spent a lot of time making sure the coaching feels human and helpful, not like you're talking to a customer service chatbot.
Resume analysis that works: The interview room can actually read your resume and ask relevant questions about it. No more generic "tell me about a time when..." questions that have nothing to do with your background.
What we learned
Audio is hard: Like, really hard. There's a reason most apps just do text. Audio has latency, quality issues, browser compatibility problems, and a million other things that can go wrong. But when it works, it's magic.
People want to practice in private first: We thought everyone would jump into the multiplayer rooms, but it turns out most people want to practice alone before they're ready to speak with others. Makes total sense when you think about it.
Immediate feedback beats perfect feedback: We could spend forever analyzing every aspect of someone's speech, but people just want to know "how am I doing?" right now. Quick, helpful feedback wins over comprehensive analysis every time.
Mobile-first isn't just a buzzword: We learned this the hard way. Designing for desktop first and then trying to make it work on mobile is like trying to fit a square peg in a round hole while blindfolded.
AI personality matters A LOT: The difference between "your speech contains excessive filler words" and "hey, you said 'um' a bunch - totally normal, let's work on it!" is huge. People respond way better to coaching that feels human.
Different scenarios really do need different approaches: Practicing for a job interview while the AI acts like a debate opponent just doesn't work. Context is everything.
Users will find bugs you never thought possible: Someone will always find a way to break your app in a way you never imagined. Always.
Real-time is addictive: Once people experience immediate feedback, they don't want to go back to "record, upload, wait, get feedback" workflows. Instant gratification is real.
What's next for BreakThrough
Body language analysis: Because apparently we're gluttons for punishment and want to add computer vision to our audio processing nightmare. But seriously, being able to give feedback on posture and gestures would be pretty cool.
Better analytics: Right now we tell you how you did in each session, but we want to show you how you're improving over time. Progress charts, skill tracking, maybe some fun badges because who doesn't like badges?
Custom scenarios: Let people create their own practice situations. Want to practice asking for a raise? Giving a wedding toast? Explaining to your parents why you dropped out of law school to become a professional juggler? We got you.
Team features: Companies keep asking if they can use this for training their employees. Apparently not everyone is naturally gifted at the art of corporate communication. Who knew?
More languages: English is great, but there's a whole world out there. Spanish is probably next, then we'll see where demand takes us.
VR integration: Okay, this one's a bit out there, but imagine practicing your presentation in a virtual conference room, or doing a job interview in a realistic office setting. The future is weird and we're here for it.
Offline mode: For when you want to practice but your internet is being moody. Because nothing kills the motivation to practice like a spinning loading wheel.
The dream is to make this so good that people actually look forward to practicing their communication skills. Crazy idea, right? But we think we're onto something here.
And hey, if we can help even one person feel more confident in their next presentation, job interview, or just ordering coffee without mumbling, we'll call it a win.
Built With
- docker
- elevenlabs
- eslint
- fastapi
- frontend:-react
- librosa-infrastructure:-google-cloud-platform
- mongodb
- mongodb-atlas-tools:-vite
- openai-whisper
- redis-ai-&-audio:-google-gemini-api
- redux-toolkit
- socket.io-backend:-python
- typescript
- websockets
Log in or sign up for Devpost to join the conversation.