Inspiration
Scrolling Reddit threads full of creators complaining about “ums” and “ahs” sparked an idea. Market research showed a genuine pain point, and curiosity about just how far AI audio processing has come pushed me to prototype.
What it does
ClipClean ingests any audio (or video) file, pinpoints filler words like “um”, “uh”, “ah”—and instantly regenerates the affected fragments so the speaker sounds natural and confident.
How we built it
100 % vibe-coding in Bolt.new.
Whisper API → word-level transcript GPT-4o reasoning over those timestamps to decide which tokens are genuine fillers. ElevenLabs voice clone & TTS to re-synthesise. Supabase Edge functions + Postgres RLS for secure jobs, credits and storage. Netlify CDN serves the final bundle at clipclean.eu.
Challenges we ran into
I was unfamiliar with the platform and accidentally deleted a "chat," not realizing it would remove an entire project. Without a backup in Git, I had to start over from scratch. Although the initial version was more beautiful, I'm happy with the final result.
Accomplishments that we're proud of
UI is sick. While it's not "liquid glass", it has a good vibe to it. The first time I vibe-coded an entire micro-SaaS in two days!
What we learned
AI can design, when prompted correctly. Not only that, you can build micro-SaaS apps in days now, without ever touching code.
What's next for ClipClean
Try to put it in front of the users. There are competitors, Reddit threads and general interest for those kind of tools, so there is definitely a market. Now we just need to see how to get some users and collect feedback, then we'll take it from there.
Built With
- bolt.new
- elevenlabs
- netlify
- openai
- stripe
- supabase
Log in or sign up for Devpost to join the conversation.