💡 Inspiration
Privacy as a Utility: I believe that digital privacy is a fundamental human right. It should be democratized and accessible to everyone, not just large corporations.Zero-Installation Web Access: I wanted to build a powerful tool that anyone can use instantly in their browser without complex local installations or heavy app downloads.The Audio Privacy Gap: While text redaction is common, our voices carry massive amounts of unprotected Personally Identifiable Information (PII). I wanted to answer a simple question: What if you could upload a recording or speak live, and automatically have names, phone numbers, and addresses instantly censored before they ever touch a server?
⚙️ What it does
Real-Time Audio Masking: Listens to live microphone input or processes uploaded audio files to automatically detect sensitive spoken data.Instant PII Redaction: Identifies and bleeps or silences critical personal identifiers like names, phone numbers, email addresses, and physical locations.100% Local Processing: Runs entirely in the client's browser, meaning your audio never leaves your device, guaranteeing total user privacy.
🛠️ How we built it
The Magic of MeDo: Instead of forcing me to write complex custom algorithms from scratch or stress over deep learning math, MeDo handled the hardest heavy lifting. It allowed me to orchestrate advanced operations smoothly without overthinking the underlying architecture.IBM Granite with WebGPU: Leveraged IBM's highly efficient Granite models accelerated directly on client hardware via native WebGPU in the browser for ultra-fast, low-latency processing.OpenAI Privacy Filter ONNX Model: Integrated OpenAI's filter ported to the ONNX runtime to handle fast, lightweight token classification and secure PII boundary detection in a web environment.Web Audio API: Used to capture live microphone input, chunk audio, and handle real-time playback manipulation for the censored output.
🛑 Challenges we ran into
In-Browser AI Compute Latency: Running heavy language models on standard browser threads often causes massive UI lag and audio stuttering.Cross-Browser WebGPU Support: Navigating inconsistent browser flags and WebGPU API variations across different operating systems.Synchronizing Audio and Text Redaction: Lining up the exact millisecond a PII word is spoken with the classified text tokens to execute a seamless real-time "bleep."
🏆 Accomplishments that we're proud of No Complex Algorithms Needed: Thanks to leveraging powerful tools like MeDo, I did not have to write custom complex data structures or waste days overthinking the logic flows. MeDo just worked.True Zero-Server Architecture: We successfully created a highly advanced AI security tool that requires exactly $0 in backend server maintenance and offers zero risk of database breaches.Slick On-Device Speed: Achieving near-instantaneous live transcription and redaction through WebGPU client acceleration.
📚 What we learned
Work Smarter, Not Harder: I learned that you don't need to reinvent the wheel. Relying on ecosystems like MeDo stops you from burning out on complex algorithm design, letting you focus on the bigger picture.The Power of ONNX in the Web: We learned how incredibly flexible ONNX is for loading, scaling, and sharing machine learning models across edge runtimes.WebGPU's Sudden Maturity: Realized how much compute power modern browsers can actually tap into when utilizing direct GPU access.
What's next for 隐声 InSound AI: Real-Time Audio PII Redaction
Deeper MeDo Integration: I plan to rely on MeDo even more heavily in the future to keep my codebase clean, efficient, and free of over-engineered logic.Dynamic Sound Substitution: Instead of jarring bleeps, we want to replace PII with generated synthetic white noise or generic fill-words to maintain conversational flow.Offline PWA Support: Turning the web app into an installable Progressive Web App (PWA) so users can securely scrub audio files without even being connected to the internet.
Built With
- ibmgranite
- javascript
- medo
- onnx
- openaiprivacy
Log in or sign up for Devpost to join the conversation.