I listen to a lot of livestreams and podcasts and kept noticing how hard it is to spot toxic or hateful remarks before they reach the audience. I’d sketched the idea of an “audio content filter” months ago but never had a clear path to a prototype. The Bolt.new hackathon finally gave me both the deadline and the tooling to turn the sketch into something real.

What Voice Guardian Does Lets anyone drop an audio file—or record right in the browser—and seconds later see a full transcript with every problematic word highlighted. Runs entirely on the user’s device, so no uploads, accounts, or API keys. If advanced AI models can’t load, it silently switches to browser-native speech-to-text and a keyword filter, so the pipeline never breaks. Shows live stats and a flag timeline so creators can scrub, review, and fix trouble spots fast.

How We Built It Bolt.new scaffolded a Vite + React + Tailwind project in minutes. Integrated WaveSurfer.js for the interactive waveform and playback. Added Whisper-tiny via @xenova/transformers for in-browser STT, with a Web Speech API fallback. Wired a Hugging Face toxicity model for moderation, backed up by a 40-word keyword list when the model can’t load. Used Zustand for state and localStorage for settings persistence. Finished with responsive tweaks, a collapsible sidebar, and clear error banners so the demo runs on any screen.

Challenges Model loading in the sandbox. Bolt’s dev container blocked large model downloads, so transcription and moderation failed until we built the dual-fallback system. Keeping the UI responsive while Whisper loaded—lazy imports and skeleton loaders saved the day. Timeboxing: fitting a real audio pipeline—and the polish judges expect—into a single weekend sprint.

Accomplishments We’re Proud Of A zero-server, zero-cost audio moderation tool that never leaves users stuck. Clean, mobile-first UI that anyone can try in under a minute. Robust error handling and user messaging that makes the tech feel trustworthy, not brittle.

What We Learned Small, thoughtful fallbacks beat perfect-but-fragile AI every time. Bolt.new is great for front-end velocity, but heavy models still need special care. Clear UX copy—“AI transcription active” vs. “Browser transcription active”—prevents confusion and builds user trust.

What’s Next True real-time WebRTC moderation so streamers can auto-mute trolls on the fly. Non-speech sound alerts with YAMNet (barks, screams, door slams). Webhooks and Slack pings so human mods can jump in when the filter trips. A plug-and-play SDK for OBS and classroom platforms.

Voice Guardian’s hackathon build proves the core concept; the roadmap turns it into a full-scale safety layer for live audio everywhere.

Built With

  • bolt.new
  • css
  • html
  • javascript
  • localstorage
  • mediarecorder-api
  • netlify
  • react-18
  • tailwind-css
  • vite
  • wavesurfer.js-(waveform)
  • web-audio-api
  • web-speech-api-(fallback-stt)
  • xenova/transformers
  • ypescript
  • zustand-(state)
Share this project:

Updates

posted an update

This are the updates that I made to the app.

Server Stack & Endpoint I built a Node.js backend using Express to expose a simple HTTP API:

Framework & Language • Node.js (v22.16.0) with TypeScript, for strong typing and developer ergonomics. • Express to define routes and middleware. Voice Guardian - Voice …

Core Transcription Endpoint • POST /api/transcribe accepts a single audio file upload (handled via multer). • Supports a ?model= query parameter so you can choose any Whisper variant (tiny, base, small, medium, large). • A health-check endpoint at GET /health and a GET /api/models list available models. Voice Guardian - Voice …

Configuration & Reliability • Environment-based config (via a .env) for secrets like HF_API_TOKEN. • CORS enabled for frontend integration. • File-size limits (100 MB), MIME-type validation, and automatic cleanup of temp files. • Comprehensive error handling with proper HTTP status codes. Voice Guardian - Voice …

Deployment on Render I deployed the server to Render.com with a standard build pipeline: Clone & Checkout • Repository voice-guardian-server, checkout commit 23b361e… on main. Build • Run npm run build → tsc compiles to dist/. Upload & Start • Build artifacts uploaded (~3.7 s, compression ~1.3 s). • Service launched on port 3001, with automatic port-binding support.

AI Model for Transcription Had big problems with embedding Whisper in the browser; so I proxy requests to Hugging Face’s managed Whisper models:

Model Variants Supported • openai/whisper-tiny.en, openai/whisper-base.en, …, up to openai/whisper-large-v3.

Default & Logging • By default, the server logs display the message "Using model: openai/whisper-large-v3" when it processes audio.

Why Server-Side?

No heavy client-side downloads: Browsers struggle with large model weights, especially in sandboxed dev containers. Hugging Face supports a wide range of audio codecs and formats by default.

Reliability & Fallbacks: In case of model-loading failures, we can implement retries or alternate endpoints without shipping new client code.

Why This Architecture? Performance & Compatibility: Offloading inference to a server ensures smooth UX on all devices and browsers.

Maintainability: Centralizing transcription logic means updates to model versions or API changes live in one place.

Scalability: Render’s autoscaling can handle spikes in transcription requests without overloading user devices.

With this server in place, the Voice Guardian frontend can simply POST audio blobs to /api/transcribe, receive clean text back, and focus on moderation and UI—while the heavy lifting stays safely in the cloud.

Log in or sign up for Devpost to join the conversation.