posted an update

This are the updates that I made to the app.

Server Stack & Endpoint I built a Node.js backend using Express to expose a simple HTTP API:

Framework & Language • Node.js (v22.16.0) with TypeScript, for strong typing and developer ergonomics. • Express to define routes and middleware. Voice Guardian - Voice …

Core Transcription Endpoint • POST /api/transcribe accepts a single audio file upload (handled via multer). • Supports a ?model= query parameter so you can choose any Whisper variant (tiny, base, small, medium, large). • A health-check endpoint at GET /health and a GET /api/models list available models. Voice Guardian - Voice …

Configuration & Reliability • Environment-based config (via a .env) for secrets like HF_API_TOKEN. • CORS enabled for frontend integration. • File-size limits (100 MB), MIME-type validation, and automatic cleanup of temp files. • Comprehensive error handling with proper HTTP status codes. Voice Guardian - Voice …

Deployment on Render I deployed the server to Render.com with a standard build pipeline: Clone & Checkout • Repository voice-guardian-server, checkout commit 23b361e… on main. Build • Run npm run build → tsc compiles to dist/. Upload & Start • Build artifacts uploaded (~3.7 s, compression ~1.3 s). • Service launched on port 3001, with automatic port-binding support.

AI Model for Transcription Had big problems with embedding Whisper in the browser; so I proxy requests to Hugging Face’s managed Whisper models:

Model Variants Supported • openai/whisper-tiny.en, openai/whisper-base.en, …, up to openai/whisper-large-v3.

Default & Logging • By default, the server logs display the message "Using model: openai/whisper-large-v3" when it processes audio.

Why Server-Side?

No heavy client-side downloads: Browsers struggle with large model weights, especially in sandboxed dev containers. Hugging Face supports a wide range of audio codecs and formats by default.

Reliability & Fallbacks: In case of model-loading failures, we can implement retries or alternate endpoints without shipping new client code.

Why This Architecture? Performance & Compatibility: Offloading inference to a server ensures smooth UX on all devices and browsers.

Maintainability: Centralizing transcription logic means updates to model versions or API changes live in one place.

Scalability: Render’s autoscaling can handle spikes in transcription requests without overloading user devices.

With this server in place, the Voice Guardian frontend can simply POST audio blobs to /api/transcribe, receive clean text back, and focus on moderation and UI—while the heavy lifting stays safely in the cloud.

Log in or sign up for Devpost to join the conversation.