Inspiration
Manual dubbing and publishing workflows are slow, error-prone, and hard to audit.
We wanted a practical tool that automates multilingual video localization while keeping strict publishing controls.
What it does
VoxShift is a Gemini-first dubbing pipeline that:
- Transcribes and translates media
- Generates dubbed audio with TTS
- Produces output media, subtitles, and segment JSON
- Runs YouTube intake risk checks when a source URL is provided
- Uploads to a specific YouTube channel with metadata, dry-run validation, and audit manifests
How we built it
- Node.js + TypeScript CLI architecture
- Gemini API for transcription/translation and TTS
- ffmpeg/ffprobe for media processing and muxing
- YouTube Data API for intake metadata checks
- YouTube OAuth upload flow with channel-ID enforcement
- CI-style checks with typecheck, build, and smoke tests
Challenges we ran into
- OAuth scope mismatches (
youtube.uploadvs channel verification needs) - Handling structured model output reliably across edge cases
- Keeping upload automation flexible without weakening safety
- Managing API auth differences between Gemini and YouTube APIs
- Designing duplicate protection and idempotent run behavior
Accomplishments that we're proud of
- End-to-end dubbing pipeline with production-style outputs
youtube:runsupports both pipeline mode and upload-only mode- Optional
--source-urlwith policy-based intake checks - Strong safety controls: target channel enforcement, dry-run upload, manifest trail
- Real speech fixture + automated smoke paths including model-variant checks
What we learned
- Automation needs guardrails as much as speed
- Strong schemas and validation save time in LLM-driven pipelines
- Channel-level publishing checks are essential for real operations
- Dry-run + manifest logging dramatically improves trust and debugging
What's next for VoxShift
- Add rights-aware source ingestion workflow (with explicit policy gates)
- Improve dubbing quality (speaker consistency, pacing, prosody control)
- Add batch job orchestration and queue-based processing
- Build a lightweight UI on top of the CLI engine
- Expand monitoring, retry logic, and publish-state observability
Built With
- gemini
- genai
- typescript
Log in or sign up for Devpost to join the conversation.