What's next for LipSync Studio
Inspiration
Video content dominates the internet — 82% of all traffic — yet producing even a simple promotional music video costs $500–$5,000 and takes weeks. Independent artists and small labels often skip video entirely because the cost is prohibitive, even though videos drive 3–5x more streams than audio-only releases. We wanted to make music video creation as easy as taking a selfie.
What it does
LipSync Studio generates AI-powered lip-synced music videos from just two inputs: a selfie photo and an audio file. Upload a portrait, pick your song, and in under 3 minutes you get a 1080x1920 video with natural lip sync, expressive facial movements, and subtle head motion — ready to share on TikTok, Instagram Reels, or YouTube Shorts. No filming, no editing, no VFX expertise required.
How we built it
- Frontend: React Native Web via Expo SDK 54, giving us a single codebase that works on iOS Safari, Android Chrome, and desktop browsers
- Backend: Three Vercel serverless functions that proxy all calls to the LTX Video API — keeping the API key server-side only and solving browser CORS restrictions
- Video Generation: LTX
audio-to-videoendpoint with theltx-2-3-promodel, which takes a face image and audio track and produces synchronized lip movement - Audio Handling: Client-side codec conversion using the Web Audio API and MediaRecorder — automatically converts unsupported formats (WAV/PCM) to Opus before upload
- Deployment: Static export to Vercel with
npm run deployinjecting the git commit hash into the footer for version tracking
Challenges we ran into
- CORS everywhere: The LTX API doesn't set CORS headers, and neither does the Google Cloud Storage pre-signed upload URL it returns. We had to proxy every external call through serverless functions — first with a local Express proxy, then properly via Vercel functions
- Audio codec rejection: Our first test upload was a WAV file. LTX returned "codec pcm_s16le not supported." We built a real-time browser-based audio transcoding pipeline using OfflineAudioContext and MediaRecorder to convert to Opus on the fly
- Safari file picker: iOS Safari grayed out .m4a files when using
type: "audio/*". We had to add explicit file extensions (.m4a,.mp3, etc.) alongside MIME types for the document picker to recognize them - Expo SDK 54 breaking changes:
expo-file-systemcompletely changed its API from the legacyreadAsStringAsync/documentDirectoryto a newFileclass witharrayBuffer()andwritableStream(). No Stack Overflow answers existed yet — we read the type definitions directly
Accomplishments that we're proud of
- End-to-end in one session: From
create-expo-appto a deployed, working production app on Vercel — including API integration, CORS solutions, codec conversion, and cross-platform compatibility - Zero API key exposure: The LTX key never touches the client bundle. All sensitive calls go through serverless functions
- It actually works on a phone: You can open the Vercel URL on an iPhone, pick a selfie from your camera roll, choose a song, and get a lip-synced video back — no app install required
What we learned
- Video AI APIs are surprisingly accessible — the hard part isn't the AI, it's the plumbing (CORS, codecs, file formats, upload flows)
- Browser APIs are more powerful than we expected — converting audio codecs entirely client-side with Web Audio API and MediaRecorder was eye-opening
- Expo's new file system API is clean but undocumented — reading
.d.tsfiles is sometimes the only way forward with bleeding-edge SDKs
What's next for LipSync Studio
- Style control: Let users choose video styles — cinematic, anime, vintage — via prompt customization
- Longer videos: Chain multiple LTX
extendAPI calls to support full-length songs beyond the current 20-second limit - Batch generation: Upload one photo and generate videos for an entire album, with different moods and camera motions per track
- TwelveLabs integration: Use Marengo to analyze generated videos for quality scoring and semantic consistency, automatically re-generating segments that don't match the audio energy
- Direct social publishing: One-tap posting to TikTok, Instagram, and YouTube directly from the results screen
Built With
- expo.io
- typescript
- vercel
Log in or sign up for Devpost to join the conversation.