Inspiration
Building in public is often stalled by a single bottleneck: the friction between filming and publishing. Stitch removes this barrier by making the editing process as seamless as possible.
Every creator knows the struggle of recording a great take only to spend hours manually scrubbing through shaky setups and filler words. We built an editor that learns your patterns and automates the tedious tasks, allowing you to focus entirely on storytelling.
What it does
Stitch is a web-based video editor enhanced by AI-driven automation:
- Overshoot Analysis: Detects quality signals within footage to assist with thumbnail and clip extraction.
- Learned Intro Trim: Our "Wood Wide" ML model analyzes your editing history. If you consistently trim the first few seconds of a clip, it predicts and applies that adjustment automatically.
- Smart Captions: Powered by ElevenLabs STT, this feature automatically filters out filler words like "um," "uh," and "like."
- Natural Language Copilot (WIP): Allows you to describe edits, such as "remove all pauses," and generates an automated edit plan.
- Multi-clip Timeline: Features intuitive drag-and-drop reordering with trim, crop, and split tools.
How we built it
- Frontend: Next.js 16, React 19, Tailwind CSS, and Base UI components.
- ML Pipeline: A Wood Wide regression model that predicts
intro_trim_secondsbased on early-clip features (shaky_ratio, avg_confidence, and num_flips). - Captions: ElevenLabs STT integrated with segment normalization and filler filtering to produce WebVTT outputs.
- Tool Adapter Pattern: Capability-based routing that allows the AI assistant to call specific tools without hardcoded UI dependencies.
Challenges
- Humanizing ML: Users notice hardcoded behavior immediately. We designed Wood Wide to provide variable predictions (such as 4.6s or 5.3s) rather than static increments to ensure the automation feels natural.
- Caption Precision: Filler word removal requires a conservative approach. We had to ensure the model distinguishes between a filler "like" and a functional "like" to maintain transcript integrity.
- State Management: Coordinating a multi-clip timeline with drag-and-drop operations, splitting, and per-clip cropping required a highly robust state architecture.
What we learned
- Utility over Complexity: A simple regression model using only three features can save creators more time than a complex, over-engineered system.
- Architecture Matters: The "adapter pattern" for AI tools is a vital investment. It ensures that when the UI changes, you only need to update a single mapping instead of every assistant response.
- Transparency Builds Trust: Showing the raw inference data and dataset IDs during a demo makes the AI's "magic" feel credible and reliable.
Built With
- base-ui
- class-variance-authority
- elevenlabs-api
- eslint
- expo.io
- express.js
- fal-ai
- ffmpeg
- javascript
- lucide-react
- nano-banana
- next.js
- overshoot-sdk
- postcss
- react
- react-native
- tailwind-css
- typescript
- veo-bridge
- vercel
- wood-wide-api

Log in or sign up for Devpost to join the conversation.