Stitch

stitch logo
home page for editor
timeline view
agent view 1
agent view 2

Inspiration

Building in public is often stalled by a single bottleneck: the friction between filming and publishing. Stitch removes this barrier by making the editing process as seamless as possible.

Every creator knows the struggle of recording a great take only to spend hours manually scrubbing through shaky setups and filler words. We built an editor that learns your patterns and automates the tedious tasks, allowing you to focus entirely on storytelling.

What it does

Stitch is a web-based video editor enhanced by AI-driven automation:

Overshoot Analysis: Detects quality signals within footage to assist with thumbnail and clip extraction.
Learned Intro Trim: Our "Wood Wide" ML model analyzes your editing history. If you consistently trim the first few seconds of a clip, it predicts and applies that adjustment automatically.
Smart Captions: Powered by ElevenLabs STT, this feature automatically filters out filler words like "um," "uh," and "like."
Natural Language Copilot (WIP): Allows you to describe edits, such as "remove all pauses," and generates an automated edit plan.
Multi-clip Timeline: Features intuitive drag-and-drop reordering with trim, crop, and split tools.

How we built it

Frontend: Next.js 16, React 19, Tailwind CSS, and Base UI components.
ML Pipeline: A Wood Wide regression model that predicts intro_trim_seconds based on early-clip features (shaky_ratio, avg_confidence, and num_flips).
Captions: ElevenLabs STT integrated with segment normalization and filler filtering to produce WebVTT outputs.
Tool Adapter Pattern: Capability-based routing that allows the AI assistant to call specific tools without hardcoded UI dependencies.

Challenges

Humanizing ML: Users notice hardcoded behavior immediately. We designed Wood Wide to provide variable predictions (such as 4.6s or 5.3s) rather than static increments to ensure the automation feels natural.
Caption Precision: Filler word removal requires a conservative approach. We had to ensure the model distinguishes between a filler "like" and a functional "like" to maintain transcript integrity.
State Management: Coordinating a multi-clip timeline with drag-and-drop operations, splitting, and per-clip cropping required a highly robust state architecture.

What we learned

Utility over Complexity: A simple regression model using only three features can save creators more time than a complex, over-engineered system.
Architecture Matters: The "adapter pattern" for AI tools is a vital investment. It ensures that when the UI changes, you only need to update a single mapping instead of every assistant response.
Transparency Builds Trust: Showing the raw inference data and dataset IDs during a demo makes the AI's "magic" feel credible and reliable.

Built With

base-ui
class-variance-authority
elevenlabs-api
eslint
expo.io
express.js
fal-ai
ffmpeg
javascript
lucide-react
nano-banana
next.js
overshoot-sdk
postcss
react
react-native
tailwind-css
typescript
veo-bridge
vercel
wood-wide-api

Updates

Jaden Park started this project — Jan 17, 2026 03:22 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.