SmartClip.ai

Landing page
Auto finding clips
English Subtitles
Trim 5s
Image to Video
High level Architecture flow

Inspiration

As content creators, we have constantly struggled with the time-consuming process of video editing. What should have taken minutes often took hours—scrubbing through footage, finding the best moments, cutting silence, and stitching clips together. We realized that with the power of multimodal AI, we could turn video editing from a manual, tedious task into a conversation. SmartClip.ai was born from a simple question: "What if you could edit a professional video without touching a single button?"

What it does

SmartClip.ai is an AI-powered video editor that works through natural language chat. Upload a video, and Gemini Flash analyzes the entire content to create a ranked list of the best moments based on virality potential, emotional impact, and visual quality. From there, you can:

Chat with your editor: Tell it to "add 2 seconds to this clip" or "trim the first five seconds"—it responds instantly
Auto-extract transcripts with speaker diarization and precise timestamps
Remove silence automatically, cutting out ums, pauses, and dead air
Create compilation videos by stitching together your top 5 clips with one click
Turn images into videos by uploading photos, slides, or graphics for marketing content
Search by keywords across full transcripts to find specific moments

The entire workflow—from upload to export-ready video—takes about 30 seconds.

How we built it

SmartClip.ai is a pure Google Cloud, and Gemini build using the brand-new Gemini 3.0 preview models:

Architecture:

Two-pass AI system:
- Pass 1 (Gemini 3.0 Flash): Extracts transcript with speaker diarization and timestamps
- Pass 2 (Gemini 3.0 Flash + Pro): Analyzes video content, visual elements, facial expressions, motion, and combines it with the transcript to identify viral moments

Tech Stack:

Frontend: React with real-time chat interface
Backend: Node.js with Express
AI Models: Gemini 3.0 Flash and Gemini 3.0 Pro (multimodal preview models)
Cloud Infrastructure: Google Cloud Run for deployment, Cloud Storage for video processing
Video Processing: Custom API with FFmpeg integration for clip extraction and compilation

Current limits: 5-minute videos, 25MB files

Challenges we ran into

Gemini Flash model inconsistencies: The model sometimes returned viralityScore with thousands of decimal places or negative numbers. We implemented a robust sanitizeClip function to force all data into valid ranges before reaching the UI.
GCS bucket permissions: Deployment failed with "PermissionDenied" errors for storage.objects.list access. Had to reconfigure service account permissions and bucket policies.
Upload URL header proxying: The proxy wasn't forwarding the x-google-upload-url response header, requiring us to build a custom upload API.
JSON parsing errors: Gemini Flash occasionally returned malformed JSON with extreme precision floats, requiring additional validation layers.
File size constraints: Balancing video quality, processing speed, and API rate limits while using personal quotas.
Real-time chat updates: Synchronizing AI responses with video timeline updates without lag.

Accomplishments that we're proud of

Built a production-ready app in hackathon timeframe with a polished, intuitive UI
Pioneered a two-pass multimodal AI system that combines transcript analysis with visual content understanding
Created a chat interface that makes video editing accessible to non-technical users
Implemented intelligent error handling to work around AI model inconsistencies
Designed for scale with Cloud Run architecture ready to handle enterprise workloads
Delivered real value: Our demo video was edited entirely using SmartClip.ai—no traditional editing software needed

What we learned

Multimodal AI is powerful but unpredictable: Working with Gemini 3.0 preview models taught us to build defensive validation layers and gracefully handle edge cases.
User experience matters more than features: A chat interface makes complex editing simple—accessibility drives adoption.
Google Cloud scales beautifully: Cloud Run's auto-scaling and Cloud Storage's performance made deployment seamless (after solving permissions).
Prompt engineering is an art: Small changes in how we structured AI prompts dramatically affected output quality.
Real-time AI requires architecture trade-offs: Balancing response speed, model accuracy, and cost required careful model selection (Flash vs Pro).

What's next for SmartClip.ai

SaaS launch for content creators and small businesses: Subscription-based model with tiered pricing
Extended limits: 30-minute videos, 500MB files with production API quotas
Advanced features:
- Multi-language subtitle generation
- Auto-captioning with customizable styles
- B-roll insertion suggestions
- Background music matching
- Social media format optimization (TikTok, Instagram Reels, YouTube Shorts)
Team collaboration: Shared workspaces and approval workflows
API for developers: Let other apps integrate AI-powered video editing
Browser extension: Edit videos directly from YouTube, Vimeo, or any video platform

Built With

cloudrun
ffmpeg
gcp
gemini
node.js
react

Updates

Angusundaresh Krishnakumar started this project — Feb 08, 2026 01:37 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.