Inspiration

As content creators, we have constantly struggled with the time-consuming process of video editing. What should have taken minutes often took hours—scrubbing through footage, finding the best moments, cutting silence, and stitching clips together. We realized that with the power of multimodal AI, we could turn video editing from a manual, tedious task into a conversation. SmartClip.ai was born from a simple question: "What if you could edit a professional video without touching a single button?"

What it does

SmartClip.ai is an AI-powered video editor that works through natural language chat. Upload a video, and Gemini Flash analyzes the entire content to create a ranked list of the best moments based on virality potential, emotional impact, and visual quality. From there, you can:

  • Chat with your editor: Tell it to "add 2 seconds to this clip" or "trim the first five seconds"—it responds instantly
  • Auto-extract transcripts with speaker diarization and precise timestamps
  • Remove silence automatically, cutting out ums, pauses, and dead air
  • Create compilation videos by stitching together your top 5 clips with one click
  • Turn images into videos by uploading photos, slides, or graphics for marketing content
  • Search by keywords across full transcripts to find specific moments

The entire workflow—from upload to export-ready video—takes about 30 seconds.

How we built it

SmartClip.ai is a pure Google Cloud, and Gemini build using the brand-new Gemini 3.0 preview models:

Architecture:

  • Two-pass AI system:
    • Pass 1 (Gemini 3.0 Flash): Extracts transcript with speaker diarization and timestamps
    • Pass 2 (Gemini 3.0 Flash + Pro): Analyzes video content, visual elements, facial expressions, motion, and combines it with the transcript to identify viral moments

Tech Stack:

  • Frontend: React with real-time chat interface
  • Backend: Node.js with Express
  • AI Models: Gemini 3.0 Flash and Gemini 3.0 Pro (multimodal preview models)
  • Cloud Infrastructure: Google Cloud Run for deployment, Cloud Storage for video processing
  • Video Processing: Custom API with FFmpeg integration for clip extraction and compilation

Current limits: 5-minute videos, 25MB files

Challenges we ran into

  1. Gemini Flash model inconsistencies: The model sometimes returned viralityScore with thousands of decimal places or negative numbers. We implemented a robust sanitizeClip function to force all data into valid ranges before reaching the UI.
  2. GCS bucket permissions: Deployment failed with "PermissionDenied" errors for storage.objects.list access. Had to reconfigure service account permissions and bucket policies.
  3. Upload URL header proxying: The proxy wasn't forwarding the x-google-upload-url response header, requiring us to build a custom upload API.
  4. JSON parsing errors: Gemini Flash occasionally returned malformed JSON with extreme precision floats, requiring additional validation layers.
  5. File size constraints: Balancing video quality, processing speed, and API rate limits while using personal quotas.
  6. Real-time chat updates: Synchronizing AI responses with video timeline updates without lag.

Accomplishments that we're proud of

  • Built a production-ready app in hackathon timeframe with a polished, intuitive UI
  • Pioneered a two-pass multimodal AI system that combines transcript analysis with visual content understanding
  • Created a chat interface that makes video editing accessible to non-technical users
  • Implemented intelligent error handling to work around AI model inconsistencies
  • Designed for scale with Cloud Run architecture ready to handle enterprise workloads
  • Delivered real value: Our demo video was edited entirely using SmartClip.ai—no traditional editing software needed

What we learned

  • Multimodal AI is powerful but unpredictable: Working with Gemini 3.0 preview models taught us to build defensive validation layers and gracefully handle edge cases.
  • User experience matters more than features: A chat interface makes complex editing simple—accessibility drives adoption.
  • Google Cloud scales beautifully: Cloud Run's auto-scaling and Cloud Storage's performance made deployment seamless (after solving permissions).
  • Prompt engineering is an art: Small changes in how we structured AI prompts dramatically affected output quality.
  • Real-time AI requires architecture trade-offs: Balancing response speed, model accuracy, and cost required careful model selection (Flash vs Pro).

What's next for SmartClip.ai

  • SaaS launch for content creators and small businesses: Subscription-based model with tiered pricing
  • Extended limits: 30-minute videos, 500MB files with production API quotas
  • Advanced features:
    • Multi-language subtitle generation
    • Auto-captioning with customizable styles
    • B-roll insertion suggestions
    • Background music matching
    • Social media format optimization (TikTok, Instagram Reels, YouTube Shorts)
  • Team collaboration: Shared workspaces and approval workflows
  • API for developers: Let other apps integrate AI-powered video editing
  • Browser extension: Edit videos directly from YouTube, Vimeo, or any video platform

Built With

Share this project:

Updates