Inspiration

Raw video footage is chaotic and hard to manage. Traditional editing workflows are slow, fragmented, and don’t scale well. We wanted a system that understands your footage before you edit it, turning hours of clips into a semantic, searchable, and programmatically editable repository. The goal was to bring AI-powered intelligence to video workflows, enabling creators to focus on storytelling, not manual asset organization.

What it does

Trem-AI is a cognitive video engine that:

  • Automatically transcribes audio using Whisper (word-level accuracy, speaker diarization).

  • Analyzes video visually with Gemini 1.5 & 3.0, detecting scenes, objects, and actions.

  • Generates a semantic repository of all assets with chapters, tags, and summaries.

  • Enables programmatic video editing using Remotion, letting you treat video like code.

  • Streams AI reasoning in real-time, providing live “Thinking…” logs.

  • Handles heavy media processing entirely in-browser, leveraging Service Workers and IndexedDB for resilience and parallelization.

How we built it

Frontend: React 19 + Vite + TypeScript, TailwindCSS for modern, fluid UI.

Background Processing: Service Workers + Workbox handle ingestion, AI calls, and database operations off the main thread.

AI Stack:

Whisper (via Replicate) for transcription.

Gemini 1.5 Flash for per-asset visual analysis.

Gemini 3.0 Pro (Thinking Mode) for semantic synthesis and repository structuring.

Video Editing: Remotion + FFmpeg.wasm for programmatic editing.

Storage: IndexedDB for local-first asset metadata and job tracking.

Challenges we ran into

  • Browser resource limits: Processing hours of video in-browser without freezing the UI required optimized batching and Service Worker orchestration.

  • Streaming AI responses: Maintaining a live “thinking” log while avoiding network timeouts took careful streaming API integration.

  • Complexity of semantic synthesis: Combining visual data, transcripts, and context into a cohesive structure required iterative tuning of Gemini prompts.

  • Parallel processing & resilience: Ensuring tasks could resume after crashes or tab closures needed transactional logic in IndexedDB.

Accomplishments that we're proud of

  • Built a fully local-first AI video engine that can analyze multiple videos in parallel.

  • Achieved frame-accurate transcription with Whisper and integrated it with programmatic editing pipelines.

  • Developed a glassmorphic, real-time dashboard with live AI logs and workspace management.

  • Enabled programmatic video editing via Remotion, turning video into code with instant preview.

  • Created a robust, resilient background pipeline that survives browser crashes, closures, or database corruption.

What we learned

Large AI models can be effectively streamed and orchestrated in-browser with careful use of Service Workers.

Semantic understanding of video requires combining multimodal data (audio, visual, text) in structured ways.

Building programmatic editing interfaces for video drastically speeds up storytelling workflows.

Real-time feedback improves user trust; users like watching AI “think”.

What's next for Trem

Preset Editing Modes: One-click modes like “High-Energy Speaker” or “Cinematic Montage” powered by AI timing and FFmpeg.

Advanced Visual Understanding: Detect emotions, gestures, and scene context for richer tagging.

Collaboration Tools: Shared repositories and live editing for team workflows.

Cloud Sync: Optional hybrid mode to combine local-first processing with cloud storage for massive libraries.

Mobile & Lightweight Version: Bring Trem-AI to tablets and low-power devices while keeping local-first processing.

Built With

  • css-variables
  • ffmpeg.wasm
  • google-gemini-1.5-flash
  • google-gemini-3.0-pro
  • indexeddb
  • node.js
  • openai-whisper-(via-replicate)
  • react-19
  • remotion
  • service-workers
  • tailwindcss
  • typescript
  • vite
  • workbox
  • zustand
Share this project:

Updates