Omni-Operator V1
Inspiration
As engineers and creators at Operators Forge, we felt trapped in the "SaaS Tax" cycle. We were paying multiple monthly subscriptions for tools that handle editing, transcription, and social media scheduling—tools that are "black boxes" where you lose control over your data. Our inspiration was to build a Sovereign AI Factory: a local-first production line where the Operator owns the infrastructure, the memory, and the reasoning logic. We wanted to prove that with the power of Gemini 3 Flash Preview, one can run an entire media agency from a single local machine without relying on external SaaS platforms.
What it does
Omni-Operator V1 is an autonomous media factory that transforms raw video footage into a multi-platform content campaign.
- Analyzes: It "watches" raw MP4 files using Gemini's native multimodality to identify viral hooks. It bypasses traditional transcription, understanding the scene's energy directly.
- Writes: It generates unique, platform-optimized strategy and copy for TikTok, YouTube, and LinkedIn, validated via PydanticAI to ensure data integrity.
- Manufactures: It physically executes sub-second precise cuts and performs vertical reframing (9:16) using an automated FFmpeg engine.
- Remembers: It uses a local Qdrant Vector DB to store campaign data, allowing the system to learn and retrieve the creator's unique style for future missions.
- Distributes: It uses the Model Context Protocol (MCP) to autonomously organize and manage the local file system.
How we built it
We architected a high-density "Sovereign Stack" designed for autonomy:
- Cognitive Engine: Gemini 3 Flash Preview via the new google-genai SDK for high-speed multimodal reasoning.
- Logic & Agency: PydanticAI for type-safe agentic orchestration and structured outputs.
- Vector Memory: Qdrant running locally in Docker to manage brand experience and RAG capabilities.
- Observability: Langfuse v2 for local tracing, debugging, and cost-per-mission analysis.
- Media Engine: A custom Python service controlling FFmpeg and MoviePy to automate the rendering process.
- Tactical Interface: A professional "Mission Control" dashboard built with Next.js 16 and Tailwind 4.
Challenges we ran into
The primary challenge was bridging the gap between "AI Reasoning" and "Technical Execution." We had to ensure that the timestamp markers identified by Gemini matched perfectly with the frame-accurate requirements of FFmpeg. Additionally, orchestrating an entire enterprise-grade stack (FastAPI, Qdrant, Langfuse, Postgres) within a local Docker environment while ensuring low latency in a Next.js frontend required a rigorous approach to network and resource management on a single machine.
Accomplishments that we're proud of
We successfully built a "Zero SaaS Tax" pipeline. We achieved total data sovereignty—your raw video and brand strategies never leave your controlled environment. We are particularly proud of the Native Multimodal Integration; by removing the need for separate speech-to-text or vision models, we've created a much faster and more cost-effective production line. The system is truly autonomous: from one raw upload to three formatted, described, and sorted video assets.
What we learned
Building this project proved that Gemini 3 Flash is a game-changer for Media-Ops. Its speed allows for real-time iteration, and its massive context window ensures that the agent maintains a coherent narrative across a long video, rather than seeing it in disconnected chunks. We also learned that the future of AI belongs to "Agents with Hands"—systems that don't just chat, but operate directly on file systems and infrastructure through protocols like MCP.
What's next for omni-operator-v1
The next phase is Agentic Quality Control (AQC), where Gemini will autonomously review its own rendered clips against the original mission intent to ensure perfect quality. We are also planning to integrate automated voice cloning and dubbing to allow creators to go global with a single click, and an auto-thumbnail generator that identifies the most visually striking frame from each cut.
Built With
- docker
- fastapi
- ffmpeg
- gemini-api-(gemini-3-flash-preview)
- langfuse
- model-context-protocol-(mcp)
- moviepy
- next.js-16
- pydanticai
- python-3.12
- qdrant-vector-db
- tailwind-css-4
- typescript
- uv
Log in or sign up for Devpost to join the conversation.