Inspiration Creating viral short-form video content from raw footage is incredibly labor-intensive—requiring long hours of manual editing per finished minute, expensive software, and deep platform knowledge. This massive barrier to entry excludes educators, rural NGOs, small businesses, and community journalists. ReelsCraft democratises high-quality video processing and closes the digital equity gap in the creator economy, sustainably.
What it does ReelsCraft for Good is a fully autonomous, 6-agent video-to-reels production pipeline. Users upload a raw video and receive some polished, unique short-form reels. The pipeline automatically: Transcribes audio and identifies visual hooks, cta, hashtags, captions. Discovers the most engaging narrative cuts using historical data. Maps cuts-per-minute to trending, beat-synchronized backing music. Performs a single-pass render with professional audio ducking and dynamic captions. Streams pipeline telemetry to an analytics dashboard where users can explore metrics via natural language (Text-to-SQL).
How we built it
We orchestrated the system using the Google Agent Development Kit (ADK) to enforce a deterministic, 6-agent sequential flow (Preprocessor → Perception → Director → Audio Sync → Packager → Conversational Analytics).
Compute: We used Cloud Run (L4 / RTX 6000 GPU instances) for scale-to-zero inference running Gemma 4 (26B/e4b) and Gemini 3.5 Flash. Data & Integrations: We implemented two Model Context Protocol (MCP) servers: Fivetran MCP (to continuously sync trending music catalogs and social metrics into BigQuery) and Firebase MCP (for global pipeline run tracking and KV state). Memory: We built a tri-layer memory architecture featuring Session Memory (ADK), Vector Memory (BigQuery Vector Search for Hybrid RAG), and Entity Memory (BigQuery SQL).
Challenges we ran into
Multimodal Video Embedding: Native BigQuery ML Object Tables struggled to directly embed raw .mp4 files for semantic search. We solved this by using Gemini to extract rich textual visual_tags and combined them with Whisper transcripts. Embedding this concatenated text created highly accurate "Vibe" vectors for retrieving historical edit formats. Compute Costs: Video processing is expensive. We needed a system that was financially viable for small creators. We solved this by utilizing BigQuery Vector Search (pay-per-query) instead of Vertex AI Vector Search (which has a ~$20-30/mo minimum) and deploying to scale-to-zero Cloud Run GPUs, achieving a true $0.00 baseline monthly cost when idle. Rendering Latency: FFmpeg rendering can easily become a bottleneck. We moved the Audio Sync agent before the Packager agent to pre-calculate beat-maps. This allowed the Packager to execute a massive, single-pass render rather than looping through multiple passes.
Accomplishments that we're proud of
Engineering a completely automated, deterministic 6-agent pipeline that doesn't just generate text, but physically produces professional-grade, beat-synced .mp4 video files. Successfully implementing Hybrid RAG for Video Editing, where our Director agent queries historical Edit Decision Lists (EDLs) via BigQuery to mimic successful narrative pacing and transitions. Building a robust multi-tier caching system (Firestore) that securely skips the 100+ second Perception phase if a video upload has ≥92% similarity to a previously processed file.
What we learned
We learned that while autonomous ReAct (Reasoning and Acting) loops are great for chatbots, deterministic state orchestration is absolutely critical for compute-heavy tasks. Using the Google ADK's strictly-typed AgentEnvelope contracts ensured that our agents didn't hallucinate expensive FFmpeg rendering commands, allowing us to safely execute complex map-reduce/fan-out rendering patterns.
What's next for Reels Craft
Establish that self improving loop with Insights from Fivetran into BigQuery and passing it on to agents, generate enough product researched customized data to finetune this agent.
Built With
- adk
- agent-to-agent-(a2a)-protocols
- apis
- artifact-registry
- bash
- bigquery-(vector-search
- bqml-k-means-clustering
- cloud-build
- cloud-run-(serverless-gpu-/-l4-/-rtx-6000-compute)
- cloud-sql-(postgresql)
- cloud-storage-(gcs)
- crewai
- embeddings-api)
- events
- fastapi
- ffmpeg-/-ffprobe
- firebase-mcp-server
- firestore
- fivetran
- fivetran-mcp-server
- gemini3.5
- gemma-4-(26b-/-e4b)
- google-adk
- iam-ai-models:-gemini-3.5-flash
- javascript
- langchain
- mcp
- ollama-google-cloud-platform-(gcp)-services:-vertex-ai-(model-registry
- openai-whisper-(asr)-integrations
- pub/sub
- pydantic
- pyscenedetect
- python
- react
- secret-manager
- server-sent
- sql
- storage-write-api)
- tailwind-css
- typescript
- vertexai
- vllm
Log in or sign up for Devpost to join the conversation.