Reels Craft

Inspiration Creating viral short-form video content from raw footage is incredibly labor-intensive—requiring long hours of manual editing per finished minute, expensive software, and deep platform knowledge. This massive barrier to entry excludes educators, rural NGOs, small businesses, and community journalists. ReelsCraft democratises high-quality video processing and closes the digital equity gap in the creator economy, sustainably.

What it does ReelsCraft for Good is a fully autonomous, 6-agent video-to-reels production pipeline. Users upload a raw video and receive some polished, unique short-form reels. The pipeline automatically: Transcribes audio and identifies visual hooks, cta, hashtags, captions. Discovers the most engaging narrative cuts using historical data. Maps cuts-per-minute to trending, beat-synchronized backing music. Performs a single-pass render with professional audio ducking and dynamic captions. Streams pipeline telemetry to an analytics dashboard where users can explore metrics via natural language (Text-to-SQL).

How we built it

We orchestrated the system using the Google Agent Development Kit (ADK) to enforce a deterministic, 6-agent sequential flow (Preprocessor → Perception → Director → Audio Sync → Packager → Conversational Analytics).

Compute: We used Cloud Run (L4 / RTX 6000 GPU instances) for scale-to-zero inference running Gemma 4 (26B/e4b) and Gemini 3.5 Flash. Data & Integrations: We implemented two Model Context Protocol (MCP) servers: Fivetran MCP (to continuously sync trending music catalogs and social metrics into BigQuery) and Firebase MCP (for global pipeline run tracking and KV state). Memory: We built a tri-layer memory architecture featuring Session Memory (ADK), Vector Memory (BigQuery Vector Search for Hybrid RAG), and Entity Memory (BigQuery SQL).

Challenges we ran into

Multimodal Video Embedding: Native BigQuery ML Object Tables struggled to directly embed raw .mp4 files for semantic search. We solved this by using Gemini to extract rich textual visual_tags and combined them with Whisper transcripts. Embedding this concatenated text created highly accurate "Vibe" vectors for retrieving historical edit formats. Compute Costs: Video processing is expensive. We needed a system that was financially viable for small creators. We solved this by utilizing BigQuery Vector Search (pay-per-query) instead of Vertex AI Vector Search (which has a ~$20-30/mo minimum) and deploying to scale-to-zero Cloud Run GPUs, achieving a true $0.00 baseline monthly cost when idle. Rendering Latency: FFmpeg rendering can easily become a bottleneck. We moved the Audio Sync agent before the Packager agent to pre-calculate beat-maps. This allowed the Packager to execute a massive, single-pass render rather than looping through multiple passes.

Accomplishments that we're proud of

Engineering a completely automated, deterministic 6-agent pipeline that doesn't just generate text, but physically produces professional-grade, beat-synced .mp4 video files. Successfully implementing Hybrid RAG for Video Editing, where our Director agent queries historical Edit Decision Lists (EDLs) via BigQuery to mimic successful narrative pacing and transitions. Building a robust multi-tier caching system (Firestore) that securely skips the 100+ second Perception phase if a video upload has ≥92% similarity to a previously processed file.

What we learned

We learned that while autonomous ReAct (Reasoning and Acting) loops are great for chatbots, deterministic state orchestration is absolutely critical for compute-heavy tasks. Using the Google ADK's strictly-typed AgentEnvelope contracts ensured that our agents didn't hallucinate expensive FFmpeg rendering commands, allowing us to safely execute complex map-reduce/fan-out rendering patterns.

What's next for Reels Craft

Establish that self improving loop with Insights from Fivetran into BigQuery and passing it on to agents, generate enough product researched customized data to finetune this agent.

Built With

adk
agent-to-agent-(a2a)-protocols
apis
artifact-registry
bash
bigquery-(vector-search
bqml-k-means-clustering
cloud-build
cloud-run-(serverless-gpu-/-l4-/-rtx-6000-compute)
cloud-sql-(postgresql)
cloud-storage-(gcs)
crewai
embeddings-api)
events
fastapi
ffmpeg-/-ffprobe
firebase-mcp-server
firestore
fivetran
fivetran-mcp-server
gemini3.5
gemma-4-(26b-/-e4b)
google-adk
iam-ai-models:-gemini-3.5-flash
javascript
langchain
mcp
ollama-google-cloud-platform-(gcp)-services:-vertex-ai-(model-registry
openai-whisper-(asr)-integrations
pub/sub
pydantic
pyscenedetect
python
react
secret-manager
server-sent
sql
storage-write-api)
tailwind-css
typescript
vertexai
vllm

Submitted to

Google Cloud Rapid Agent Hackathon

Created by

As an Architect, I designed a scalable, 7-tier multi-agent architecture that intelligently orchestrates specialized AI agents and routes inference across local models and managed Google Cloud endpoints.

In my Fullstack capacity, I engineered the end-to-end platform by building a responsive React frontend and a robust FastAPI backend to handle asynchronous, heavy-duty video processing with FFmpeg and Whisper.

To ensure operational reliability as a DevOps engineer, I containerized the entire local environment with Docker and automated the cloud infrastructure provisioning using Terraform and Cloud Build.

Leading Product Research, I evaluated and integrated cutting-edge technologies like Unsloth for efficient model fine-tuning and pgvector for advanced semantic search capabilities.

Finally, taking Product Ownership, I drove the overarching vision of the platform, defining the core media generation workflows and spearheading the integration of BigQuery-driven Conversational Analytics to deliver a mature, data-driven SaaS product.

Niki Gouda

Updates

Niki Gouda started this project — Jun 11, 2026 03:32 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.