💡 Inspiration

History is often buried in dusty archival boxes and dense PDFs, making it inaccessible to the general public. While generative AI excels at summarizing this text, reading a summary in a chat UI is inherently dry. We were inspired by legendary documentary filmmakers (like Ken Burns) who can take a single static photo and make it feel alive. Our goal was to build a bridge between raw data and compelling, emotionally resonant storytelling: an autonomous "AI Co-Director" that doesn't just read your history, but films it.

🛠️ How we built it

The system operates as a sophisticated two-step pipeline. First, we built a Python/Streamlit backend where users upload their messy unorganized assets. We utilize the Google GenAI SDK to ingest these multimodal files directly into the context window. The LLM is prompted to act as a Director: analyzing the visual evidence, writing a grounded Voiceover script, and choosing the perfect 3D cinematic template (e.g., INVESTIGATION_DESK, SURVEILLANCE). Second, the Python backend exports this highly structured narrative into an AST (storyboard.json), which is then dynamically rendered by a powerful React/Remotion frontend using 3D CSS transforms and synchronized TTS audio.

🧗 Challenges & Solutions (The Math Behind the Magic)

One of our biggest hurdles was Foreground/Background Blending. Early iterations drowned out the user's historical documents because the AI-generated aesthetic backgrounds were too bright.

To fix this programmatically, we couldn't just guess brightness values. We had to calculate standard relative luminance pixel-by-pixel for the generated ambiance:

$$ L = 0.299R + 0.587G + 0.114B $$

To ensure the viewer's eye was drawn strictly to the historical asset in the center of the frame, we derived an adaptive parabolic spotlight curve to map darkness back into a dynamic CSS vignette, peaking at the golden ratio area (appx 35% height):

$$ f(t) = \max\left(0, 1 - \left(\frac{t - 0.35}{0.45}\right)^2\right) $$

Google Cloud Integration Examples in the Repository

Google Gemini API Integration

The primary application code demonstrates the use of the Google GenAI SDK to process multimodal inputs (images and text) by making calls to Gemini models.

File Link: app.py Explanation: Inside this file, the application configures the google.genai client and structures prompts that combine user scripts with optimized image binaries. The function get_gemini_response handles the direct interaction with the Gemini API to analyze archival assets and generate the documentary storyboard sequence.

# Example snippet from app.py
from google import genai

def get_gemini_response(prompt, images, image_texts=None):
    client = genai.Client(api_key=api_key)

    # ... asset preparation logic ...

    response = client.models.generate_content(
        model=model_name,
        contents=content
    )
    return response.text

Google Cloud Platform (GCP) Deployment

To demonstrate how the application is hosted and managed using Google Cloud infrastructure, the repository includes an automated deployment script.

File Link: deploy_gcp.sh Explanation: This shell script manages the build and deployment process to Google Cloud Run. It authenticates with GCP, builds the Docker container, pushes the image to the Google Artifact Registry (or Container Registry), and deploys the newly built image to a scalable Cloud Run service. This showcases the project's reliance on Google Cloud for production infrastructure.

🧠 What we learned

We learned that getting an LLM to generate a massive, unbroken 8000-token narrative requires heavy prompt engineering so it doesn't "lazily" summarize.

Built With

Share this project:

Updates