Inspiration

We've all seen incredible AI images, but trying to tell a coherent story with them is a nightmare. You generate a "cyberpunk detective" in one frame, and in the next, their jacket changes color, the lighting shifts, or they turn into a different person entirely. Using standard image generators feels like playing a slot machine—you pull the lever and hope for the best.

We wanted to change the paradigm from Generating (random outputs) to Directing (planned, consistent sequences). We were inspired by traditional animation workflows where a director plans a scene, and artists execute it with strict adherence to character sheets and lighting guides. We asked: Can we build an AI Agent that acts as that Director?

What it does

FIBO Sequential Studio is an agentic creative workspace that transforms simple prompts into consistent, cinematic image sequences.

  • The AI Director: You give high-level instructions (e.g., "A calm ocean evolving into a violent storm"), and our Google Gemini Agent plans the sequence frame-by-frame, ensuring narrative logic and pacing.
  • Structured Execution: Instead of just sending text prompts, the system constructs complex JSON schemas for the Bria AI engine. This allows us to programmatically "lock" parameters like specific camera angles (low_angle), lighting conditions (golden_hour), and object consistency across the entire sequence.
  • Professional Studio Control: We built a full React-based "Studio" interface where users see the timeline, inspect the Agent's plan, edit the raw JSON for any frame, and regenerate specific shots without breaking the rest of the sequence.

How we built it

The project relies on a robust Multi-Agent architecture:

  • The Brain (Gemini Flash): We use Google's Gemini model as the "Director Agent." It takes the user's concept and outputs a JSON Plan—an array of structured frame definitions that mathematically describe the scene's evolution.
  • The Engine (Bria v2.3): We chose Bria specifically for its Structured Prompting capability. Unlike typical diffusion models that just take text, Bria accepts a specific JSON schema, which makes it perfect for the kind of programmatic control we needed.
  • The Interface (React + Flask): The frontend is built with React and Shadcn/UI for a premium, dark-mode "Studio" feel. The backend is Flask (Python), which orchestrates the agents, manages the state, and handles the "cache-busting" logic to ensure the UI always shows the freshest renders.

Challenges we ran into

  • The "Context" Error: We spent hours debugging a persistent 422 Unprocessable Entity error from the Bria API. It turned out that when generating strictly from JSON, the model demands a specific context field that wasn't well-documented. We had to implement a "Sanitizer" middleware to auto-patch missing fields in the Agent's output.
  • Prompt Drift: Getting an LLM to output valid JSON for another AI model is tricky. The LLM would sometimes "hallucinate" keys that Bria didn't support. We solved this by creating a strict TypeScript-like schema definition in the System Prompt to constrain the LLM's creativity to valid parameters only.
  • State Synchronization: Keeping the React state in sync with the Flask backend was tough, especially when regenerating single frames. We had to implement a robust polling mechanism and unique timestamp query parameters to force the browser to update the images instantly.

Accomplishments that we're proud of

  • True "In-Painting" Workflow: We successfully built a feature where you can edit the JSON for Frame 3, hit "Regenerate," and only that frame updates while keeping its context within the story. This feels like real magic.
  • Seamless Agent Handoff: The way Gemini (the planner) hands off data to Bria (the painter) feels invisible to the user. You just type a story, and images appear.
  • The UI: We're really proud of the "Studio" aesthetic. It manages to pack a lot of complexity (JSON editors, Timelines, Logs) into a clean, usable interface that doesn't overwhelm the user.

What we learned

We learned that Structured Prompting is the future of commercially safe AI. Natural language is too ambiguous for production pipelines. By using an Agent to generate Structure (JSON) instead of just Pixels, we gained a level of control that feels closer to professional 3D tools like Blender, but with the speed of Generative AI.

What's next for FIBO_Sequential_Studio

  • Video Export: We want to implement ffmpeg on the backend to visually stitch the frames into an actual .mp4 video with frame interpolation for smoother motion.
  • Multi-Character Consistency: We plan to implement a "Character Sheet" feature where the Agent defines a character's traits (hair, clothes, face) once and injects that definition into every frame's JSON automatically.
  • Community Gallery: A feature for users to share their saved "Plans" (JSON sequences) so other creators can remix them with different styles or subjects.

Built With

Share this project:

Updates