NovelForge AI

A Gemini 3–powered multi-agent system that turns a single sentence into a consistent long-form story.

Input: one-line idea (+ optional constraints)
Output: Series Bible → Characters → Outline.jsonl → Slice plan → Multi-chapter drafts → Compressed memory (for consistency)


TL;DR

Long-form storytelling is not just “write better text” — it’s a structure + memory problem.
We solve it using:

  • Multi-agent specialization (like a real writing team)
  • Hierarchical generation (chapter → slice → prose)
  • Compressed memory injection (no embeddings / no vector DB)
  • Gemini 3 for planning, structured outputs, drafting, and memory compression

About the Project

What inspired us

Most LLM writing demos end at a short story, because once narratives become long, they drift:

  • characters “change personality”
  • canon facts contradict
  • pacing collapses across chapters

We wanted a simple experience:

Type one sentence. Get a structured, multi-chapter story that stays consistent.

What we built

A workflow-orchestrated, Gemini 3 multi-agent pipeline that generates:

  1. a Series Bible (world rules + plot beats)
  2. a Character System
  3. a chapter outline (outline.jsonl)
  4. a slice plan (scenes per chapter)
  5. prose drafted slice-by-slice
  6. and memory compressed after each chapter and injected into the next prompts

Why this matters (Market + Technical Gap)

Market opportunity

The books / publishing ecosystem is already massive:

Even though these are overall book market numbers (not “novels only”), long-form storytelling remains a high-value content segment—especially as serialized digital fiction and downstream IP expand.

The technical gap: long-form writing doesn’t fit into one model call

A standard novel commonly falls around 70k–100k words (genre dependent).
How Many Words in a Novel? (Reedsy)

Even with long-context models, “single-shot novel generation” is constrained by:

  • limited controllability,
  • output limits,
  • and the inability to preserve canon over many chapters without memory.

Public summaries for Gemini 3 Pro preview commonly cite up to ~64k output tokens and a very large input context window.
Gemini 3 Pro (Preview) - LLM Stats

So the real challenge becomes:

How do we scale storytelling beyond a single prompt while maintaining structure and long-term consistency?


Solution Overview

We scale beyond one context window by combining:

  1. specialized agents (planning, outlining, slicing, drafting)
  2. hierarchical decomposition (chapters → slices)
  3. compressed memory injection (summarize canon + recap after each chapter)

This makes long-form generation structured, stable, and explainable.


Architecture (High-Level)

ClientOrchestrator / Job RunnerAgentsArtifacts + Memory

  • The Orchestrator owns the workflow:
    • runs the chapter loop + slice loop
    • assembles prompts
    • injects compressed memory
    • validates outputs
    • persists artifacts
  • Agents stay role-focused (no messy cross-calls)
  • Memory is updated per chapter and injected into next steps

Agent Design (Role-Based Multi-Agent Pipeline)

Architect — Series Bible Builder

Turns the one-line idea into a Series Bible: world rules, plot beats, themes, stakes, and long-range structure.
Goal: make the story writeable for many chapters.

Profiler — Character System Designer

Builds a coherent cast: motivations, fears, goals, arcs, and relationship graph.
Goal: create characters that generate conflict naturally.

Weaver — Chapter Outline Generator

Creates the executable chapter plan: chapter summaries, turning points, pacing roles, and cliffhangers.
Goal: structure long-form narrative so drafting doesn’t drift.

Slicer — Chapter-to-Slice Decomposer

Breaks each chapter into 3–4 “slices” (scenes) with explicit objectives, conflicts, and transitions.
Goal: keep generation manageable and controllable.

Writer — Slice-Following Draft Writer

Drafts prose slice-by-slice, preserving tone and canon, and lands the chapter hook.
Goal: produce readable text while respecting the plan.


Memory System (Compressed Summaries, No Embeddings)

We intentionally do not use embeddings or a vector database. Instead, after each chapter, we compress the chapter into:

  • Short-term memory: current arc, recent events, immediate continuity constraints
  • Long-term memory (canon): stable facts (world rules, character truths, timeline constraints)

Then, for the next chapter/slice, the orchestrator:

  1. loads memory summaries
  2. injects them into the prompt context
  3. continues generation with continuity constraints

This yields long-form consistency without retrieval complexity.


Why Gemini 3

Gemini 3 is not just “a good writer model” for us — it is well-suited for a workflow that repeatedly switches modes:

  • Strong planning + constraint-following Agents like Architect/Weaver require structured reasoning, not just fluent prose.

  • More reliable structured outputs Our pipeline depends on JSON/JSONL artifacts; format reliability directly affects system stability. Public summaries highlight Gemini 3 Pro’s large context and high output capacity, supporting long-form pipelines.
    Gemini 3 Pro (Preview) - LLM Stats

  • Good summarization for memory compression Memory compression needs precise canon extraction + short recap. Gemini 3 performs well at controlled summarization.

In short:

Gemini 3 makes multi-agent orchestration practical because it handles planning, structure, drafting, and compression reliably in one system.


What We Learned

  • Multi-agent systems are software architecture, not “more prompts”.
  • Memory is the real bottleneck for long-form LLM products.
  • Compression + injection is a strong baseline before adding retrieval/embeddings.

References

Built With

  • google-ai-studio
Share this project:

Updates