Inspiration

We all know that feeling when you read a book, fall in love with the characters, and then the movie adaptation comes out and you're like… "Wait. That’s NOT who I saw in my head." Directors, writers, and artists spend a ton of time digging through paragraphs to figure out how a character is actually described, and it’s still super easy to miss important details buried across chapters. We wanted to build a tool that helps bridge that gap, something that makes it easier to see what's on the page, and help creative teams stay true to the story.

What it does

Our tool takes in a book and:

  • Finds all the descriptions of each character scattered throughout the story
  • Merges those into one consistent profile per character
  • Generates visuals that actually match the text.
  • Lets you explore character relationships through a network map

So in short: it turns words into a shared visual reference, something everyone on a movie or creative team can point to and agree on.

How we built it

We used a combination of:

  • RAG (Retrieval-Augmented Generation) to pull every mention of a character from the text
  • Gemini 2.0 to synthesize those into one clear description
  • Imagen 3 to turn that description into consistent character visuals
  • FAISS to store and search text embeddings efficiently
  • React + Tailwind on the frontend, because we like clean builds and fast iteration

We also made sure every character gets a deterministic "seed" so they look the same every time we generate them. This was huge for consistency.

Challenges we ran into

  • Getting consistent images was the hardest part. Most image models want to "be creative" every time, which is the opposite of what we needed.
  • Books are long. We had to make sure we could process them fast enough to demo without waiting forever.
  • Designing a UI that feels simple, even though the backend is doing a lot, took some iteration.

Accomplishments that we're proud of

  • The characters actually look like how the book describes them.
  • The whole pipeline works end-to-end in one flow. No manual cleanup. No guessing. Just: upload -> explore -> visualize.

What we learned

  • RAG is way more powerful than just answering questions, it can shape the output of generative models.
  • Consistency > fanciness. If the character doesn't look the same twice, the whole illusion breaks.

What's next for StoryMind

  • Multi-character scene generation (same characters, same art style, same frame)
  • Style-Support
  • Support for screenplay imports
  • A "Director's Mode" with storyboards and palette references
  • Collaboration features for writing rooms & film teams

We want this to eventually become a tool that helps storytellers see the story together, before the camera ever rolls.

Built With

Share this project:

Updates