StoryMind

Book Collection
Character Relationship Network
Homepage
Image Generation/Character Gallery

Inspiration

We all know that feeling when you read a book, fall in love with the characters, and then the movie adaptation comes out and you're like… "Wait. That’s NOT who I saw in my head." Directors, writers, and artists spend a ton of time digging through paragraphs to figure out how a character is actually described, and it’s still super easy to miss important details buried across chapters. We wanted to build a tool that helps bridge that gap, something that makes it easier to see what's on the page, and help creative teams stay true to the story.

What it does

Our tool takes in a book and:

Finds all the descriptions of each character scattered throughout the story
Merges those into one consistent profile per character
Generates visuals that actually match the text.
Lets you explore character relationships through a network map

So in short: it turns words into a shared visual reference, something everyone on a movie or creative team can point to and agree on.

How we built it

We used a combination of:

RAG (Retrieval-Augmented Generation) to pull every mention of a character from the text
Gemini 2.0 to synthesize those into one clear description
Imagen 3 to turn that description into consistent character visuals
FAISS to store and search text embeddings efficiently
React + Tailwind on the frontend, because we like clean builds and fast iteration

We also made sure every character gets a deterministic "seed" so they look the same every time we generate them. This was huge for consistency.

Challenges we ran into

Getting consistent images was the hardest part. Most image models want to "be creative" every time, which is the opposite of what we needed.
Books are long. We had to make sure we could process them fast enough to demo without waiting forever.
Designing a UI that feels simple, even though the backend is doing a lot, took some iteration.

Accomplishments that we're proud of

The characters actually look like how the book describes them.
The whole pipeline works end-to-end in one flow. No manual cleanup. No guessing. Just: upload -> explore -> visualize.

What we learned

RAG is way more powerful than just answering questions, it can shape the output of generative models.
Consistency > fanciness. If the character doesn't look the same twice, the whole illusion breaks.