Video Demo

https://drive.google.com/file/d/1mTrwExPDjm965SbG3GxSt6PNgLzaYWNG/view?usp=sharing

Inspiration

I’ve always loved reading, and growing up I was obsessed with Choose Your Own Adventure books. I would reread them over and over just to explore different paths. They made reading feel active — not passive. Every decision forced me to think about consequences, character motivations, and alternative outcomes.

Looking back, those books weren’t just fun — they strengthened comprehension, cause-and-effect reasoning, and critical thinking. I wanted to recreate that experience for any book — not just the ones written in branching format.

What If? is designed to transform reading from passive consumption into active exploration. By allowing students to generate and visualize counterfactual story branches, What If? encourages:

  1. Deeper comprehension of plot and character motivations

  2. Critical thinking about cause and effect

  3. Engagement through creative exploration

  4. Stronger memory retention through interaction

What it does

What If? turns any uploaded PDF into an interactive counterfactual storytelling experience.

Users can:

  1. Upload a book (PDF)

  2. Read in an e-reader interface

  3. Highlight any passage

  4. Ask: “What if…?” and input an alternate decision

I then generate an interactive video that visualizes that counterfactual scene.

But it doesn’t stop there.

Users can:

  1. Pause the generated video

  2. Double-click on a character or object

  3. Recursively generate a new branch exploring that element

This transforms static books into explorable narrative trees.

How we built it

I ingest the context of the text preceding the highlighted text (preventing temporal leakage) and the user's counterfactual pondering and feed this into Claude. Claude identifies the most relevant characters and locations for the counterfactual scene and generates a detailed description of each. I then feed this into a Google image generation model to generate front/back/sideview images of each character & location that are cached in SupaBase. I also generate scripts for the counterfactual scene and feed the script and relevant character/location assets into a Runway Model to generate the counterfactual video.

When the user pauses the video and clicks on a particular location in the frame, I identify what element of the frame they clicked on and recursively generate more character/location assets and scripts as necessary. This allows the user to interact meaningfully with this counterfactual world.

Challenges we ran into

  • Goading cheaper models into consistently outputting images at the right angles, with the right level of detail, and not copies of online adaptions was unexpectedly time consuming.
  • I initially experimented with Figma Make to prototype UIs and had a much more difficult time than anticipated to export the generated code into our own environment where I intended to flesh out the backend and adapt the technical stack.

Accomplishments that we're proud of

  • Successfully delivered end-to-end counterfactual visualization pipeline that can be activated at any point in the story.
  • consistency across the book (a character generated at the start of the book will remain consistent even if called upon at the end of the book, unless significant character description has caused the cannon understanding of the character to diverge significantly from the initial reader impression of the character).

What we learned

  1. The GPT API's image generation capabilities are significantly weaker than the ChatGPT interface's image generation capabilities. I spent a LOT of time testing out various image and video generation models and it was interesting to learn about the strengths and weaknesses of various services.
  2. In the world of AI, building intentional, thoughtful, design and ux-centered products is the moat against slop.
  3. Robust processing of non-deterministically-formatted AI outputs still seems harder than it should be.

What's next for What If?

I want to make the user experience a lot more seamless, finish incorporating audio into our generated worlds, reduce costs associated with generation, improve context management, and much more. Eventually, I'd also like to add a social aspect by enabling users to share their generated worlds and counterfactuals with each other.

Built With

Share this project:

Updates