Inspiration
For the longest time, I thought I was alone with ADHD struggles. I struggled with simple chores that others seemed to do effortlessly. I would find myself neglecting tasks and prefer to just lie in bed, daydreaming about vivid, cinematic fantasies — becoming a wizard, an ancient warrior, a secret agent — while my real-world room stayed messy and my tasks remained untouched.
It wasn't until I saw others on social media sharing their experiences that I put names to these phenomenon: "Executive Dysfunction" and "Maladaptive Daydreaming".
Executive dysfunction means we know what we have to do, but we just can't get that motivation to get started, because it is just not exciting or fun, until it becomes urgent.
Maladaptive Daydreaming means we spend hours doing nothing, conjuring up worlds and fantasies in our minds, complete with plots, dialogues, relationships to strangely cinematic details, and playing them like a "mental movie". This is where we escape to because real life is just plain dull, compared to the emotional and epic depths of our daydreams which we use to cope with mediocrity. Saving the world from an army of orcs and goblins is definitely more palatable than filling up an Excel spreadsheet. By the time we realize it, we struggle to come back to reality and kind of lose touch with our real commitments and obligations.
These lead to real, heavy burden as there are consequences of not getting errands, chores and work done. Rooms remain cluttered because tidying is just too boring. Signing up for membership to get discounts is procrastinated on, because the process of filling up a form is just too boring. These leads to negative consequences that affect our lives, including the "ADHD Tax".
Even ADHD management apps don't really help. They function more like productivity checklist planner, which in the end look like dull looking calendars and works only for neurotypicals. Even gamification systems like points collection as a result of tasks being done are just low level cheap thrills and don't really push motivation. For example, when I set a timer in an ADHD app to work on a task, a virtual tree will grow during the time. If I leave the app or get distracted, the tree dies. But accomplishment is only felt if I can relate to tree planting. I'm not even interested in tree planting, so I wouldn't care if I lost the game. But if you can somehow make me imagine me mopping the floor with two hands holding the mop is represented as casting spells with a Wizard's staff, then we are talking, because, you need to cater to my fantasies.
However, thanks to the countless posts about this topic on social media, I realized I wasn't "lazy" or "broken". My brain, like many others, simply functions on a different currencies: novelty, excitement, idealism.
So, what if we didn't have to fight our brains? What if we turned our "maladaptive" daydreams into our greatest ally, using fantasy to fuel our motivation to overcome Executive Dysfunction?
Dreamie was born from this simple question: Why escape into an imaginary second life when your real life can be made legendary?
What it does
Dreamie introduces the concept of Productive Escapism. Most games, even augmented reality ones, offer "Negative Escapism"—you enjoy a story, but your real-life tasks remain in a "task debt."
Dreamie flips the script. It doesn't just feed your imagination, it reframes your reality:
- Reality as Fantasy: We don't just put you in a story, we map your physical world into a "Fantasy Twin." Your messy room becomes a "Cursed Armory," and your math homework becomes "Ancient Arcane Runes."
- The Main Character: You aren't playing a generic hero. Using multimodal AI, the protagonist is a high-fidelity version of you.
- Live the Legend: Instead of living a double life, Dreamie allows you to live one life, which is your real life, that is actually converted into a fantasy. It turns the "boring" into the "epic" so that by the time the story ends, your chores are actually done.
The Value: Killing the ADHD Tax
By replacing the "boredom gap" with infinite, AI-generated novelty, Dreamie provides the stimulation the ADHD brain craves. It transforms executive dysfunction into a productive flow state, ensuring that "escaping into a story" finally results in "getting things done."
How I built it
Ideation Phase
Idea Refinement:
I context dumped the problem of Executive Dysfunction and Maladaptive Daydreaming as well as my proposed idea to Gemini app (Thinking Mode), then collaborated with it to refine my ideas.
I then asked Gemini to create the Product Requirements Document, detailing the System Architecture, Tech Stack, Agentic Pipelines, System Prompts, Front End UI design, and the Phased Implementation Plan, for Vibe Coding in Google AI Studio later.
Hypothesis validation with Grounded Evidence:
I used Gemini Deep Research to gather evidence to validate against my hypothesis of how certain features of Dreamie can help with the problem. Below are the prompts being used.
"Explain the neurological mechanism of 'Reward Deficiency Syndrome' in ADHD brains and how high-novelty, immediate feedback loops (like real-time narrative progression) specifically lower the 'activation energy' required for task initiation compared to traditional gamification (e.g., points or badges)." --> Validates why "Cinematic Rewards" work where an unstimulating productivity app fails. It proves that the ADHD brain needs a high-stimulus reward to overcome executive dysfunction.
"Research the concept of 'Cognitive Reframing' and 'Narrative Therapy' for neurodivergent individuals. How does mapping a mundane, anxiety-inducing environment (like a cluttered room) into a positive, high-stakes fantasy narrative affect the amygdala’s response to task paralysis?" --> Proves that the "Scouting Phase" (turning the room into a dragon’s lair) is actually a form of cognitive therapy that reduces the stress/fear associated with overwhelming chores.
"Investigate the overlap between Maladaptive Daydreaming and ADHD. Can 'Immersive Storytelling' be used as a 'harm reduction' strategy to redirect compulsive daydreaming back into reality-based productive tasks? Find case studies or theories on 'Integrated Daydreaming' as a motivation tool." --> Validates claim that we shouldn't "fight" the daydreaming brain, but "leverage" it as fuel.
"How does the psychological 'Zeigarnik Effect' (the tension caused by unfinished tasks) interact with the dopamine hit of 'Narrative Closure' in video games? Does a cinematic climax (like a Final Boss battle) help cement a sense of accomplishment in people with executive dysfunction?" --> Validates "Final Challenge" QTE phase. It shows that the interactive battle provides the "closure" the ADHD brain needs to feel like the work was actually worth it.
Building Phase
I used Google AI Studio to vibe code the app according to the Product Requirements Document generated previously by the Gemini app.
Gemini models and Google Cloud are leveraged to create a seamless, interleaved flow between reality and fiction.
Here is how the flow goes:
1. The Armory (Character Building)
Users engage in a conversational interview with a Gemini Live Agent to define their "Lore Profile." Using Gemini 3.1 Flash Image, the app generates a high-fidelity protagonist image based on the user's own photo, placing them directly into the center of their chosen genre (Cyberpunk, High Fantasy, K-Pop Stardom, etc.).
2. The Scout (Environment Mapping)
When it’s time to work, the user points their camera at their surroundings. A Scouting Agent uses Gemini Live’s vision capabilities to "see" the clutter or the homework and translates it into the chosen lore in real-time.
3. The Forge (Narrative Planning)
Behind the scenes, a Planner Agent, powered by Gemini 3.1 Pro splits the goal into "milestones." It instructs Gemini 3.1 Flash Image in an iterative loop to pre-generate a series of cinematic story assets (interleaved image and text) that maintain visual and narrative continuity. These rewards are hidden from the user until they are "earned."
4. The Guardian (Live Supervision)
The Supervision Agent monitors the user via the camera with Gemini Live model, providing real-time "body doubling" and encouragement in character. Once a milestone is successfully completed, the agent triggers a function call to "unlock" the next cinematic story beat. After the story scene plays out, the user is returned to the camera supervision mode, and the user will need to carry out the tasks to meet the next milestone before the next story scene is unlocked. Just think of how cinematic scenes are inserted throughout video games.
5. The Final Boss (Interactive Climax)
The quest ends with an epic showdown. We use Veo 3.1 Fast Generate to generate custom video cinematics of the user clashing with the "Final Tribulation" of their chore. Users must participate in a Quick Time Event (QTE)—like drawing arcane patterns—to determine if they win or lose the narrative battle. If they win, a video portraying the protagonist standing victorious is played, else if they lose, a video portraying the protagonist being defeated is played.
Challenges I ran into
1) Controlling Text Output in Interleaved Output
- It was difficult to control the interleaved output of Gemini 3.1 Flash Image model to only output the Story Text itself, and leave out the meta commentary on the description of the picture.
- For example, there will be some times when the text output goes like this: "You entered the magical vault and cast your spell.", and then it will be followed by such a meta-commentary: "The image is a high resolution 8K fantasy image of sorcerer in stylized realistic style, in portrait orientation..."
- Although the occurrence has reduced, it is still happening in about 1 out of 10 image generation.
- Currently, I'm still trying to refine the approach to enforce the model to strictly output only the story text
2) "Supervision Agent" Ambiguity
- Using the Live API (gemini-2.5-flash-native-audio) with a 1 FPS camera stream to verify real-world chores is incredibly difficult.
- AI sometimes still does not definitively know a chore is "done". If the user says they are "cleaning their room", standards could be subjective, and the AI might not pass the user even though the task is done.
- If the AI is too strict, the user gets frustrated and loses their dopamine reward. If it's too lenient, the gamification loses its meaning. Tuning the prompt to understand context, look for visual deltas (before vs. after), and ask clarifying questions without being annoying is a massive balancing act.
3) "Epic vs. Violent" Safety Line
- Most of the time, user fantasy will be an RPG-like experience where users fight "adversaries". However, AI safety filters are highly sensitive to violence. Prompting gemini 3.1 flash and Veo 3.1 to create an "epic finishing blow" without triggering safety blocks required invention of a very specific, sanitized "epic vocabulary" (e.g., using words like banish, purify, burst of light instead of destroy, kill, slash).
4) Multi-Agent Latency & The ADHD User
- Our target audience struggles with executive dysfunction and needs immediate dopamine feedback. However, our architecture relies on a Multi-Agent Prompt Chain.
- During the "Scouting" phase, we have to wait for the Live Agent to finish -> send context to the Mapping Agent -> generate the Story -> generate the Images + Text -> generate the TTS.
- This chain can take 10-20+ seconds. For an ADHD user, a 20-second loading spinner is a deal breaker for engagement.
5) Taming the Live API WebSocket State
- The gemini-2.5-flash-native-audio Live API is incredibly powerful, but managing raw WebSockets and PCM audio streams in a React SPA is complex.
- Handling user interruptions (barge-ins) is a challenge. If the Interview, Scouting or Supervision Agent is speaking and the user suddenly interrupts, we have to instantly flush the audio playback queue, send an interruption signal to the server, and seamlessly transition the UI state back to "listening." Dealing with network drops and keeping the React state perfectly synced with the WebSocket state required a very robust LiveSessionManager.
Accomplishments that I'm proud of
1) Solving a Real Psychological Hurdle (Tech for Good)
- We're taking the clinical concept of executive dysfunction (common in ADHD) and the psychological coping mechanism of maladaptive daydreaming, and engineered a dopamine-delivery system.
- We aren't just telling users "here is your to-do list." We are hacking their brain's reward center by turning "folding laundry" into "purifying the cursed garments," complete with cinematic narration and video rewards.
2) Deep, Zero-Friction Personalization
- The app feel deeply personal without forcing the user to fill out long, boring forms.
- By combining a simple selfie upload with a conversational voice interview, we extract the user's "Lore" invisibly in the background.
- Using that selfie as a structural reference for gemini-3.1-flash-image-preview ensures the epic fantasy avatar actually looks like the user, instantly increasing their emotional investment in the app.
3) Dynamic, Branching Video Generation (Veo 3.1)
- Using Veo 3.1 to generate videos as conditional game assets is next-level, in the space of vibe-coding where games are usually generated in html or three.js with simple graphics.
- By hooking veo-3.1-fast-generate-preview into a Quick Time Event (QTE) for the "Finale", we aren't just playing a static cutscene. We are dynamically generating "Win" and "Lose" states based on the user's actual gameplay performance. This treats generative video like a real-time game engine, which is a highly innovative use case.
Learnings
1) Real-Time Vision Requires "Delta" Prompting, Not Absolute Prompting
- When building the Supervision Agent to watch the user do chores via the rear camera, I found that asking the AI, "Is the room clean?" or "Is the laundry done?" yields highly inconsistent results because "clean" is subjective.
- AI vision works best when evaluating state changes (deltas) rather than absolute states. We learned to prompt the Live Vision agent to look for specific physical actions or disappearances (e.g., "Watch for the pile of clothes on the bed to disappear into the basket" or "Confirm the user's hands are actively wiping the counter").
2) Generative Video Can Act as a Dynamic Game Engine, as part of interactive storytelling
- Interactive stories can be generated where there can be branches of many different outcomes, each portrayed by different scenes in dynamically generated video.
- The anticipation of not knowing what to expect next, even after the story is restarted, is a new paradigm change.
3) WebSockets & Live Audio Require a New Frontend Paradigm
- Handling raw PCM audio streams, managing microphone sample rates (16kHz in, 24kHz out), and dealing with user interruptions ("barge-ins") requires a highly robust, event-driven architecture. We had to learn how to instantly flush audio playback queues the millisecond the user started speaking to make the AI feel truly conversational and alive.
What's next for Project Dreamie
1) Latency Optimization
- Currently, it takes quite a long time to generate the story image assets as well as the video assets for the QTE game. Optimization needs to be done to ensure top user experience.
2) Native Mobile App & Augmented Reality (AR)
- Instead of just generating 2D scene images, use ARKit/ARCore to literally overlay the "goblins" or "corruption" onto the user's messy desk through their phone camera.
3) Persistent Worlds & Long-Term Memory (RAG)
- Currently, quests are somewhat episodic. We want the user's fantasy world to feel alive, persistent, and reactive to their history.
- Integrate a Vector Database (like Pinecone or Firebase Vector Search) to give the Mapping Agent Long-Term Memory (RAG).
- The AI will remember past victories and failures. If the user puts off doing the dishes for a week, the AI will remember that the "Sink Swamp" has been festering and the enemies there have leveled up. Introduce an Inventory System and Skill Trees, where completing real-world chores drops virtual loot (swords, armor, potions) that can be used in future QTE challenges.
4) "Low-Spoon" / Low-Stimulation Modes
- Executive dysfunction fluctuates. Some days a user wants an epic, loud, high-stakes challenge. Other days, they are overstimulated and just need gentle encouragement.
- There could be a "Cozy Mode" or "Low-Spoon Mode."
- Instead of generating intense epic videos with Veo and high-energy Live Audio, the AI shifts its persona to a gentle companion (like a cozy tavern keeper or a peaceful forest spirit). The tasks become "gathering herbs" or "tending the hearth," accompanied by lo-fi music and soft, watercolor-style image generation.
Built With
- cloud-firestore
- docker
- firebase
- firestore-auth
- framer-motion
- gemini-2.5-flash-native-audio-preview-09-2025
- gemini-3.1-flash-image-preview
- gemini-3.1-pro-preview
- github
- google-cloud-build
- google-cloud-run
- google-genai-sdk
- google-oauth
- https
- lucide-react
- nosql
- react
- tailwind-css
- typescript
- veo-3.1-fast-generate-preview
- vite
Log in or sign up for Devpost to join the conversation.