Inspiration

I grew up with a “cinema of the mind”: reading a sentence instantly painted a vivid, textured world in my head. Many people—children with learning differences, language learners, and those with different cognitive styles—don’t have that internal projection. Text stays flat. Vivido began as a bridge from internal imagination to shared, multimodal experience: a tool that externalizes mental cinema as images, sound, and structured story state so everyone can access the soul of a story, regardless of reading level, language, or ability.

What it does

Vivido AI - Nova Edition is an autonomous Director Agent that turns spoken, visual, or typed prompts into synchronized, cinematic story pages. For each scene the app can:

  • Generate high-fidelity stills (Nova Canvas) with consistent visual continuity;
  • Produce multiple narration variants (Nova Sonic + “Sonic Brainstorm") to audition tone, pacing, and voice;
  • Create structured scene beats and translations (Nova Pro / Nova Lite) to support multilingual narration and captions;
  • Save locally and restore a complete Story State (text, images, audio, video, cast, and metadata) for collaborative and iterative creation.

Personalization features let creators upload reference images to establish a persistent CAST across pages, and configure style/voice presets for different audiences and languages.

How I built it

  • Models: Integrated Amazon Nova Canvas for text→image, Nova Sonic for TTS, Nova Reel for videos and Nova Pro/Lite/Micro for narrative structuring and reasoning. Inference runs through AWS Bedrock for managed, scalable calls with safety directives.
  • Architecture: Lightweight React client (App.tsx) with a State Engine that maintains per-page VisualStateSignature objects. We implemented a rolling-context approach so the agent preserves character and scene continuity without re-sending the entire book for every call.
  • Brainstorm: A Nova-driven subflow that generates several short, alternate narrations and descriptors per text/visual/audio prompt; the UI offers instant playback.
  • Optimizations: Prompt truncation, model-specific request formats, caching, and quality toggles reduce latency and token costs. LocalStorage-based model overrides allow runtime flexibility for regional availability.

Challenges

  • Narrative drift: Early iterations produced inconsistent character features across pages. I encoded a persistent Visual State (immutable attributes and constraints) and fed it as structured directives to generation calls to maintain continuity.
  • Model-specific formats & availability: Nova Canvas and Nova Sonic require different request/response schemas and may have region/access constraints. I added runtime model ID configuration and parsing for each model type.
  • Cost & latency: High-resolution assets are expensive and slow. I implemented image-size/quality toggles, caching, and context reuse for responsive classroom and workshop demos.
  • Safety & moderation: Multimodal outputs increase risk of harmful content. I enforce a Safety Directive in prompts and provide moderation guidance for pilot partners.

Accomplishments

  • Brainstorming: Creators can instantly brainstorm complete scenarios using images and multilingual voices and text.
  • Persistent CAST + Style Pivoting: Users can lock a character’s look across pages and pivot the entire project’s aesthetic (e.g., watercolor → cyberpunk) without breaking identity or continuity (still to be improved).
  • Practical demoability: The web demo (public CloudFront link) runs the core flows so community partners can use without complex setup.

What I learned

  • Context is king: To be an effective director, the agent needs a persistent memory of the story state, not just one-off image or audio calls.
  • Multimodal harmony matters: Users engage far more when visuals and narration match affectively—voice timbre, pacing, color palette, and composition should be coherent.
  • Engineering for variability: Model parameter differences, region availability, and output formats require apps to be flexible and configurable at runtime.

What's next for Vivido AI - Nova Edition

  • Interactive “What If” branching: Let readers change choices and watch the story re-branch with new visuals and narration.
  • AR previews: Surface images into the user’s physical space for immersive reading and guided storytelling sessions.
  • Educator integrations: Build a Universal Literacy Kit tailored to neurodivergent learners with scaffolded prompts, comprehension checks, and curriculum-aligned lesson plans.
  • Pilot: Expand pilot deployments with community champions and iterate on safety/moderation workflows.
  • Open starter kit: After the hackathon, publish a starter kit (one-hour workshop, presets, README) and move the repo to a public starter branch to seed community adoption.

Built With

Share this project:

Updates