Inspiration
I grew up with a “cinema of the mind”: reading a sentence instantly painted a vivid, textured world in my head. Many people—children with learning differences, language learners, and those with different cognitive styles—don’t have that internal projection. Text stays flat. Vivido began as a bridge from internal imagination to shared, multimodal experience: a tool that externalizes mental cinema as images, sound, and structured story state so everyone can access the soul of a story, regardless of reading level, language, or ability.
What it does
Vivido AI - Nova Edition is an autonomous Director Agent that turns spoken, visual, or typed prompts into synchronized, cinematic story pages. For each scene the app can:
- Generate high-fidelity stills (Nova Canvas) with consistent visual continuity;
- Produce multiple narration variants (Nova Sonic + “Sonic Brainstorm") to audition tone, pacing, and voice;
- Create structured scene beats and translations (Nova Pro / Nova Lite) to support multilingual narration and captions;
- Save locally and restore a complete Story State (text, images, audio, video, cast, and metadata) for collaborative and iterative creation.
Personalization features let creators upload reference images to establish a persistent CAST across pages, and configure style/voice presets for different audiences and languages.
How I built it
- Models: Integrated Amazon Nova Canvas for text→image, Nova Sonic for TTS, Nova Reel for videos and Nova Pro/Lite/Micro for narrative structuring and reasoning. Inference runs through AWS Bedrock for managed, scalable calls with safety directives.
- Architecture: Lightweight React client (
App.tsx) with a State Engine that maintains per-pageVisualStateSignatureobjects. We implemented a rolling-context approach so the agent preserves character and scene continuity without re-sending the entire book for every call. - Brainstorm: A Nova-driven subflow that generates several short, alternate narrations and descriptors per text/visual/audio prompt; the UI offers instant playback.
- Optimizations: Prompt truncation, model-specific request formats, caching, and quality toggles reduce latency and token costs. LocalStorage-based model overrides allow runtime flexibility for regional availability.
Challenges
- Narrative drift: Early iterations produced inconsistent character features across pages. I encoded a persistent Visual State (immutable attributes and constraints) and fed it as structured directives to generation calls to maintain continuity.
- Model-specific formats & availability: Nova Canvas and Nova Sonic require different request/response schemas and may have region/access constraints. I added runtime model ID configuration and parsing for each model type.
- Cost & latency: High-resolution assets are expensive and slow. I implemented image-size/quality toggles, caching, and context reuse for responsive classroom and workshop demos.
- Safety & moderation: Multimodal outputs increase risk of harmful content. I enforce a Safety Directive in prompts and provide moderation guidance for pilot partners.
Accomplishments
- Brainstorming: Creators can instantly brainstorm complete scenarios using images and multilingual voices and text.
- Persistent CAST + Style Pivoting: Users can lock a character’s look across pages and pivot the entire project’s aesthetic (e.g., watercolor → cyberpunk) without breaking identity or continuity (still to be improved).
- Practical demoability: The web demo (public CloudFront link) runs the core flows so community partners can use without complex setup.
What I learned
- Context is king: To be an effective director, the agent needs a persistent memory of the story state, not just one-off image or audio calls.
- Multimodal harmony matters: Users engage far more when visuals and narration match affectively—voice timbre, pacing, color palette, and composition should be coherent.
- Engineering for variability: Model parameter differences, region availability, and output formats require apps to be flexible and configurable at runtime.
What's next for Vivido AI - Nova Edition
- Interactive “What If” branching: Let readers change choices and watch the story re-branch with new visuals and narration.
- AR previews: Surface images into the user’s physical space for immersive reading and guided storytelling sessions.
- Educator integrations: Build a Universal Literacy Kit tailored to neurodivergent learners with scaffolded prompts, comprehension checks, and curriculum-aligned lesson plans.
- Pilot: Expand pilot deployments with community champions and iterate on safety/moderation workflows.
- Open starter kit: After the hackathon, publish a starter kit (one-hour workshop, presets, README) and move the repo to a public starter branch to seed community adoption.
Built With
- amazon
- amazon-bedrock-amazon-nova-(micro
- amazon-cloudfront-cdn
- amazon-web-services
- canvas
- cloudfront
- css
- html
- lite
- node.js
- nova-reel
- npm
- polly
- pro
- react
- rekognition
- s3
- sonic
- tailwind
- typescript
- vite


Log in or sign up for Devpost to join the conversation.