Vivido AI - Nova Edition

Vivido AI - Nova Edition
A complete story with videos, images and multilingual text and audio
A recorded scene
CAST 1
CAST 2
CAST 3
CAST 4
CAST Dinner
Brainstorming
Imagination
Nova Reel
Imagination
Imagination
AWS Nova Configuration
Brainstorming
Brainstorming
Multilingual

Inspiration

I grew up with a “cinema of the mind”: reading a sentence instantly painted a vivid, textured world in my head. Many people—children with learning differences, language learners, and those with different cognitive styles—don’t have that internal projection. Text stays flat. Vivido began as a bridge from internal imagination to shared, multimodal experience: a tool that externalizes mental cinema as images, sound, and structured story state so everyone can access the soul of a story, regardless of reading level, language, or ability.

What it does

Vivido AI - Nova Edition is an autonomous Director Agent that turns spoken, visual, or typed prompts into synchronized, cinematic story pages. For each scene the app can:

Generate high-fidelity stills (Nova Canvas) with consistent visual continuity;
Produce multiple narration variants (Nova Sonic + “Sonic Brainstorm") to audition tone, pacing, and voice;
Create structured scene beats and translations (Nova Pro / Nova Lite) to support multilingual narration and captions;
Save locally and restore a complete Story State (text, images, audio, video, cast, and metadata) for collaborative and iterative creation.

Personalization features let creators upload reference images to establish a persistent CAST across pages, and configure style/voice presets for different audiences and languages.

How I built it

Models: Integrated Amazon Nova Canvas for text→image, Nova Sonic for TTS, Nova Reel for videos and Nova Pro/Lite/Micro for narrative structuring and reasoning. Inference runs through AWS Bedrock for managed, scalable calls with safety directives.
Architecture: Lightweight React client (App.tsx) with a State Engine that maintains per-page VisualStateSignature objects. We implemented a rolling-context approach so the agent preserves character and scene continuity without re-sending the entire book for every call.
Brainstorm: A Nova-driven subflow that generates several short, alternate narrations and descriptors per text/visual/audio prompt; the UI offers instant playback.
Optimizations: Prompt truncation, model-specific request formats, caching, and quality toggles reduce latency and token costs. LocalStorage-based model overrides allow runtime flexibility for regional availability.

Challenges

Narrative drift: Early iterations produced inconsistent character features across pages. I encoded a persistent Visual State (immutable attributes and constraints) and fed it as structured directives to generation calls to maintain continuity.
Model-specific formats & availability: Nova Canvas and Nova Sonic require different request/response schemas and may have region/access constraints. I added runtime model ID configuration and parsing for each model type.
Cost & latency: High-resolution assets are expensive and slow. I implemented image-size/quality toggles, caching, and context reuse for responsive classroom and workshop demos.
Safety & moderation: Multimodal outputs increase risk of harmful content. I enforce a Safety Directive in prompts and provide moderation guidance for pilot partners.

Accomplishments

Brainstorming: Creators can instantly brainstorm complete scenarios using images and multilingual voices and text.
Persistent CAST + Style Pivoting: Users can lock a character’s look across pages and pivot the entire project’s aesthetic (e.g., watercolor → cyberpunk) without breaking identity or continuity (still to be improved).
Practical demoability: The web demo (public CloudFront link) runs the core flows so community partners can use without complex setup.

What I learned

Context is king: To be an effective director, the agent needs a persistent memory of the story state, not just one-off image or audio calls.
Multimodal harmony matters: Users engage far more when visuals and narration match affectively—voice timbre, pacing, color palette, and composition should be coherent.
Engineering for variability: Model parameter differences, region availability, and output formats require apps to be flexible and configurable at runtime.

What's next for Vivido AI - Nova Edition

Interactive “What If” branching: Let readers change choices and watch the story re-branch with new visuals and narration.
AR previews: Surface images into the user’s physical space for immersive reading and guided storytelling sessions.
Educator integrations: Build a Universal Literacy Kit tailored to neurodivergent learners with scaffolded prompts, comprehension checks, and curriculum-aligned lesson plans.
Pilot: Expand pilot deployments with community champions and iterate on safety/moderation workflows.
Open starter kit: After the hackathon, publish a starter kit (one-hour workshop, presets, README) and move the repo to a public starter branch to seed community adoption.

Built With

amazon
amazon-bedrock-amazon-nova-(micro
amazon-cloudfront-cdn
amazon-web-services
canvas
cloudfront
css
html
lite
node.js
nova-reel
npm
polly
pro
react
rekognition
s3
sonic
tailwind
typescript
vite

Updates

Dr. AbdElrahman Shabayek started this project — Mar 15, 2026 10:36 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.