Recap Feature
Buddy Chat Companion(Discuss Feature)
Animate Feature

🌟 NovaLearn — AI Study Companion

💡 Inspiration

We've all been there — watching a 45-minute YouTube lecture, nodding along, feeling productive... then remembering almost nothing the next day.

YouTube is the world's largest classroom. But it was never designed for learning — it was designed for watching.

There's no recap. No way to ask questions. No structured takeaway. Just autoplay rolling into the next video.

We built NovaLearn because passive watching isn't learning — and every student deserves better than that.

🚀 What It Does

NovaLearn is a Chrome Extension that sits alongside any YouTube video and transforms it into a fully interactive learning session — powered by Amazon Nova.

Three core features:

📄 Smart Recap

Paste the video. Get back a structured breakdown in seconds — TL;DR bullets, key concepts, timestamped chapters, and actionable takeaways. Click any timestamp to jump directly to that moment in the video. Export the whole thing to Markdown.

🎙️ Voice Buddy

Have an actual conversation about what you just watched. NovaLearn uses Amazon Nova Sonic for AI responses and OpenAI Whisper for speech transcription — so you can speak naturally, ask follow-ups, and get casual buddy-style answers grounded in the video's content. No stiff chatbot energy.

🎬 Video Animation

Turn any video's key ideas into a whiteboard-style animated explainer — fully generated by AI. A two-agent pipeline writes the script, Amazon Nova Reel renders 4 parallel scene animations, and FFmpeg stitches them into a final shareable video. From YouTube to explainer video in ~4 minutes.

🏗️ How We Built It

NovaLearn is a full-stack AI application spanning a Chrome Extension frontend and a Python backend — all orchestrated around the Amazon Nova model suite.

Frontend — Chrome Extension (MV3)

Built with React 18 + TypeScript inside a Manifest V3 Chrome Extension
A content script injects a sidebar directly into YouTube pages
A service worker handles all AI orchestration and message routing
Tailwind CSS + ShadCN for a clean, responsive 380px sidebar UI
Framer Motion for smooth tab transitions

Backend — FastAPI

Handles the heavy lifting: video generation pipeline, FFmpeg stitching, S3 uploads
Exposes clean REST endpoints consumed by both the extension and direct API clients

AI Stack — Amazon Nova at the Core | Task | Model | |---|---| | Recap generation | Amazon Nova Lite | | Script writing | Amazon Nova Pro | | Voice conversations | Amazon Nova Sonic | | Animated video scenes | Amazon Nova Reel × 4 (parallel) | | Speech transcription | OpenAI Whisper |

Video Generation Pipeline

YouTube Transcript
      ↓
Nova Pro — Director Agent (outline)
      ↓
Nova Pro × 4 — Parallel Scene Scripts
      ↓
Nova Reel × 4 — Parallel Whiteboard Animations
      ↓
FFmpeg — Scene Stitching
      ↓
AWS S3 — Final Video Storage

Infrastructure

AWS Bedrock for all Nova model access (SigV4 auth)
AWS S3 for video scene storage and final output
Chrome Storage API for 7-day transcript + recap caching
Supabase (optional) for persistence

🧗 Challenges We Ran Into

1. Chrome Extension Architecture Complexity MV3 service workers are ephemeral — they spin down unexpectedly. We had to rethink our state management entirely, moving from chrome.storage.session to chrome.storage.local and building a robust message-passing bridge between the content script, sidebar, and service worker.

2. Nova Reel API Constraints Nova Reel has strict prompt length limits and specific input formatting requirements that aren't immediately obvious. Getting 4 parallel Reel jobs to complete reliably — and in the right format for FFmpeg — took significant iteration on both the prompt structure and the S3 URI handling.

3. Real-Time Voice Pipeline Latency Building a voice experience that feels natural meant fighting latency at every layer — Whisper transcription, Nova Sonic inference, and audio playback. We ended up tracking latency per message and exposing it in the UI so users could see exactly where time was being spent.

4. Dual-Mode Operation Supporting both frontend-only (direct API keys) and backend-enhanced (FastAPI) modes in the same codebase without making either feel like a second-class citizen required careful abstraction in our service layer.

🏆 Accomplishments That We're Proud Of

End-to-end video generation in ~4 minutes — from a YouTube URL to a shareable whiteboard explainer video, fully automated with zero human input after the click
A voice experience that actually feels conversational — Nova Sonic responses stay grounded in the video's content while maintaining a casual, buddy-like tone rather than sounding like a formal assistant
Sub-5-second recaps — Nova Lite generates structured, timestamped, markdown-exportable recaps faster than most people finish reading the title
A seamlessly injected sidebar that doesn't break YouTube's layout, keyboard shortcuts, or fullscreen behavior — harder than it sounds
A complete multi-provider fallback chain — Nova → OpenAI → Anthropic — so the extension degrades gracefully regardless of which keys a user has configured

📚 What We Learned

Amazon Nova is remarkably capable across modalities. Using Lite, Pro, Reel, and Sonic within a single cohesive product showed us just how much you can build when text, voice, and video generation share the same underlying platform. The consistency in response quality across models made orchestration far smoother than working across different vendors.

Chrome Extensions are deceptively complex. The three-context architecture (content script, service worker, sidebar) means every feature has to be designed around message passing from day one. There's no shortcut — and retrofitting it later is painful.

Latency is a UX feature, not just a metric. Exposing real-time generation timing in the UI transformed how users perceived wait times. A 4-minute video generation feels completely acceptable when you can see every stage progressing. Hiding it made the same wait feel broken.

Prompt engineering for video is its own discipline. Nova Reel requires a fundamentally different prompting mindset than text models. Thinking in scenes, pacing, and visual metaphors — rather than information density — was a genuine creative shift.

🔮 What's Next for NovaLearn

NovaLearn is just getting started. Here's what's on the roadmap:

📝 Quiz Generation Auto-generate multiple choice and short answer questions from any video — with difficulty levels and instant feedback.

🗒️ Timestamp-Anchored Notes Click anywhere in a recap to drop a personal note anchored to that exact video moment. Synced across devices via Chrome Sync storage.

📦 Anki Export One-click export of key concepts and Q&A pairs directly into Anki-compatible flashcard decks for spaced repetition study.

🎯 Playlist Context Maintain learning continuity across an entire YouTube playlist — so your AI buddy remembers what you covered in the last video and builds on it.

📴 Offline Mode IndexedDB caching for recaps, transcripts, and generated videos so learning doesn't stop when the connection does.

🌍 Multi-Language Support Whisper already handles 99 languages. We want NovaLearn to generate recaps and respond in the user's native language automatically.

Built with ❤️ for the AWS Bedrock Hackathon · 2026 Powered by Amazon Nova — the future of multimodal AI learning