๐ก Inspiration
Earlier this year, I tried to build something similarvideolearnai.com โ a web-based version of NanoTutor. But it faced serious challenges: proxy requirements for downloading YouTube transcripts, LLM API credit management, and a poor user experience that forced people to leave YouTube entirely.
Iโve always wanted a tool that could turn passive video watching into active learning โ making YouTube videos interactive, engaging, and better for knowledge retention.
When Gemini Nano became available locally in Chrome, that dream finally became possible: no servers, no API costs, and no data leaving the browser.
โ๏ธ What It Does
NanoTutor is a Chrome extension that transforms any YouTube video into an interactive AI tutor, powered entirely by Gemini Nano, Chromeโs built-in AI model.
It allows users to:
- ๐ฌ Chat with videos โ Ask questions about any part of the content.
- ๐ Generate contextual quizzes โ (Prototype feature) Currently supports True or False quizzes to test comprehension.
- ๐ Smart transcript search โ Uses RAG (retrieval-augmented generation) to locate relevant info in long videos.
- ๐ Full privacy โ Everything runs locally. No data ever leaves your browser.
For long videos, NanoTutor uses a local RAG pipeline with WebGPU-powered embeddings (Transformers.js v3) to fetch only relevant transcript segments per query โ ensuring fast, context-aware answers.
โ ๏ธ Prototype Notice: The quiz system is still in an early prototype stage โ limited to True or False questions. But it lays the groundwork for richer quiz formats like multiple choice, fill-in-the-blank, and spaced repetition.
๐ ๏ธ How I Built It
Tech Stack
- Plasmo (Parcel) โ Chrome extension framework
- React โ UI and state management
- Gemini Nano โ On-device AI inference via Chromeโs Prompt API
- Transformers.js v3 โ WebGPU-accelerated embeddings for RAG
- DOM manipulation โ Direct transcript extraction from YouTubeโs native UI
Architecture
The extension interacts directly with YouTubeโs DOM, automatically clicking the transcript button and parsing its text โ no APIs or proxies required. The transcript is then chunked intelligently for RAG processing, and Gemini Nano responses are streamed directly into the UI for a smooth experience.
The quiz generator uses the same transcript context to create simple, focused True/False questions that reinforce learning โ with plans to evolve into richer formats soon.
๐ง Challenges I Ran Into
1. Getting Transformers.js v3 to work with Plasmo (Parcel) The library relied on Node.js APIs unavailable in the browser. I had to write scripts to build configs and tweak module resolution to make it fully compatible inside a Chrome extension.
2. Implementing a streaming UI I wanted the real-time token streaming effect like in Vercelโs AI SDK. Iโve achieved object-level streaming (partial responses streaming as theyโre completed), but true token-level streaming is still in progress. Even so, itโs already a huge improvement over waiting for full responses before rendering!
๐ Accomplishments Iโm Proud Of
โ Fully local AI โ Gemini Nano, RAG, and embeddings all run entirely on-device. โ Built in 12 days โ Intense but highly productive sprint. โ Live quiz generation โ Questions appear progressively, improving UX. โ Fast embeddings(using webgpu) โ Smooth performance even on longer videos.
๐ What I Learned
I learned how to design and optimize fully local AI workflows โ from on-device inference to RAG and streaming inside the browser. Also gained deeper insights into Transformers.js v3 and React streaming UIs for dynamic AI interactions.
๐ Whatโs Next for NanoTutor
The current RAG implementation works but is still naive, and the quiz system is just a prototype โ thereโs lots of room to grow.
Near-Term Goals:
- ๐ Better quizzes โ Add multiple choice, fill-in-the-blank, and spaced repetition. Improve UX and visual design.
- ๐ง Smarter RAG โ Experiment with Graph RAG and Agentic RAG for deeper understanding.
- โก Token-level streaming โ True real-time text generation for smoother UX.
Future Ideas:
- ๐ผ๏ธ Visual context โ Use video preview frames for visual understanding.
- ๐๏ธ Voice interaction โ Hands-free AI tutoring via speech-to-text.
- โฑ๏ธ Automatic chapters โ AI-generated timestamps and navigation.
Hybrid AI Mode:
- ๐ Client + Cloud options โ Choose between fully local AI or cloud models when needed.
- ๐ Social learning โ Optional online features like leaderboards and quiz challenges.
Built With
- indexdb
- mememo
- plasmo
- react
- transformers
- typescript
Log in or sign up for Devpost to join the conversation.