NanoTutor

💡 Inspiration

Earlier this year, I tried to build something similarvideolearnai.com — a web-based version of NanoTutor. But it faced serious challenges: proxy requirements for downloading YouTube transcripts, LLM API credit management, and a poor user experience that forced people to leave YouTube entirely.

I’ve always wanted a tool that could turn passive video watching into active learning — making YouTube videos interactive, engaging, and better for knowledge retention.

When Gemini Nano became available locally in Chrome, that dream finally became possible: no servers, no API costs, and no data leaving the browser.

⚙️ What It Does

NanoTutor is a Chrome extension that transforms any YouTube video into an interactive AI tutor, powered entirely by Gemini Nano, Chrome’s built-in AI model.

It allows users to:

💬 Chat with videos — Ask questions about any part of the content.
📝 Generate contextual quizzes — (Prototype feature) Currently supports True or False quizzes to test comprehension.
🔍 Smart transcript search — Uses RAG (retrieval-augmented generation) to locate relevant info in long videos.
🔒 Full privacy — Everything runs locally. No data ever leaves your browser.

For long videos, NanoTutor uses a local RAG pipeline with WebGPU-powered embeddings (Transformers.js v3) to fetch only relevant transcript segments per query — ensuring fast, context-aware answers.

⚠️ Prototype Notice: The quiz system is still in an early prototype stage — limited to True or False questions. But it lays the groundwork for richer quiz formats like multiple choice, fill-in-the-blank, and spaced repetition.

🛠️ How I Built It

Tech Stack

Plasmo (Parcel) — Chrome extension framework
React — UI and state management
Gemini Nano — On-device AI inference via Chrome’s Prompt API
Transformers.js v3 — WebGPU-accelerated embeddings for RAG
DOM manipulation — Direct transcript extraction from YouTube’s native UI

Architecture

The extension interacts directly with YouTube’s DOM, automatically clicking the transcript button and parsing its text — no APIs or proxies required. The transcript is then chunked intelligently for RAG processing, and Gemini Nano responses are streamed directly into the UI for a smooth experience.

The quiz generator uses the same transcript context to create simple, focused True/False questions that reinforce learning — with plans to evolve into richer formats soon.

🚧 Challenges I Ran Into

1. Getting Transformers.js v3 to work with Plasmo (Parcel) The library relied on Node.js APIs unavailable in the browser. I had to write scripts to build configs and tweak module resolution to make it fully compatible inside a Chrome extension.

2. Implementing a streaming UI I wanted the real-time token streaming effect like in Vercel’s AI SDK. I’ve achieved object-level streaming (partial responses streaming as they’re completed), but true token-level streaming is still in progress. Even so, it’s already a huge improvement over waiting for full responses before rendering!

🏆 Accomplishments I’m Proud Of

✅ Fully local AI — Gemini Nano, RAG, and embeddings all run entirely on-device. ✅ Built in 12 days — Intense but highly productive sprint. ✅ Live quiz generation — Questions appear progressively, improving UX. ✅ Fast embeddings(using webgpu) — Smooth performance even on longer videos.

📚 What I Learned

I learned how to design and optimize fully local AI workflows — from on-device inference to RAG and streaming inside the browser. Also gained deeper insights into Transformers.js v3 and React streaming UIs for dynamic AI interactions.

🚀 What’s Next for NanoTutor

The current RAG implementation works but is still naive, and the quiz system is just a prototype — there’s lots of room to grow.

Near-Term Goals:

📊 Better quizzes — Add multiple choice, fill-in-the-blank, and spaced repetition. Improve UX and visual design.
🧠 Smarter RAG — Experiment with Graph RAG and Agentic RAG for deeper understanding.
⚡ Token-level streaming — True real-time text generation for smoother UX.

Future Ideas:

🖼️ Visual context — Use video preview frames for visual understanding.
🎙️ Voice interaction — Hands-free AI tutoring via speech-to-text.
⏱️ Automatic chapters — AI-generated timestamps and navigation.

Hybrid AI Mode:

🔄 Client + Cloud options — Choose between fully local AI or cloud models when needed.
🏆 Social learning — Optional online features like leaderboards and quiz challenges.

Built With

indexdb
mememo
plasmo
react
transformers
typescript

Updates

kevin-weitgenant weitgenant started this project — Nov 01, 2025 02:06 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.