Inspiration

I was in a YouTube rabbit hole trying to study, after a few videos, I realized I was getting distracted by ads or suggested videos on the side. YouTube is the world's largest learning library, and for many like me, it's the best university to learn from, but it's built for consumption, not comprehension. You watch, you get recommended another, add to queue, you forget. There's no active layer, no way to take structured notes, get tested on what you just watched, or ask questions about the content without leaving the tab and breaking your focus entirely. I built Bloc because I needed it myself. I wanted something that would sit right inside the video-watching experience and make it feel less like entertainment and more like a real study session.

What it does

Bloc transforms any YouTube video or playlist into a focused, interactive learning session. You paste the link and get a player paired with an intelligent sidebar that includes:

  • Contextual AI Chat: ask questions about the video at any timestamp, powered by Gemini Flash
  • Auto-generated Topics: the AI analyzes the transcript to identify the key chapters and learning themes in the video
  • Study Notes: a rich text editor with Markdown and LaTeX support so you can write math and structured notes without leaving
  • Sanity/Concept Checks: random AI-generated concept checks that pop up mid-video to confirm you're still focused and retaining
  • End-of-video Quiz: a 5-question multiple-choice assessment generated from the full transcript to validate understanding
  • Session Management: save, name, and organize different learning sessions across videos
  • Multi-language support: chat, topics, and quizzes work in 10 languages
  • Library: bookmark any video and return to it later, keeping your learning organized in one place
  • Playlist Sharing: share your playlists publicly so other users can clone and follow the videos within them

How we built it

Bloc was built in different layers, with the goal of making requests less time consuming

Frontend: React 18 + Vite + TypeScript, styled with Tailwind CSS. Routing via React Router 6, auth state via Context API. The notes editor is built on Lexical with custom extensions for rich text and LaTeX math rendering. The UI is dark, minimal and also supports light mode.

Backend: Node.js + Express + TypeScript, deployed on Vercel. It serves as the bridge between the frontend, Supabase, and the Gemini AI engine.

  • Caching: The server implements a language-aware caching layer — transcripts, video metadata, and topics are all cached in Supabase keyed by language, so Gemini is never hit twice for the same content in the same language.
  • Transcript fetching: YouTube transcripts are fetched server-side via Supadata API to avoid datacenter IP blocks that YouTube enforces on platforms like Vercel.

AI: Google Gemini Flash handles all intelligence, contextual chat with full conversation history and timestamp context, topic extraction from transcripts, sanity/concept check question generation, and comprehensive quiz generation.

Database & Auth: Supabase (PostgreSQL) for user sessions, caching, and profiles. Auth uses Google OAuth.

Made the interface responsive, taking YouTube's UI as inspiration, with the player pinned at the top and buttons below it for switching views, alongside a keyboard-aware layout that collapses the player when you're typing.

Challenges we ran into

The biggest problem was YouTube blocking transcript requests from Vercel's datacenter IPs. The entire intelligence layer depends on the transcript and without it, there's no AI context, no topics, no quiz. After trying multiple approaches, Supadata API turned out to be the cleanest solution that worked reliably in a serverless environment.

Being built as a desktop-first application in terms of UI design made it a lot more tasking to make it responsive, had to figure out how to fit the video player, full sidebar with multiple tabs, and a rich text editor into a mobile viewport without it feeling cramped.

Accomplishments that we're proud of

The caching structure and multi-language handling is something I'm definitely proud of, when transcripts are gotten, they're fetched once and cached per language, the AI is never called twice for the same content in the same language.

The end-to-end coherence also matters here. The AI not being just a wrapper but with the same session context as the notes, the queue, and the player timestamp. When you ask the AI something, it knows where you are in the video. That integration took a while to figure out correctly.

Being able to use Bloc in my own time for videos ranging from studying to just video essays, I've found that I retain more from the vids, and I don't often drift as I do with YouTube, so that's definitely a win.

What we learned

Building the right flow and infrastructure around AI is harder than getting the AI itself to work. Getting transcripts reliably, caching efficiently, and keeping it all functioning across the feature set was the real engineering work.

What's next for Bloc

Exports: One-click "Export to Notion" and PDF generation for study notes.

Voice Interactivity: Allowing users to "talk" to the AI tutor during hands-on learning (like coding or cooking).

Built With

Share this project:

Updates