PodLearn

Inspiration

We've always struggled with dense textbook chapters. You read 20 pages, close the book, and realize you retained almost nothing. One night we thought , what if we could just paste notes and have an AI turn them into a podcast we could listen to while walking to class?

That idea became PodLearn, a platform where you paste any study material and it transforms into a complete learning experience: a dual-voice podcast, a visual summary, an interactive quiz, and an AI tutor you can chat with. Not just one output. The whole loop.

What It Does

PodLearn takes raw text (lecture notes, textbook chapters, research papers) and generates:

A two-voice AI podcast = a host asks questions, an expert explains. Real voices via ElevenLabs, stitched with crossfade transitions using the Web Audio API.
A synced visual lesson = an AI-generated diagram appears alongside a guided lesson card that auto-syncs to whichever concept is being discussed in the podcast.
An interactive quiz = 3–5 multiple choice questions with instant feedback and explanations.
An AI tutor chat = ask follow-up questions and the AI responds as a subject-matter expert grounded in the lesson topic.

Everything is connected. Click a transcript line and the audio jumps there. The lesson card tracks what the speakers are saying. The quiz tests what you just heard. The chat lets you go deeper.

How we Built It

Stack: Next.js 16 (App Router), Tailwind CSS, Vercel (deployment)

AI Layer

Google Gemini 2.5 Flash , generates the podcast script, summary, quiz, image prompt, and concept annotations in a single streaming call. We designed a structured Zod schema with conceptMapping so each script turn is deterministically mapped to key concepts — no fuzzy matching needed.
ElevenLabs , text-to-speech with two distinct voices (host + expert). We batch requests in groups of 4 to stay under the concurrent request limit, then stitch the audio segments client-side using the Web Audio API with 50ms crossfades.
Runware , generates educational diagrams from AI-crafted prompts. We learned the hard way that AI image generators hallucinate text, so the prompt explicitly says "no text, no labels" and the frontend overlays labels instead.

Challenges we Faced

The Concept Sync Problem

The hardest part wasn't generating content , it was synchronizing it. When the podcast says "prophase," the lesson card needs to show Prophase. Our first approach used fuzzy text matching. Then annotation-based matching. Then look-ahead matching. After 6 iterations, we realized the solution was upstream: have Gemini map each script turn to concept indices at generation time. The client just reads the mapping. Zero ambiguity.

ElevenLabs Concurrency Limits

Our first version fired all 8 audio requests in parallel and got rate-limited. We had to learn about batching and sequential-then-parallel strategies to stay under the 5-concurrent-request cap.

iOS Audio Silence

A friend tested on iPhone and heard nothing. Turns out iOS Safari suspends AudioContext until an explicit ctx.resume() on user gesture. One line fix, but it took debugging across devices to find.

AI Image Text Hallucination

Runware kept generating diagrams with misspelled labels baked into the image ("H Heavy Helnd"). The fix: tell the AI "absolutely no text in the image" and overlay clean labels from the frontend.

What we Learned

Structured output schemas are everything. When you control the shape of AI output with Zod schemas, you eliminate an entire class of client-side parsing bugs.
The best AI features feel invisible. Users shouldn't know there are 4 API calls happening — they should just press play and the whole UI comes alive.
Design matters at hackathons. Judges see 50 projects. The one that looks like a product gets remembered.

What's Next

MP3 download and shareable podcast links
Spaced repetition , resurface quiz questions you got wrong
Multi-modal input , upload PDFs, images of handwritten notes,
Collaborative study rooms where friends can listen to the same podcast together,