π StudyO β Your tabs, turned into a study session.
π‘ Inspiration
The average student keeps 30+ tabs open while studying. Notes get lost. Lectures don't get reviewed. PDFs sit unread. YouTube videos get paused and never resumed.
We've lived this. Every study session starts with good intentions β a YouTube lecture here, a PDF there, a few Wikipedia tabs β and ends with 47 open tabs, a half-filled Notion page, and the vague feeling that nothing stuck.
The problem isn't access to information. It's the gap between consuming content and actually learning it.
Existing tools like Notion, Anki, and Quizlet all require you to do the heavy lifting β manually creating flashcards, typing summaries, building quizzes. That friction kills consistency. And none of them know how you learn best.
We built StudyO to be the missing layer between your browser and your brain β an AI-native learning OS that ingests everything you already read, watch, and record, and turns it into the exact study artifact your brain wants, in the style it learns best.
β¨ What It Does
StudyO is a personalized AI learning workspace built around three core loops:
π Loop 1 β Ingest Anything
| Source | How |
|---|---|
| π PDF | Upload β auto-thumbnail β full text extraction |
| π¬ YouTube | Paste URL β transcript + metadata pulled automatically |
| π Web URL | Paste link β Cheerio scrapes structured content |
| ποΈ Audio lecture | Upload β waveform visual + HLS playback |
| ποΈ Open tabs | Chrome extension bulk-ingests your active study tabs in one click |
π Loop 2 β Generate What You Need
From any combination of sources, StudyO generates:
- π Flashcards β JSON-mode, schema-validated, ready to review
- β Quizzes β multi-choice with explanations, deep-thinking mode
- πΊοΈ Concept Maps β D3-style node/edge graphs of ideas and relationships
- π Summaries β TL;DR + key points + study questions
- π¨ Canvas β interactive HTML lessons, style-aware
- π¬ AI Explainer Videos β narrated, captioned, multi-format, delivered in ~60 seconds
π Loop 3 β Study Your Way
A 60-second VARK onboarding quiz (Visual / Auditory / Reading / Kinesthetic) runs once at sign-up and adapts every AI surface in the app:
| Style | Chat | Generated Content |
|---|---|---|
| ποΈ Visual | ASCII diagrams, suggests Concept Map | Charts, mind-maps, color-coded sections |
| π§ Auditory | Conversational lecturer voice | Narration player with speed slider |
| π Reading/Writing | Structured prose, headings | Embedded notes, definition lists |
| β Kinesthetic | "Try this" framing | Drag interactions, inline quizzes |
Plus a Vapi voice tutor grounded in your actual sources β quizzes you, explains concepts, runs spaced repetition sessions, all by voice.
π οΈ How We Built It
Architecture Overview
Browser / Chrome Extension
β
βΌ
Next.js 16 App Router βββ Clerk Auth (v7)
β
ββββββ΄βββββ
β API βββββ Source Ingest (PDF, YouTube, URL, Audio)
β Routes βββββ AI Generators (Flashcards, Quiz, Summary, Concept Map)
β βββββ Streaming Chat (SSE + VARK injection)
ββββββ¬βββββ
β
ββββββ΄βββββββββββββββ
β MongoDB Atlas β (JSON-schema validated, compound indexes)
β Cloudinary v2 β (upload, transform, stream, stitch)
β Google Gemma 4 β (primary: 26B, thinking: "high")
β ElevenLabs β (TTS narration)
β Vapi + Deepgram β (voice agent)
βββββββββββββββββββββ
π¬ The AI Video Pipeline (our crown jewel)
No FFmpeg server. No separate media infra. Pure Cloudinary transformation chain:
Gemini script β Scene images (Gemini Image / DALLΒ·E) β ElevenLabs narration
β β β
βββββββββββ All uploaded to Cloudinary with tagged public_ids β
β
fl_layer_apply chain stitches β master MP4
β
ONE explicit() call β 6 eager renditions in a single API call:
πΊ HLS m3u8 (adaptive bitrate)
π± 9:16 portrait (Shorts / Reels / TikTok)
π¦ 1:1 square (LinkedIn / Twitter)
ποΈ 3-second animated GIF preview
π¬ q_auto:good + f_auto MP4
πΌοΈ Smart poster (g_auto saliency)
What Cloudinary replaced for us
| Without Cloudinary | With Cloudinary |
|---|---|
| AWS S3 + signed URLs | One SDK call |
| FFmpeg server | fl_layer_apply transformation chain |
| MUX / Bento for HLS | streaming_profile: "hd" |
| Whisper for captions | Built from narration text + timings |
| Sharp for PDF thumbnails | pg_1 URL parameter |
| Custom waveform generator | fl_waveform URL parameter |
| Cloudflare CDN | Built in with q_auto + f_auto |
Tech Stack
Frontend Next.js 16 Β· TypeScript strict Β· Zustand Β· Tailwind v3 Β· Framer Motion
Auth & Data Clerk v7 Β· MongoDB Atlas
AI Gemma 4 26B (primary) Β· Gemini 2.5 Flash (fallback) Β· GPT-4o-mini (tertiary)
Media Cloudinary v2 Β· ElevenLabs Β· Vapi + Deepgram
Ingestion pdf-parse Β· youtube-transcript Β· cheerio
Extension Chrome MV3
π§ Challenges We Ran Into
1. The Video Pipeline Was Brutally Hard to Get Right
Stitching multi-scene videos entirely through Cloudinary's transformation layer β without any server-side FFmpeg β required deeply understanding layer composition, timing, and eager transformation sequencing. Getting captions synced to narration timing, then encoding that into WebVTT on the fly, was a weekend of pain we don't wish on anyone.
2. VARK Injection Without Prompt Bloat
Adapting every AI call to a learning style sounds simple until you're streaming SSE responses and every extra token in the system prompt costs latency. We had to carefully balance the size of the LEARNER PROFILE addendum against response quality β and write style-specific prompt templates that actually changed output behavior meaningfully, not cosmetically.
3. Cross-Session Master Agent Context
Building a workspace-wide agent that has coherent context across dozens of sessions and hundreds of sources β without hallucinating or losing track β required careful MongoDB schema design and a tiered context injection strategy. Too little context and it's useless; too much and it hits token limits and slows to a crawl.
4. Chrome Extension CORS + Auth
The Clerk cookie-based auth doesn't travel cleanly into a Manifest V3 extension service worker. We had to build a custom /api/status polling endpoint and handle the auth handoff carefully so the extension could authenticate without requiring the user to re-login from the popup.
5. Streaming + Schema Validation Together
Running thinking: "high" on Gemma 4 while simultaneously streaming and schema-validating JSON output meant we couldn't just parse at the end β we had to implement a streaming JSON parser that validated structure incrementally and surfaced partial results to the UI in real time.
π Accomplishments That We're Proud Of
- π¬ A fully working AI video pipeline β from text prompt to narrated, captioned, multi-format explainer video in ~60 seconds, with zero FFmpeg and zero dedicated media server
- π§ True learning-style personalization β not a skin, but behavioral adaptation at the model prompt level that genuinely changes how the AI explains, structures, and formats content
- π¦ 7-in-1 media infrastructure β Cloudinary replaces S3, CDN, FFmpeg, MUX, Whisper, Sharp, and a waveform generator in a single SDK
- π£οΈ A voice tutor that knows your sources β Vapi agent dynamically configured with the user's actual ingested content as system context
- β
Zero TypeScript errors, clean build β
npx tsc --noEmitandnpx next buildpass clean across 32 files
π What We Learned
- Cloudinary is vastly underused as an AI media orchestration layer. Most teams treat it as a CDN. We used it as a video stitching engine, a streaming pipeline, and an asset tagging system β all in one.
- Learning personalization needs to go deeper than the UI. Changing font size or colors for visual learners doesn't do anything. The adaptation has to live in the prompt layer and change how the model actually structures its output.
- AI video is the killer feature nobody's built well yet. Every student has sat through a lecture they didn't understand and wished someone would just explain it differently. On-demand, source-grounded explainer videos are genuinely compelling β and technically much harder than they look.
- Thinking modes are worth the latency cost for generative tasks. Gemma 4's
thinking: "high"on concept maps and quizzes produced measurably more accurate, interconnected output. The extra seconds are worth it when the artifact has to be study-ready.
π What's Next for StudyO
Near-term
- π Spaced repetition scheduler β surface the right flashcards at the right time, Anki-style, built into the app
- π Learning analytics dashboard β track retention, quiz performance, and time-per-concept across sessions
- π§βπ€βπ§ Live Study Groups β real-time collaborative sessions where multiple users study the same source set with shared AI context
Medium-term
- π« LMS integrations β Brightspace, Canvas, Blackboard source connectors so course materials auto-ingest
- π Multilingual support β generate all artifacts in the user's native language, with narration
- π± Mobile app β iOS/Android with offline flashcard review and voice tutor on the go
Long-term
- π StudyO for educators β let professors upload course content and provision personalized study environments for entire cohorts
- π€ Institutional licensing β university-wide deployments with learning analytics for faculty
**Built for students who keep 30 tabs open.** π *StudyO turns your chaos into a curriculum.*
Built With
- cloudnary
- elevenlabs
- mongodb

Log in or sign up for Devpost to join the conversation.