Inspiration
I wanted a study tool that actually meets students where they are: sitting on a pile of lecture recordings the night before an exam. Most tools either stop at transcripts or hide behind paywalls and heavyweight setups. We asked: what if one upload turned a lecture into everything you need to study—clean captions, summaries, flashcards, quizzes, and even a calendar plan? Serverless on Cloud Run so it’s easy to deploy and cheap to run.
What I built
LectureLens converts any MP3/MP4 into a complete study kit:
Transcription & captions: Offline-friendly Faster-Whisper (tiny) + ffmpeg → transcript + SRT/VTT.
Insights: Summaries, key points, flashcards (Markdown), and auto-generated quizzes (JSON).
Study plan: Exports a ready-to-import .ics schedule.
Privacy-first pipeline: Uploads stream directly to Cloud Storage via V4 signed URLs; the service never proxies large files.
Optional AI enrichment: If a GEMINI_API_KEY is present, Gemini models enhance summaries, quizzes, and plans; otherwise heuristic fallbacks kick in.
Polished UX: A single-page web app (generated with Google AI Studio and iterated by hand) manages categories, notes, flashcards, and quizzes and persists progress in localStorage.
I optimized each term to keep first results snappy on 2 vCPU.
How I built it
Runtime: Python 3.11, FastAPI + Uvicorn behind Gunicorn.
Transcription: faster-whisper (CT2, tiny) shipped inside the image under /app/models for offline use; ffmpeg for audio extraction.
AI enrichment: Optional Gemini calls for better summaries, flashcards, quizzes, and study plans; heuristics if no key.
Storage: Google Cloud Storage for uploads; browser requests a V4 signed URL from /uploads/sign, streams bytes directly to GCS, then posts a signed download URL to /meetings/process.
Deployment: Cloud Run with --cpu 2 --memory 4Gi --timeout 600 --concurrency 1 to give ASR breathing room. Secrets via Secret Manager; no secrets in the repo.
Frontend: Lightweight vanilla-JS SPA scaffolded with AI Studio (prompts in /docs), then hand-tuned for accessibility and performance.
Architecture (at a glance) Browser ──> /uploads/sign ──> (V4 URL) ──┐ │ │ └── PUT media to GCS (direct) <────────┘ └── POST /meetings/process (signed GET link) │ Cloud Run (FastAPI) ffmpeg → Faster-Whisper → (Gemini?) │ GCS artifacts: transcript, SRT, .ics, flashcards.md, quiz.json
What I learned
Serverless ≠ “just functions.” Long-running CPU tasks can thrive on Cloud Run with the right timeouts, memory, and low concurrency for deterministic performance.
Direct-to-GCS uploads are a superpower. Offloading large bodies from HTTP handlers avoided flaky demos and timeouts.
Model pragmatism wins. The tiny Faster-Whisper checkpoint, with VAD and downmixed mono at 16 kHz, gives great cost–quality tradeoffs for lectures.
Prompt scaffolds matter. Clear, rubric-style prompts for Gemini produced consistent flashcards and quizzes; we kept a deterministic heuristic fallback to guarantee output.
Product polish > raw AI. The .ics export, SRT captions, and local persistence make the “Aha!” moment immediate for judges and students.
Bonus: tracking word error rate
helped us evaluate small ffmpeg tweaks (e.g., resampling, VAD) before changing models.
Challenges we faced (and how we solved them)
CORS & Signed URLs: Preflight errors on PUT uploads. Fix: explicit PUT/GET/HEAD methods; allow Content-Type, Content-Range, x-goog-resumable/x-goog-date headers; validated IAM roles for storage.objectAdmin and serviceAccountTokenCreator.
Audio edge cases: Variable bitrates, stereo quirks, silent intros. Fix: standardized demux -ac 1 -ar 16000 -vn -sn -dn; added VAD and a short lead-in trim.
Cold starts: Model load + ffmpeg spawn could spike P95. Fix: --min-instances=1, lazy load the model, memoize tokenizer, and keep concurrency at 1.
Gemini optionality: Needed graceful degradation. Fix: guardrails that fall back to heuristic summarization and QG, keeping UX predictable.
Observability: Needed to explain where time goes in a demo. Fix: structured logs with durations for download, ffmpeg, asr, enrich, and artifact sizes.
Why Cloud Run
Zero ops, full control: Container freedom with autoscale.
Cost-aware: Scale to zero when idle; tiny model; no egress for uploads.
Security by design: Signed URLs + Secret Manager + least-privilege IAM.
What’s next
Resumable uploads by default for flaky networks.
Multi-lecture bundling (batch ZIP + consolidated .ics).
Richer agents (e.g., syllabus-aware quiz generation).
GPU variant (L4) to demo larger ASR/models for the GPU track.
Log in or sign up for Devpost to join the conversation.