Inspiration

I wanted a study tool that actually meets students where they are: sitting on a pile of lecture recordings the night before an exam. Most tools either stop at transcripts or hide behind paywalls and heavyweight setups. We asked: what if one upload turned a lecture into everything you need to study—clean captions, summaries, flashcards, quizzes, and even a calendar plan? Serverless on Cloud Run so it’s easy to deploy and cheap to run.

What I built

LectureLens converts any MP3/MP4 into a complete study kit:

Transcription & captions: Offline-friendly Faster-Whisper (tiny) + ffmpeg → transcript + SRT/VTT.

Insights: Summaries, key points, flashcards (Markdown), and auto-generated quizzes (JSON).

Study plan: Exports a ready-to-import .ics schedule.

Privacy-first pipeline: Uploads stream directly to Cloud Storage via V4 signed URLs; the service never proxies large files.

Optional AI enrichment: If a GEMINI_API_KEY is present, Gemini models enhance summaries, quizzes, and plans; otherwise heuristic fallbacks kick in.

Polished UX: A single-page web app (generated with Google AI Studio and iterated by hand) manages categories, notes, flashcards, and quizzes and persists progress in localStorage.

I optimized each term to keep first results snappy on 2 vCPU.

How I built it

Runtime: Python 3.11, FastAPI + Uvicorn behind Gunicorn.

Transcription: faster-whisper (CT2, tiny) shipped inside the image under /app/models for offline use; ffmpeg for audio extraction.

AI enrichment: Optional Gemini calls for better summaries, flashcards, quizzes, and study plans; heuristics if no key.

Storage: Google Cloud Storage for uploads; browser requests a V4 signed URL from /uploads/sign, streams bytes directly to GCS, then posts a signed download URL to /meetings/process.

Deployment: Cloud Run with --cpu 2 --memory 4Gi --timeout 600 --concurrency 1 to give ASR breathing room. Secrets via Secret Manager; no secrets in the repo.

Frontend: Lightweight vanilla-JS SPA scaffolded with AI Studio (prompts in /docs), then hand-tuned for accessibility and performance.

Architecture (at a glance) Browser ──> /uploads/sign ──> (V4 URL) ──┐ │ │ └── PUT media to GCS (direct) <────────┘ └── POST /meetings/process (signed GET link) │ Cloud Run (FastAPI) ffmpeg → Faster-Whisper → (Gemini?) │ GCS artifacts: transcript, SRT, .ics, flashcards.md, quiz.json

What I learned

Serverless ≠ “just functions.” Long-running CPU tasks can thrive on Cloud Run with the right timeouts, memory, and low concurrency for deterministic performance.

Direct-to-GCS uploads are a superpower. Offloading large bodies from HTTP handlers avoided flaky demos and timeouts.

Model pragmatism wins. The tiny Faster-Whisper checkpoint, with VAD and downmixed mono at 16 kHz, gives great cost–quality tradeoffs for lectures.

Prompt scaffolds matter. Clear, rubric-style prompts for Gemini produced consistent flashcards and quizzes; we kept a deterministic heuristic fallback to guarantee output.

Product polish > raw AI. The .ics export, SRT captions, and local persistence make the “Aha!” moment immediate for judges and students.

Bonus: tracking word error rate

helped us evaluate small ffmpeg tweaks (e.g., resampling, VAD) before changing models.

Challenges we faced (and how we solved them)

CORS & Signed URLs: Preflight errors on PUT uploads. Fix: explicit PUT/GET/HEAD methods; allow Content-Type, Content-Range, x-goog-resumable/x-goog-date headers; validated IAM roles for storage.objectAdmin and serviceAccountTokenCreator.

Audio edge cases: Variable bitrates, stereo quirks, silent intros. Fix: standardized demux -ac 1 -ar 16000 -vn -sn -dn; added VAD and a short lead-in trim.

Cold starts: Model load + ffmpeg spawn could spike P95. Fix: --min-instances=1, lazy load the model, memoize tokenizer, and keep concurrency at 1.

Gemini optionality: Needed graceful degradation. Fix: guardrails that fall back to heuristic summarization and QG, keeping UX predictable.

Observability: Needed to explain where time goes in a demo. Fix: structured logs with durations for download, ffmpeg, asr, enrich, and artifact sizes.

Why Cloud Run

Zero ops, full control: Container freedom with autoscale.

Cost-aware: Scale to zero when idle; tiny model; no egress for uploads.

Security by design: Signed URLs + Secret Manager + least-privilege IAM.

What’s next

Resumable uploads by default for flaky networks.

Multi-lecture bundling (batch ZIP + consolidated .ics).

Richer agents (e.g., syllabus-aware quiz generation).

GPU variant (L4) to demo larger ASR/models for the GPU track.

Built With

  • css
  • ctranslate2
  • docker
  • fastapi
  • faster-whisper
  • ffmpeg
  • gemini-api
  • google-ai-studio
  • google-artifact-registry
  • google-cloud
  • google-cloud-run
  • google-secret-manager
  • gunicorn
  • html
  • ics-calendar
  • json
  • localstorage
  • markdown
  • python-3.11
  • rest-api
  • secure-by-design
  • serverless
  • signed-urls-v4
  • spa-frontend
  • srt-vtt
  • uvicorn
  • vanilla-js
Share this project:

Updates