Inspiration
We've all been there: frantically scribbling notes during a lecture, missing the next point because we were still writing the last one, then opening four different apps after class just to study what we half-captured. Research backs up what we felt. Locke (2015) found that the average student fails to record nearly half the material a lecturer doesn't write on the board. Aivaz & Teodorescu (2022) showed that multitasking causes measurable mental overload in college learners, and the APA confirms that task-switching cuts efficiency and raises error rates. We built Converge because no single tool solves the full lecture-to-study workflow. Students are forced to stitch together a recorder, a transcription app, a note-taker, and a flashcard tool, losing momentum at every seam.
What it does
Converge is an AI-powered lecture capture and study platform that handles everything in one place. You capture live audio directly in the browser or upload existing recordings, videos, and course material. Converge transcribes in real time using Deepgram, then automatically generates structured notes, flashcards, and quizzes from the transcript. Study sessions use SM-2 spaced repetition to schedule reviews at the right intervals, and a built-in Pomodoro timer keeps sessions focused. The platform integrates with Canvas so your study material maps directly to your courses and assignments. For students with accessibility needs, Converge includes text-to-speech, dyslexia-friendly fonts, a reading ruler, focus mode, and, uniquely, ASL fingerspelling-to-text input powered by client-side hand-landmark detection, letting deaf and hard-of-hearing students add text without a keyboard.
How we built it
The frontend is React 19 + Vite 6 with TanStack Router for client-side navigation, styled with Tailwind v4, and animated with Framer Motion. PocketBase serves as our backend and database: lightweight, self-hostable, and fast to iterate on. For real-time speech-to-text we stream audio over a Deepgram WebSocket (Nova-2 model), and for uploaded files we batch-process with OpenAI Whisper. AI generation (note structuring, flashcard creation, quiz generation) runs through GPT-4o-mini. The five-stage pipeline (STT, cleanup, notes, flashcards, quiz) retries each stage up to twice on failure and always persists partial results, so students never lose a session. The ASL fingerspelling module uses MediaPipe Hands entirely client-side: it detects 21 landmarks per hand at 15fps with under 50ms inference, computes a 69-feature vector of relative positions, finger curl values, and thumb distances, and feeds that into a letter classifier whose output merges into the live caption stream. Confidence below 0.85 triggers a low-confidence indicator so users know when to re-sign.
Challenges we ran into
Getting real-time transcription and ASL recognition to co-exist in the same capture flow was the toughest integration challenge, as both systems compete for microphone and camera resources and each has its own timing model. We also had to design the five-stage AI pipeline to be fault-tolerant: any stage can fail and the user still has everything computed up to that point. Tuning the ASL classifier to be reliable across different lighting conditions and hand sizes without server infrastructure meant careful feature engineering on the client. On the UX side, keeping the interface genuinely friction-free rather than just feature-complete required several rounds of cutting things that felt useful but added cognitive load.
Accomplishments that we're proud of
We shipped a fully browser-native app (no installs, no extensions) that handles live capture, transcription, AI generation, spaced repetition study, and accessibility features end-to-end. The client-side ASL fingerspelling recognition is something none of the incumbent tools (Notion, Canvas, Wispr Flow) offer at all, and running it entirely in the browser with no server cost is something we're genuinely proud of. We also closed every gap in the competitive feature matrix: live capture, batch transcription, structured notes, flashcards with SM-2, quizzes, Canvas integration, and built-in accessibility, all checked, none partial.
What we learned
We learned that "no friction" is a design constraint as demanding as any technical one. Every feature decision had to be weighed against whether it added a step between a student and their study material. We deepened our understanding of the SM-2 algorithm and why spaced repetition beats passive re-reading for long-term retention. On the technical side, we got hands-on experience designing resilient multi-stage AI pipelines and building real-time WebSocket audio streams that stay stable under variable network conditions. The MediaPipe integration taught us a lot about the gap between a model working in ideal demos and working reliably across real users.
What's next for Converge
The roadmap has three clear priorities: multi-tenant organization support so instructors can push material directly to student accounts, a native mobile app so students can capture on their phones without a laptop, and a sign-language avatar overlay that renders a signing avatar alongside captions, making Converge genuinely immersive for ASL users rather than just accessible. Longer term, we see Converge expanding beyond Canvas to other LMS platforms, and adding collaborative study features so students can share notes and quiz each other from the same lecture capture.
Built With
- framer
- mediapipe
- ollama
- pocketbase
- react
- tanstack
Log in or sign up for Devpost to join the conversation.