Notas: AI-Powered Lecture Processing Platform

Inspiration

Current lecture delivery systems are restricted by finite office hour availability, passive student engagement models, and the administrative overhead of manual assessment generation. Our objective was to develop a platform that automates lecture processing and evaluation without requiring modifications to established pedagogical workflows.

System Overview

Notas is an AI-integrated educational platform designed to process live lecture data in real-time to generate interactive assessments and structured learning materials.

Core Functionality

Professor Interface:

  • Real-time STT (Speech-to-Text): Continuous transcription of lecture audio.
  • Automated Assessment Generation: Quiz modules generated at 30-second intervals via LLM analysis of lecture telemetry.
  • Discrete Deployment: Assessment modules are pushed to student devices using a room-code protocol.
  • Visual Analytics: Computer vision-based analysis of whiteboard and slide content.
  • Asynchronous Synthesis: Lecture notes aggregated and summarized every 30 seconds.
  • Zero-Configuration Deployment: The system operates as a background process while the instructor conducts the session normally.

Student Interface:

  • Low-Latency Participation: Live quiz engagement via room-code authentication.
  • 24/7 Support: AI-driven "Office Hours" agent utilizing a photorealistic avatar interface.
  • Automated Material Delivery: Access to synthesized notes and flashcards.
  • Real-time Response Validation: Immediate feedback on quiz responses with detailed technical explanations.

Technical Architecture

  • Transcription Pipeline: Real-time streaming via ElevenLabs Scribe API.
  • Visual Analytics: Whiteboard and slide state detection using the Overshoot SDK.
  • Inference Engine: Question generation and context windowing powered by Google Gemini.
  • State Synchronization: WebSocket-based real-time synchronization for the live quiz lifecycle.
  • Agentic Interface: Photorealistic AI avatar rendering via Beyond Presence for office hours.
  • Data Synthesis: Recursive chunk aggregation and AI summarization for note generation.
  • Pedagogical Logic: Socratic method implementation within the AI agent to facilitate guided discovery.

Technical Implementation

System Architecture

The platform follows a decoupled three-tier architecture:

1. Frontend Layer (Next.js)

  • Real-time transcription client (ElevenLabs STT)
  • Visual analysis client (Overshoot SDK)
  • State-driven quiz interface
  • WebRTC video conferencing (LiveKit)

2. API Layer (FastAPI)

  • 5-second telemetry chunk aggregation service
  • 30-second window processor (Gemini integration)
  • Heuristic-based question generation and ranking engine
  • Distributed lecture note synthesis service

3. Data Layer (Supabase)

  • PostgreSQL persistence layer
  • Real-time event bus (WebSockets)
  • Identity and Access Management (IAM)

Implementation Details

Frontend Stack:

  • Next.js 16 / TypeScript: Type-safe application development.
  • ElevenLabs Scribe: Real-time STT integration.
  • Overshoot SDK: Frame-based visual analysis for static content.
  • LiveKit: WebRTC infrastructure for conferencing and agent interaction.
  • Supabase Client: Real-time database subscriptions and client-side state management.

Backend Stack:

  • Python 3.13 / FastAPI: High-performance asynchronous API endpoints.
  • Google Gemini AI: Natural language understanding and structured data generation.
  • Pydantic: Strict data validation and schema enforcement.
  • LiveKit Agents: Orchestration for voice-enabled AI entities.

Data Flow Pipeline:

  1. Ingestion: Every 5 seconds, the frontend pushes an AggregatedChunk (audio transcript + visual data) to the backend.
  2. Processing: Every 30 seconds, the backend processes a 6-chunk window using Gemini to generate notes and questions.
  3. Distribution: Questions are pushed to the professor’s dashboard in real-time via Supabase's listener pattern.
  4. Execution: Professors broadcast quizzes; students consume state updates via WebSockets and post responses.
  5. Persistence: Post-session, all artifacts (transcripts, notes, questions) are indexed for RAG-based retrieval.

AI Office Hours Implementation

The persistent AI agent for student support utilizes:

  • LiveKit Agents Framework: Orchestration of the real-time voice pipeline.
  • Beyond Presence: Photorealistic video synthesis.
  • DeepSeek AI: Reasoning and logic engine.
  • Socratic Prompt Engineering: The agent is constrained to avoid direct answer-giving, utilizing progressive questioning to guide student logic.

Technical Challenges

  1. Real-time State Sync: Managing consistent quiz states across hundreds of concurrent clients necessitated rigorous Supabase real-time subscription management to prevent race conditions during reconnections.
  2. Pedagogical Accuracy: Ensuring Gemini generated high-quality, non-redundant questions required iterative prompt engineering and the implementation of a robust context window strategy.
  3. Avatar/Agent Integration: Integrating the Beyond Presence avatar with LiveKit Agents required precise lifecycle management of the WebRTC session to ensure audio/video alignment.
  4. Data Aggregation: Handling out-of-order 5-second chunks and network latency during 30-second windowing required a server-side buffer with timeout-based reconciliation.

Engineering Accomplishments

  • Sub-30s Latency: Achieved an end-to-end pipeline from audio capture to deployed quiz in under 30 seconds.
  • Zero-Friction UX: Successfully implemented a system that requires no behavioral changes from the lecturer.
  • Architecture Scalability: The decoupled nature of the services allows for horizontal scaling of the FastAPI workers and Gemini processing nodes.

Future Roadmap

  1. RAG Optimization: Deep integration of lecture transcripts into the AI Professor's knowledge base for context-aware office hours.
  2. Predictive Analytics: A dashboard for instructors identifying specific concepts where student response data indicates low comprehension.
  3. Automated Spaced Repetition: Auto-generation of flashcards formatted for Anki/Quizlet from lecture metadata.

Local Development Setup

Backend

cd backend
pip install -e .
python main.py

Built With

Share this project:

Updates