Agentic AI Language Learning Studio

An intelligent, personalized language learning platform that leverages advanced AI agents to create adaptive, context-aware lessons tailored to each learner's interests and proficiency level. The system combines large language models (LLMs) for content generation and analysis with state-of-the-art speech recognition and synthesis to provide comprehensive pronunciation feedback, ensuring learners master vocabulary and speaking skills before progressing to new material.

The Problem We're Solving

Traditional language learning apps like Duolingo follow a rigid, one-size-fits-all curriculum that doesn't adapt to individual learning styles, interests, or real-world application needs. Learners are forced through predetermined lesson sequences regardless of their personal goals, whether they want to learn Spanish for cooking, Japanese for travel, or French for business. This approach leads to disengagement, as learners struggle to see relevance in generic vocabulary lists and disconnected exercises.

How We're Better Than Duolingo:

  1. Personalized Content Generation: Unlike Duolingo's fixed curriculum, our AI agent generates lessons on-demand based on topics the learner actually cares about. Want to learn Spanish through football? Japanese through anime? The system creates contextually relevant content instantly.

  2. Adaptive Progression System: While Duolingo allows users to skip ahead without mastery, our agentic system enforces comprehension through intelligent quizzes. Users must demonstrate proficiency in vocabulary and pronunciation from previous lessons before creating new ones, ensuring genuine learning rather than superficial progression.

  3. Real-World Application: Our AI generates daily news articles in the target language, automatically simplified for the learner's level. This provides authentic, current content that keeps learners engaged with real-world language use, something Duolingo's scripted dialogues cannot match.

  4. Comprehensive Pronunciation Feedback: Using ElevenLabs' advanced ASR, we provide granular, word-level pronunciation analysis with both textual and voice feedback. Duolingo's pronunciation checks are binary (pass/fail) without detailed guidance on what went wrong.

  5. Intelligent Vocabulary Management: Our system maintains a personalized vocabulary corpus that grows with each lesson, allowing learners to review all encountered words. The AI intelligently tracks which words appear in which contexts, creating a rich knowledge graph of the learner's journey.

  6. Cultural Context Integration: Each lesson includes AI-generated cultural insights, language family information, and usage tips that help learners understand not just what to say, but why and when—context that Duolingo's gamified approach often lacks.

Technical Architecture

This is an agentic AI application where multiple specialized AI agents work collaboratively to provide a comprehensive learning experience:

Agentic AI Components

  1. Lesson Generation Agent (OpenAI GPT-5-mini)

    • Analyzes user input (language + topic) to generate beginner-appropriate paragraphs (~100 words)
    • Extracts key vocabulary with translations and usage tips
    • Identifies grammar concepts with English explanations
    • Generates sentence-by-sentence translations for comprehension
    • Creates cultural facts and language family information
  2. Pronunciation Analysis Agent (OpenAI GPT-5-mini + ElevenLabs ASR)

    • Receives transcribed audio from ElevenLabs ASR
    • Compares user pronunciation against target text
    • Provides detailed, constructive feedback on mispronunciations
    • Generates voice feedback using ElevenLabs TTS to guide corrections
  3. Content Simplification Agent (OpenAI GPT-5-mini)

    • Analyzes news articles for complexity
    • Automatically generates simplified versions for language learners
    • Maintains meaning while reducing vocabulary and sentence complexity
  4. Quiz Generation Agent (OpenAI GPT-5-mini)

    • Creates multiple-choice vocabulary questions from previous lessons
    • Selects appropriate sentences for speaking practice
    • Validates answers and provides feedback
  5. Translation Agent (OpenAI GPT-5-mini)

    • Provides full article translations
    • Generates sentence-by-sentence translations for detailed comprehension

System Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Frontend (React + Vite)                   │
│  - User Interface & State Management                         │
│  - MediaRecorder API for Audio Capture                       │
│  - Real-time Quiz & Practice Interfaces                      │
└──────────────────────┬────────────────────────────────────────┘
                       │ HTTP/REST API
┌──────────────────────▼────────────────────────────────────────┐
│              Backend (Node.js + Express)                      │
│  ┌────────────────────────────────────────────────────────┐  │
│  │  Agent Orchestration Layer                             │  │
│  │  - Routes requests to appropriate AI agents            │  │
│  │  - Manages agent workflows and data flow               │  │
│  └──────────────┬──────────────────────┬──────────────────┘  │
│                 │                      │                      │
│  ┌──────────────▼──────┐  ┌───────────▼──────────────┐    │
│  │  OpenAI API         │  │  ElevenLabs API            │    │
│  │  (GPT-5-mini)      │  │  - TTS (Text-to-Speech)   │    │
│  │  - Content Gen       │  │  - ASR (Speech Recog)     │    │
│  │  - Analysis          │  │  - Voice Synthesis        │    │
│  │  - Translation       │  └──────────────────────────┘    │
│  └─────────────────────┘                                    │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  SQLite Database (better-sqlite3)                       │ │
│  │  - User accounts & preferences                           │ │
│  │  - Lesson history & vocabulary corpus                   │ │
│  │  - Quiz results & progress tracking                      │ │
│  │  - News articles cache                                   │ │
│  └────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘

Key Technical Features

  • Agentic Workflow: The system uses multiple AI agents that communicate and coordinate to provide a seamless learning experience. Each agent specializes in a specific task (generation, analysis, simplification) and the orchestration layer manages their interactions.

  • Real-time Speech Processing: Audio is captured client-side using the Web MediaRecorder API, streamed to the backend, processed through ElevenLabs ASR, and analyzed by the pronunciation agent for immediate feedback.

  • Intelligent Caching: News articles are cached per user per day to reduce API costs while maintaining freshness. The system intelligently determines when to regenerate content.

  • Progressive Learning Enforcement: The quiz system uses AI to generate questions from previous lessons, ensuring users master material before advancing—a form of adaptive learning that traditional apps lack.

Tools & Technologies

Backend Technologies

Tool Purpose Integration
Node.js Runtime environment Core server platform running Express application
Express.js Web framework RESTful API server handling all HTTP requests and routing
OpenAI API Large Language Model Primary AI agent for content generation, analysis, translation, and quiz creation. Integrated via openai npm package using the Responses API for structured outputs
ElevenLabs API Speech Services Integrated via axios and form-data for TTS (text-to-speech) reference audio generation and ASR (automatic speech recognition) for pronunciation transcription
better-sqlite3 Database Embedded SQLite database for user data, lessons, vocabulary corpus, quizzes, and news articles. Provides synchronous, high-performance data persistence
multer File Upload Handler Middleware for processing multipart/form-data audio file uploads from the frontend
axios HTTP Client Used for making requests to ElevenLabs API endpoints (TTS and ASR)
form-data Form Encoding Constructs multipart form data for ElevenLabs API requests with audio files
cors Cross-Origin Resource Sharing Enables frontend (running on different port) to communicate with backend API
dotenv Environment Variables Loads API keys and configuration from .env file securely

Frontend Technologies

Tool Purpose Integration
React UI Framework Component-based architecture for building interactive user interfaces with hooks (useState, useEffect, useRef)
Vite Build Tool Fast development server and optimized production builds with HMR (Hot Module Replacement)
MediaRecorder API Audio Capture Browser-native API for recording user audio directly in the browser without external dependencies
localStorage Client-side Storage Persists user session data across page refreshes for seamless user experience

Development Tools

Tool Purpose Integration
nodemon Development Server Automatically restarts Node.js server on code changes during development
ESLint Code Linting Ensures code quality and consistency in React components

API Endpoints

Authentication

  • POST /api/auth/login - User login/registration
  • POST /api/auth/update-language - Update user's target language

Lesson Management

  • POST /api/practice - Generate personalized lesson (language + topic)
  • POST /api/lessons/save - Save lesson to user history
  • GET /api/lessons/:userId - Retrieve user's lesson history

Pronunciation Practice

  • POST /api/pronunciation - Full paragraph pronunciation analysis
  • POST /api/sentence-pronunciation - Sentence-level pronunciation with voice feedback

Quiz System

  • GET /api/quiz/required/:userId - Check if quiz is required (2+ lessons)
  • POST /api/quiz/generate - Generate quiz from last lesson
  • POST /api/quiz/validate - Validate quiz answers
  • POST /api/quiz/validate-speaking - Validate speaking practice for quiz

Vocabulary & Content

  • GET /api/corpus/:userId - Get user's vocabulary corpus
  • GET /api/news/:userId - Get personalized news feed in target language
  • POST /api/news/translate - Translate article to English
  • POST /api/news/sentence-translations - Get sentence-by-sentence translations

Getting Started

  1. Install dependencies

    npm install
    cd client && npm install
    
  2. Environment variables

Duplicate .env.example (in the repo root) to .env and set:

| Key | Description | | --- | --- | | OPENAI_API_KEY | API key for the LLM you want to use. | | OPENAI_MODEL | Optional override (defaults to gpt-5-mini). | | ELEVENLABS_API_KEY | ElevenLabs API key. | | ELEVENLABS_VOICE_ID | Voice for TTS reference audio. | | ELEVENLABS_TTS_MODEL | ElevenLabs TTS model (default eleven_turbo_v2_5). | | ELEVENLABS_ASR_MODEL | ElevenLabs ASR model (default scribe_v1). | | PORT | Backend port (default 5000). |

  1. Run the services

In two terminals:

   # Terminal 1 – backend
   npm run dev:server

   # Terminal 2 – frontend
   cd client
   npm run dev

The UI points to http://localhost:5000 by default. Override via VITE_API_BASE_URL.

Key Features

  • Personalized Lesson Generation: AI creates lessons based on your interests and target language
  • Intelligent Pronunciation Feedback: Word-level analysis with both text and voice guidance
  • Adaptive Learning Path: Quiz system ensures mastery before progression
  • Real-World Content: Daily news articles in your target language, automatically simplified
  • Vocabulary Corpus: Comprehensive tracking of all learned words with context
  • Cultural Insights: Learn about language origins, families, and cultural context
  • Sentence-by-Sentence Practice: Interactive speaking practice with immediate feedback
  • Grammar Concepts: AI-identified grammar points with English explanations

License

ISC

Built With

Share this project:

Updates