Agentic AI Language Learning Studio

An intelligent, personalized language learning platform that leverages advanced AI agents to create adaptive, context-aware lessons tailored to each learner's interests and proficiency level. The system combines large language models (LLMs) for content generation and analysis with state-of-the-art speech recognition and synthesis to provide comprehensive pronunciation feedback, ensuring learners master vocabulary and speaking skills before progressing to new material.

The Problem We're Solving

Traditional language learning apps like Duolingo follow a rigid, one-size-fits-all curriculum that doesn't adapt to individual learning styles, interests, or real-world application needs. Learners are forced through predetermined lesson sequences regardless of their personal goals, whether they want to learn Spanish for cooking, Japanese for travel, or French for business. This approach leads to disengagement, as learners struggle to see relevance in generic vocabulary lists and disconnected exercises.

How We're Better Than Duolingo:

Personalized Content Generation: Unlike Duolingo's fixed curriculum, our AI agent generates lessons on-demand based on topics the learner actually cares about. Want to learn Spanish through football? Japanese through anime? The system creates contextually relevant content instantly.
Adaptive Progression System: While Duolingo allows users to skip ahead without mastery, our agentic system enforces comprehension through intelligent quizzes. Users must demonstrate proficiency in vocabulary and pronunciation from previous lessons before creating new ones, ensuring genuine learning rather than superficial progression.
Real-World Application: Our AI generates daily news articles in the target language, automatically simplified for the learner's level. This provides authentic, current content that keeps learners engaged with real-world language use, something Duolingo's scripted dialogues cannot match.
Comprehensive Pronunciation Feedback: Using ElevenLabs' advanced ASR, we provide granular, word-level pronunciation analysis with both textual and voice feedback. Duolingo's pronunciation checks are binary (pass/fail) without detailed guidance on what went wrong.
Intelligent Vocabulary Management: Our system maintains a personalized vocabulary corpus that grows with each lesson, allowing learners to review all encountered words. The AI intelligently tracks which words appear in which contexts, creating a rich knowledge graph of the learner's journey.
Cultural Context Integration: Each lesson includes AI-generated cultural insights, language family information, and usage tips that help learners understand not just what to say, but why and when—context that Duolingo's gamified approach often lacks.

Technical Architecture

This is an agentic AI application where multiple specialized AI agents work collaboratively to provide a comprehensive learning experience:

Agentic AI Components

Lesson Generation Agent (OpenAI GPT-5-mini)
- Analyzes user input (language + topic) to generate beginner-appropriate paragraphs (~100 words)
- Extracts key vocabulary with translations and usage tips
- Identifies grammar concepts with English explanations
- Generates sentence-by-sentence translations for comprehension
- Creates cultural facts and language family information
Pronunciation Analysis Agent (OpenAI GPT-5-mini + ElevenLabs ASR)
- Receives transcribed audio from ElevenLabs ASR
- Compares user pronunciation against target text
- Provides detailed, constructive feedback on mispronunciations
- Generates voice feedback using ElevenLabs TTS to guide corrections
Content Simplification Agent (OpenAI GPT-5-mini)
- Analyzes news articles for complexity
- Automatically generates simplified versions for language learners
- Maintains meaning while reducing vocabulary and sentence complexity
Quiz Generation Agent (OpenAI GPT-5-mini)
- Creates multiple-choice vocabulary questions from previous lessons
- Selects appropriate sentences for speaking practice
- Validates answers and provides feedback
Translation Agent (OpenAI GPT-5-mini)
- Provides full article translations
- Generates sentence-by-sentence translations for detailed comprehension

System Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Frontend (React + Vite)                   │
│  - User Interface & State Management                         │
│  - MediaRecorder API for Audio Capture                       │
│  - Real-time Quiz & Practice Interfaces                      │
└──────────────────────┬────────────────────────────────────────┘
                       │ HTTP/REST API
┌──────────────────────▼────────────────────────────────────────┐
│              Backend (Node.js + Express)                      │
│  ┌────────────────────────────────────────────────────────┐  │
│  │  Agent Orchestration Layer                             │  │
│  │  - Routes requests to appropriate AI agents            │  │
│  │  - Manages agent workflows and data flow               │  │
│  └──────────────┬──────────────────────┬──────────────────┘  │
│                 │                      │                      │
│  ┌──────────────▼──────┐  ┌───────────▼──────────────┐    │
│  │  OpenAI API         │  │  ElevenLabs API            │    │
│  │  (GPT-5-mini)      │  │  - TTS (Text-to-Speech)   │    │
│  │  - Content Gen       │  │  - ASR (Speech Recog)     │    │
│  │  - Analysis          │  │  - Voice Synthesis        │    │
│  │  - Translation       │  └──────────────────────────┘    │
│  └─────────────────────┘                                    │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  SQLite Database (better-sqlite3)                       │ │
│  │  - User accounts & preferences                           │ │
│  │  - Lesson history & vocabulary corpus                   │ │
│  │  - Quiz results & progress tracking                      │ │
│  │  - News articles cache                                   │ │
│  └────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘

Key Technical Features

Agentic Workflow: The system uses multiple AI agents that communicate and coordinate to provide a seamless learning experience. Each agent specializes in a specific task (generation, analysis, simplification) and the orchestration layer manages their interactions.
Real-time Speech Processing: Audio is captured client-side using the Web MediaRecorder API, streamed to the backend, processed through ElevenLabs ASR, and analyzed by the pronunciation agent for immediate feedback.
Intelligent Caching: News articles are cached per user per day to reduce API costs while maintaining freshness. The system intelligently determines when to regenerate content.
Progressive Learning Enforcement: The quiz system uses AI to generate questions from previous lessons, ensuring users master material before advancing—a form of adaptive learning that traditional apps lack.

Tools & Technologies

Backend Technologies

Tool	Purpose	Integration
Node.js	Runtime environment	Core server platform running Express application
Express.js	Web framework	RESTful API server handling all HTTP requests and routing
OpenAI API	Large Language Model	Primary AI agent for content generation, analysis, translation, and quiz creation. Integrated via `openai` npm package using the Responses API for structured outputs
ElevenLabs API	Speech Services	Integrated via `axios` and `form-data` for TTS (text-to-speech) reference audio generation and ASR (automatic speech recognition) for pronunciation transcription
better-sqlite3	Database	Embedded SQLite database for user data, lessons, vocabulary corpus, quizzes, and news articles. Provides synchronous, high-performance data persistence
multer	File Upload Handler	Middleware for processing multipart/form-data audio file uploads from the frontend
axios	HTTP Client	Used for making requests to ElevenLabs API endpoints (TTS and ASR)
form-data	Form Encoding	Constructs multipart form data for ElevenLabs API requests with audio files
cors	Cross-Origin Resource Sharing	Enables frontend (running on different port) to communicate with backend API
dotenv	Environment Variables	Loads API keys and configuration from `.env` file securely

Frontend Technologies

Tool	Purpose	Integration
React	UI Framework	Component-based architecture for building interactive user interfaces with hooks (useState, useEffect, useRef)
Vite	Build Tool	Fast development server and optimized production builds with HMR (Hot Module Replacement)
MediaRecorder API	Audio Capture	Browser-native API for recording user audio directly in the browser without external dependencies
localStorage	Client-side Storage	Persists user session data across page refreshes for seamless user experience

Development Tools

Tool	Purpose	Integration
nodemon	Development Server	Automatically restarts Node.js server on code changes during development
ESLint	Code Linting	Ensures code quality and consistency in React components

API Endpoints

Authentication

POST /api/auth/login - User login/registration
POST /api/auth/update-language - Update user's target language

Lesson Management

POST /api/practice - Generate personalized lesson (language + topic)
POST /api/lessons/save - Save lesson to user history
GET /api/lessons/:userId - Retrieve user's lesson history

Pronunciation Practice

POST /api/pronunciation - Full paragraph pronunciation analysis
POST /api/sentence-pronunciation - Sentence-level pronunciation with voice feedback

Quiz System

GET /api/quiz/required/:userId - Check if quiz is required (2+ lessons)
POST /api/quiz/generate - Generate quiz from last lesson
POST /api/quiz/validate - Validate quiz answers
POST /api/quiz/validate-speaking - Validate speaking practice for quiz

Vocabulary & Content

GET /api/corpus/:userId - Get user's vocabulary corpus
GET /api/news/:userId - Get personalized news feed in target language
POST /api/news/translate - Translate article to English
POST /api/news/sentence-translations - Get sentence-by-sentence translations

Getting Started

Install dependencies
```
npm install
cd client && npm install
```
Environment variables

Duplicate .env.example (in the repo root) to .env and set:

| Key | Description | | --- | --- | | OPENAI_API_KEY | API key for the LLM you want to use. | | OPENAI_MODEL | Optional override (defaults to gpt-5-mini). | | ELEVENLABS_API_KEY | ElevenLabs API key. | | ELEVENLABS_VOICE_ID | Voice for TTS reference audio. | | ELEVENLABS_TTS_MODEL | ElevenLabs TTS model (default eleven_turbo_v2_5). | | ELEVENLABS_ASR_MODEL | ElevenLabs ASR model (default scribe_v1). | | PORT | Backend port (default 5000). |

Run the services

In two terminals:

   # Terminal 1 – backend
   npm run dev:server

   # Terminal 2 – frontend
   cd client
   npm run dev

The UI points to http://localhost:5000 by default. Override via VITE_API_BASE_URL.

Key Features

Personalized Lesson Generation: AI creates lessons based on your interests and target language
Intelligent Pronunciation Feedback: Word-level analysis with both text and voice guidance
Adaptive Learning Path: Quiz system ensures mastery before progression
Real-World Content: Daily news articles in your target language, automatically simplified
Vocabulary Corpus: Comprehensive tracking of all learned words with context
Cultural Insights: Learn about language origins, families, and cultural context
Sentence-by-Sentence Practice: Interactive speaking practice with immediate feedback
Grammar Concepts: AI-identified grammar points with English explanations

License

ISC

Built With

asr
elevenlabs
express.js
node.js
react
sqlite
tts
vite

Updates

Neh Majmudar started this project — Nov 18, 2025 04:06 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.