Agentic AI Language Learning Studio
An intelligent, personalized language learning platform that leverages advanced AI agents to create adaptive, context-aware lessons tailored to each learner's interests and proficiency level. The system combines large language models (LLMs) for content generation and analysis with state-of-the-art speech recognition and synthesis to provide comprehensive pronunciation feedback, ensuring learners master vocabulary and speaking skills before progressing to new material.
The Problem We're Solving
Traditional language learning apps like Duolingo follow a rigid, one-size-fits-all curriculum that doesn't adapt to individual learning styles, interests, or real-world application needs. Learners are forced through predetermined lesson sequences regardless of their personal goals, whether they want to learn Spanish for cooking, Japanese for travel, or French for business. This approach leads to disengagement, as learners struggle to see relevance in generic vocabulary lists and disconnected exercises.
How We're Better Than Duolingo:
Personalized Content Generation: Unlike Duolingo's fixed curriculum, our AI agent generates lessons on-demand based on topics the learner actually cares about. Want to learn Spanish through football? Japanese through anime? The system creates contextually relevant content instantly.
Adaptive Progression System: While Duolingo allows users to skip ahead without mastery, our agentic system enforces comprehension through intelligent quizzes. Users must demonstrate proficiency in vocabulary and pronunciation from previous lessons before creating new ones, ensuring genuine learning rather than superficial progression.
Real-World Application: Our AI generates daily news articles in the target language, automatically simplified for the learner's level. This provides authentic, current content that keeps learners engaged with real-world language use, something Duolingo's scripted dialogues cannot match.
Comprehensive Pronunciation Feedback: Using ElevenLabs' advanced ASR, we provide granular, word-level pronunciation analysis with both textual and voice feedback. Duolingo's pronunciation checks are binary (pass/fail) without detailed guidance on what went wrong.
Intelligent Vocabulary Management: Our system maintains a personalized vocabulary corpus that grows with each lesson, allowing learners to review all encountered words. The AI intelligently tracks which words appear in which contexts, creating a rich knowledge graph of the learner's journey.
Cultural Context Integration: Each lesson includes AI-generated cultural insights, language family information, and usage tips that help learners understand not just what to say, but why and when—context that Duolingo's gamified approach often lacks.
Technical Architecture
This is an agentic AI application where multiple specialized AI agents work collaboratively to provide a comprehensive learning experience:
Agentic AI Components
Lesson Generation Agent (OpenAI GPT-5-mini)
- Analyzes user input (language + topic) to generate beginner-appropriate paragraphs (~100 words)
- Extracts key vocabulary with translations and usage tips
- Identifies grammar concepts with English explanations
- Generates sentence-by-sentence translations for comprehension
- Creates cultural facts and language family information
Pronunciation Analysis Agent (OpenAI GPT-5-mini + ElevenLabs ASR)
- Receives transcribed audio from ElevenLabs ASR
- Compares user pronunciation against target text
- Provides detailed, constructive feedback on mispronunciations
- Generates voice feedback using ElevenLabs TTS to guide corrections
Content Simplification Agent (OpenAI GPT-5-mini)
- Analyzes news articles for complexity
- Automatically generates simplified versions for language learners
- Maintains meaning while reducing vocabulary and sentence complexity
Quiz Generation Agent (OpenAI GPT-5-mini)
- Creates multiple-choice vocabulary questions from previous lessons
- Selects appropriate sentences for speaking practice
- Validates answers and provides feedback
Translation Agent (OpenAI GPT-5-mini)
- Provides full article translations
- Generates sentence-by-sentence translations for detailed comprehension
System Architecture
┌─────────────────────────────────────────────────────────────┐
│ Frontend (React + Vite) │
│ - User Interface & State Management │
│ - MediaRecorder API for Audio Capture │
│ - Real-time Quiz & Practice Interfaces │
└──────────────────────┬────────────────────────────────────────┘
│ HTTP/REST API
┌──────────────────────▼────────────────────────────────────────┐
│ Backend (Node.js + Express) │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Agent Orchestration Layer │ │
│ │ - Routes requests to appropriate AI agents │ │
│ │ - Manages agent workflows and data flow │ │
│ └──────────────┬──────────────────────┬──────────────────┘ │
│ │ │ │
│ ┌──────────────▼──────┐ ┌───────────▼──────────────┐ │
│ │ OpenAI API │ │ ElevenLabs API │ │
│ │ (GPT-5-mini) │ │ - TTS (Text-to-Speech) │ │
│ │ - Content Gen │ │ - ASR (Speech Recog) │ │
│ │ - Analysis │ │ - Voice Synthesis │ │
│ │ - Translation │ └──────────────────────────┘ │
│ └─────────────────────┘ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ SQLite Database (better-sqlite3) │ │
│ │ - User accounts & preferences │ │
│ │ - Lesson history & vocabulary corpus │ │
│ │ - Quiz results & progress tracking │ │
│ │ - News articles cache │ │
│ └────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
Key Technical Features
Agentic Workflow: The system uses multiple AI agents that communicate and coordinate to provide a seamless learning experience. Each agent specializes in a specific task (generation, analysis, simplification) and the orchestration layer manages their interactions.
Real-time Speech Processing: Audio is captured client-side using the Web MediaRecorder API, streamed to the backend, processed through ElevenLabs ASR, and analyzed by the pronunciation agent for immediate feedback.
Intelligent Caching: News articles are cached per user per day to reduce API costs while maintaining freshness. The system intelligently determines when to regenerate content.
Progressive Learning Enforcement: The quiz system uses AI to generate questions from previous lessons, ensuring users master material before advancing—a form of adaptive learning that traditional apps lack.
Tools & Technologies
Backend Technologies
| Tool | Purpose | Integration |
|---|---|---|
| Node.js | Runtime environment | Core server platform running Express application |
| Express.js | Web framework | RESTful API server handling all HTTP requests and routing |
| OpenAI API | Large Language Model | Primary AI agent for content generation, analysis, translation, and quiz creation. Integrated via openai npm package using the Responses API for structured outputs |
| ElevenLabs API | Speech Services | Integrated via axios and form-data for TTS (text-to-speech) reference audio generation and ASR (automatic speech recognition) for pronunciation transcription |
| better-sqlite3 | Database | Embedded SQLite database for user data, lessons, vocabulary corpus, quizzes, and news articles. Provides synchronous, high-performance data persistence |
| multer | File Upload Handler | Middleware for processing multipart/form-data audio file uploads from the frontend |
| axios | HTTP Client | Used for making requests to ElevenLabs API endpoints (TTS and ASR) |
| form-data | Form Encoding | Constructs multipart form data for ElevenLabs API requests with audio files |
| cors | Cross-Origin Resource Sharing | Enables frontend (running on different port) to communicate with backend API |
| dotenv | Environment Variables | Loads API keys and configuration from .env file securely |
Frontend Technologies
| Tool | Purpose | Integration |
|---|---|---|
| React | UI Framework | Component-based architecture for building interactive user interfaces with hooks (useState, useEffect, useRef) |
| Vite | Build Tool | Fast development server and optimized production builds with HMR (Hot Module Replacement) |
| MediaRecorder API | Audio Capture | Browser-native API for recording user audio directly in the browser without external dependencies |
| localStorage | Client-side Storage | Persists user session data across page refreshes for seamless user experience |
Development Tools
| Tool | Purpose | Integration |
|---|---|---|
| nodemon | Development Server | Automatically restarts Node.js server on code changes during development |
| ESLint | Code Linting | Ensures code quality and consistency in React components |
API Endpoints
Authentication
POST /api/auth/login- User login/registrationPOST /api/auth/update-language- Update user's target language
Lesson Management
POST /api/practice- Generate personalized lesson (language + topic)POST /api/lessons/save- Save lesson to user historyGET /api/lessons/:userId- Retrieve user's lesson history
Pronunciation Practice
POST /api/pronunciation- Full paragraph pronunciation analysisPOST /api/sentence-pronunciation- Sentence-level pronunciation with voice feedback
Quiz System
GET /api/quiz/required/:userId- Check if quiz is required (2+ lessons)POST /api/quiz/generate- Generate quiz from last lessonPOST /api/quiz/validate- Validate quiz answersPOST /api/quiz/validate-speaking- Validate speaking practice for quiz
Vocabulary & Content
GET /api/corpus/:userId- Get user's vocabulary corpusGET /api/news/:userId- Get personalized news feed in target languagePOST /api/news/translate- Translate article to EnglishPOST /api/news/sentence-translations- Get sentence-by-sentence translations
Getting Started
Install dependencies
npm install cd client && npm installEnvironment variables
Duplicate .env.example (in the repo root) to .env and set:
| Key | Description |
| --- | --- |
| OPENAI_API_KEY | API key for the LLM you want to use. |
| OPENAI_MODEL | Optional override (defaults to gpt-5-mini). |
| ELEVENLABS_API_KEY | ElevenLabs API key. |
| ELEVENLABS_VOICE_ID | Voice for TTS reference audio. |
| ELEVENLABS_TTS_MODEL | ElevenLabs TTS model (default eleven_turbo_v2_5). |
| ELEVENLABS_ASR_MODEL | ElevenLabs ASR model (default scribe_v1). |
| PORT | Backend port (default 5000). |
- Run the services
In two terminals:
# Terminal 1 – backend
npm run dev:server
# Terminal 2 – frontend
cd client
npm run dev
The UI points to http://localhost:5000 by default. Override via VITE_API_BASE_URL.
Key Features
- Personalized Lesson Generation: AI creates lessons based on your interests and target language
- Intelligent Pronunciation Feedback: Word-level analysis with both text and voice guidance
- Adaptive Learning Path: Quiz system ensures mastery before progression
- Real-World Content: Daily news articles in your target language, automatically simplified
- Vocabulary Corpus: Comprehensive tracking of all learned words with context
- Cultural Insights: Learn about language origins, families, and cultural context
- Sentence-by-Sentence Practice: Interactive speaking practice with immediate feedback
- Grammar Concepts: AI-identified grammar points with English explanations
License
ISC
Built With
- asr
- elevenlabs
- express.js
- node.js
- react
- sqlite
- tts
- vite

Log in or sign up for Devpost to join the conversation.