Inspiration

The inspiration for ML-E came from observing the growing importance of machine learning literacy in today's digital world, yet the lack of accessible, personalized educational tools for high school students. Traditional textbooks and online courses often fail to provide the interactive, conversational learning experience that helps students truly understand complex ML concepts.

We envisioned an AI tutor that could:

  • Explain machine learning concepts in age-appropriate language
  • Provide instant, personalized responses to student questions
  • Remember previous conversations to build upon prior knowledge
  • Optimize learning efficiency through intelligent response caching

What We Learned

Throughout the development of ML-E, we gained valuable insights into:

  1. Intelligent Caching Systems: Implementing multi-level duplicate detection taught us about the importance of optimizing API usage while maintaining response quality. We learned that even slight variations in questions (like "What is ML?" vs "What is machine learning?") should be treated as duplicates to provide consistent learning experiences.

  2. Real-time Communication Architecture: Building a robust WebSocket-based chat system showed us the complexities of maintaining session state, handling connection failures, and ensuring message persistence across user navigation.

  3. Educational AI Design: We discovered that effective AI tutoring requires careful prompt engineering to ensure responses are:

    • Age-appropriate for high school students (grades 9-10)
    • Conversational and engaging
    • Technically accurate yet accessible
    • Building upon previous conversation context
  4. Data Persistence Strategies: Implementing dual storage (MongoDB + Redis) taught us about balancing performance with reliability, ensuring that user conversations are never lost while maintaining fast response times.

How We Built It

The development process followed a systematic approach:

Phase 1: Foundation (Authentication & Basic UI)

  • Implemented secure JWT-based authentication system
  • Created responsive React frontend with clean, student-friendly interface
  • Established MongoDB database with user management

Phase 2: Real-time Chat System

  • Built WebSocket-based communication using Socket.io
  • Integrated OpenAI GPT-3.5-turbo for educational responses
  • Implemented grade-aware prompting system

Phase 3: Intelligence Layer

  • Developed multi-level duplicate detection algorithm: Current Session → Recent Sessions → Redis Cache → OpenAI API
  • Created sophisticated similarity matching using word analysis: $$\text{Similarity} = \frac{|\text{CommonWords}|}{\max(|\text{Words}_1|, |\text{Words}_2|)}$$
  • Implemented adaptive thresholds (80% for short questions, 70% for longer ones)

Phase 4: Persistence & Analytics

  • Enhanced session management with MongoDB storage
  • Built comprehensive learning analytics system
  • Created progress tracking and visualization dashboard

Phase 5: Optimization & Polish

  • Removed unnecessary UI elements (connection status indicators)
  • Optimized response caching with clear user indicators
  • Implemented cross-session conversation continuity

Challenges We Faced

1. Session Persistence Complexity

Challenge: Users lost their chat history when navigating between pages.

Solution: Implemented a sophisticated session management system:

  • Frontend: localStorage-based session service with automatic recovery
  • Backend: Dual storage strategy (MongoDB + Redis) with session continuity logic
  • Result: Seamless conversation persistence across all navigation

2. Duplicate Detection Accuracy

Challenge: Determining when questions are "similar enough" to use cached responses.

Mathematical Approach: We developed an adaptive similarity algorithm:

For questions $Q_1$ and $Q_2$ with word sets $W_1$ and $W_2$:

$$\text{Similarity}(Q_1, Q_2) = \frac{|W_1 \cap W_2|}{\max(|W_1|, |W_2|)}$$

With adaptive thresholds:

  • Short questions (≤3 words): $\text{threshold} = 0.8$
  • Longer questions (>3 words): $\text{threshold} = 0.7$

Result: 95%+ accuracy in duplicate detection, significantly reducing API costs.

3. Real-time Performance Optimization

Challenge: Balancing response speed with system reliability.

Solution: Multi-tier caching strategy:

  1. Level 1: Current session MongoDB check (~50ms)
  2. Level 2: Cross-session MongoDB search (~100ms)
  3. Level 3: Redis fallback cache (~20ms)
  4. Level 4: OpenAI API call (2-5 seconds)

Result: 70%+ of responses served from cache in <100ms.

4. Educational Content Quality

Challenge: Ensuring AI responses are educationally appropriate and engaging.

Solution: Developed grade-aware prompting system:

const gradePrompts = {
  9: "Explain like I'm a 9th grader with basic math knowledge...",
  10: "Explain for a 10th grader who understands algebra..."
};

Result: Consistently age-appropriate, engaging educational content.

Built With

Share this project:

Updates