Inspiration

Every CS student faces the same struggle: drowning in textbooks, scattered YouTube tutorials, disconnected coding problems, and no personalized guidance. We've all been there - spending hours searching for explanations across multiple sources, trying to connect the dots ourselves.

I built AI Study Mentor to be the tutor I wish I had - one that understands YOUR materials, adapts to YOUR learning style, and helps you master both theory AND coding practice in one unified platform.

The "aha moment" came when I realized: what if an AI could read all my textbooks, watch all my lecture videos, analyze my GitHub repos, AND help me practice LeetCode problems - all while explaining concepts using Socratic teaching methods? That's when AI Study Mentor was born.

What it does

AI Study Mentor is a comprehensive learning platform that combines:

🎓 MULTI-SOURCE RAG SYSTEM

  • Upload PDFs (textbooks, notes, papers)
  • Paste YouTube video URLs (auto-fetches transcripts with timestamps)
  • Import GitHub repositories (processes READMEs and code documentation)
  • Upload lecture recordings (auto-transcribes with Whisper AI)
  • All sources unified into one searchable knowledge base (12,000+ chunks)

🤖 ADAPTIVE TUTORING MODES

  • Ask Mode: Direct Q&A with source citations
  • Socratic Tutor: Guides you to discover answers (doesn't spoon-feed)
  • Concept Map: Visual knowledge graphs showing topic relationships
  • Synthesis Mode: Compares explanations across multiple sources
  • Adaptive Tutor: Adjusts complexity based on your understanding
  • Math Mode: LaTeX rendering and step-by-step solutions
  • 3D Visualization: Interactive embedding space exploration

💻 LEETCODE INTEGRATION (2000+ Problems)

  • Integrated 3 major LeetCode datasets
  • Search by difficulty (Easy/Medium/Hard)
  • Filter by topics (Arrays, DP, Trees, Graphs, etc.)
  • Multiple solution approaches with complexity analysis
  • AI-powered explanations connecting theory to practice
  • Step-by-step walkthroughs for interview prep

📊 STUDY ANALYTICS

  • Track learning progress over time
  • Visualize topic coverage with word clouds
  • Monitor RAG performance metrics (similarity scores, response times)
  • Knowledge gap identification
  • Study streak tracking

🎨 BEAUTIFUL UI

  • Glassmorphism design with animated gradients
  • Dark mode optimized for extended study sessions
  • Real-time streaming responses
  • Intuitive tab-based navigation
  • Professional, distraction-free interface

How we built it

ARCHITECTURE:

  • Frontend: Streamlit with custom CSS (glassmorphism, animations)
  • Backend: Python with LangChain framework
  • Vector Database: FAISS (12,368 text chunks, 384-dim embeddings)
  • LLM: Google Gemini 1.5 Flash API (cloud-based, cost-efficient)
  • Embeddings: sentence-transformers (all-MiniLM-L6-v2)
  • Data Processing: PyPDF, youtube-transcript-api, BeautifulSoup4

TECH STACK:

  • LangChain 0.1.20 - RAG orchestration
  • FAISS - Vector similarity search
  • Sentence Transformers - Text embeddings
  • Gemini API - Answer generation
  • Streamlit - Interactive UI
  • Python 3.12 - Core language

DATA PIPELINE:

  1. Document ingestion (PDFs, URLs, audio files)
  2. Text extraction and chunking (1000 chars, 200 overlap)
  3. Embedding generation (semantic vectors)
  4. Vector store indexing (FAISS)
  5. Query processing (semantic + keyword search)
  6. Context retrieval (top-k similar chunks)
  7. LLM synthesis (Gemini generates coherent answers)
  8. Source citation (transparency and verification)

LEETCODE INTEGRATION:

  • Aggregated 3 datasets: HuggingFace (kaysss/leetcode-problem-set), Kaggle (gzipchrist, mohitkumar282)
  • Normalized schema across sources
  • Semantic search over problem descriptions
  • AI-generated study guides per problem

PERFORMANCE OPTIMIZATIONS:

  • @st.cache_resource for instant re-loads
  • Pre-computed embeddings (saved to disk)
  • Async data loading
  • Response caching (5-min TTL)
  • Lazy tab loading

KEY FEATURES:

  • Multi-source synthesis (compare 3+ sources)
  • Timestamp citations (YouTube links to exact moment)
  • Socratic questioning (guides learning, doesn't lecture)
  • Adaptive difficulty (adjusts explanations)
  • Real-time analytics (tracks progress)

Challenges we ran into

  1. PERFORMANCE BOTTLENECKS Problem: Initial version took 30+ seconds to answer queries Solution: Implemented aggressive caching (@st.cache_resource), pre-computed embeddings, and switched from ChromaDB to FAISS for 5x faster similarity search

  2. LEETCODE DATASET INTEGRATION Problem: 3 different datasets with incompatible schemas Solution: Built normalization pipeline to unify fields (title, difficulty, topics, acceptance rate) and deduplicate by problem title

  3. CONTEXT QUALITY Problem: Retrieved chunks were sometimes irrelevant or incomplete Solution: Implemented hybrid search (semantic + keyword), increased chunk overlap from 100 to 200 chars, and added re-ranking based on keyword matches

  4. API COST MANAGEMENT Problem: Gemini API calls adding up with repeated queries Solution: Cached responses for 5 minutes, implemented demo mode with pre-computed answers for common questions

  5. UI RESPONSIVENESS Problem: Streamlit reloading entire app on every interaction Solution: Strategic use of st.session_state, lazy loading tabs, and separating heavy computations into cached functions

  6. YOUTUBE TRANSCRIPT FAILURES Problem: Some videos lack transcripts or have auto-generated garbage Solution: Graceful error handling, fallback to video description, user-friendly error messages explaining limitations

  7. EMBEDDING DIMENSION MISMATCH Problem: Different models produce different vector sizes Solution: Standardized on sentence-transformers/all-MiniLM-L6-v2 (384 dims) across all sources for consistency

BIGGEST LEARNING: RAG systems need constant tuning. The "right" chunk size, overlap, top-k value, and prompt template vary by use case. We iterated 10+ times to find optimal settings for CS education content.

Accomplishments that we're proud of

✨ TECHNICAL ACHIEVEMENTS:

  • Processed 977 documents into 12,368 searchable chunks in under 2 minutes
  • Achieved <2 second query response time (95th percentile)
  • Integrated 2000+ LeetCode problems with semantic search
  • Built 8 different tutoring modes (Ask, Socratic, Concept Map, Synthesis, etc.)
  • Zero-downtime deployment with cached responses
  • Beautiful glassmorphism UI that rivals commercial products

🎓 EDUCATIONAL IMPACT:

  • Multi-source synthesis - no other tool compares 3+ textbook explanations side-by-side
  • Socratic tutoring - actually teaches critical thinking, not just answers
  • LeetCode integration - connects theory (textbooks) to practice (coding problems)
  • Adaptive difficulty - adjusts explanations from ELI5 to graduate-level

💪 PERSONAL GROWTH:

  • First time building a production RAG system from scratch
  • Learned prompt engineering, vector databases, and LLM integration
  • Mastered Streamlit for rapid prototyping
  • Gained experience with API cost optimization
  • Built something I'll actually use daily for my own learning

🏆 WHAT MAKES THIS SPECIAL: Most "AI tutors" are just ChatGPT wrappers. AI Study Mentor is different:

  • Learns from YOUR materials (not generic training data)
  • Cites sources (transparency and fact-checking)
  • Multiple teaching styles (not one-size-fits-all)
  • Coding practice integrated (LeetCode problems)
  • Analytics dashboard (track your progress)
  • Local-first (your data stays private)

This isn't vaporware - it's a fully functional platform processing real documents and answering real questions RIGHT NOW.

What we learned

TECHNICAL LESSONS:

  • RAG is harder than it looks - chunk size, overlap, retrieval strategy all matter
  • FAISS is 5x faster than ChromaDB for <1M vectors
  • Gemini 1.5 Flash offers best price/performance ratio ($0.00015 per 1K tokens)
  • Streamlit caching (@st.cache_resource) is essential for performance
  • sentence-transformers embeddings are surprisingly good for domain-specific search
  • Hybrid search (semantic + keyword) beats pure semantic search for technical content

AI/ML INSIGHTS:

  • Prompt engineering is critical - minor template changes improve quality 30%+
  • Chain-of-thought prompting reduces hallucinations
  • Citing sources builds trust (users verify answers)
  • Top-k=3 chunks is sweet spot (more = noise, less = missing context)
  • Re-ranking retrieved chunks improves relevance

PRODUCT LESSONS:

  • Students want BOTH learning AND practice (hence LeetCode integration)
  • Socratic tutoring beats direct answers for retention
  • Visual feedback (progress bars, stats) increases engagement
  • Dark mode is non-negotiable for extended study sessions
  • Example questions solve cold-start problem (users don't know what to ask)

BIGGEST SURPRISE: Users care MORE about source citations than answer speed. They want to verify the AI's claims. This shaped our design - we always show sources, even if it takes extra screen space.

WHAT I'D DO DIFFERENTLY:

  • Start with FAISS (not ChromaDB) - would've saved 4 hours of migration
  • Build demo mode from day 1 (pre-cached responses for presentations)
  • Add telemetry earlier (understand usage patterns sooner)
  • Use smaller embedding model (all-MiniLM-L6-v2 is overkill for this use case)

What's next for AI Study Mentor

SHORT-TERM (Next Month): ✅ Notion integration - sync with student note-taking workflows ✅ Google Drive sync - auto-import study folders ✅ Collaborative study rooms - multiple students, shared knowledge base ✅ Quiz generator - auto-create practice tests from documents ✅ Flashcard export - generate Anki decks ✅ Mobile app - study on the go ✅ Voice mode - audio input/output for hands-free learning

MID-TERM (3-6 Months): ✅ Fine-tuned model - domain-specific CS education model via LoRA ✅ Peer learning - see what classmates are asking, upvote best answers ✅ Study streaks - gamification (Duolingo-style) ✅ Professor dashboard - instructors track class progress ✅ Assignment helper - paste homework, get hints (not answers) ✅ Code execution - run Python/Java snippets inline ✅ Latex editor - write math-heavy notes with AI assistance

LONG-TERM (Vision): ✅ University partnerships - official adoption by CS departments ✅ Marketplace - share curated study collections ✅ Multi-language support - international students ✅ AR mode - point phone at textbook, get instant explanations ✅ Research assistant - help with literature review, paper writing ✅ Career prep - resume review, interview coaching integrated with LeetCode practice

MONETIZATION STRATEGY:

  • Free tier: 50 queries/month, 100 documents
  • Student ($5/mo): Unlimited queries, premium models (GPT-4, Claude)
  • Pro ($15/mo): Collaboration features, priority support, advanced analytics
  • University ($500/year/student): Institutional deployment, SSO, admin dashboard

IMPACT GOAL: Help 1 million students learn CS more effectively by 2026. Make AI tutoring accessible to everyone, not just those who can afford human tutors.

Built With

Share this project:

Updates