Inspiration

Education is the master key, but the lock is different for everyone. I was inspired by the silent struggles of students facing cognitive or sensory challenges—specifically dyslexia and hearing impairments. I realized that the rigid structure of traditional schooling could be completely reshaped using Artificial Intelligence. I wanted to build a platform that allows anyone to scale the "mountain of knowledge", using real-time visual techniques to break down the barrier between complex concepts and the learner.

What it does

Mount AI Scholar is an advanced, multi-modal educational platform. It features a real-time vocal analyzer, phoneme-grapheme mapping for dyslexia, sound-to-visual transformations for the hearing impaired, and a dynamic revision system spanning 8 languages. It takes raw educational input and outputs interactive summaries, fun quizzes, and accessible learning modules.

How I built it

I designed a modern, highly decoupled architecture to ensure both speed and scalability:

  • Frontend (The Interface): Built with React and Tailwind CSS, creating a sleek, accessible, and ultra-responsive dashboard.
  • Backend & Data: Powered by FastAPI for high-performance, asynchronous API routing, with Firebase handling secure user authentication and real-time database synchronization.
  • AI & ML Pipeline (The Engine): I integrated custom Machine Learning pipelines in Python, heavily leveraging the Hugging Face ecosystem for state-of-the-art NLP, audio processing, and language translation.
  • Search & Retrieval: I used the Elastic stack to instantly query, filter, and retrieve educational data and text embeddings at lightning speed.

Challenges I ran into

The biggest hurdle was orchestrating the real-time inference pipeline. Capturing live audio or text from a React frontend, routing it through a FastAPI server to query massive Hugging Face ML models, and returning visual outputs with near-zero latency required intense optimization. Furthermore, mapping audio to visual phonemes with high accuracy requires strict probabilistic models. I had to deeply understand the math behind speech-to-text algorithms, represented fundamentally by Bayes' Theorem:

$$ P(w | a) = \frac{P(a | w) P(w)}{P(a)} $$

(Optimizing the probability of a word/phoneme sequence $w$ given the acoustic feature sequence $a$).

What I learned

I learned how to bridge the gap between experimental AI/ML and production-ready Web Engineering. Building an asynchronous API with FastAPI to serve heavy Hugging Face models taught me a lot about network bottlenecks and concurrency. I also solidified my skills in state management with React and cloud architecture with Firebase.

What's next for Mount AI Scholar

This Web/Python foundation is just Phase 1. My ultimate "Master Plan" is targeted at the Apple WWDC Swift Student Challenge 2027. I plan to optimize and port this entire ML pipeline into CoreML. From there, I will rebuild the user interface using SwiftUI and ARKit, creating a fully immersive spatial computing environment where students can literally see, touch, and interact with floating phonemes in their own living room.

Built With

Share this project:

Updates