🧠 Inspiration

The inspiration for the Cognitive Arena module of Mount AI Scholar stemmed from a critical gap in modern EdTech: the lack of real-time, privacy-first cognitive remediation for dyslexia and phonological processing disorders. The complexity of mapping phoneme-grapheme correspondence requires more than just flashcards; it requires a dynamic environment that listens, analyzes, and adapts. I wanted to build an engine that doesn't just score a student, but algorithmically understands their phonetic deviation and generates instant, customized remediation pathways.

⚙️ What it does

The Cognitive Arena is a secured neuro-cognitive session environment. It provides:

  1. Active Phonetic Rehabilitation: Users practice pronunciation while the system performs live comparative validation against expected transcripts.
  2. AI-Driven Remediation: Leveraging Gemini (as a bridge toward my ultimate Gemma 4 Edge local inference architecture), the system detects specific graphemic confusions (e.g., b/d/p/q or Sibilants/Fricatives) and instantly creates adaptive practice regimens.
  3. Secure Patient Sessions: Every session is securely logged with 100% data segregation, ensuring absolute privacy and allowing for historical progress tracking.

🔨 How I built it

I engineered the platform using a modern, scalable full-stack approach:

  • Frontend Architecture: React 18, TypeScript, and Tailwind CSS. The UI is heavily optimized using functional components, hook-based state management, and strict render control to prevent UI blocking during real-time cognitive processing.
  • Backend & Security: Google Firebase. I implemented Google Authentication married with Firestore. For absolute data integrity, I wrote strict Firestore Security Rules ensuring that arena_sessions are inaccessible to anyone but the authenticated user.
  • AI & Voice Pipeline: I integrated the Web Speech API for zero-latency voice ingestion, passing the transcript delta to our AI remediation engine to construct the patient's cognitive profile in real-time.

Mathematically, the core challenge is minimizing the phonetic deviation $D$. If $p_i$ is the target phoneme and $\hat{p}i$ is the spoken phoneme, the system's objective function for the remediation engine is to construct exercises that minimize the error over $N$ speech samples: $$ \mathcal{L}{cognitive} = \frac{1}{N} \sum_{i=1}^{N} \omega_i \cdot \delta(p_i, \hat{p}_i) $$ Where $\omega_i$ represents the contextual weight of the specific phoneme based on the user's historical dyslexia profiles.

⚠️ Challenges I ran into

Building a synchronous AI pipeline over asynchronous voice streams in the browser was structurally complex.

  • State Management: Handling continuous audio transcripts without causing recursive render loops in React required heavy memoization and strict useEffect dependency arrays.
  • Privacy-by-Design Compliance: Writing standard NoSQL queries wasn't enough. I had to design compound queries and granular Firebase security rules to ensure patient data could theoretically pass highly strict medical/educational compliance frameworks.

🏆 Accomplishments that I'm proud of

I am incredibly proud of the seamless UX/UI integration. The Arena doesn't look like a standard, clunky medical or educational app. It features a dark, immersive "Cosmic-Slate" UI with glassmorphism, instant visual feedback, and a highly performant session history dashboard. It looks and feels like elite software, while housing highly technical cognitive-computing operations under the hood.

📚 What I learned

Managing real-time interaction between a client-side Speech API, a NoSQL data stream (Firestore), and an AI generation layer forced me to think deeply about architectural bottlenecks. I leveled up my ability to structure non-blocking UI threads and manage secure, scoped data streams in Firebase.

🚀 What's next for Mount AI Scholar

The Cognitive Arena is Phase 1. The Master Plan is to transition the cloud-based AI calls entirely to local edge inference (deploying Gemma 4 locally via Python/FastAPI) to achieve true, offline "Privacy by Design" with zero data leaving the device. Following that, Phase 2 targets the WWDC 2027 Swift Student Challenge: porting this entire engine into the Apple ecosystem using CoreML and projecting these neuro-cognitive sessions into spatial environments via ARKit and SwiftUI.

Built With

Share this project:

Updates