TypeMaster AI

Inspiration

Touch typing is a fundamental requirement for modern productivity, yet the pedagogical approach has not evolved in decades. Freely available tutors present users with static, repetitive string sets that quickly lead to disengagement and performance plateaus. I wanted to break this ceiling. My inspiration was to merge Generative AI with cognitive psychology—specifically operant conditioning—to build a system where users can practice typing topics they actually care about, while receiving real-time, gamified acoustic feedback to build motor-skill retention.

What it does

TypeMaster AI is a multimodal, generative typing environment. Instead of fixed sentences, it uses the Gemini 2.0 API to generate custom typing lessons on-demand—whether it is a 50-word story about quantum physics, or a valid block of Python code. It also features a multimodal OCR engine, allowing users to upload textbook PDFs or images to extract practice text. The standout feature is Melody Typing: every correct keystroke synthesizes a specific musical note in real-time (mapping characters to frequencies like C4=261.63 Hz). Typing a paragraph flawlessly allows the user to literally "perform" classical or pop melodies, gamifying the entire experience.

How we built it

I architected this as a decoupled Single Page Application (SPA) using React 19 and Vite.

The AI Layer: I implemented an AI Manager that sends structured prompts to Google Gemini 2.0. or other models To ensure production-grade reliability, I routed all traffic through an OpenRouter REST gateway, completely bypassing strict free-tier rate limits.

The Audio Engine: Instead of using laggy pre-recorded MP3 files, I engineered a custom synthesis engine using the browser's native Web Audio API. It uses OscillatorNode and GainNode to generate raw sine waves on the fly, creating an exponential decay amplitude envelope for each keystroke.

The UI Pipeline: High-frequency keyboard listeners were memoized using React's useCallback and useRef hooks to prevent state-drift during rapid input.

Challenges we ran into

API Rate Limiting & Overload: Direct calls to Gemini's free tier kept crashing the app during heavy generation. Solution: I engineered a failover architecture using OpenRouter as a managed proxy to absorb burst traffic.

Audio Latency: Standard HTML5 tags produced a 50-100ms delay, which completely ruined the user's typing rhythm. Solution: Pivoting to the low-level Web Audio API allowed me to achieve sub-5 millisecond latency.

React State Bottlenecks: At typing speeds exceeding 100 WPM, React's standard useState re-renders were causing input lag. Solution: I shifted the core keystroke tracking to mutable useRef objects and optimized the concurrent rendering pipeline.

Accomplishments that we're proud of

I am incredibly proud of achieving true zero-latency audio synthesis in the browser while running complex LLM calls in the background. Furthermore, I successfully formalized this architecture into an academic research paper, which is now officially published as a preprint on Zenodo (CERN) with a recognized DOI. Turning a software idea into a fully deployed application and a published paper as an undergraduate researcher is a massive milestone.

What we learned

Building TypeMaster AI was a masterclass in full-stack performance optimization. I learned the deep mathematics behind browser audio synthesis, advanced React concurrent rendering techniques, and how to write highly specific regex pipelines to sanitize markdown formatting out of raw LLM outputs so they render properly in a typing interface.

What's next for TypeMaster AI

The next phase is scaling this into a B2B SaaS architecture.

Adaptive Difficulty: Implementing an analytics engine that tracks per-user WPM and error rates to dynamically instruct the LLM to increase or decrease the complexity of the generated text.

Web Speech Integration: Building a live voice-to-text drill mode where the AI generates typing content based on spoken prompts.

Multiplayer Racing: Introducing WebSockets to allow competitive, real-time typing battles synced to the generated melodies.