Inspiration
Traditional voice assistants often provide shallow answers without context or curated resources. Inspired by Marvel’s JARVIS and FRIDAY, as well as educational platforms like Perplexity AI, Khan Academy, and 3Blue1Brown, we created Monday, a voice-enabled AI companion that reasons, teaches, and integrates multimedia learning tools. Our goal is to bridge the gap between natural language voice queries and high-quality educational content, making Monday a learning partner that adapts to your needs, whether you want quick facts, deep reasoning, or comprehensive academic research.
This project is also of deep personal importance to us because our mutual friend Chatur has struggled to find accommodating learning platforms that adapt to his needs. His experience made us realize how many educational tools overlook accessibility, flexibility, and personalization, and we hoped to create that with Monday.
What it does
Monday is a voice-first learning assistant with three core modes: Basic Mode for quick factual answers with source citations (e.g., "What is photosynthesis?").
Reasoning Mode that provides step-by-step logical explanations …(activated by saying "think about").
Deep Research Mode that performs thorough investigations across multiple sources, visualized as a web of connected information (activated by saying "research into").
You start by saying "Hey Monday," and interact through voice. The app uses speech recognition to capture your queries, sends them to Perplexity's API, and presents answers as text on floating panels. For visual topics like data structures or scientific models, dynamic 3D models appear and animate. The AI’s voice replies are powered by ElevenLabs text-to-speech, creating an immersive multimodal learning experience.
Monday is also optionally VR-enabled, but designed to be fully functional without a headset, making it accessible to users with different physical abilities and technology access. The voice-first interface supports hands-free learning, which is particularly helpful for users with mobility or visual impairments. By combining audio responses, visual panels, and interactive 3D models, Monday adapts to different learning styles, whether auditory or visual, ensuring an inclusive experience for all types of learners.
Key features include:
- Voice interaction for hands-free queries.
- Real-time 3D visualizations of concepts.
- Integration with curated YouTube educational content from channels like Khan Academy, MIT OpenCourseware, etc.
- Multi-modal feedback combining text, speech, and spatial panels.
How we built it
Backend
- Perplexity Sonar API powers AI-generated responses in all modes.
- ElevenLabs API provided natural-sounding text-to-speech output.
- Node.js and Express for REST API and WebSocket server enabling real-time communication.
- YouTube Data API fetches and filters educational videos from trusted sources.
- Socket.IO handles messaging between frontend and backend.
- Implemented fallback mechanisms and custom ping/pong logic to improve WebSocket stability.
Frontend
- React with TypeScript for a dynamic UI, styled with Tailwind CSS.
- react-speech-recognition library enables voice commands.
- Three.js and WebXR create interactive 3D visualizations and spatial learning panels.
- Automated validation of YouTube content to ensure high-quality educational resources.
- Debugged and optimized real-time sync between voice input, AI processing, and UI feedback.
Challenges we ran into
One of the main challenges was achieving seamless real-time synchronization between voice input, AI responses, and dynamic UI feedback. We spent many hours testing out the API responses until we were able to switch seamlessly between models and get satisfactory answers, all while having to make sure voice worked as well.
We also faced frequent WebSocket disconnects early on, which we solved by implementing custom ping/pong logic and automatic reconnection handlers. Detecting voice commands correctly was another hurdle because initial implementations misclassified user intent, so we refined our mode detection using natural language pattern matching.
Unexpectedly, version control became a significant obstacle. Our Git repository began generating submodules during commits, which caused conflicts and broke our local setups. After several failed attempts to fix it, we created a new repository to share code more cleanly. Unfortunately, this meant only one of us could commit easily to the old repo without constant configuration issues, slowing down our collaboration for a while.
Accomplishments that we're proud of
We're especially proud of building out an end-to-end voice-to-reasoning pipeline that delivers contextual, step-by-step explanations in real time. Creating immersive 3D visualizations that respond to queries added a whole new layer to the learning experience and it was a rewarding challenge to build them out. We challenged ourselves by making our platform VR-compatible, but what truly makes us proud is the accessibility breakthrough we've achieved.
By designing Monday as a completely hands-free, voice-driven learning companion, we've inadvertently created something much more significant than we initially envisioned. The entire system operates without requiring any manual manipulation - no controllers, no keyboards, no mouse clicks. For learners with mobility impairments, including those with limited hand function, paralysis, or conditions like cerebral palsy or ALS, Monday removes the physical barriers that often make digital learning tools inaccessible. They can engage with complex educational content simply by speaking, with Monday handling all the spatial arrangement and visualization automatically. The system's ability to bring content closer or push it away through voice commands means that even navigation through the learning space requires no physical movement.
The audio-first design, with Monday's conversational responses and careful TTS implementation, also opens doors for visually impaired learners. While VR might seem counterintuitive for this community, our approach creates an immersive audio learning environment where spatial audio cues help organize information in 3D space. The reasoning chains and knowledge connections are explained verbally while also being visualized, creating a multi-sensory learning experience that adapts to different needs. For individuals with learning differences like dyslexia or ADHD, the combination of verbal explanation with dynamic 3D visualization can be transformative. Abstract concepts become tangible objects floating in space, making them easier to grasp and remember. The conversational nature of the interaction feels less like studying and more like having a patient tutor who can explain things multiple ways without judgment.
We believe Monday represents a new paradigm in accessible education technology, one where physical limitations don't determine access to knowledge, where learning adapts to the individual rather than forcing the individual to adapt to the tool, and where the most advanced AI capabilities are available to everyone regardless of their abilities. In building a "Jarvis for learning," we've actually built something more important: a learning companion that truly leaves no one behind.
What we learned
This project taught us a lot about designing natural voice user experiences, particularly around latency, feedback timing, and maintaining user flow. We gained a lot of experience integrating LLM APIs for educational use and learned how to tailor responses for different learning depths. On the frontend, managing complex application state across WebSocket events, voice input, and 3D rendering in React was a significant technical undertaking. Perhaps most importantly, we learned the importance of keeping humans in the loop for validation when building educational tools powered by generative AI.
What's next for Monday
We’re excited about expanding Monday's capabilities. First, we plan to support multiple languages, starting with Spanish and Hindi, to make the tool more inclusive. We're also working on collaborative learning features, allowing users to share spatial panels in real-time for group study sessions. We also want to introduce customizable AI avatars with distinct personalities and teaching styles, and eventually open up our platform to educators through an API that lets them upload their own lesson plans and video content.
Monday is just getting started!

Log in or sign up for Devpost to join the conversation.