Inspiration
With one teacher having to deal with up to 60 students in many rural public schools, there isn’t enough time for the teacher to slow down for 30 minutes to explain how to carry over (the "1" in this case) to one child who might be struggling. When kids are continually falling behind or getting frustrated with learning, they eventually drop out. The existing large EdTech providers (e.g. Byju’s, Khan Academy) deliver passive, text-heavy video content and quiz-based assessments to students. They assume the student has his or her own laptop (i.e. by providing all lessons online), has an email address, and has a high level of digital literacy. This creates a big problem for students living in low-income families (i.e. living below the poverty line, or "BPL") because these families typically have one smartphone to share amongst all their family members. As a result, neurodivergent students (such as students with ADHD or dyslexia) are typically unable to engage with the text box on these platforms. The thought occurred to me that in order for technology to help solve the dropout crisis it needs to work for the student and adapt to the student, and not have the student adapt to technology. You thought about creating an AI with infinite patience that provides the feeling of a real friend sitting right next to him/her. Instead of putting a 7-year-old in front of a computer and making him/her type everything, he or she could simply speak naturally, send the camera what he/she is doing with physical objects, and learn through touch.
What it does
Vidya Spark makes passive screen time a fun and engaging multi-modal learning experience for children under the age of 6, children with neurodiversity, and families living under the poverty line. It achieves this by using: No need for email addresses or complicated passwords—students sign in using a family cell phone number and receive an SMS code to log in. Sparky can talk to students with unlimited patience—Vidya Spark uses both the Web Speech API and Computer Vision for students to speak naturally rather than type in a text box. Students can also hold physical objects in front of their webcam for a direct evaluation of the matter in the real world. Interactive interface to replace static multiple choice quizzes—children will learn through hands-on activities and experience while contributing to Vidya Spark by playing with an on-screen digital abacus, using an HTML5 Canvas for free-form drawing, and building sentences by dragging and dropping words from a list on to a sentence-building tools. While children are playing and learning, Vidya Spark logs all interactions, streaks, and struggles. The real-time telemetry data is sent to an Enterprise Dashboard that creates multi-axis mastery charts to assist with targeting students who are 'at risk' of falling behind and notify the human teacher(s) who will provide necessary assistance.
How we built it
From day one, Vidya Spark was built with an eye towards scalability, speed and safety by utilizing a modern, decoupled Serverless-First SaaS Architecture.
Frontend (The Face): Our client-side was built using React 19, Vite, and TailwindCSS 3. Vite gives us rapid build times, while Tailwind enables us to create beautiful and animated interfaces to keep the kids engaged! Zustand was chosen for state management to handle complex global state management issues (including real-time point tracking and auth) without the excessive boilerplate of Redux.
Backend & AI Proxy (The Brain): Core logic runs on Node.js and Express server. The thin "AI Proxy" design allows us to connect with the OpenAI API (gpt-4o-mini) securely. The React app never holds API Keys eliminating the possibility for key scraping.
Database & Authentication (The Spine): We use Supabase (PostgreSQL) as our back end. Supabase provides native SMS OTP Authentication (BPL Families can log on without having an email) and Row Level Security for managing telemetry data safety for thousands of students (RLS). Module Interactivity & Connecting With Hardware: We have done more than just provide standard/typical chatbot-like interfaces; we have provided completely custom designed/interfaced tactile UI components (Abacus - Drag and Drop Using CSS/PAD - HTML 5 drawing pad/ Terminal Logic Block Based Coding). In addition, we integrated the browser's Web Speech API for voice input/output to the regenerated device camera frames were used to send in real-time frame capture (live pictures) for gpt-4o-mini Vision.
Ecosystem Deployments - To demonstrate production readiness, we created a single Docker image containing the React app and Node server; this entire ecosystem has been deployed seamlessly via hugging face spaces.
Challenges we ran into
Standard LLMs are pedantic and fail kids for typos. To create a hyper-forgiving, empathetic, and supportive teacher experience for Sparky’s system prompt (e.g. revealing that “Aep” is said like “Ape” but without providing the correct answer) was challenging to iterate on.
Managing multiple states across the Web Speech API, live webcam frames for AI Vision, and custom tactile UIs (like our CSS Abacus) created huge amounts of state complexity. Heavily optimizing our Zustand store was needed to reduce UI lag and dropped telemetry.
Creating an authentication experience for a 6-year-old child using a shared family phone was an incredible challenge. Implementing Supabase SMS OTPs required rigorous configuration of Postgres Row Level Security (RLS) for student data to remain completely siloed while still being able to be accessed via the NGO Admin Dashboard.
Processing real-world webcam frames using Node.js as a proxy to communicate with gpt-4o-mini Vision created inherent latency in all interactions with Sparky since we had to finely tune the size and formatting of our payloads to create an experience that felt natural and conversational.
Accomplishments that we're proud of
Delivering a Real SaaS MVP: We didn't just create a Figma mockup or local script; we actually created a true marketplace product by engineering and deploying a production-quality, Serverless architecture to Hugging Face Spaces—everything was containerized with Docker, and the product is fully decoupled from our primary operations.
Solving the "Email Barrier": We are so proud of the dual-auth system that we have developed with Supabase SMS OTP confirmation. The development of this product means that all families on the BPL list in India can access the EdTech marketplace today; thereby, we are removing the biggest barrier to entry for EdTech in rural India.
Bringing "Sparky" to Life: We have integrated the Web Speech API and gpt-5-mini Vision to turn a text-only based LLM into a fully multi-modal, empathetic tutor. Seeing the AI accurately evaluate physical objects via a webcam in real time was a major breakthrough for us.
Instead of using generic web forms, we created drag-and-drop custom components (like the CSS Abacus and HTML5 Canvas). By doing this, we have created an environment where kinesthetic and neurodivergent learners can thrive.
Closing the Telemetry Loop: By transforming raw real-time Postgres database logs into visual, actionable radar charts for NGO Admins, we have effectively closed the educational pathway between student play and teacher intervention. Therefore, our technology truly scales the human impact on education.
What we learned
Understanding How to Utilize Empathy in Creating Larger Programming Interfaces (PIs): We created an identity for an AI targeted toward working with children, rather than simply providing them with factual responses; this was a very challenging task.
Understanding True Accessibility for Individuals With Low Income: We quickly realized that having an email address or being able to type severely restricts our target audience (and thus affects how we create PIs for them). Therefore, we need to use either telephone numbers to log our users in or voice-to-text to allow them to log in to PIs.
Developing Skills to Support Multi-Modal State Management: We learned how important utilizing thin components in global state management is when programming with Web Speech API (i.e., using continuous audio input for processing) and live webcam video frames with interactive UI components. The continuous audio stream provided by the Web Speech API supported us while managing a significant amount of global state in our React PIs.
Creating a Secure Architecture for AI: We are becoming increasingly clear regarding the importance of creating a secure SaaS architecture, as evidenced by the development of our thin Node.js AI Proxy to hold all sensitive API keys when interfacing with powerful LLMs (e.g., gpt-5).
What's next for Vidya Spark
We hope to use the Twilio API to send milestone updates and streak alerts straight to the parent’s WhatsApp (for example- “Sidd achieved 90% in Abacus today! Make sure to let him know he did a great job!”).
We are looking at using localized text-to-speech engines instead of just the standard Web Speech API such as Bhashini and ElevenLabs to allow Sparky to ‘talk’ to kids in Hindi, Marathi and Tamil fluently.
Since internet connectivity in rural areas is often unreliable, we will develop a Progressive Web Application that creates a local version of a physical drawing board. By using service workers, the offline versions of the drawing board will still be valid even when there is no internet; once connectivity is re-established, the app will sync to the API and the telemetry data will be restored.
We plan to embed lightweight TensorFlow.js models such as PoseNet into a web browser to track children’s hand movements in real time for ‘air-drawing’ and physical motions without any latency from the server.
Log in or sign up for Devpost to join the conversation.