This app is an AI-powered adaptive tutor that watches how a student reacts while learning and adjusts explanations in real time. Instead of forcing users through fixed lessons, it dynamically responds to their facial expressions to detect confusion, understanding, or distraction. When a user asks a question, the system generates a clear explanation and presents it in small, digestible chunks. While the user reads or listens, the app analyzes their facial expressions through the device camera to detect signals such as confusion, focus, or moments of realization. If the system detects confusion, it immediately re-explains the concept in a simpler way, breaking the idea down further or using different analogies. If the user appears focused, the lesson continues normally. When the app detects a “eureka” moment, it knows the concept has clicked and moves forward. This creates a continuous feedback loop between the learner and the AI tutor. Instead of waiting for a student to ask for help, the system proactively adapts the explanation until the user truly understands. The result is a learning experience that feels personal, responsive, and human-like—similar to sitting with a tutor who can read your expressions and instantly change how they explain something. By combining computer vision, AI tutoring, and real-time feedback, the app turns passive explanations into an interactive learning process that adapts to each student's understanding moment by moment. Building this project came with several challenges, especially because I worked on it entirely solo. The app combines multiple complex systems—computer vision, AI tutoring, real-time feedback, and audio and integrating all of these components into one smooth experience required careful design and debugging. Managing both the backend AI logic and the real-time camera pipeline alone made development significantly more demanding. Another challenge was access to APIs and resources early in development. Many of the services that would have simplified parts of the project—such as voice synthesis platforms like ElevenLabs—require credits or paid access. Since I wasn’t able to obtain free credits at the start, I had to find workarounds, test with alternative tools, and design the system in a way that could later integrate those services once access became available. A major technical hurdle was optimizing facial expression detection for speed. The tutoring system depends on recognizing a user’s expression almost instantly so it can adjust the lesson in real time. This meant carefully optimizing the face-analysis pipeline, reducing latency, and ensuring the model could process frames quickly enough to feel responsive. Balancing accuracy with performance was particularly challenging, especially when running these processes continuously from a webcam feed. Despite these obstacles, solving these problems helped shape the project into a fast, responsive adaptive learning system that can react to a learner’s understanding moment by moment.
Log in or sign up for Devpost to join the conversation.