AirCanvas interface enabling gesture-based math problem solving
Skeletal hand landmarks and gesture classification driving multimodal math solving
Step-by-step mathematical reasoning generated from hand-drawn input
A gesture-first interface for AI-powered mathematical reasoning

About the Project — AirCanvas

💡 Inspiration

The idea for AirCanvas came from a simple frustration: thinking mathematically feels natural, but expressing it digitally does not. Writing equations like
[ x^2 + y^2 = r^2 ] on a keyboard breaks cognitive flow and adds unnecessary mental overhead. Watching students struggle more with input tools than concepts made us ask—what if math could be expressed as naturally as drawing it in the air?

This question led us to explore embodied cognition and spatial computing as a way to reduce the gap between human intuition and machine reasoning.

🧠 What We Learned

Through this project, we gained deep insights into:

Multimodal AI systems, especially how vision and language models can collaborate.
Computer vision pipelines, including real-time hand landmark detection and gesture state machines.
Human–Computer Interaction (HCI) principles like cognitive load reduction and constructivist learning.
The importance of hybrid edge–cloud architectures for balancing latency, privacy, and energy efficiency.

We also learned that designing for humans is as challenging—and as important—as designing algorithms.

🛠️ How We Built It

AirCanvas is built as a multimodal framework:

Hand Tracking (Edge Layer)
- Used real-time computer vision to track hand landmarks at ~30 FPS.
- Implemented smoothing techniques to handle natural hand tremors.
- Designed a gesture state machine to differentiate between drawing, hovering, and clearing actions.
Reasoning Engine (Cloud Layer)
- Captured the drawn mathematical representation as visual input.
- Sent it to a multimodal large language model for transcription, interpretation, and step-by-step solving.
- Returned both the final answer and reasoning to the user.

This split architecture ensured responsiveness while keeping computationally expensive reasoning efficient.

🚧 Challenges We Faced

Gesture Noise & Precision: Free-space drawing is inherently unstable. Filtering without harming user intent was a major challenge.
Symbol Ambiguity: Distinguishing between similar symbols (e.g., 1 vs l, + vs t) required careful visual heuristics.
Latency vs Accuracy: Balancing real-time interaction with accurate AI reasoning was non-trivial.
Designing for Accessibility: Ensuring the system works for users with different motor abilities pushed us to rethink interaction design.

🚀 Outcome

AirCanvas demonstrates that mathematics can be spatial, intuitive, and inclusive. By removing the traditional input bottleneck, we open new possibilities for AI-assisted education, especially in classrooms, accessibility-focused learning, and future spatial computing environments.