About the Project — AirCanvas
💡 Inspiration
The idea for AirCanvas came from a simple frustration: thinking mathematically feels natural, but expressing it digitally does not. Writing equations like
[
x^2 + y^2 = r^2
]
on a keyboard breaks cognitive flow and adds unnecessary mental overhead. Watching students struggle more with input tools than concepts made us ask—what if math could be expressed as naturally as drawing it in the air?
This question led us to explore embodied cognition and spatial computing as a way to reduce the gap between human intuition and machine reasoning.
🧠 What We Learned
Through this project, we gained deep insights into:
- Multimodal AI systems, especially how vision and language models can collaborate.
- Computer vision pipelines, including real-time hand landmark detection and gesture state machines.
- Human–Computer Interaction (HCI) principles like cognitive load reduction and constructivist learning.
- The importance of hybrid edge–cloud architectures for balancing latency, privacy, and energy efficiency.
We also learned that designing for humans is as challenging—and as important—as designing algorithms.
🛠️ How We Built It
AirCanvas is built as a multimodal framework:
Hand Tracking (Edge Layer)
- Used real-time computer vision to track hand landmarks at ~30 FPS.
- Implemented smoothing techniques to handle natural hand tremors.
- Designed a gesture state machine to differentiate between drawing, hovering, and clearing actions.
Reasoning Engine (Cloud Layer)
- Captured the drawn mathematical representation as visual input.
- Sent it to a multimodal large language model for transcription, interpretation, and step-by-step solving.
- Returned both the final answer and reasoning to the user.
This split architecture ensured responsiveness while keeping computationally expensive reasoning efficient.
🚧 Challenges We Faced
- Gesture Noise & Precision: Free-space drawing is inherently unstable. Filtering without harming user intent was a major challenge.
- Symbol Ambiguity: Distinguishing between similar symbols (e.g.,
1vsl,+vst) required careful visual heuristics. - Latency vs Accuracy: Balancing real-time interaction with accurate AI reasoning was non-trivial.
- Designing for Accessibility: Ensuring the system works for users with different motor abilities pushed us to rethink interaction design.
🚀 Outcome
AirCanvas demonstrates that mathematics can be spatial, intuitive, and inclusive. By removing the traditional input bottleneck, we open new possibilities for AI-assisted education, especially in classrooms, accessibility-focused learning, and future spatial computing environments.
Log in or sign up for Devpost to join the conversation.