Inspiration:
AirNote was inspired by the idea of merging AI, augmented reality, and natural note-taking into one seamless experience. During brainstorming, we realized that while there are hundreds of note apps, none truly integrate AI-driven understanding with real-time visual input. We wanted to build a system that could see what you see, understand it, summarize it, and intelligently connect ideas across sessions, just like the human brain does.
“People without smart glasses may one day be at a ‘significant cognitive disadvantage’ compared to those who do use the tech.” - Mark Zuckerberg
So why don't we bring this technology into the hands of everyone? Making the world YOUR notebook.
What It Does:
AirNote is an AI-powered visual notebook that transforms real-world drawings, annotations, and objects into searchable, connected digital notes. Using computer vision, every bounded area or object the user interacts with is automatically captured, combined with any drawings or overlays, and sent to our backend.
Each screenshot is then analyzed by Google Gemini, which generates: • A short summary of what’s visible • Context-aware labels and tags • AI embeddings for semantic clustering
These labeled notes are displayed on a modern dashboard, visualized in an Obsidian-style interactive vault graph, and can be queried through an AI assistant using either text or voice.
How We Built It:
We used a multi-stack approach combining: • Python (FastAPI) backend with Firebase Firestore + Cloud Storage • Google Gemini API for image understanding, summarization, and embeddings • React + Vite + Tailwind + Framer Motion frontend for a clean, dynamic UI • react-force-graph for rendering our interactive vault graph • Voice and AI integration for natural question-answering over captured visuals
Our system periodically saves synchronized camera and drawing screenshots, merges them into one composite image, uploads them to Firebase, and analyzes them with Gemini. The resulting notes are automatically embedded and clustered into a knowledge graph representing related ideas.
Challenges We Faced: • Combining Visual Layers: We had to merge the camera feed and drawing overlay into a single frame so Gemini could process the entire context. • Gemini Integration: Ensuring the AI could accurately label and summarize complex scenes took fine-tuning prompt structures and data pipelines. • Graph Intelligence: Creating meaningful connections between notes required embedding text and comparing similarity scores using cosine similarity. • Realtime Updates: Syncing Firebase with React’s live UI while maintaining smooth animations and state management was a major front-end challenge. • Performance + Scaling: Processing and embedding high-resolution images on the fly without delays required optimizing how we batch and upload data.
What We Learned: • How to integrate multimodal AI models (image + text) into real-time applications. • Efficiently building scalable, event-driven systems with Firebase + FastAPI. • Visualizing data meaningfully using 2D graph layouts and clustering algorithms. • Balancing UX design and engineering to make AI-powered tools approachable and intuitive.
The Impact:
AirNote transforms passive note-taking into an active, intelligent, and visual experience. It bridges the gap between human creativity and machine understanding, showing how AI can organize our ideas as we think them, not after.
What’s Next: • Real-time Gemini analysis for live note summaries • Smarter vault graph with dynamic clustering • Voice-based interaction for hands-free use • Multimodal input (speech + handwriting) • Shared vaults and team collaboration • Cloud deployment with user authentication • Scalable backend using Firebase Functions or Cloud Run • Polished UI/UX for public beta launch
Built With
- api
- conda
- css3
- fastapi
- firebase
- firestore
- framer-motion
- gemini
- google-cloud
- html5
- javascript
- mediapipe
- node.js
- opencv
- pillow
- python
- react
- tailwind
- typescript
- uvicorn
- vite

Log in or sign up for Devpost to join the conversation.