Pitch deck https://canva.link/l2g8i1z6lmxd4o1
Inspiration
The most capable AI models today share two assumptions that quietly exclude the people who need them most: they assume a reliable internet connection, and they live on a server. For the 8 million completely blind individuals in India, connectivity is weakest exactly when they need help the most—an unfamiliar street or a rural bus stand. Furthermore, existing assistive wearables are often priced like premium medical devices, or force users to awkwardly hold their phones, occupying the hands they need for a cane. We built Percevia to completely dismantle these barriers: an entirely offline, highly affordable wearable ecosystem that ensures true independence is never gated behind an internet connection or an exorbitant price tag.
What it does
Percevia is an offline-first vision assistant consisting of a smartphone app and custom smart glasses. It answers the critical question, "What is in front of me?" by describing scenes, reading text, recognizing faces, and warning of obstacles via voice and dynamic haptics.
Because we built the UI specifically for muscle memory (a rigid 2x2 grid of massive buttons), users can navigate it entirely by touch. When the glasses are connected, they act as a seamless hardware extension, freeing the user's hands entirely while the phone stays in their pocket processing the environment.
How we built it
Percevia fundamentally refuses traditional cloud architecture. We built an Edge-Compute Engine where our primary vision-language model, Gemma 4, lives entirely on the user's phone, accelerated locally via Google AI Edge's LiteRT on the device's GPU.
To achieve True Spatial Awareness, we built a continuous dual-model pipeline. We divide the user's field of view into a rigid 3x3 grid. When our YOLO model detects an obstacle, we map its centroid to a sector. Concurrently, we paired this with MiDaS (a monocular depth estimation model) to generate a dense topological depth map.
Fusing these streams drives our proprietary Zone Weighting Logic for haptic feedback. The haptic intensity response $H$ is calculated dynamically:
$$H = \sum_{i=1}^{9} (O_i \cdot W_i)$$
Where $O_i$ represents the proximity scalar of an object in zone $i$, and $W_i$ is the assigned zone weight. We assigned a critical weight to the bottom-center grid (Zone 7, $W_7 = 1.0$) to trigger high-intensity continuous vibration for immediate path obstacles, while peripheral zones are dampened ($W_i = 0.2$).
The Hardware Bridge consists of custom ESP32-based smart glasses that spin up their own Wi-Fi and serve an MJPEG stream. The app pulls a clean JPEG by scanning the byte stream for frame markers, feeding the exact same Gemma 4 LiteRT inference pipeline.
Challenges we ran into
- Model Concurrency Crashes: Gemma 4 is a heavy model. We found that the LiteRT-LM engine crashes hard (SIGSEGV) if a second inference touches the model while one is still running. We solved this by wrapping every request in an exclusive lock that queues sessions safely and streaming tokens straight into our TTS engine to eliminate wait times.
- Taming the OS Screen Readers: Android's native TalkBack swallowed the double-tap gestures we used to stop speech. Instead of forcing users to buy external Bluetooth clickers, we engineered an invisible, zero-size screen-reader-only overlay that exposes a "Stop reading" semantic action that TalkBack can focus on perfectly.
Accomplishments that we're proud of
- True Offline Autonomy: Achieving high-accuracy obstacle detection and scene description entirely on the edge with zero cloud dependency.
- Seamless Hardware Handoff: Creating a robust MJPEG over Wi-Fi stream that allows cheap, commodity hardware to rival devices that cost 10x as much.
- Deep Accessibility: Designing a UI that completely rejects scrolling, opting for a static layout that can be mastered via muscle memory in minutes.
What we learned
We learned that in assistive technology, filtering out noise is just as critical as detecting the signal. Balancing the hardware constraints of microcontrollers with the heavy demands of spatial AI taught us the vital importance of software-level optimization (like our YOLO+MiDaS fusion grid) to solve hardware-level problems.
What's next for Percevia
We are actively transitioning from lab validation to field deployment. Having validated the core architecture, our immediate next step is to deploy our first pilot batch of 50 units into the hands of real users to stress-test the hardware ergonomics and edge-inference stability in daily, unpredictable environments.
Log in or sign up for Devpost to join the conversation.