Problem Statement
Visually impaired individuals face daily, significant challenges in navigating unfamiliar environments, identifying common objects, and ensuring personal safety during routine activities. Existing solutions are often passive, expensive, or require complex hardware setups, leaving a gap for a truly intelligent, software-driven assistive agent.
Solution Description
Z3GION bridges this gap by transforming a standard mobile device into a proactive, autonomous virtual assistant. Equipped with real-time AI-powered object recognition and navigation support, it provides intuitive audio feedback to the user, enabling independence and significantly enhancing personal safety. By seamlessly integrating camera-based environment perception, advanced AI processing, route guidance via Google Maps, and text-to-speech technology, Z3GION delivers a frictionless and empowering user experience.
Key Features
Real-Time Virtual Assistant: Provides instant, context-aware assistance to the user by describing surroundings, people, colors, and light intensity.
Advanced Navigation & Proximity Alerts: Assists with location-based guidance using Google Maps API. It actively alerts the user to close proximity hazards to ensure safety.
Voice Commands & Multilingual Operations: Enables hands-free operation and allows the user to interact with the assistant in multiple languages.
OCR & Document Reading: Performs Optical Character Recognition to read text, documents, currency, and handwritten notes aloud.
Dynamic Model Switching: Seamlessly switches between high-powered pre-trained models (Gemini 2.0 Flash_Exp), custom-trained models, and offline models to ensure continuous operation even without internet access.
Secure Chat History: Provides a logged history of prompts and responses that the user can access at any time, securely stored on the device.
Working Implementation
Functional Prototype System
Our working prototype is a fully functional mobile application architecture that prioritizes low latency and high accessibility.
Frontend: Built natively using Kotlin, ensuring robust cross-platform compatibility and smooth integration with device hardware like cameras and microphones.
Backend: Developed using Python and Flask, which handles the RESTful API routing, AI model integration, and server-side processing.
Demonstrable Core Features
The current build successfully demonstrates:
Hands-free navigation initiated entirely through intuitive voice-based controls.
Multimodal feedback, converting visual surroundings into real-time auditory cues and tactile vibrations for dynamic environments.
Instant execution of the Gemini API for complex scene captioning and OCR.
Inspiration
The world is rapidly moving toward an "Agentic Economy" where digital agents execute complex tasks autonomously, yet the physical world remains deeply inaccessible for millions of visually impaired individuals. Our inspiration for Z3GION came from a glaring gap in the market: existing assistive tools are entirely passive. They might tell a user "there is a bus," but they cannot help the user interact with that bus.
We realized that true independence doesn't just come from seeing the world; it comes from navigating and transacting within it seamlessly. We wanted to build a mobile-first, proactive AI agent that acts as a true digital proxy for the user. Our drive was to take complex technologies—machine learning, data science, and autonomous agents—and package them into a simple, life-changing mobile interface.
What it does
Z3GION is an autonomous, multimodal mobile agent designed to empower visually impaired users. Rather than just acting as a camera, Z3GION operates as a centralized hub that understands its environment and communicates with other digital agents on the user's behalf.
Real-Time Perception & OCR: It instantly identifies objects, reads documents (OCR), and understands complex physical scenes using the device's camera.
Proximity & Safety Alerts: It calculates the distance of approaching obstacles and delivers real-time auditory and tactile alerts to keep the user safe.
Agentic Orchestration: Instead of requiring manual input, Z3GION anticipates needs. If the user is at a transit stop, Z3GION can identify the bus and theoretically interface with a transit agent to confirm the route.
Multilingual Voice Control: Users interact entirely hands-free via natural voice commands, supported in multiple languages to ensure global accessibility.
How we built it
We architected Z3GION to be highly scalable and incredibly fast. The foundation relies on a hybrid processing approach:
Frontend (Mobile): Built natively using Kotlin for robust, cross-platform mobile compatibility and smooth camera hardware access.
Backend & Data: Powered by Python and Flask, with MongoDB handling secure, localized storage of the user's chat history, preferences, and navigation logs.
The AI Engine: We integrated the Gemini 2.0 Flash_Exp API for heavy-duty, low-latency multimodal reasoning.
ASI: One Integration: To transition this from a simple app to an agentic business model, we utilized ASI: One during the prompt-a-thon. We used it to design our agent communication protocols, define our high-value user personas, and generate our go-to-market business narrative.
Challenges we ran into
Our biggest technical hurdle was the Latency vs. Accuracy trade-off. For a visually impaired user navigating a busy intersection, a two-second delay in processing is dangerous. We had to heavily optimize our Python backend to handle concurrent API calls to Gemini without blocking the main mobile UI thread.
Additionally, ensuring reliable offline capability was difficult. We built a Model Switching architecture that seamlessly drops down to custom-trained, on-device offline image captioning models when the user loses internet connectivity. Developing the business narrative inside ASI: One within a single day also forced us to rapidly pivot from thinking like pure engineers to thinking like product founders.
Accomplishments that we're proud of Zero-to-Agent in Hours: We successfully used ASI: One to map out an entire agentic business ecosystem, proving that Z3GION is not just a hack, but a scalable B2B2C product.
Sub-Second AI Responses: By fine-tuning our Flask endpoints and leveraging Gemini 2.0 Flash, we achieved near real-time auditory feedback for the user.
Affordable Accessibility: By centralizing the heavy lifting in software and cloud AI, we eliminated the need for users to buy expensive proprietary hardware—their smartphone is all they need.
What we learned
We learned that prompt engineering is product engineering. Using ASI: One taught us how to rapidly test business logic and user flows through natural language before writing a single line of backend code. We also deepened our knowledge of how to build AI applications that prioritize human trust; when building for the visually impaired, consistency and reliability are far more important than flashy features.
What's next for Z3GION
The next phase is fully integrating Z3GION into the broader Fetch.ai decentralized network. We plan to give Z3GION a wallet so it can autonomously execute micro-transactions for the user (e.g., paying for an autonomous taxi once it verifies the license plate visually). We will also expand our MongoDB schemas to securely store personalized spatial mapping data, allowing the agent to remember the specific layout of a user's home or office for even faster, localized navigation.
Built With
- api
- cv
- deep
- gemini
- imagedecoders
- kotlin
- learning
- llm
- machine-learning
- open-images
- python
- python-package-index
Log in or sign up for Devpost to join the conversation.