💡 Inspiration Imagine walking down a busy road in Kenya with your eyes closed. The sounds of traffic are overwhelming, and every step feels like a gamble. This is the daily reality for millions of visually impaired individuals.
My name is Abel, and as a student at Kirinyaga University and the lead at Deep Intel (the AI branch of my company, Alpha), I wanted to bridge the gap between biological sight and artificial intelligence. I realized that while LLMs are great at chat, their true potential lies in Multimodal capabilities—acting as a real-time pair of eyes for those who need it most.
DeepSight was born from a simple question: Can I build an AI that doesn't just describe a photo, but actively protects a human being?
🚀 What it does DeepSight is a mobile application that serves as a visual guide for the blind. It uses the camera to continuously scan the environment and provides real-time, spoken audio feedback.
Hazard Detection: It classifies the path ahead as SAFE, CAUTION, or DANGER.
Instant Navigation: It identifies obstacles (holes, cars, poles) and suggests how to move.
Low Latency: Optimized to speak short, critical warnings (under 6 words) to ensure the user reacts instantly.
⚙️ How we built it The app is built using Flutter for a cross-platform mobile experience. The core intelligence is powered by Google's Gemini API.
Camera Stream: We capture live frames from the device camera.
AI Analysis: Each frame is sent to the Gemini model (specifically leveraging the newer Gemini 2.5 Flash for speed).
Prompt Engineering: We designed a strict system instruction to force the AI into a "Safety Officer" persona, outputting structured data like [DANGER_LEVEL] : [DESCRIPTION].
Text-to-Speech (TTS): The response is immediately converted to speech, guiding the user without them needing to touch the screen.
Here is a snippet of the logic that handles the continuous visual stream:
Dart
// The heartbeat of DeepSight Future _analyzeFrame() async { final image = await _cameraController.takePicture();
// Sending visual data to Gemini 2.5 Flash final response = await _model.generateContent([ Content.multi([ TextPart("Describe path ahead. Hazards? Output strictly: [LEVEL]: [Adivce]"), DataPart('image/jpeg', await image.readAsBytes()), ]) ]);
// Immediate Audio Feedback _speak(response.text); } 🔧 Challenges we ran into The biggest hurdle was API Versioning and Model Availability. We initially struggled with 404 Not Found errors using standard model names like gemini-1.5-flash. We learned that for production-grade reliability, generic aliases (like latest) are risky.
The Fix: We had to deeply debug our API key permissions and discovered we had access to the cutting-edge Gemini 2.5 Flash and Gemini 3.0 Preview models. Switching to these specific pinned versions (gemini-2.5-flash) not only solved the errors but significantly improved the inference speed.
🧠 What I learned State Management: Handling camera streams and async API calls without freezing the UI requires careful state management in Flutter.
AI Latency: For a safety app, "fast" is more important than "detailed." We learned to trade off verbose descriptions for rapid, 2-word warnings.
The Power of Updates: Navigating the shift from v1beta to stable v1 APIs taught me the importance of keeping SDKs and dependencies up to date.
🔮 What's next for DeepSight We plan to implement Haptic Feedback (vibrating the phone when danger is imminent) and an Offline Mode using on-device models (like Gemini Nano) for areas with poor internet connectivity.
Log in or sign up for Devpost to join the conversation.