💡 Inspiration Imagine walking down a busy road in Kenya with your eyes closed. The sounds of traffic are overwhelming, and every step feels like a gamble. This is the daily reality for millions of visually impaired individuals.

My name is Abel, and as a student at Kirinyaga University and the lead at Deep Intel (the AI branch of my company, Alpha), I wanted to bridge the gap between biological sight and artificial intelligence. I realized that while LLMs are great at chat, their true potential lies in Multimodal capabilities—acting as a real-time pair of eyes for those who need it most.

DeepSight was born from a simple question: Can I build an AI that doesn't just describe a photo, but actively protects a human being?

🚀 What it does DeepSight is a mobile application that serves as a visual guide for the blind. It uses the camera to continuously scan the environment and provides real-time, spoken audio feedback.

Hazard Detection: It classifies the path ahead as SAFE, CAUTION, or DANGER.

Instant Navigation: It identifies obstacles (holes, cars, poles) and suggests how to move.

Low Latency: Optimized to speak short, critical warnings (under 6 words) to ensure the user reacts instantly.

⚙️ How we built it The app is built using Flutter for a cross-platform mobile experience. The core intelligence is powered by Google's Gemini API.

Camera Stream: We capture live frames from the device camera.

AI Analysis: Each frame is sent to the Gemini model (specifically leveraging the newer Gemini 2.5 Flash for speed).

Prompt Engineering: We designed a strict system instruction to force the AI into a "Safety Officer" persona, outputting structured data like [DANGER_LEVEL] : [DESCRIPTION].

Text-to-Speech (TTS): The response is immediately converted to speech, guiding the user without them needing to touch the screen.

Here is a snippet of the logic that handles the continuous visual stream:

Dart

// The heartbeat of DeepSight Future _analyzeFrame() async { final image = await _cameraController.takePicture();

// Sending visual data to Gemini 2.5 Flash final response = await _model.generateContent([ Content.multi([ TextPart("Describe path ahead. Hazards? Output strictly: [LEVEL]: [Adivce]"), DataPart('image/jpeg', await image.readAsBytes()), ]) ]);

// Immediate Audio Feedback _speak(response.text); } 🔧 Challenges we ran into The biggest hurdle was API Versioning and Model Availability. We initially struggled with 404 Not Found errors using standard model names like gemini-1.5-flash. We learned that for production-grade reliability, generic aliases (like latest) are risky.

The Fix: We had to deeply debug our API key permissions and discovered we had access to the cutting-edge Gemini 2.5 Flash and Gemini 3.0 Preview models. Switching to these specific pinned versions (gemini-2.5-flash) not only solved the errors but significantly improved the inference speed.

🧠 What I learned State Management: Handling camera streams and async API calls without freezing the UI requires careful state management in Flutter.

AI Latency: For a safety app, "fast" is more important than "detailed." We learned to trade off verbose descriptions for rapid, 2-word warnings.

The Power of Updates: Navigating the shift from v1beta to stable v1 APIs taught me the importance of keeping SDKs and dependencies up to date.

🔮 What's next for DeepSight We plan to implement Haptic Feedback (vibrating the phone when danger is imminent) and an Offline Mode using on-device models (like Gemini Nano) for areas with poor internet connectivity.

Built With

  • ai
  • android
  • engine)
  • flutter
  • flutter-(mobile-app-framework)-dart-(programming-language)-google-gemini-api-(models:-gemini-2.5-flash
  • gemini-3.0-preview)
  • generative
  • google
  • google-generative-ai
  • ide)
  • package)
  • sdk
  • target
  • text-to-speech
  • tts
Share this project:

Updates