Inspiration

My goal was to empower blind individuals to navigate the world safely and independently. Traditional accessibility tools often require users to manually capture photos and wait for processing, which lacks the real-time context needed for dynamic environments. I wanted to leverage the new Gemini Multimodal Live API to build "Ujala" (meaning light)—an always-on, real-time companion that provides immediate, conversational assistance through continuous video and audio streaming.

What it does

Ujala is a mobile android app that acts as an intelligent set of eyes and an always-available companion. It streams live video and audio from an Android phone to a FastAPI backend powered by the Gemini Live API.

Users can seamlessly switch between specialized modes for tailored assistance:

  • Guardian Mode: Proactively identifies hazards (like steps, vehicles, or obstacles) and provides urgent, concise warnings.
  • Companion Mode: Acts as a warm conversational partner to keep the user company or perform any ad hoc tasks like finding a sitting, telling a story etc.
  • Product Mode: Instantly reads prodcut name, expiry date etc.
  • Reader Mode: Instantly reads any text which is presented including types or hanwritten notes.
  • Currency Mode: Instantly identifies paper currency and coins.
  • Emergency Mode: Silently monitors the environment for safety. If requested, it can use the Google Maps API to find and call the nearest hospital, or send summaries via SMS to emergency contacts.

How we built it

  • Frontend: I built a Native Android app using Kotlin and Jetpack Compose. We utilized CameraX for live image capture and AudioTrack for streaming and playing raw audio.
  • Backend & AI: The backend is built with Python and FastAPI. It receives WebSocket connections from the Android app and forwards the media to the google-genai SDK using live.connect.
  • Latency Optimization: To ensure near-instant responses, I designed a custom binary framing protocol over WebSockets (e.g., [5 bytes tag][N bytes payload]).
  • Tool Calling Integration: We equipped Gemini with custom tools, such as the call_emergency_services tool using the Google Maps Places API, send_sms_summary tool to send sms summary, allowing the AI to autonomously trigger external actions without user UI interaction.
  • Cloud Infrastructure: I used Google Cloud Run to deploy my backend app and Firestore Database to store user settings in the app.
  • Google Antigravity: I used Google Antigravity IDE for vibe coding. It took care from developemet of btoh frontend and backend, fixing errors and even deploying the app automatically.

Challenges we ran into

  • Minimizing Latency: For a "live" assistant, latency is a critical safety issue. We had to move away from standard JSON JSON payloads to raw binary framing over WebSockets to drop the overhead times.
  • Prompt Engineering for Context: Training the model to behave correctly depending on the mode was tricky. For example, in Emergency Mode, getting the AI to remain completely silent unless directly addressed, while still actively processing the video feed for SMS summaries, required extensive tuning.

Accomplishments that we're proud of

  • Creating a versatile product that genuinely handles real-world problems (from avoiding a wet floor to identifying a 100 rupee note) through dynamic persona switching.
  • Achieving extremely low-latency, real-time audio and vision processing over WebSockets between a mobile device and the Gemini backend.
  • Successfully integrating external API tools (like Google Maps) seamlessly into the continuous Live API session without blocking the audio stream.

What we learned

  • I learned how responsive and perceptive the Gemini Live API truly is when processing continuous video frames and audio at the same time.
  • I gained a deep understanding of WebSockets, binary data streaming, and raw PCM audio handling.
  • Building anything with Antigravity is fun and simple.

What's next for Ujala

  • Integrate with Google Maps for navigation: I plan to integrate Google Maps for navigation. Google maps can provide step by step distance and directions, Gemini can guide for obstacles usign camera.
  • Add Memory: I will add memory to Ujala so that it can store whatever is process and use it for relevancy in future.
  • Accessibility Polish: I want to Improve the Android UI to be more optimized for TalkBack and other system accessibility services.

Built With

Share this project:

Updates