SecondEye

Inspiration

Our inspiration for SecondEye came from the growing need for accessible, real-time visual support. As AI capabilities evolve, we saw an opportunity to leverage Gemini’s unique features to create a truly interactive and teachable assistant. This tool empowers users to complete complex vision tasks, personalize the experience to their needs, and receive real-time guidance without programming skills. We wanted to combine advanced object detection, video understanding, and live context analysis in one application that adapts to various user needs and environments.

What it does

SecondEye is an interactive and adaptable visual assistant. It can detect and define objects in images, search and answer specific questions about image parts, and provide real-time guidance for repairing, assembling, or identifying items. In video mode, SecondEye allows users to search for specific moments, teaching it with examples and receiving timestamped information. In live camera mode, it offers real-time guidance by memorizing faces, objects, or places. During screen sharing, it enhances workflows by assisting with IT tasks, supporting programming, and enhancing web browsing for users with visual impairments. Through personalized training and interaction, SecondEye learns from user feedback, tailoring itself to each individual’s needs.

How we built it

SecondEye was built using Gemini API to take full advantage of its capabilities in object detection, video support, and large context windows. Our development process focused on designing intuitive user interfaces, creating seamless API integration, and ensuring real-time response accuracy. Each of Gemini's capabilities was harnessed to build distinct, user-centered features. For example, we used Gemini’s native video support to develop accurate timestamped information extraction for video search, and object-detection features were optimized to support detailed, annotated responses in images.

Challenges we ran into

Balancing responsiveness and accuracy while ensuring a low-latency experience.

Accomplishments that we're proud of

We’re proud to have created an accessible tool that brings the power of real-time vision AI to a broad range of users, including those with visual impairments. SecondEye’s ability to be taught by the user, to adapt to personalized workflows, and to provide detailed guidance in real time are achievements that showcase the potential of modern AI to improve daily tasks and accessibility. Additionally, overcoming latency challenges and ensuring accurate, context-specific feedback have made SecondEye a truly unique application.

What we learned

Building SecondEye taught us about the potential and limitations of AI in real-time visual assistance. We gained insights into optimizing object-detection algorithms for enhanced accessibility, and we learned how to structure context windows and memory management for user-specific personalization. This experience also highlighted the importance of user-centered design, particularly for accessibility-driven applications, as we balanced technical performance with ease of use.

What's next for SecondEye

Looking ahead, we plan to expand SecondEye’s learning capabilities, allowing it to adapt even more deeply to individual user preferences. We’re exploring additional support for multi-modal feedback (such as audio and haptic responses) to improve accessibility further. We also aim to develop more sophisticated memory functions, enabling SecondEye to recognize and recall complex workflows and environments. In the long term, we envision SecondEye as an indispensable tool for personal and professional use, offering real-time visual support that seamlessly integrates with each user’s daily routine.

Built With

express.js
firebase
gemini-api
genkit
google-cloude-speech-to-text
google-cloude-text-to-speech
javascript
node.js
react
typescript
vertexai

Updates

zakaria KADDARI started this project — Oct 25, 2024 01:37 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.