Inspiration

We were inspired by the daily challenges faced by visually impaired individuals in navigating their environment independently. Simple tasks like finding lost items, reading signs, or understanding their surroundings can be incredibly difficult. We wanted to create a solution that empowers blind and low-vision users with real-time visual information through natural voice interactions, making them more self-reliant and safe in their daily lives.

What it does

FiND.iT is an AI-powered visual assistant that acts as "eyes" for visually impaired users through: 🎀 Voice Command Interface: Users can naturally speak commands like "Where are my keys?" or "Read this sign" πŸ” Object Detection: Identifies and locates objects in the user's environment using advanced computer vision πŸ“– Text Reading: Extracts and reads text from documents, signs, and labels aloudπŸ‘οΈ Scene Description: Provides comprehensive descriptions of surroundings and spatial relationships 🚨 Emergency Assistance: Immediate emergency alert system with automated assistance calling πŸ“ Object Guidance: Gives clear, actionable directions to help users locate specific items

How we built it

System Architecture :

--> We built a modular, accessible application with three core components: -->Frontend Interface (Streamlit): Accessible web interface with large buttons and high contrast Real-time camera feed and voice interaction display Emergency mode with visual and audio alerts -->AI Processing Engine Computer Vision: HuggingFace YOLOS model for object detection OCR Processing: Tesseract for text extraction from images Natural Language Understanding: Fetch.ai ASI API for intent parsing and intelligent responses Audio Processing: Speech recognition and text-to-speech synthesis Accessibility Layer -->Voice-first interaction design Spatial awareness descriptions Non-blocking audio feedback Emergency protocols

User Voice Command β†’ Speech-to-Text β†’ ASI Intent Parsing β†’ Computer Vision Processing β†’ AI-Powered Response Generation β†’ Text-to-Speech Output

Challenges we ran into

Technical Challenges:

Real-time Object Detection: Balancing accuracy with speed for immediate user feedback Voice Command Ambiguity: Handling varied natural language phrasing and accents Camera Integration: Ensuring consistent image capture across different devices and lighting conditions Spatial Positioning: Converting pixel coordinates to meaningful spatial descriptions ("on your left", "in front of you") OCR Accuracy: Improving text extraction from various fonts, backgrounds, and lighting situations

Accessibility Challenges:

Designing intuitive voice interactions for users who can't see the interface Creating clear, non-visual spatial descriptions Ensuring emergency features are immediately accessible Balancing detailed information with concise, actionable guidance

Accomplishments that we're proud of

True Accessibility: Created a genuinely useful tool that addresses real-world challenges for visually impaired users Seamless Integration: Successfully combined multiple AI technologies into a cohesive, natural user experience Intelligent Responses: Developed an AI that provides contextual, helpful guidance rather than just raw detection data Emergency Ready: Built a reliable emergency system that could potentially save lives Voice-First Design: Created an interface that doesn't require any visual interaction Real-time Performance: Achieved responsive object detection and audio feedback suitable for real-world use

What we learned

Technical Insights:

Advanced computer vision model integration and optimization Real-time AI processing pipeline design Voice interface best practices and natural language processing Spatial reasoning and description generation Accessible web application development

User Experience Lessons:

The importance of concise, actionable information for visually impaired users How to design for complete voice interaction without visual fallbacks Emergency feature design considerations for high-stress situations Balancing detail with usability in audio descriptions

Team Growth:

Integrating multiple complex AI systems into a single application Problem-solving for real-world accessibility challenges Building technology that can genuinely improve people's quality of life

What's next for Find.it

Short-term Enhancements:

Mobile App Development: Native iOS/Android applications for on-the-go use Offline Mode: Local processing for situations without internet connectivity Multi-language Support: Expand beyond English to serve global users Custom Object Training: Allow users to train the system on personal items

Advanced Features:

Navigation Assistance: Indoor and outdoor wayfinding with obstacle detection Facial Recognition: Identify friends, family, and caregivers Color Detection: Help with color matching for clothing and objects Currency Identification: Recognize and describe money denominations Product Recognition: Identify food items, medications, and common products

Platform Expansion:

Smart Glasses Integration: Hands-free operation with wearable technology Home Assistant Compatibility: Integration with Alexa, Google Home, etc. Community Features: Shared object databases and crowd-sourced descriptions Professional Versions: Specialized versions for workplaces and educational settings

Accessibility Improvements:

Haptic Feedback: Vibration patterns for additional information channels Custom Voice Profiles: Personalized voice preferences and speaking styles Learning Mode: An Adaptive system that learns user preferences over time Accessibility Certification: Work with vision impairment organizations for official certification

FiND.iT :

Represents a significant step toward making visual information accessible to everyone, and we're committed to continuing its development to create an even more powerful tool for independence and safety for visually impaired individuals worldwide.

Built With

  • huggingface
  • numpy
  • opencv
  • opencv-(image-processing)
  • pandas
  • pytesseract
  • python
  • pytorch
  • pyttsx3
  • streamlit
  • ultralytics-yolo-(object-detection)-apis/services:-optional-fetch.ai-agent-for-real-time-object-recognition-other-tools:-pil/pillow-(image-handling)
  • yolo
Share this project:

Updates