We created an intuitive app for people with visual impairment to help them better understand their surroundings.
Describing the Scene: Using our app, users can get a description of the scene in front of them. The app provides a natural human-like caption instead of simply listing out the items detected in the scene. For example, when we tested this by taking a noisy picture at the hackathon, the app returned "a person sitting at a table with his laptop", which is in fact the most important part of the scene.
Directional Audio: We also offer directional audio to list the objects in the scene based on their location, which allows users to understand their surrounding quickly through an innovative approach.
Describing Faces: The app can describe the faces and help decipher the emotions of individuals the camera is pointed to.
Visual Question Answering: The app can also answer questions the user poses about the scene. Some examples of questions are "what is the shirt color of the person in the picture?" and "what is the can of soda in front of me?".
Color contrast filters: help friends and family simulate having color blindness, as well as help individuals with color blindness navigate their surroundings
OCR: Processes and reads detected text
We attempt an innovative API free of explicit "modes", making it more intuitive and quicker to use than apps with scrolling modes (e.g. Apple's camera).
How to use our app:
Welcome to LongDog, your seeing eye dog with extra looong range!
Hold your phone in portrait mode, with the charger port on the bottom. Simply tap to find out about the things in front of you. An AI concierge will respond describing what's important.
Tap hold to begin concierge questions mode. After the prompt: Ask your A.I. concierge a question about something in front of you. Let go to send your question.
Double tap to access Settings Mode, where you can increase the voice speed by tapping the upper half of the screen, or decrease the speed by tapping the lower half of the screen. Double tap to exit Settings Mode.
Swipe left or right to change the camera's color contrast filters, including color-blind inversion and achromatopsia previews.
Triple tap any time to hear this instructional menu again.
We hope our app can help you see the world around you better! Thanks for hanging out with Looong Dog.
- Used Unity to develop mobile app
- Host TinyYOLO locally via OpenCV for instant classification of 80 common objects.
- Used Flask, gunicorn, nginx to host the server on Heroku
- Used IBM Watson Visual Recognition to identify, then caption images
- Used Microsoft Azure to describe faces and OCR
- Used Intel OpenVino inference engine to run VQA model
- Improve color contrast filters
- Provide wider ranger of voices (IBM Watson, AWS Polly)
- Larger servers (tinyYOLO only identifies 80 classes, so we want to support e.g. YOLOv3 with 1000 classes), perhaps as premium feature with local YOLO processing on the free tier
- Higher quality question answering, definitely need more compute to train
- Support more features and integrate them to be more helpful to users
- Personalization of features for users
- Settings to allow optimize the app for different forms of visual impairment
- Use GCP for OCR, since it seems more performant than Azure
- And of course, 5G, which allows us to send images quickly to the server and get responses, anywhere!