Identity theft is a problem that affects almost everybody, but the visually impaired is affected more by it. Our android application aims to solve this problem by using face recognition and object detection.
What it does
This is the general flow of our app: The visually impaired person wears the phone like an id card with the screen facing outside. Earphones are also connected to the app.
By default, the app is in detection mode. If a known Person approaches the visually impaired person, the camera picks up the persons face and sends the person’s name as an audio message to the visually impaired person.
If an unknown person approaches the visually impaired person, the visually impaired person clicks the volume down button(switches to recognition mode) and gives the phone to the unknown person who would then state his name and then the phone’s camera will scan his face and add it to the list of known faces.
Another important feature of this app is the object detection built into it. When the visually impaired person clicks the volume up button, the app switches to sentry mode in which this person can point to one direction and take a photo and then the app would send an audio message explaining about all the things in that direction. (ex: There are 2 persons in that direction)
How we built it
- The sentry mode uses a single-shot object detection model trained on the COCO dataset built with PyTorch to detect objects. gTTS was used for text to speech and it has been deployed as an API in google’s compute engine.
- The Detection mode and recognition mode was built with Java, c++, javacv, tensorflow, okhttp, camera2api,
Challenges we ran into
- Implementing a face recognition system into our app that runs natively.
- Merging all the features into one app.
- Combining speech to text API and hotword detection into one activity.
Accomplishments that we're proud of
- Setting up face detection and recognition that works on the mobile itself without the need for an external API.
- Deployed a flask API to google cloud that can run single-shot detection on an image and return an audio message that points out what objects were present in that image.
What we learned
- Deploying PyTorch models to the cloud.
- Implementing face detection and recognition that runs locally on the android app.
- Implementing Speech recognition on an app.
What's next for Project Occuli
- To conduct a survey among the visually impaired people to understand their difficulties so that we can fine-tune our object detection model according to their needs.
- Improve Scalability of the app.
- Find a way to attach more physical buttons to the phone to make the user experience smooth and seamless.
- Add an audio-based virtual assistant that can communicate with the visually impaired person and make the UX even better.
- Add a character recognition system to the object detection model so that any text found in the sentry mode is conveyed as audio to the visually impaired person(ex: text in billboards and signs.).