39 million people worldwide cannot see and often times have no outlet to their surroundings other than a cane used for feeling their environment. I aim to bridge this gap with an item that everyone has, a smart phone. I was inspired by a school for the blind that my family would volunteer at. It opened my eyes to how differently people live when they are so restricted.
What it does
The app detects objects in the user's surroundings when the user points their camera at the object that they want to detect. The user is provided with speech transcribed from strings. If the user taps on the screen, a recording of their starts and they can say anything that when followed up by the phrase "copy", will save what they said to a database to be retrieved by pasting wherever the user pleases to use this text. This alleviates the struggle that blind people have to go through when texting to loved ones, or sending emails to friends as they can now do this through voice recognition.
How I built it
The app uses the ResNet 50 CoreML model to decide on what a certain object is when given an image, and which object the model is most confident in. To achieve a simple UI that a visually impaired individual would be able to navigate through, I used an embedded camera from the AVFoundationLibrary. It uses the Speech framework (speech recognizers) to turn the user's speech into text that is updated in a real time database for users to retrieve. It also uses Firebase ML Vision to detect text on previously detected objects.
Challenges I ran into
Implementing the ResNet 50 CoreML model was difficult because of the repeated closures and optional unwrapping that had to take place before getting the confidence values and the read values. I overcame this with perseverance and by debugging. Figuring out how to manage and start recording sessions was difficult because of all the different classes involved.
Accomplishments that I'm proud of
I am proud to have made an app that can have a real world impact and change the blind community for the better!
What I learned
I learned a lot about machine learning models, FirebaseML, and speech recognizers which were all new to me prior to this project. I learnt about how to implement a machine learning model into any project which is valuable knowledge that increased my skills in iOS development.
What's next for VisionAssistant
I plan to localize this app so that no matter where the user is in the world, the strings responsible for speech will be changed to adapt to the region's language. I also plan to make a machine learning algorithm that can detect the distance from the user's camera to the detected object as most objects share similar heights.