Our world is well on the way to becoming an information society - a society in which information is used intensively in all walks of life. It has been estimated that human beings absorb as much as 80 percent of information about their immediate surrounding by means of sight. Visually Impaired people are, therefore, at greater risk of lagging behind due to poor access to information.
According to the latest Global Blindness & VI prevalence figures published in Lancet about 36 million people across the planet are blind and about 217 million people with severe or moderate visual impairment. Completely blind people have great difficulty in navigating and interacting with their surroundings. Activities which were otherwise an inherent part of their daily life, such as reading a book or newspaper, or something as basic as social interaction becomes a challenge.
I strongly believe that advances in deep learning and artificial intelligence can play a crucial role in developing technologies for people with disabilities. TetraChrome Lenses is a step in this direction to improve the quality of life for visually impaired people.
What it does
TetraChrome Lenses is a portable, robust and efficient device capable of assisting a visually impaired person in a seamless manner. The device integrates different technologies like computer vision, embedded systems, and text-to-speech to create a powerful personal assistant for the blind. It generates audio cues and haptic feedback for helping the blind person to perceive and interact with the surrounding. The hardware includes a pair of glasses with a camera, an ultrasonic rangefinder and a vibration motor all mounted on it. The camera is used for capturing the live feed of the user's surrounding. This visual data is latter converted into audio cues for assisting the blind person. The ultrasonic rangefinder is used for acquiring depth information of the environment which is mapped to the haptic feedback generated by the vibrating motor on the frame of the lenses.
All these components are connected to a Raspberry Pi 3B+ housed inside a 3d printed controller box. This controller box also houses other components like batteries, cooling fan, switches, etc. The Raspberry Pi acts as the main processing unit, performing operations like image processing, depth information processing, making API calls, etc. The Raspberry Pi is wirelessly connected to a Bluetooth earpiece through which the device sends the audio cues. It is also connected to a wireless hotspot for internet connection. The switches on the controller box are used to trigger different functions of the device. The device has can perform 5 main functions in real-time:-
Image Caption Generator - One of the key features of TetraChrome Lenses is to generate a textual description of the user's surrounding. The generated captions are then conveyed to the user via earpiece after speech synthesis. It is capable of describing the salient features of the surrounding with high accuracy. This feature is mainly to provide situational awareness to the user.
Sample Image for Image Captioning
Live Feed from Device
Obstacle Detection System - The main components of the Obstacle Detection System are an ultrasonic rangefinder, touch sensor, vibration motor, microcontroller (NodeMCU), batteries, etc. All these components are housed inside a small casing which can be attached on the frame of the lens. The ultrasonic sensor acquires depth information of the surrounding which is processed by the microcontroller. The processed depth information is mapped to the intensity of haptic feedback generated by the vibrating motor. When an obstacle is too close to the person, haptic feedback is generated to alert the user. The user can also trigger this feature using the touch sensor in order to judge the distance of different objects in his surrounding based on the intensity of haptic feedback.
Obstacle Detection System
Face Recognition - The device is capable of performing real-time Face Recognition. The system has a database of the user's friends. Using this function the user can recognize the people around him and can also distinguish between friends and strangers. The role of this function is to provide social awareness to the user. (I added Mark Wahlberg and Joaquin Phoenix as my friends in the database).
Sample Image for Face Recognition
Live Feed from Device
Emotion Recognition - TetraChrome Lenses can perform real-time emotion recognition. The user can trigger this function to read the emotions of the people around him. This function is very helpful in scenarios where the user is conversing with an individual or a group of people and he wants to know their reaction. This feature also contributes to the social awareness of the user.
Sample Image for Emotion Recognition
Live Feed from Device
Text Reading - This function uses optical character recognition for reading to the user any kind of printed textual material through the earpiece. It is very useful in the day to day activities like reading the newspaper, reading posters, road signs, etc.
Sample Image for Text Reading
Live Feed from Device
TetraChrome Lenses has a simple, and intuitive user interface. The controller box has a power button and a charging port on the upper side. It also has a neck strap so that the user can wear it around his neck. The device has a powerful 5000mAh battery and a cooling system.
How I built it
The initial work went into designing the controller box and the compartment for Obstacle Avoidance System in such a way that the device is portable and easy-to-use.
After completion of the design, the controller box was 3d printed and the Raspberry Pi was installed along with the cooling system, batteries, power switch, charging ports and neck strap. The switch board was then interfaced with Raspberry Pi using GPIO pins.
Next, I went on to installing the ultrasonic sensor, vibration motor, touch sensor, battery and microcontroller on the obstacle avoidance system. I used NodeMCU as the microcontroller and programmed it to obtain the depth information and mapped it on the vibration intensity of the motor. The whole hardware was enclosed inside a small compartment which can be attached on the frame of the lens.
Then I mounted an HD camera on the lenses and began working on the software of the device. TetraChrome Lenses uses Microsoft Cognitive Services for different functions like Image Captioning, Face Recognition, Emotion Recognition, Optical Character Recognition, and Text-to-Speech.
Challenges I ran into
I ran into numerous difficulties both in hardware as well as software of the device. One of the major hardware difficulty was to fit so many components into a final product which is both portable and easy-to-use. This problem was dealt with by designing a controller box with precise measurements to house all the components without any empty space. Although I was able to fit all the components in a portable manner, my raspberry pi was throttling due to overheating because of such tight space with no circulation of air. I went on to install a cooling system for the Raspberry Pi using a bunch of heat sinks and a cooling fan to solve the overheating issue.
My next challenge was to deal with the bad efficiency of the device in low lighting conditions. Initially, I was using a Raspberry Pi camera which gave poor quality images in low lighting conditions which resulted in poor efficiency. My only option was to install a better camera but I couldn't find a camera which was both portable and had good resolution. So, eventually, I decided to use a Logitech camera which had both good resolution and automatic low-light correction but it wasn't portable. I got rid of some parts of the camera like hinges etc to make it more portable and went on to design a mounting bracket for attaching the camera on the lens.
Initially, when I got the Raspberry Pi, I didn't have an HDMI screen. So, I had to do setup everything like Wifi, SSH, VNC, etc on a headless Raspberry Pi which is a quite daunting task in itself.
On the software side, one of the key challenges was to optimize the device so that the response of the device is quite fast. I noticed that some functions like Optical Character Recognition needed a high-resolution image for high accuracy but on the other hand Face and Emotion Recognition worked even with a low-resolution camera. To leverage this, I coded a function which could capture images in different resolutions so that my device can switch between high and low resolutions depending on the function it needs to perform.
My next task was to get acquainted with the Microsoft Face SDK. I went through all the codebase of the SDK so that I could leverage all the different modules which it offers.
Accomplishments that I'm proud of
Building the Obstacle Avoidance System was a very challenging and exciting task. A lot of brainstorming went into the design of the system like how should the microcontroller, battery and charging systems will be placed in such small compartment, what will be the best location for the vibration motor so that haptic feedback is proper, how should the touch sensor work, etc. It was an awesome experience when I first attached the system on the lens and could feel the haptic feedback myself. I was proud of my accomplishment.
I was very happy when I first used the device with my eyes closed and saw different modules of the device coming together and feeding so much information through the earpiece in form of audio cues and haptic feedback. I was able to read a text, navigate in my surroundings, socially interact with people and was quite aware of my immediate surrounding. I was proud of the fact that how this rich information source will play a crucial part in the life of a visually impaired person.
What's next for TetraChrome Lenses
- Using a wide-angle lens camera to increase the field of view of the camera so that more information can be acquired from the surrounding at once.
- Adding support for other languages using language translation APIs offered by Microsoft Azure.
- Integrating GPS module in the device so that the user can navigate easily outdoor as well.
- Adding voice support so that the user can trigger different functions using voice commands