As many people are becoming more and more aware of, self-driving cars are becoming increasingly sophisticated and are thus capable of greater feats of autonomy. Can we apply the very same principles of computer vision that drive the cars in helping people with vision impairments navigate their own world more freely? The answer is yes.
What it does
We used a series of two connected neural networks that, in addition to recognizing various common classes of objects, also generated an estimate for the distance the object is away from the camera. The project in its current form can continuously generate predictions from a live webcam.
How we built it
The neural network used to estimate distance was a regression model trained on bounding boxes of the KITTI vision benchmark dataset, while the other neural network used was an object detection-and-segmentation model, Faster R-CNN. We trained the first neural net ourselves due to the scarcity of resources in the area of distance estimation, while we used a pre-trained model of Faster R-CNN.
Challenges we ran into
Neural networks are very computationally intensive, and so naturally evaluating every single frame of a video feed tends to run into some lag. On the software side, we found that the regression neural network we had relied on to generate distance estimations had some serious structural flaws, and so we essentially rebuilt it from the ground up.
Accomplishments that we're proud of
This was our first project involving deep learning and computer vision, and we are glad that we produced a project that yielded the type of result that we wished to see.
What we learned
Data is everything when it comes to deep learning; the Faster R-CNN model, presumably trained by computer scientists on massive datasets, was consistently reliable, whereas the regression neural net that we used to distance estimation consistently proved to be the weaker link, due to our training it on smaller and more homogeneous datasets.
What's next for Computer Vision for the Vision Impaired
As mentioned earlier, the distance estimation model was trained on smaller and more homogeneous data, and thus has some inconsistencies when it is evaluated in real-world conditions (such as on images with a large variation in depth and many objects). Thus, in the future, we would like to find (or assemble) larger and more varied datasets for the purposes of training this aspect of the model.