Theia: Object Detection and Depth Perception Using ML

Bluetooth glasses (2)
Raspberry Pi
OAK-D
External battery
Theia being worn (front)
Bluetooth glasses (1)
Theia Accessories
Theia being worn (back)
Backend code when Theia is in use (not full code)
Example of detection in Theia's view

Inspiration

Our team had a number of inspirations. First and perhaps foremost, our engineer works for a computer vision company that utilizes machine learning (ML) techniques to prevent mass shootings and provide insight to security professionals. He has been involved with the development of a neural inference model that detects weapons in a scene and provides notifications. Building off this knowledge, he was compelled to first consider how computer vision could be used to enhance accessibility.

On the business development side, our administrative teammate had previously studied how biomimicry can solve complex human problems. Biomimicry is the emulation of biological phenomena – or examining nature to find solutions. In the case of visual impairments, animals like bats and dolphins use echolocation to expand their visual abilities. Echolocation is the use of sound to help decipher visual environments. We will expand on this more in the sections below, but our decision to build off the OAK-D system reminds us of echolocation because it fuses stereo depth technology to expand on its video features for improved spatial recognition.

Finally, we’ve named our project “Theia” after the Greek goddess of sight.

What it does

Theia allows people with visual impairments to have advance notice of key surroundings, further than a standard mobility cane is able to reach. For example, if a person was approaching our individual, they would receive a spoken notification of the object’s distance, such as “3 meters,” “2 meters,” etc. Theia does this by combining five key components.

First, there is the sensory component from OAK-D. This is hardware that combines a camera with spatial stereos and basic intelligent analytics.

Second, there is the software component built by our engineer. Our engineer wrote code to detect, track, and report the distance of objects. The software can handle processing at 30 frames per second.

Third, there is an audio component. Compatible with both Bluetooth technology and classic wired headphones, this component reports the distance of an object to our user. With these options, our preferred mode of audio is a pair of sunglasses that have Bluetooth speakers on the side of the frame for easy wearability.

Fourth, there is a bridge component. Notably, our software (component 2) does not live on the OAK-D technology that we procured (component 1). In order to bridge the components together, we used a Raspberry Pi 4. This allowed our engineer to create software that could better handle high dimensionality of data, while also maintaining communication between our de-coupled components. This also gave us the freedom to implement more memory-intensive computations, such as storing track IDs as they come in.

Fifth and last is our power source. We used an external battery that can produce a minimum of 20 watts under high load as well as outputting a frequency of at least 10,000 mAMP per second.

How we built it

Our business development teammate first compiled research on existing tools for people with visual impairments. The most common tool is of course the standard mobility cane. The average length of a mobility cane, or white cane, is half of a person’s height. With the average height of adults being roughly between 5 feet and 4 inches to 5 feet and 9 inches, average length of a mobility cane is around 2 to 3 feet – or just under one meter. Consulting with our engineer teammate, we considered ways to extend this distance to provide information of objects that are further away than the cane can reach. We then wrote our software to only provide insights of objects after this distance to avoid providing redundant information; Our tool does not replace the mobility cane, it enhances it.

Then our engineer did compiled research on existing, open source tools and appropriate hardware that would allow him to turn our idea into a reality. He invested in the equipment, including the OAK-D technology, Raspberry Pi, and audio component. He then sat down to code the software that could bring Theia to life.

The software’s objection detection capacity provides the ability to detect a region within an image and classify it as a particular object. The software’s object tracking capacity is similar to short-term memory in that it can determine if one object is brand new to the region or has been in the region for some time. By combining these two paradigms, we successfully created a system that allows a person with a visual disability to receive real-time alerts when an object is either approaching them or getting further way.

Challenges we ran into

We went back and forth on the audio components, e.g. headphones versus Bluetooth. Our first idea was to use noise-cancelling headphones that could ensure clear, crisp audio for the user. However, we realized that in a real-world scenario, although this would provide sharp insight, it could be overbearing and block out all other stimuli for the wearer. This could make it difficult for Theia users to participate in conversations, or even put them in harm’s way. Thus, we decided to go with the a more standard wired headphone or Bluetooth technology format that could allow the user to use the eyeglasses that contain speakers on the side of the frames (arms), or to simply wear one speaker of the standard headphone. This would allow audio to be played or heard without blocking any other external audio (e.g. traffic, general surrounding noises, etc.). We want Theia users to be able to maintain a balance between our technology and their own environment, without one overbearing the other.

Accomplishments that we're proud of

We are proud that we were able to start with a complex idea, one which had many different routes through which it could be accomplished in terms of hardware and software, and were able to execute it in a fashion that only required accessories that can fit in a pocket or small bag. We believe that in a commercial environment this would allow for more affordability and simplicity for consumers.

Both members of our team have past experience with seeking solutions for social problems and/or underrepresented communities. Further, as we are both on the cusp of the Gen-Z and Millennial generations, we recognize that we and our peers are the future of society and feel it is our duty to work on solutions that are accessible to all. There is no better time to do this; As products such as Raspberry Pis become more affordable, software and robotics engineers are unlocking enough horsepower to develop advanced edge computing applications, meaning more portable technological solutions. With recent advancements in technology nearly everything is now possible, but not everything has been applied yet. Being able to dedicate our time to a project that focuses on accessibility, and that could theoretically benefit many people, is rewarding in and of itself.

What we learned

We learned a lot about existing hardware and open source tools that are just waiting for engineers to improve and supplement their capacity.

While our engineer has previous experience in computer vision, integrating the audio component was new for him. Inferencing at 30 frames per second means 1,800 possible notifications that could be sent to an audio device. If we sent all of these alerts, then the user would hear sounds 24/7 – deeming the product useless and maybe even dangerous. He had to teach himself an effective way to pick and choose amongst the frames while maintaining the tool’s integrity to allow for more intelligent analytics, less stimuli and ultimately a better product.

What's next for Theia: Object Detection and Depth Perception Using ML

We are interested in seeking solutions to power constraints. In order to power the Raspberry Pi and the camera together, we currently need to use an external battery that can produce a minimum of 20 watts under high load as well as outputting a frequency of at least 10,000 mAMP per second. The program is able to run effectively, but the power discrepancy causes the hardware to slow down in certain circumstances such as when there are many objects in the frame or when objects are moving quickly. In the future, we would like to explore better and/or separate batteries so our components do not have to share a power source.

Additionally, due to the onboard audio chip on the Raspberry Pi, we are currently limited to a Bluetooth connection of 3 megabytes per second. Although this is acceptable for an effective initial iteration, if this product were to be used at scale we would prefer a stronger Bluetooth connection with a higher frequency to ensure zero data interruptions while it is being used. Further, on the audio component, we would like to explore how an iPhone app could improve our audio features. We believe an app would allow for de-coupling of several tasks currently performed by the Raspberry Pi and allow for a faster, smoother notification process. This could hypothetically allow us to provide Theia users with more information than the distance at which an object is to them, including identification of the object. For example, instead of telling the user “one meter,” we could tell them “person in one meter” or “car in one meter.” Our device already contains the ability to identify several objects, but we are not specifying in the notification due to limited capability of the Raspberry Pi. De-coupling this through use of an iPhone app would allow us to provide longer verbiage to Theia users without slowing down their experience.

Finally, future iterations should be made even more wearable. We aimed to maximize this in our project by using Bluetooth compatible eyeglasses with speakers in the frames, in addition to setting up an arm for the OAK-D camera that could be placed inside a backpack. However, we found the camera needed adjusting after our user walked a lengthier distance. The user is also required to carry the external battery in their backpack. While our device is portable, the future of Theia should be simplified even further for the ease of consumers.