Dialog

Inspiration

The inspiration for this came from grandparents that were forced to try lots of different types of hearing aids. The hearing aids all came from the same three manufacturers, which had a monopoly on all the options and pricing. The profit made on these hearing aids is almost scandalous since the price of manufacture has steadily decreased, similar to the current market of graphing calculators as well. The goal with this project is to create a cheap pair of AR glasses for displaying text on a see through display to allow for less isolating communication at the dinner table with friends. As peoples sense of hearing degrades faster than their eyesight it can cause issues with communication in the elderly as well as with their loved ones. When it is difficult to understand each other it can facilitate a breakdown in communication and understanding. Sometimes insurance is too expensive to be able to pay for pricey hearing aids and this project costs much less than low priced hearing aids with an estimated BOM cost of under 200 dollars.

What it does

Currently it uses a speech to text API with a microphone array as the input. This applies de noising algorithms which help with improving the speech to text performance. This then sends the text over bluetooth to the ESP32 with an OLED based display mounted with a battery and boost converter onto a pair of cheap magnifying glasses for inspection work. This brings the display closer to your face and allows for the text to be easily read, while also allowing for easy prototyping and integration.

How we built it

We built it using a 3D printer to print the brackets which we mounted to the magnifying glasses. We then used a soldering iron to press threaded inserts into the brackets. This was used with M2, M3, M4 screws and zipties to mount the different PCBs and display to the repurposed magnifying glasses. We used an arduino library with the ESP32 to talk via bluetooth to our phone initially and then to the microphone array, as well as display text on the screen. The proof of concept initially was using googles speech to text on an android phone with a bluetooth serial terminal to send the text to the screen. Once this was successful the next step was plumbing the microphone array into the speech to text software, sending the output to the screen on the ESP32.

Challenges we ran into

The main issues we had were related to the transparent OLED, initially to use SPI mode all of the jumper pads had to be scratched off of the back of the driver board. When scratching them off the driver board was damaged and trying to use I2C mode was unsuccessful. Luckily we brought multiple displays and used another one to at least demonstrate the proof of concept even even though it is more difficult to see out of the glasses with it. Another challenge we ran into was using mozilla deepspeech to do the inference on one of our laptops. We spent multiple hours trying to install CUDA to use a GPU accelerated speech to text model that never worked properly due to driver issues with Debian. We were able to run the model and infer using the CPU pipeline but it was not very fast and unfortunately the microphone inputs to the Debian laptop also did not work due to driver issues. This made interfacing to the microphone board more difficult than initially thought.

Accomplishments that we're proud of

We are proud that we managed to build a successful hardware project with multiple different issues that had to be overcome. It was enjoyable to work with real hardware and test out different parts of the original idea in different hardware concepts and ideas. Getting the display to work properly with the ESP32 was a slight challenge and seeing it display text for the first time was awesome. When the model started inferring properly it was also super exciting. Seeing the brackets come together, printed properly with mounting for all of our boards was a good feeling as well. Wearing the display for the first time while it was working was awesome.

What we learned

We learned about how to interface with the ESP32 and used a buffer to display text on the screen as well as generate multiple lines of text and scrolling animations. We also learned a decent amount about battery voltage regulation, with one battery powering the ESP32 as well as the display on the front of the AR glasses. Learning the basics of how natural language processing works and how to implement it properly in a project was super interesting and valuable for future projects.

What's next for Dialog

Our next goals will be designing a custom PCBA for the natural language processing. Using a Xilinx FPGA with the recently released Vitis software package to convert the deepspeech model to infer locally using the FPGA. The FPGA can be used to drive the display, perform the digital signal processing on the audio streams from the microphone array as well as perform natural language processing. The nice advantage to using a custom PCBA is that the board can be lower power when not doing the inference and signal processing. It will also be cheaper to manufacture due to the fact that the product can be higher volume. Since hearing aid manufacturers have gotten lazy due to their continued monopoly it is an industry ripe for change with a huge market capitalization.