Vision vs Reality

We originally had a much more robust idea for this hackathon: an open vision doorbell to figure out who is at the door, without needing to go to the door. The plan was to use an Amazon Echo Dot to connect our vision solution, a Logitech c270 HD webcam, with our storage, a Firebase database. This final integration step between the Echo Dot and OpenCV services ended up being our downfall as the never-ending wave of vague errors kept getting thrown and we failed to learn how to swim.

Instead of focusing on our downfalls, we want to show the progress that was made in the past 36 hours that we believe shows the potential behind what our idea sought to accomplish.

Using OpenCV3 and Python3, we created multiple vision solutions such as motion detection and image detection. Ultimately, we decided that a facial recognition program would be ideal for our design. Our facial recognition program includes a vision model that has both Jet's and I's faces learned as well as an unknown catch-all type that aims to cover any unknown or masked faces. While not the most technically impressive, this does show the solid base work and the right step that we took to get to our initial idea.

The Development Process

These past 36 hours presented us with a lot full of trials and tribulations and it would be a shame if we did not mention them considering the result.

In the beginning, we considered using the RaisingAI platform for our vision rather than OpenCV. However, when we attended their workshop, we saw that it relied on a Raspberry Pi which we originally wanted to avoid using due to our lack of server experience. Also, the performance seemed to vary and it did not seem like it was aimed for facial recognition.

We planned and were excited to use a NVIDIA Jetson due to how great the performance is and we saw that the NVIDIA booth was using a Jetson to run a resource intensive vision program smoothly. Unfortunately, we could not get the Jetson setup due to a lack of a monitor.

After not being able to successfully run the Jetson, we reluctantly switched to a Raspberry Pi but we were pleasantly surprised at how well it performed and how easily it was to setup without a monitor. At this stage is also when we started learning how to develop the Amazon Echo Dot. Since this was our first time ever using an Alexa-based device, it took a while to develop even a simple Hello, World! application. However, we definitely learned a lot about how smart devices work and got to work with many AWS utilities as a part of this development process.

As a team, we knew that integrating the vision and Alexa would not be an easy task even at the start of the hackathon. Neither of us predicated just how difficult it would actually be. As a result, this vision-Alexa integration took up a majority of our overall development time. We also took on the task of integrating Firebase for storage at this step, but since this is the one technology in this project that we have had past experience with, we thought it would be no problem.

What We Built

At the end of the day (...more like morning), we were able to create a simple Python program and dataset that allows us to show off our base vision module. This comprises of 3 different programs: facial detection from a custom dataset of images, a ML model to associate facial features to a specific person, and applying that model to a live webcam feed. Additionally, we were also able to create our own Alexa skill that allowed us to dictate how we interact with the Echo.

Accomplishments that I'm proud of

  • Learning how to use/create Amazon Skills
  • Getting our feet wet with an introduction to Raspberry Pi
  • Creating our own ML model
  • Utilizing OpenCV & Python to create a custom vision program

Future Goals

  • Figure out how to integrate Alexa and Python programs
  • Seek mentor help in a more relaxed environment
  • Use a NVIDIA Jetson
  • Create a 3D printed housing for our product / overall final product refinement

Built With

Share this project:

Updates