There comes a time in everyone's life when your friends are all talking about a celebrity or a sports athlete and you simply have no clue who that person is. It happens to all of us, and often leads to a huge awkwardness when your friends are sitting there, explaining the person's significance, as you politely nod, pretending to understand.Or maybe you're in the comfort of your home, but aren't cought up with all the cinematic masterpicies that you promised you would watch and so you stumble across a new actor.
What it does
Who Dat is an application that utilizes the latest in computer vision technologies in order to help enrich the ordinary life of the everyday person.
The process is as follows: First, a user takes a picture using their smartphone camera. The image is then sent to a server over the internet using the TCP protocal, Then, the trained computer vision network locates any features that represent a face and attempts to classify every example provided. Afterwards, once the person is known, facts and an image are sent back to the android device, where the user can have all of their questions answered.
How I built it
I started out hacking away at the android portion, as that is a strong suite of mine and I figures I might get the easier parts knocked out quickly so I have more time for the other parts. Afterwards, I designed the model that would be my basis for the recognition and found that it was pretty accurate (more accurate than even I expected, so great sign). Finally, came the part of integrating the two sections together. This posed the greatest challenge for me.
Challenges I ran into
Connecting the android front end with the python and opencv back end was such a tremendous challenge. I started out transferring the image from the android app to the receiving app by converting to base64 to generate a text stream, sending the text stream, then recreating the image. The recreation of the image was, unfortunately, my Achilles Heel. For over twelve hours, I tried to remedy the image corruption that occurred, but to no avail. It was later on suggested to create a get/post back end for the image transfer instead, but I did not manage to incorporate that with Docker, which leads me to my second challenge: Docker. Every time I trained my image recognition software, Docker would have a tendency to shut down and prevent me from opening it. I genuinely recreated the piece of software 4 times, then got annoyed and started to recreate it on my work server without Docker (oops, I forgot to tell my boss). Finally, sleep deprivation caused me to work slower and less efficiently than I normally do, which could probably explain the inability to figure out the get/post with Docker. (By the way, if anyone would be willing to help me out with this after the hackathon, I would greatly appreciate it. I really want to finish this app).
Accomplishments that I'm proud of
What I am proud of most is that I got to experience so many new technologies. While I only had the set amount of tech, albeit a good portion of it being new as well, in my own project, I kept floating around and helping some of my friends, both old and new.
What I learned
I really want to say that I learned TCP communication... But I didn't exactly. So, I almost learned TCP. I also experienced Docker for the first time (and honestly speaking, hopefully for the last). I learned a great deal about opencv, especially facial and object recognition. Finally, even though I didn't end up using it in my product, I learned image homography and FLANN feature matching as these were the two original technologies I intended to work with.
What's next for Who Dat?
First and foremost, I want to figure out the TCP photo transfer mostly for the satisfaction that yes, I did this finally. Next, I intend to write an image scrubbing script that gathers pictures of a giant list of celebrities so that I can identify them, not just the handful I have now. Finally, I want to include some aspects of image homography to expand this beyond just recognizing humans. I want this technology to be able to identify all sorts of everyday objects, like a can of corn in a supermarket and then, for example, return the average price of a can of corn and location of the cheapest corn, as well. Overall, I think the possibility to expand this product is endless.