Max was inspired to build SeeFood, or Food Shazam this summer when he heard someone in the C2FO break room ask what food the lunch was. Having spent the summer learning about deep learning and artificial intelligence in his free time, it seemed like a good application for these technologies. An idea had been born.
What it does
Given an RGB image, our model computes the likelihoods of the image being one of 101 classes that are part of the Food 41 dataset. The model then determines which one is the most likely and outputs this along with a confidence score. Depending on this score the mobile app either displays the prediction or a heartfelt error message.
How we built it
Cole and Max divided and conquered on SeeFood. Cole built the mobile app and Max trained the neural network and wrote the inference server. We went through an arduous process of choosing which pre-trained weights and hyper-parameters to use for the neural network, running the training process on a Google Compute Engine instance outfitted with a Tesla V100 GPU to speed things up. Although we had a fair amount of firepower, it took about an hour to train a neural network, so all decisions had to be thoroughly calculated, each experiment carefully thought through. Do we want to use a ResNet or InceptionNet? Should we use dropout or L2 regularization? Each choice ate up a significant portion of our limited time. Eventually we got the network to a state of the art level of performance through diligence and good luck.
Challenges we ran into
Before Hack K-State Cole had very limited experience with machine learning and no experience with deep learning. This meant that we spent some time going through the foundations of the subject to solidify Cole's understanding, allowing us to move forward as a team. Finding hyper-parameters for a neural network is a very difficult task. It is necessary to be incredibly systematic and to think about which hyper-parameters to change. It is also hard to have the patience to let the model train to completion when you have way better ideas just a few minutes after beginning the process. We were frustrated with the results of training. Our validation set results were consistently underwhelming until the very end.
Accomplishments that we're proud of
We are proud of training a state of the art neural network and effectively using transfer learning. We are proud of using deep learning in an actual system meant to be used as more than just a learning exercise. We are proud of persisting.
What we learned
We learned the importance of having a solid understanding of the fundamentals of a subject before trying to use it, something that helped us out. We learned the importance of being systematic when dealing with processes that seem entirely random.
What's next for seefood
After we move inference to the edge we will be able to use object detection to paint bounding boxes around multiple food objects in an image in realtime without being hampered by the latency of a datacenter round trip. We could paint new bounding boxes as quickly as the camera moves and also increase the scalability of the system by doing this.