Inspiration

Max was inspired to build SeeFood, or Food Shazam this summer when he heard someone in the C2FO break room ask what food the lunch was. Having spent the summer learning about deep learning and artificial intelligence in his free time, it seemed like a good application for these technologies. An idea had been born.

What it does

Given an RGB image, our model computes the likelihoods of the image being one of 101 classes that are part of the Food 41 dataset. The model then determines which one is the most likely and outputs this along with a confidence score. Depending on this score the mobile app either displays the prediction or a heartfelt error message.

How we built it

Cole and Max divided and conquered on SeeFood. Cole built the mobile app and Max trained the neural network and wrote the inference server. We went through an arduous process of choosing which pre-trained weights and hyper-parameters to use for the neural network, running the training process on a Google Compute Engine instance outfitted with a Tesla V100 GPU to speed things up. Although we had a fair amount of firepower, it took about an hour to train a neural network, so all decisions had to be thoroughly calculated, each experiment carefully thought through. Do we want to use a ResNet or InceptionNet? Should we use dropout or L2 regularization? Each choice ate up a significant portion of our limited time. Eventually we got the network to a state of the art level of performance through diligence and good luck.

Challenges we ran into

Before Hack K-State Cole had very limited experience with machine learning and no experience with deep learning. This meant that we spent some time going through the foundations of the subject to solidify Cole's understanding, allowing us to move forward as a team. Finding hyper-parameters for a neural network is a very difficult task. It is necessary to be incredibly systematic and to think about which hyper-parameters to change. It is also hard to have the patience to let the model train to completion when you have way better ideas just a few minutes after beginning the process. We were frustrated with the results of training. Our validation set results were consistently underwhelming until the very end.

Accomplishments that we're proud of

We are proud of training a state of the art neural network and effectively using transfer learning. We are proud of using deep learning in an actual system meant to be used as more than just a learning exercise. We are proud of persisting.

What we learned

We learned the importance of having a solid understanding of the fundamentals of a subject before trying to use it, something that helped us out. We learned the importance of being systematic when dealing with processes that seem entirely random.

What's next for seefood

The first next step for SeeFood is moving inference to the edge. PyTorch has iOS and Android native modules, but we wanted to develop a cross platform app to speed up development and use pure JavaScript to do the same and increase our pace of development. Using CocoaPods can suck away hours of precious development time in case of an error which was not something we were willing to risk. I am particularly excited to get this work as we may have an opportunity to spearhead the development of a React Native module to interface with these PyTorch native modules.

After we move inference to the edge we will be able to use object detection to paint bounding boxes around multiple food objects in an image in realtime without being hampered by the latency of a datacenter round trip. We could paint new bounding boxes as quickly as the camera moves and also increase the scalability of the system by doing this.

See the code

The mobile app: https://github.com/cjwillenbring/SeeFoodJS The ML application: https://github.com/cjwillenbring/SeeFood

Share this project:
×

Updates

posted an update

After running training and validations on multiple models (VGG11, VGG13, VGG16, ResNet18, ResNet34, ResNet50, ResNet152, ResNet100, InceptionNet3) our model consistently overfit the training set, a problem we were unable to solve with traditional methods like regularization and using more data (we made the desperate blood sacrifice of our test set in order to feed the network). Max is also recovering from a serious back injury, and cannot justify neglecting his health in pursuit of the hack any longer. It was a good run (don't worry, we plan on coming back to this problem in the future but feel as if we have greatly underestimated the complexity of this problem in the present and you can expect to have updates on GitHub in a time to come).

Log in or sign up for Devpost to join the conversation.