When we watched the first few seasons of the TV show Silicon Valley a couple summers ago, neither of us seriously considered pursuing computer science as a career. And although we didn’t understand a lot of the theory behind their apps, their crazy ideas still intrigued us. One of them was Jian Yang’s SeeFood app which would classify food as either “Hot Dog” or “Not Hot Dog”. In the years since watching that episode, both of us have come to understand the impact of machine learning and computer vision, and now we are both seriously pursuing computer science. It is interesting how television gimmicks can stick with people, and we even occasionally discussed how funny it would be to replicate a hotdog app, but never had time to pursue it. However, when we saw Creatica suggest it in the “Far Out Track”, we knew this hackathon would be the perfect time to achieve a years-long dream. We wanted to see if we could replicate the app and even expand on it to classify multiple types of foods. And most importantly, we wanted to have fun while learning more about machine learning and web app development.
What it does
Our web app allows users to drag or upload an image to the application, and then the app will classify it as a hotdog, banana, or carrot with 92% validation accuracy and 94% training accuracy. The app uses a TensorFlow backend with a convolutional neural network that has been trained on thousands of images of hotdogs, bananas, and carrots, and it utilizes InceptionV3 for transfer learning. To use the web app, clone the Github repository and build it with docker; more instructions can be found on the Github repository. You can also learn more at the website we created documenting the app here. There are also a few clickable links on the web app, but you’ll have to find them yourselves!
We know that a fictional hotdog classifier from Silicon Valley motivated us to bring this fun idea to life, and we hope that others are also curious to see a real-life hotdog classifier! It would be a dream to have others go through our code to learn more about image classification. Since the code is very portable, it can be applied to other machine learning situations as well.
How we built it
First, we discussed our goals and what we could reasonably accomplish in two days. We had 3 goals in mind: 1) optimize the user experience (accuracy of images, aesthetic user interface, easy to load), 2) write well-documented code that is clear, beautiful and simple, and 3) most importantly, have fun. We did a brief literature search on what types of ML algorithms are best at image classification, and we decided that convolutional neural network with transfer learning was the best way to go. Next, we prototyped our project, where we clearly defined the architecture of our machine learning algorithm and decided on what kind of user interface we wanted. Then, we separated the tasks. Victoria was in charge of the backend and linking it to the app, while Gloria performed image validation, documented the process, and helped with front end tasks. Ultimately, the process was split into five sections: image validation and viewing, training the model, evaluating the model, linking the model to the app, and customizing the user interface.
Image validation included writing a script to validate images to
send2trash any images that could not be opened using
PIL.Image. It also included manually going through all the images to make sure that the pictures we downloaded were representative of what they were supposed to be (i.e. that hotdogs were hotdogs, carrots were carrots). We also wrote a script to view the images from the command line.
After image validation, we leveraged open source to find a reasonable convolutional neural network to start with, as seen here J-Yash's open-source code. The code, written in TensorFlow, already incorporated transfer learning, and we modified the model to do multi-class classification, instead of binary classification. We used
softmax activation instead of
sigmoid for the final layer of the network, and we played with the parameters to fine-tune the accuracy (notably, adding our own l2 regularizer). We realized that doing multi-class classification on the binary classification network resulted in less accuracy, so we also added three more convolutional layers. The architecture of our final CNN is as follows: there are five main convolutional blocks, followed by three fully connected layers. All of our filters are 3 x 3 matrices. In the first convolutional block, we train 16 filters; in our second block, we train 32 filters, max-pool, and regularize via dropout method. In the third block, we train on 64 filters and regularize via dropout method. In the fourth block, we train 128 filters and regularize dropout method. In the fifth block, we train 256 filters twice, before max pooling and doing dropout. We then flatten the data to feed into the fully connected layers, which ultimately output three values, each one representing a different food category. We use a
softmax activation for the final output, while we use
ReLu for all other layers. The trained models were saved under the “/creatica/code/model” directory in
In the testing phase, we tested our saved models on testing images that the model had never seen before. We found that our best models gave 92% accuracy, so we decided to save that model to use in the app. As part of the testing, we also outputted pictures that our model classified incorrectly, mostly for laughs.
We linked our Keras model to an open source Flask web app. This included modifying the
app.py code significantly, as their code used a pre-trained model that preprocessed data very differently from ours. Finally, we added customization to the user interface to make it more friendly, fun, and meme-y. We also tried to document our progress in a website, but we did not have time to completely finish it before the end of the Hackathon. You can find part of it here, where we used an open source website template.
Challenges we ran into
Our first problems came when we started collecting training images. We found that a lot of datasets were too small or didn’t have enough variety. We used a script to download thousands of images of hot dogs, carrots, and bananas from Stanford vision lab’s image net. However, a lot of the links to image sets, and to the images themselves, were broken, so we had to manually sort them out. We ran into an issue where we couldn’t find every broken image, leading to problems with the code. We tried to find the broken images for hours, but we simply couldn’t find them among the thousands we had downloaded. So we wrote a script to delete any image that couldn’t be opened by
PIL.Image.open. We also validated our images by making sure they could be stored with
ImageDataGenerator, which we used in preprocessing. In addition, the open source code we were using employed an older version of TensorFlow, and some of the functions were already deprecated. We had to fix all of those functions before we could start on creating our own network architecture. In the backend, when we tried to implement 4 classes, we saw that our validation accuracy dropped significantly, so we ended up only using 3, since we were aiming for accuracy. This is something we want to improve on in the future. In the frontend, we had trouble linking our TensorFlow backend to a web app so we ended up using open source for a template. However, the app code needed many modifications because we needed to put our own model onto it.
Finally, morale was occasionally a challenge because there seemed to be an endless number of computer bugs. Nevertheless we prevailed and we are happy that we created this app for the hackathon.
Accomplishments that we're proud of
We’re really proud that our validation and training accuracies are so close and so high, suggesting that our architecture for the network and parameters were well selected. Furthermore, we’re happy to say that we changed the classification from binary to multi-class, and the modules can now be run from the command line as well. We’re also proud that we customized our app to have a Silicon Valley theme, paying homage to our original inspiration. Finally, we’re proud that we worked so well together for our first virtual hackathon.
What we learned
Gloria: Victoria taught me more about the architecture behind neural networks and how convolutional neural networks function. I also became more familiar with HTML code.
Victoria: This was my first time building a convolutional neural network, as well as my first time building a web app. I feel like I learned a lot about going from a prototype to the finished product. I also learned more about image validation.
What's next for Seefood Classifier 2.0
Seefood can already classify more things than the original Seefood App from Silicon Valley, but we want to do even more. We want to add more classes of food to the app. In addition, our model could potentially be trained to classify non-food images such as MRIs/X-rays, different species, or even faces in the future. Finally, our app has potential as an accessibility service for the blind, as it can serve as a screen reader for images that might not otherwise have captioning. There are endless possibilities from here - our neural network is versatile and portable. It’s accessible on Github so anyone who has further ideas can build off our creation as well. In terms of the app interface, we’d like to host it publicly on a URL so that it is even more accessible.