Dino-Nuggetology

Inspiration

Initially, we had an extremely difficult time trying to think of dinosaur themed topic, as we were only thinking about topics related to dinosaur fossils and how me might use fossil data to predict their lifestyles, diet, locations, etc. None of this particularly excited us, and we were stumped until we eventually landed on the topic of lunch. As it turns out, we all shared the experiences of having the most delicious food known to mankind, and decided that the food was dinosaur related enough to be the topic of our project. That being said, our project is inspired by our mutual love for dinosaur shaped chicken nuggets (dino-nuggies for short) and the shared frustrations and joylessness caused by the appearance of misshapen nuggets.

What it does and how we built it

We attempted to created a machine learning model that can take in a RGB image of a dino-nuggie and classify it as normal or misshapen.

This project took quite a bit of data processing, as we had to collect and clean the data ourselves (since there aren't any dino-nuggie image datasets available to our knowledge). With this data, we tried several models: A one-class SVM, a convolutional neural network, and a convolutional autoencoder. Both the one-class SVM and CNN did not work well at all (less than 60% accuracy), so we focused on the convolutional autoencoder.

Our idea was to train a model to learn representation of normal nuggets, so it will fail to reconstruct deformed nuggets. We set a threshold of error for reconstruction to detect if a nugget is deformed or not. The encoder and decoder consist of 3 convolution layers and 1 dense layer. The convolution layers help reduce the dimension of the input, and the dense layer flattens the image into a 3000 row latent representation. For the activations in the intermediate layers, we used ReLU, and for the final output, we used sigmoid. This is to force the output values to range from 0 to 1, which is how the images are represented.

Challenges we ran into

Processing the images was extremely finnicky at times, as we were not familiar with the various image processing packages and the best way to work with image data. Our dataset was also fairly small, as we only had about 228 images of unique dino-nuggies, which could've led to some bias in our model. After augmenting out dataset to have more samples, our dataset quickly became too large. With how large our augmented dataset was, this processing took a long time and was hard to work with, as it would take too long to constantly train and retrain our models when tweaking hyperparameters. In addition to this, we were also not too familiar with using packages like pytorch and tensorflow, so a lot of time was spent simply debugging. Time was also a limiting factor in our project as more complex models that may have performed better could not be explored.

Accomplishments that we're proud of

What were most proud of is the fact that we had a model that worked as well as a demo that could be used to classify any RGB jpg nugget image, even though it might not have been extremely accurate yet.

What we learned

We learned that more linear neural networks (non-convolution neural networks) struggled with the images. Convolution neural networks also seemed to work quite well for images.

For future projects, time management is also essential especially when we are collecting our own data.

What's next for Dino-Nuggetology

Improving our classifier and looking towards multi-class classification would be our ideal next step. We could move past just anomaly detection and look towards also classifying nuggets based on dino-species.

Built With

Submitted to

DataHacks 2024
- Winner 1st Place

Created by

I worked on project design methodology, Python scripting for data modeling and analysis, as well as creating the deliverables for the final presentation.

Penny King
I worked on image processing and feature engineering to create data that could be easily worked with.

Garvey Li
I worked on training the autoencoder. I also integrated the model into a demo notebook for our presentation.

So Hirota

Updates

Penny King started this project — Apr 07, 2024 07:56 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.