Categorizing Fine Art

Who: Anna Dai (adai10), Rena Jiang (rjiang6), Ezra Muratoglu (emuratog)

Introduction: We want to implement a DL model to classify fine art images by genre, as proposed in the paper linked here: https://ieeexplore.ieee.org/abstract/document/8675906 We think that this will be a meaningful challenge because rather than classifying something with a hard ground truth, such as object recognition, our model seeks to categorize images semantically by artistic style.

Related Work: https://ieeexplore.ieee.org/document/5967323, https://hcis-journal.springeropen.com/articles/10.1186/s13673-016-0063-4, These two papers explore a similar task of painting genre classification, though the first one is limited in the number of genre/classes, and the second uses self-organizing maps instead of supervised neural nets.

Data: We will be using the Paintings Dataset for Recognizing the Art Movement (Pandora 18K) collection (http://imag.pub.ro/pandora/pandora_download.html), which comprises of 18038 images and 18 classes. The images are relatively evenly distributed across styles (each style ranging from 4-7% of the total dataset)

Methods: Option 1: One stage classification, or classifying image as a whole. Option 2: Two-stage classification, or splitting each painting into five patches, the four corners (L-shaped) and square centerpiece, training and classifying each patch individually, and then making a decision based on the predictions of those 5 patches. We can use GoogLeNet architecture, or something similar with a reasonable number of params (<15m)

Metrics: Base goal = 40% accuracy, Target goal = 50% accuracy, Stretch goal = 60% accuracy (Paper reaches baseline 64% accuracy with GoogLeNet architecture)

Ethics: What is your dataset? Are there any concerns about how it was collected, or labeled? Is it representative? What kind of underlying historical or societal biases might it contain? The labels of the dataset may potentially be disputable since paintings may fall into multiple genres or present more ambiguously, which would be reflected in the model's prediction. The dataset is also taken from the West, which has historically dominated fine art and marginalised Eastern and Polynesian art, though this is not a significant concern in this context since the movements/genres that we seek to label/classify primarily pertain to the West.

What broader societal issues are relevant to your chosen problem space? This kind of model could serve as a digital education tool, making the consumption of art more accessible, especially as classification of artistic style is traditionally a skill that requires both time and access to fine art and other educational resources.

Division of Labor: Anna–preprocessing, Rena–one-stage classification, Ezra–two-stage classification, All–troubleshoot/debug

Built With

Share this project:

Updates

posted an update

FINAL WRITEUP/REFLECTION

Final Reflection

Fine Art Classification: Anna Dai, Ezra Muratoglu, Rena Jiang

Introduction

Fine art has always played a significant role in society as manifestations of culture and records of history. The presence of art education in communities and school curriculums has grown, whilst several museums and galleries nationwide have participated in an effort to make fine art more accessible to the public. This project aims to facilitate the accessibility of fine art by identifying the genres of various paintings, which helps contribute to our understanding their context. A classification tool for people who encounter fine art on the internet could provide educational utility to groups who don’t typically have access to fine art resources. To do this, we designed and trained a convolutional neural network to classify paintings into their respective genres.

Methodology

To preprocess our input images, we had to choose a standardized input size for all paintings, which we set at a value close to the average input height x width (150 x 150), padding all inputs smaller than this size and cropping all inputs greater than this size. After training attempts on several variations of cnn architectures (including the number of convolutional blocks, filter number and size, fully connected layer size, and dropout layers), we found that the architecture detailed by the diagram on the right was able to achieve the highest testing accuracy.

Results

Classifying fine art is a particularly difficult task since paintings that belong to the same genre have an incredibly diverse visual appearance, and these labels are to some extent subjective. The deep learning model proposed by the paper from which this project is based was able to reach a baseline of 62.20% accuracy with AlexNet. Our model was able to perform about half as well with a 32.87% accuracy. Constraints.

Constraints

One of the main constraints for this project was access to computational resources. Since we were working with fairly high resolution images and running multiple CNN, the model trained quite slowly on our personal machines.

Challenges

Attempts we made to improve performance, and limitations we encountered included:

Trying different architectures, hyperparameter values, and loss functions, Using a smaller dataset than the paper due to time and memory constraints Having to crop/pad images to a uniform size, which may have taken away valuable information from paintings Training for less epochs with a model with less trainable params (<20 mil compared to >60 mil) due to time and memory constraints

Reflection

How do you feel your project ultimately turned out? How did you do relative to your base/target/stretch goals?

Our target/stretch goal was to reach about a 50% accuracy with our test set, and we were able to reach our base goal with a 32.87% accuracy. We found that our model could perform much better when constricted to a smaller number of classes, (up to 70-95% accuracy). Although we were ultimately unable to reach our target/stretch goal, we were able to understand why our model was not performing well, and how the dataset itself may have factored into this.

Did your model work out the way you expected it to?

While we did not have concrete expectations about the overall accuracy of our model, we were surprised to see how the accuracy behaved from a categorical lens. When we looked at the model’s accuracy in classifying each category of art, we were surprised to see that it assigned certain categories a probability of zero across the board, and further that when we retrained the model it was not always the same categories that were removed from consideration. We imagine that this behavior might be the result of not having enough trainable parameters; however, we were unable to increase our model complexity due computational resource constraints. We also consider that this behavior might be the result of convergence to a local minimum during gradient descent, where the model immediately rules out the possibility of certain genres being classified.

How did your approach change over time? What kind of pivots did you make, if any?

We started off with a very basic CNN model using one convolutional layer, three dense layers with dropout and relu, and softmax as a baseline from which to optimize. There was a lot of experimentation and trial and error in including maxpool layers, batch normalization and tuning hyperparameters. We also made several changes in the preprocessing component, including figuring out an optimal input size that struck a balance between computation time and accuracy, as well as choosing how to deal with inputs that were smaller or larger than the chosen input dimensions (for instance, for inputs larger than 150 x 150, we could either resize without preserving aspect ratio, resize with preserving aspect ratio and pad empty space, or crop at some corner or centrally. Similarly, for inputs smaller than 150 x 150, we could scale larger and crop, or pad with zeros.) Ultimately, we ended up choosing Keras’ resize_with_crop_or_pad (over resize and resize_with_crop).

What would you have done differently if you could do your project over again? What do you think you can further improve on if you had more time?

Further steps could include implementing a two-stage classification process, where images are split into five pieces and individually classified in an intermediate step before a final classification based on the parts.

To address the issue with the model assigning zero probability to certain categories a priori, we might consider adding a term to our loss function that penalizes for weights being near zero.

What are your biggest takeaways from this project/what did you learn?

In terms of implementation, coding the model itself was fairly straightforward using the keras library. We found however that the data we used was fairly hard to work with, somewhat disorganized, and of variable size and resolution. In a course like Deep Learning where we receive orderly, clean data for all of our assignments, we are never forced to deal with these issues, so this project was a strong reminder that data in the real world is seldom pretty—preprocessing is about half the battle!

While we were tinkering with our model, we also found that it wasn’t always intuitive what effect changing a certain hyperparameter would have on the performance of the model. We certainly had many lively debates about these decisions, but ultimately the buck stopped with trial and error. This was a good reminder that while the deep learning toolbox is a powerful one, it is seldom interpretable.

Log in or sign up for Devpost to join the conversation.

posted an update

Introduction We want to implement a DL model to classify fine art images by genre, as proposed in the paper linked here: https://ieeexplore.ieee.org/abstract/document/8675906. We think that this will be a meaningful challenge because rather than classifying something with a hard ground truth, such as object recognition, our model seeks to categorize images semantically by artistic style.

Challenges The data preprocessing portion involved a lot more steps than our class assignments. Although we were able to find an existing dataset, its images were not consistently sized, ranging between 117 x 249 and 3000 x 2530. In order to use a sequential model, we needed to resize the images (ideally to 600x600, which is close to the average height / width across the dataset). For images smaller than the target size, this meant padding; for images larger than the target size, we had to shrink them. In addition, this size is much larger than the images we have been using in class— MNIST images are around 28x28. We are able to run our model with a target size of 100x100, but we are still looking for a workaround (perhaps using gcp credits to train) so we can increase the image sizes.

Insights At 12 categories, a randomly guessing model would have an accuracy of 8% ish. Currently, with a target size of 100x100, we have an accuracy of roughly 32%. This is much lower than our target accuracy of 50, but hopefully, by adding additional max pool layers and finding a feasible way to feed larger images into our model, we can increase performance.

Plan Our foundational model is set, and runs successfully. We need to dedicate more time to addressing the issues described under Insights.

Log in or sign up for Devpost to join the conversation.