Final Reflection

Fine Art Classification: Anna Dai, Ezra Muratoglu, Rena Jiang

Introduction

Fine art has always played a significant role in society as manifestations of culture and records of history. The presence of art education in communities and school curriculums has grown, whilst several museums and galleries nationwide have participated in an effort to make fine art more accessible to the public. This project aims to facilitate the accessibility of fine art by identifying the genres of various paintings, which helps contribute to our understanding their context. A classification tool for people who encounter fine art on the internet could provide educational utility to groups who don’t typically have access to fine art resources. To do this, we designed and trained a convolutional neural network to classify paintings into their respective genres.

Methodology

To preprocess our input images, we had to choose a standardized input size for all paintings, which we set at a value close to the average input height x width (150 x 150), padding all inputs smaller than this size and cropping all inputs greater than this size. After training attempts on several variations of cnn architectures (including the number of convolutional blocks, filter number and size, fully connected layer size, and dropout layers), we found that the architecture detailed by the diagram on the right was able to achieve the highest testing accuracy.

Results

Classifying fine art is a particularly difficult task since paintings that belong to the same genre have an incredibly diverse visual appearance, and these labels are to some extent subjective. The deep learning model proposed by the paper from which this project is based was able to reach a baseline of 62.20% accuracy with AlexNet. Our model was able to perform about half as well with a 32.87% accuracy. Constraints.

Constraints

One of the main constraints for this project was access to computational resources. Since we were working with fairly high resolution images and running multiple CNN, the model trained quite slowly on our personal machines.

Challenges

Attempts we made to improve performance, and limitations we encountered included:

Trying different architectures, hyperparameter values, and loss functions, Using a smaller dataset than the paper due to time and memory constraints Having to crop/pad images to a uniform size, which may have taken away valuable information from paintings Training for less epochs with a model with less trainable params (<20 mil compared to >60 mil) due to time and memory constraints

Reflection

How do you feel your project ultimately turned out? How did you do relative to your base/target/stretch goals?

Our target/stretch goal was to reach about a 50% accuracy with our test set, and we were able to reach our base goal with a 32.87% accuracy. We found that our model could perform much better when constricted to a smaller number of classes, (up to 70-95% accuracy). Although we were ultimately unable to reach our target/stretch goal, we were able to understand why our model was not performing well, and how the dataset itself may have factored into this.

Did your model work out the way you expected it to?

While we did not have concrete expectations about the overall accuracy of our model, we were surprised to see how the accuracy behaved from a categorical lens. When we looked at the model’s accuracy in classifying each category of art, we were surprised to see that it assigned certain categories a probability of zero across the board, and further that when we retrained the model it was not always the same categories that were removed from consideration. We imagine that this behavior might be the result of not having enough trainable parameters; however, we were unable to increase our model complexity due computational resource constraints. We also consider that this behavior might be the result of convergence to a local minimum during gradient descent, where the model immediately rules out the possibility of certain genres being classified.

How did your approach change over time? What kind of pivots did you make, if any?

We started off with a very basic CNN model using one convolutional layer, three dense layers with dropout and relu, and softmax as a baseline from which to optimize. There was a lot of experimentation and trial and error in including maxpool layers, batch normalization and tuning hyperparameters. We also made several changes in the preprocessing component, including figuring out an optimal input size that struck a balance between computation time and accuracy, as well as choosing how to deal with inputs that were smaller or larger than the chosen input dimensions (for instance, for inputs larger than 150 x 150, we could either resize without preserving aspect ratio, resize with preserving aspect ratio and pad empty space, or crop at some corner or centrally. Similarly, for inputs smaller than 150 x 150, we could scale larger and crop, or pad with zeros.) Ultimately, we ended up choosing Keras’ resize_with_crop_or_pad (over resize and resize_with_crop).

What would you have done differently if you could do your project over again? What do you think you can further improve on if you had more time?

Further steps could include implementing a two-stage classification process, where images are split into five pieces and individually classified in an intermediate step before a final classification based on the parts.

To address the issue with the model assigning zero probability to certain categories a priori, we might consider adding a term to our loss function that penalizes for weights being near zero.

What are your biggest takeaways from this project/what did you learn?

In terms of implementation, coding the model itself was fairly straightforward using the keras library. We found however that the data we used was fairly hard to work with, somewhat disorganized, and of variable size and resolution. In a course like Deep Learning where we receive orderly, clean data for all of our assignments, we are never forced to deal with these issues, so this project was a strong reminder that data in the real world is seldom pretty—preprocessing is about half the battle!

While we were tinkering with our model, we also found that it wasn’t always intuitive what effect changing a certain hyperparameter would have on the performance of the model. We certainly had many lively debates about these decisions, but ultimately the buck stopped with trial and error. This was a good reminder that while the deep learning toolbox is a powerful one, it is seldom interpretable.

Log in or sign up for Devpost to join the conversation.