Inspiration

The brief of the Durham Council Challenge sounded incredibly exciting and stood out amongst the rest with its impact on the real world. It would serve to remind generations about the history of the area they grew up in, and possibly even discover some secrets they can examine with their families. As the theme of this hackathon was Unity, we aim to use this project to unite the past with the present.

What it does

The project is a modular program that finds whether two images might be connected in some way. The ways we checked for those connections is through checking the similarity between two images, through both computer vision and existing captions. For image similarity using image processing, we used YOLO and OpenCV to detect objects and people within images, map the recognized items to vectors and then calculate distances between the vectors to gauge image similarity based on the number (or lack thereof) of the objects. For the image similarities through existing captions, we tried to find similar images to a specified image based on the image descriptions. Nltk library was used to extract all the nouns from the descriptions. Then, we compared the list of nouns of the specified image with the nouns of images from the full dataset and picked the ones with the most common nouns. Some descriptions were very general and had a lot of similar images, so we narrowed it down by filtering by categories and location.

How we built it

The challenge was mainly divided into 4 sub-sections: similarity through images, similarity through textual captions, colourization and recognition of age, objects and gender of people in the images. Since there are 4 members of our team, we decided to split the task so that each member was responsible for one section. Then, in the end, we were going to put our code together into one big piece that could perform a number of tasks on the image dataset. In the end, only two subsections were completed (the reasons why the other two subsections weren’t finished will be detailed in “Challenges we ran into”). We couldn’t have built this project without certain libraries. Numpy and OpenCV were particularly useful; OpenCV provided us with options for image processing, while Numpy allowed for array manipulation.

Challenges we ran into

About the colourization of the images:

We had run through a number of pre-trained NN models (but skipped on training my own, as that would take more time than we had to begin with) including one from the "Colorful Image Colorization" 2016 study by Zhang et al., using OpenCV to do the work - but having had no deep learning experience to start with, it was an uphill battle to implement anything, even if it was taken in bulk from online sources. Google Colab presented extra challenges, like rejecting python files as imports (which shut down one entire solution to the problem) and most models took in images of fixed size (usually 256 x 256), which was not helpful. A large amount of the 3000 images were of variable size and cropping them all would lose us a lot of information and so would resizing and upscaling.

About recognition of objects, age and gender:

One issue that we encountered is the dataset that YOLO is commonly trained on, COCO. The dataset doesn’t contain any building or plant objects from which YOLO could learn. However, there are multiple images that exclusively have plants or buildings. In short, we would need a different object recognition algorithm. One dataset that we found that could help combat this inaccuracy is Universal Genome, which is made up of over 108,000 pictures. Unfortunately, we were only able to find one R-CNN algorithm that was trained on the Universal Genome (made by shilrley6, link: https://github.com/shilrley6/Faster-R-CNN-with-model-pretrained-on-Visual-Genome ), but we were unable to implement it in our program. And because training our own R-CNN or YOLO algorithm would take many hours, we were stuck trying to make YOLO work.

Accomplishments that we're proud of

Despite our lack of contact with Natural Language Processing and Deep Learning, the endeavours we endured with it were quite successful. This was quite a pleasant introduction to these topics and prompted us to think about possibly pursuing them further in the third year of our degree. It also prompted additional research that taught us about extra-curricular concepts and subjects of research we might wish to delve into further (for example, YOLO vs R-CNN approaches to object recognition). Despite how simple the ideas for some of the sub-sections were, they worked out surprisingly well.

What we learned

We learned that both understanding and using deep learning programs and models takes a lot of work and effort! Theory is almost necessary to begin understanding any sort of code that you write for deep learning and the like, and we aim to learn even more about the topic after being thrown into the deep end. In addition, this intense 24 hours has given us invaluable teamwork experience under pressure on a completely unknown task, which is a huge plus.

What's next for Image similarity detector

We would love for this project to become the first step towards adding extra user functionality and more usage out of the council image database. Although we were only given 24 hours and did best with the time we had, the two subsections we completed can be used without problem and just would need to be implemented into the website search system in order to work properly. Aside from the linking with the website, the two extra subsections we did not have time for could be implemented too if given more time. For the textual similarity, a feature could be added where the images suggested wouldn’t repeat if they were already viewed and for visual similarity, we could amend the algorithm to consider different distance metrics for calculation than only Euclidean distance, which we used in our program. Combining the two approaches to make the suggesting system even better could also be a good next move.

Share this project:

Updates