When public photographic collections are digitised it takes a lot of time for the stuff of GLAMs to describe the images and then the images are still only searchable by text. But descriptions are language specific, they contain errors etc. For engaging users in geotagging the images on our crowdsourcing platform Ajapaik.ee (currently also putting up a Norwegian version Fotodugnad in collaboration with K-Lab) we first need to sort out outdoor images from all the rest.
Machine learning can do a lot of image categorisation tasks and it would make GLAMs' work with picture collections much easier if AI image categorisation was also incorporated into the workflow. There are more and more commercial AI services available, but we set out to create an easy self built solution to do the task.
How we built it
We started to train an algorithm with LIBSVM to detect outdoor/indoor images. The training and initial test images came from Estonian, Finnish and British collections that have been tagged on a crowdsourcing platform Sift.pics. Later we also retrieved images from Digitaltmuseum.no to test our algorithm.
Challenges we ran into
We had little time and we not able to perform image segmentation that would probably have given better end results.
Accomplishments that we're proud of
We got results with around 65 % accuracy! This is not too good but still for first phase categorisation it is helpful.