We were inspired by one of our team members' intense passion for Geoguessr, combined with the incredible inference abilities of machine learning models, and decided to create an AI image recognizer which can used as a just for fun Captcha but also as a tool to help identify the location of significant pictures, such as missing people or historically nostalgic family photos.
This model takes in an image as an input and will output its best educated guess about where that picture is taken. Alternatively, there is also a Captcha-Geoguessr-esque activity where the model will show the user some pictures and the user will pinpoint their best guess on a map, and the model notes down how much error was in that guess.
To build this model, we used transfer learning to build upon a pre-existing EfficientNet model and specialize it to recognize different locations. We have trained this model on 600 thousand images paired with longitude and latitude coordinates through 5 epochs. Then, we also implemented a Captcha that can collect user data, and this user data will be averaged to find the average user best guess, and we'll feed that to our model so it can interpolate about human perception of geological similarities to make a more human-oriented educated guess.
Although some of us had some experience with prompt-engineering and working with LLMs, this was our first foray into image recognition. It was very ambitious to build a model that could specifically pinpoint some location, and it was hard to find a open source model that specialized in specifically in recognizing location instead of just regular objects. Although we found a comprehensive dataset, we also didn't know whether that dataset was filled with recent pictures or old and outdated pictures. We were also unable to train the model on all of the data due to time constraints because of how long it took us to even come up with a plan of action.
We're proud of even being able to produce a model that works end-to-end and a Captcha that has a user interface and can take in data from the user. Since none of us have had any experience creating user-oriented products before, this was a big step for us.
We learned a lot about what transfer learning is, and about the process of training a model on images instead of just text. The process of training the AI itself also gave us some insights on how to optimize our setup next time. Overall, it diversified our overall experience with AI.
We hope to train it on even more data, as we had found a dataset with 4 million images but due to time constraint were only able to use 600 thousand. Using the Captcha, we also hope to crowdsource human responses to further enhance our model. We also need to devise a way to get it to understand cultural similarities and customs so that it can associate different objects in pictures with a certain region, to minimize error.
Built With
- css
- html
- javascript
- pytorch
- sh
- yaml
- yfcc100m
Log in or sign up for Devpost to join the conversation.