Inspiration
We believe current reCAPTCHA v.3 has few problems. First, it is actually hard to prove myself to be not robot. It is because Machine Learning is advancing everyday, and ImageToText's (Computer Vision) accuracy is also skyrocketing. Thus, CAPTCHA question images have to be more difficult and vague. Second, the dataset used for current CAPTCHA is limited. It becomes predictable as it's repeating its questions or images (All of you should have answered "check all the images with traffic lights"). In this regard, several research paper has been published through Black Hat using Machine learning models to break CAPTCHA.
What it does
Therefore, we decided to build a CAPTCHA system that would generate a totally non-sensical picture, and making humans to select the description for that AI-created photo of something 'weird'. As it will be an image of something that is non-existent in this world, machine learning models like ImageToText will have to idea what the matching prompt would be. However, it will be very clear for human even though the images might not be 100% accurate of the description, it's obvious to tell which prompt the AI try to draw. Also, it will randomly create image from scratch every time, we don't need a database having thousands of photos and prompts. Therefore, we will be able to have non-repeating 'im not a robot' question every single time -> No pattern, or training data for malicious programs. Very easy and fun 'Im not a robot' challenge.
How we built it
We used AI-painting model called 'Stable Diffusion', which takes a prompt as an input, and creates an image of the prompt. The key of our CAPTCHA is that the prompt that we feed in to this model is absurd and non-existent in real world. We used NLP APIs provided by Cohere in order to generate this prompts. Firstly, we gathered 4,000 English sentences and clustered them to groups based on the similarity of topics using Cohere's embed model. Then, from each clusters, we extracted on key words and using that keywords generated a full sentence prompt using Cohere's generate model. And with that prompt, we created an image using stable diffusion.
Challenges we ran into
As stable-diffusion is a heavy computation and for sure needed GPU power, we needed to use a cloud GPU. However, cloud GPU that we used from paperspace had its own firewall, which prevented us to deploy server from the environment that we were running tests.
Accomplishments that we're proud of
We incorporated several modern machine learning techniques to tackle a real world problem and suggested a possible solution. CAPTCHA is especially a security protocol that basically everyone who uses internet encounters. By making it less-annoying and safer, we think it could have a positive impact in a large scale, and are proud of that.
What we learned
We learned about usability of Cohere APIs and stable diffusion. Also learned a lot about computer vision and ImageToText model, a possible threat model for all CAPTCHA versions. Additionally, we learned a lot about how to open a server and sending arguments in real-time.
What's next for IM NOT A ROBOT - CAPTCHA v.4
As not everyone can run stable diffusion on their local computer, we need to create a server, which the server does the calculation and creation for the prompt and image.


Log in or sign up for Devpost to join the conversation.