Inspiration

One team member had already been working on manually scraping data for training an intelligent flagging solution for an extremist vocabulary of violent misogynists and far-right sympathizers in the clearnet when we discovered this hackathon and, specifically, the "PS ID-03 Dark Web Crawler" challenge. A second team member also studying toward the same MSc Cyber Security was readily available.

When we compared several screenshots of illegal items listed for sale on darknet marketplaces, we took notice of searchable and therefore crawlable similarities that would work in all geographic areas and natural languages. 1) Objects were usually displayed on a neutral background (greyscale or uni-colored). 2) Next to strings of text describing the objects, prices were given in one or several cryptocurrencies, not in FIAT currencies as with legitimate sales offers.

What it does

Our solution scrapes the web domain of your choice for "firearms image" + "any 3- or 4-letter cryptocurrency call-sign", returning image files with their URLs. This could be eBay or any darknet image board or marketplace. All (suspicious) image files are then tested against our custom-built computer vision model, resulting in a collection of images containing firearms being sold illegally being returned for human agent inspection.

How we built it

Using the Anaconda data science platform, we installed Python 3.11, OpenCV, and Scrapy. We created our own library of 3- and 4-letter cryptocurrency callsigns taken from CoinMarketCap (Thanks for pointing us there, Albert!) as well as custom object detection for firearms trained with freely accessible image data published after police raids.

Challenges we ran into

We could not legally train our model with any data that is inherently illegal, e.g. CSAM or violent pornography, therefore we settled on "illegal firearms sales" for this hackathon.

Similarly, we had to rely upon open-source components, so that no cloud-based service would just censor or reject our data and model, rendering our work pointless.

Accomplishments that we're proud of

Forming a team of two in a matter of hours after learning about this hackathon.

Successfully wading through a sea of no- and low-code solutions for web crawling/scraping and data mining plus their developers' rather aggressive marketing.

What we learned

How laws and regulations can actually hinder tech4good software development in some cases.

Not to scrape images for training purposes directly from the darknet, for security reasons.

What's next for Computer Vision Crawler for Dark Web Image Boards

We are looking for (law enforcement) agencies we could team up with for adding further categories to this crawler.

Resources we used

https://bitcoin.stackexchange.com/questions/20745/where-can-one-find-the-listing-of-the-3-call-letters-for-all-of-the-altcoins

https://coinmarketcap.com/

https://www.youtube.com/watch?v=bUoWTPaKUi4

https://www.datacamp.com/tutorial/web-scraping-using-python

https://towardsdatascience.com/how-to-build-a-weapon-detection-system-using-keras-and-opencv-67b19234e3dd

https://www.scrapestorm.com/?type=tutorial

https://azure.microsoft.com/en-us/products/ai-services/ai-custom-vision

What others have created before

https://arxiv.org/abs/2105.01058

https://viso.ai/application/weapon-detection/

https://www.unite.ai/kogniz-introduces-computer-vision-platform-for-gun-detection/

https://www.sciencedirect.com/science/article/pii/S1877050915014076

Built With

Share this project:

Updates