Inspiration

Traffickers rely heavily on internet-based tools

  • Coinciding with wide availability of social media tools, trafficking in the US has risen several-fold
  • How can we harness technology to use those same tools to identify trafficking activity?

Trafficking experts are fighting the good fight - but can't be everywhere at once

  • Organizations working to combat human trafficking are uncovering invaluable insights
  • How can we get these insights into the hands of local law enforcement and other resources that function on smaller scales?

What it does

End-users input raw html (as a text file) into the tool, and the tool will:

  1. Parse html to pull out key characteristics of each post:

    • Post identifier
    • Date posted
    • Time posted
    • Location (City and "Sub-Area")
    • Title and body text
  2. Compare results against known suspicious characteristics

How we built it

Python's BeautifulSoup was used to parse the raw html, along with additional text manipulation to get cleaned output.

Challenges we ran into

We had wanted to use clustering of known suspicious personal ads to identify insights that drive the flagging of trafficking-related posts. However, late issues with our datapull resulted in fewer items from the library of suspicious posts, so this is left as a next step (see below).

Accomplishments that we're proud of

Lots of great learning and exposure to new methods and tools!

What we learned

Ambitions easily exceed time- and resource-constraints in hackathon environments. (And we're too old to pull all-nighters.)

What's next for PersonalsParser

1. Optimize list of known suspicious characteristics

  • In partnership with learnings from expert anti-trafficking organizations

  • Several approaches, including:

    • Use keywords/phrases gleaned from partnered orgs' research
    • With posts flagged as suspicious by experts, use data mining to identify key characteristics

2. Adapt for new sources?

  • As of this week, Craigslist has replaced all pages in its Personals category with this note:

US Congress just passed HR 1865, "FOSTA", seeking to subject websites to criminal and civil liability when third parties (users) misuse online personals unlawfully.

Any tool or service can be misused. We can't take such risk without jeopardizing all our other services, so we are regretfully taking craigslist personals offline. Hopefully we can bring them back some day.

  • Consider applying within other Craigslist categories:

    • Services category (including "skilled trade" page)
    • Gigs category (including "labor" and "domestic" pages)
  • The tool can also be adapted to other social media sites' html structures

Built With

Share this project:

Updates