Our project focused on using information contained in online postings of escort services (Thorn - Digital Defenders of Children Dataset) to distinguish between posts by individuals and organized syndicated. Through a combination of natural language processing and heuristic information about the posts, we have been able to identify hundreds of crime rings that operate in multiple states soliciting services of numerous women. By analyzing the textual information within each ring, we further determine which rings are more likely to exploit underage women. We believe that identification of crime rings will allow law enforcement to better focus their efforts to fight sexual exploitation and human trafficking.

One of the most descriptive attributes of an online escort service post is the provided phone number. These have to be legitimate as the posters want the readers to be able to reach them. But we can’t simply identify crime rings by looking at the number of posts associated with a single phone number: it is possible for an individual to repeatedly post the same ad with the same contact information. We resolved this by scannings the posts for the contained female names, and considering the geographic information. We then identify an organized crime ring by focusing on a single phone number that is associated posts contained many female names and multiple geographic locations. The single phone number likely connects to a dispatches who then coordinates the services. Some of the identified crime rings are shown below.

Identified crime rings. The color corresponds to a single phone number. My image

Identifying crime rings by a phone number is too limiting: a single crime ring could easily employ multiple dispatchers. So we looked for similarity in the text of the posts across multiple phone numbers. We vectorized individual post texts by converting them to a bag of words after normalization of the texts and stop word removal. Next using these wordvectors, we computed the average Jaccard similarity between sets of phone numbers. This allowed us to identify when multiple phone numbers actually belong to the same "crime ring". Using this method, we found 22 connections among the top 50 phone-based crime rings. The longest chain consisted of five separate phone numbers. An example of two of such rings, along with the similar posts in each are shown below.

Five phone-based crime rings linked together because their postings used similar text. My image

An example of similar posts with different phone numbers and distinct female names. We identify these as belonging to the same crime ring.

hey gentlemen i'm sexy, sophisticated, passionate provider. sweet, bubbly personality love building exciting new relationships! im perfect life size barbie!!! enjoy sweet friendly attitude, never rush, allowing relax enjoy time together. soft sweet voice, embracing passionate touch. (15/60) (30/100) (60/150) call now!!! 312*600*8208 ask kandi;) 
hello gentlemen i'm sexy, sophisticated, upscale busty red haired bombshell. sweet, bubbly personality love building exciting new relationships! i'm perfect life size busty pin-up cutie !!! enjoy sweet friendly attitude, never rush, allowing relax enjoy time together. soft sweet voice, embracing passionate touch. specials dont forget ask them!!! xoxoxo sexy amy 857**891**0252

Focusing on the textual content of the posts within each "crime ring" also allowed us to find crime rings that are more likely to exploit underage women. The posts from some of the top crime rings were much likely to contain key words associated with underage prostitution (‘young’, ‘high school’, ‘barely legal’) and sex trafficking (‘new in town’, ‘visiting’) etc. This likelihood could allow law enforcement to better utilize the resources to combat these crimes.

Built With

Share this project:

Updates