I often rely on Facebook carpool to get rides home during University breaks. Carpooling is great because it cuts down on emissions from transportation, but finding a ride through Facebook wasn't always easy. The ambiguous text-based format of Facebook groups meant that you might not be able to find the ride you need just through the search bar. I spent a lot of time just scrolling through my local University's carpool group to see if someone had posted the ride I listed.

What it does

I built an API to analyze posts for any public Facebook carpool group. Posts are first filtered for spam, then classified into driver or rider and one-way or round-trip. Next, the coordinates and times noted in the post are extracted.

EX. Post: "Looking for a ride to va beach tuesday morning. Checking for the off chance that anyone is leaving for Virginia Beach early-ish tuesday morning, but my plans are still tentative. Will pay for gas if I come!"

Driver: no Routes: ['start': {u'lat': 36.8529263, u'lng': -75.97798499999999}, 'end': {u'lat': 38.0335529, u'lng': -78.5079772}] Dates: ['11-22-2016']

How I built it

I used a Facebook scraper to collect posts from carpool groups, then annotated a small set of posts (~300), classifying them and also tagging locations that appeared. The classifiers were trained using the sklearn library tf-idf to construct the feature set, and then exported to pickle. I trained StanfordNLP's named entity recognizer CRF model using my tagged data, and it performed around 90% accuracy. I obtained coordinates for the locations using Google's places API. To extract the dates, I used a modified python wrapper of StanfordNLP's SUTime, a temporal extractor that works by using regular expressions. I ported these methods into a Django app to be run as a Rest API. Lastly, I worked on using the API to create a website to display the routes on a map and calendar, that could be hovered over to display the original Facebook post.

Challenges I ran into

It was challenging to get all the natural language processing and machine learning libraries to work together. The stanfordNLP library was originally in java, so I used a python wrapper involving jar files to port it to python. I ultimately decided to build a rest API to modularize the Facebook post retrieval and processing of coordinates and times. This means other teams could reuse this API for Chrome extensions or other projects. Converting the colloquial names of places into coordinates was also tricky! I used Google's places API to make queries given the colloquial name and the home location (of the FB group) context.

Accomplishments that I'm proud of

I'm really proud of how hard I worked on this project. I felt really motivated with the theme of HackDuke being Code for Good! I didn't have time to do a Chrome extension as I had planned, but I am happy with the results.

What I learned

I learned how to adapt to changes in schedule. I also learned a lot about current natural language processing techniques and how to train classifiers and models.

What's next for Carpool Finder

Next is the Chrome extension!!

Share this project: