Hackathon data meta-analysis

Inspiration

This was my first hackathon event, and I was immediately struck by how skilled and interested people were in tech. I love how hackathon events have recently exploded in popularity, but I think that promoting access to these events in super techy (but less advantaged) areas of Europe is important.

How it works

I built a web crawler (Python, scrapy package) to get hackathon locations from the last 1.5 years from the hackerleague website. This didn't actually work out (curse dynamically loaded pages!), so I had to clean and regex some raw text data. Using R, I could visualise where hackathons are held across Europe and how often.

Next, I wanted to build up a Europe-wide map of tech interest. I did this by mining tweets that had keywords such as "hacking", "coding", "programming" and "analytics" and visualising them on a map of Europe (the Twitter mining was done both using the arcgis/esi map tool and by building a Twitter bot in Python). This let me see where there exist populations of techy geeks, and allowed me to compare it to the distribution of hackathon events.

Finally, using the arcgis tool, I could visualise economic prosperity across regions in Europe and also compare this to provision of hackathons.

Challenges I ran into

Too many to list. Most notable among these is the difficulty of web crawling dynamically loaded pages and the lack of documentation for fetching information from json format tweets in python. I had to use workaround solutions for both of these problems (good old data cleaning and regex in both cases).

Accomplishments that I'm proud of

I built my first ever web crawler, and my first ever Twitter bot! I did not actually think that I would have the time to teach myself these things and implement them within a day. Pleased with myself level: Kanye West.

What I learned

I mostly code in R, and typically only use Python for support tasks like restructuring weird data into nicer formats before I can upload it into a relational database. This project completely put me outside of my comfort zone and I've learned a lot.

What's next for Hackathon data meta-analysis

Most importantly, I need to figure out how to properly overlay the information sources so I can make prettier visualisations and analyse them quantitatively. Quantitative information would also allow us to track any effects of holding a hackathon in less catered to and advantaged areas (example sensationalist headline: "holding a hackathon in a poorer area increases tech-related tweets by 400% for 1 month!"). The twitter json result processing needs some serious love - this would involve using a proper json decoder rather than simple text mining. Also, the hackathon event data is probably quite incomplete and should be aggregated from multiple sources rather than just hackerleague.

Built With

crawl
esi
esri
ggmap
ggplot
python
r
scrapy
twitter

Updates

Natasha Latysheva started this project — Apr 12, 2015 06:18 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.