I'd really like to be social-media famous, but I don't want to put in the time and effort to get there. What if I could use an algorithm to suggest content? I'd be able to generate tweets that are popular with my followers who can spread my words of wisdom and attract new followers.

What it does

Twitter data is read into a MongoDB instance. From there, a series of asynchronous and parallelized data processes form intermediate representations and eventually build out data that can be used to create Tweets.

How I built it

I built a Flask web app that handles the API that is exposed to consumers of the data. From there, I built a job queue service that handles requests for data asynchronously and in parallel. This includes special handlers for making API requests.

Challenges I ran into

The process of debugging jobs running asynchronously was the biggest hurdle to meeting my project timeline. I was able to develop a process where I could make each step repeatable, but in the end I couldn't totally solve more complex operations where I'd need to track two different jobs simultaneously.

Accomplishments that I'm proud of

Building a successful data pipeline that has the potential to handle far larger quantities of data. At this point, I'm limited by the rate at which I can get data out of Twitter instead of the compute time or disk read-write.

What I learned

Start simple, and make data changes repeatable. There's no need to build in more complex options into a system ahead of future perceived slowdowns when the code doesn't run the important bits yet.

What's next for Oshen

More complex analytics. I'd like to move into sentiment analysis and deeper analysis of the best time of day for my tweet engagement, as well as a scraping utility that looks for new web content that matches suggested material.

Share this project: