We were inspired by the Google Cloud presentation on Machine Learning and wanted to use the knowledge we have a acquired so far to explore a new topic.
So far, we have used a reddit API to collect a huge amount of data containing posts on r/pics. Though we hope to target all social media, reddit was the easiest for us to access data from in the span of one day. Specifically, we collected posts' picture, title, and time posted with intent to use the three pieces of information to predict upvotes and comment count for a post.
However, the majority of our time was spent learning about Google Cloud and figuring out how to run a project using the resources available. We hit many roadblocks in this process, but learned a lot about how cloud computing works. Additionally, once we got our google virtual machine fixed and ready with the packages and capabilities we needed, We began to work on formatting our reddit data. This turned out to take a lot longer than we had anticipated and pushed our workflow back heavily.
At this point, we have began prototyped a neural network using Tensorflow, and focusing on image analysis. Currently when given 80% trial and 20% test data from our reddit post table, our program gives us: "Parameters have been trained! Train Accuracy: 0.8718593 Test Accuracy: 0.92"
Whats next for us is continuing to work on this program, improving the neural network in the following ways: 1) Improving the efficiency of our data processing so we can actually utilize the hundreds of thousands of posts we have. 2) Continue to research more about neural networks trying out specific methods so we can improve our prediction accuracy 3) Figuring out how to consider the posts datetime and title/message
And in the future, we hope to expand this program into other social medias like Instagram and Twitter.
Built With
- google-cloud
- python
- tensorflow
Log in or sign up for Devpost to join the conversation.