Introduction

Howdy! This may come as a surprise, but we're a small team of General Engineers and an Electrical Engineer dabbling in a little data science. But what's more interesting is that we serve as leaders for our Asian American Christian Ministry on campus, Epic Movement. We're in charge of reaching out to new students, and a big way we do that is through our social media accounts on Instagram and Facebook. So as someone who already knows the struggle of always wanting the most effective social media posts, my team and I were eager to take a deep dive into solving our problem with machine learning.

How we built it

To start us off, we learned to preprocess our data. And select specific features that we concluded would best represent and heavily influenced the success of a social media post based on model results and inferences.

Before training, we split our train_data in typical 80/10/10 fashion. We used convolutional layers mixed with max pooling layers and turned the crank on activation functions. This tests and improves our model's capabilities for increasing the significance of small values and large values in our dataset.

ensemble

Additionally, we applied the concept of ensembling, which allowed us to essentially put together multiple models for more accurate results. We found that this robust method of training also reduced bias due to layering architectures that favored some features over another. With the massive amounts of tabular data we were handling, we challenged ourselves to learn how to set up Google Cloud for Storage Buckets and other Cloud Services that made quality of life a little easier.

importance of features Using Principal Component Analysis, the top 3 features we found to be the most important for our model were user_id, has_logo, and font. This makes sense, since users are built on reputation and a consistent number of returning followers. Logos are iconic, memorable, and can be the pulling factor for people to interact with content. Fonts are eye-catching, to say the least. The difference between Comic Sans and something like Roboto or a sleek serif font like Anko is night and day when it comes to appealing to audiences.

Interestingly enough, we saw that parameters for the posts themselves, such as captions, chapters, titles, tones, and photo search terms were the least significantly impactful on our model. This can most likely be attributed to how social media is a fast paced environment where users may stay on a post for only a couple seconds. Because of this, text-like objects are easily overlooked and thus make little difference in the prediction process.

Accomplishments that we're proud of

confusion matrix

We're really proud our model finished with a result. Although it isn't 100% accurate, this is our first time dealing with real machine learning and not just the data manipulation parts of the job. And it was truly eye-opening to see what we could accomplish in less than 24 hours.

What we learned

Machine learning is HARD. Like really HARD. Electrical Engineers are tasked with working with breadboards, incoming and outcoming signals, and the most low-level programming necessary. Us General Engineers learning about what data science is as a field of Computer Science, and how data can solve problems has been a great start to our college experience.

What's next for Team 7 | Marky Social Media Post Approval Prediction

Currently, our model has yet to completely leverage the full influence of image data. To accomplish this, we plan to test a few different things:

  • Extracting information from the images as pixels and performing image classification techniques.
  • Utilizing open source AI APIs or building our own model to generate captions describing the image, and training with Natural Language Processing.
  • Designing a Multi-Modal Classifier Model that is capable of processing our varying types of sensory inputs.

image

Built With

Share this project:

Updates