Hashtags are powerful. When used correctly on a platform like Instagram, they can be tools for companies and creators alike to reach an engaged audience. That being said, we are all too familiar with misused hashtags; however, what often goes unnoticed is the artist or educator who could have put their content in front of the right people post after post, if only they were able to use this confusing and often unpredictable tool to its fullest potential.

What it does

Using Socialsense, you’ll be able to find the perfect image and hashtags to reach the right people. First, input your Instagram handle, select the images you consider posting, and submit. Socialsense will predict which images will attract the most attention. The user can then choose an image to post, and socialsense will find the most optimal hashtags to use, in addition to estimating how much engagement the post will receive with these hashtags.

How we built it

We first built a module in Python to collect data from Instagram. This data is scraped using the Selenium library. This library was then used to build a dataset of 12,000 images and the hashtags they used, which we used to train our image similarity model. The similarity model takes in 2 images as input and predicts how similar the two images are. The labels were generated depending on how many hashtags the two posts have in common.

As we built the dataset and trained this model, we started writing the algorithm to rank hashtags. The algorithm uses the user’s previously used hashtags as a starting point, before scraping hashtags related to those in order to build a large set of hashtags. For each hashtag the algorithm considers its top posts, calculating the difference between the engagement rate of the post and the average engagement rate of the account that posted it. The algorithm takes a weighted average of these differences, with the weights being the similarity of each post to the image the user wants to post, assigning this weighted average as the score of a hashtag. After scoring each hashtag, the algorithm takes the user’s average engagement rate and adds the mean of the scores of each hashtag. This predicted engagement rate is multiplied by the user’s follow count to predict how much engagement the post will receive.

After building the similarity model and the hashtag ranking algorithm, we then worked on the image popularity algorithm. We ended up using a pre-trained model developed by researchers from the City University of Hong Kong that predicts the popularity of an image on Instagram.

After completing the AI, we needed to integrate it with the front-end. We built a Flask server to receive data in HTTP requests from the client, and to send responses with the AI’s predictions.

While half the team was creating the backend, the other half simultaneously worked on the front-end. We started off with an empty react application and gradually added the features of file reading, sorting, deleting, and displaying. Playing hand in hand with the server, the react app served mostly to transmit the AI’s findings into an intuitive user interface.

Challenges we ran into

When creating the similarity model, we first forgot to account for the overlap of hashtags between posts, and created a dictionary of unique hashtags as keys and stored images which used that hashtag as values. Similar examples were created by first selecting a hashtag and picking two images that used that hashtag, while unsimilar examples were created by selecting a hashtag to pick one image from, and selecting another random hashtag to pick the second image from. Training on this dataset did not return ideal results, and we soon realized there was a lot of overlap so samples that should have been unsimilar had a high chance of actually being similar. In order to fix this we pivoted to a list that stored a dictionary of every image along with the hashtags they used. Labels were then generated depending on how many hashtags the two posts had in common. With this, two random posts could be picked, and there is a high chance they would actually be unsimilar, and two posts that are close together in the list could be picked and have a high chance of being similar due to the way we originally gathered the post data. There was a little bit of balancing that needed to be done with the examples to prevent overfitting, but overall it ended up working nicely, and with some more tweaking (and training) the similarity model could become quite accurate, in turn increasing the value of the hashtag suggestions, and engagement prediction.

Handling the images, filenames, src attributes, hashtags, and scores on the front-end was another hurdle we faced. We opted for global state management with Redux, which stores most information centrally across many views.

Accomplishments that we're proud of

We are proud of developing a fully integrated web application with a diverse technology stack. We were able to effectively use powerful machine learning in PyTorch alongside a Flask web server, while scraping data with Selenium. All of this processing on the back-end works in harmony with a beautiful, responsive React web page.

What we learned

All of our members walked away with a much better understanding of AI, machine learning, and data engineering. We gained a lot of experience in full-stack development and learned how to work effectively as a software development team.

What's next for

We are not 100% sure what we want to do with Socialsense moving forward. Our team is considering launching the app publicly, however we would need to make a lot of changes and improvements to the codebase in order to effectively scale the app to support a large user base.


Share this project: