Let’s face it: we’re not all great listeners. With a world of information to keep track of, we can’t expect ourselves to remember every birthday, address, or favorite food. Yet, showing loved ones that you can remember small details about them makes them feel heard. In pursuit of a Life Hack, our web app, messages wrapped arose to help!

What it does

Employing machine learning and natural language processing, messages wrapped extracts, analyzes, and stores information directly from your direct messages. We created it in the hopes that we would never have to send another dreaded belated birthday text or have to ask for a close friend’s dietary restrictions again.

While our original idea had initially stopped here, our final concept has evolved into a fully-fledged statistical summary of a user’s direct messages. Inspired by Spotify’s annual “Wrapped on Spotify”, our team has included a multitude of features including contacts ranked by message frequency, group chat activity, average response time, word boards, and more to not only capture, but also celebrate each of a user’s relationships.

How we built it

First, we needed to retrieve a user’s conversation data.

We discovered that we could access iMessage’s archive file, stored in a Mac user’s library, and rewrite the data into a readable csv file. This csv file stored all of the user’s text messages in a table that specifies the date sent, timestamp, phone number, and if the message was sent or received by the user.

Similarly, using the Facebook Messenger API, we parsed messaging data from Instagram DMs. Additionally, Instagram provides a service to download all of one’s messages, albeit in a format quite different from that of iMessages.

After resolving the two platforms’ databases into a unified format, we parsed each sent message and populated a dictionary that kept track of all important information. Then, using this dictionary, we applied Python’s NLTK library to perform several Natural Language Processing methods such as extracting meaningful words to use in a Word Cloud, determining the most common trigrams (sequences of words), and retrieving essential information such as birthdays and allergies.

Utilizing this information, we displayed it in a dashboard format via Plotly Dash. This Python framework allowed us to build an interactive web app, complete with HTML and bootstrap styling as well as dynamic callback functions.

Challenges we ran into

We began by accessing the data archive files that were hidden from the user, which proved to be a difficult undertaking. Little did we know, that would be the easiest part. The iMessage data in the csv file was formatted in an obscure manner, and this proved to be difficult when we wanted to analyze the data and extract meaningful insights. In order to make comparisons between different interactions, we had to parse through the Apple user’s Contacts book to match phone numbers with names.

Working with Instagram was simpler, as their data was presented in a more readable manner. However, the data we parsed was rather dense, and a lot of unnecessary information was provided. For example, we wanted to avoid parsing and analyzing text messages like ‘Umm’ and ‘haha’ and messages such as ‘video call ended.’ Here, we preprocessed to ignore certain words and highlight more meaningful words and phrases. Furthermore, integrating these components via Dash proved to be more difficult than expected due to conflicts that sometimes arose between Dash’s html object and standard html as well as the tricky nature of multiple callback functions.

What We learned

“The goal is to turn data into information and information into insight” - Carly Fiorina (former CEO of HP)

Each of us sends and receives hundreds of messages throughout the day, from different people, on different platforms, regarding different things. Sifting through these mountains of data to extract various insights showed us the reason data is so highly valued. Being able to quantify aspects of texting, such as response rate, length of responses, and the ratio of responses, allowed us to paint a clearer picture of our interactions via text. By employing Natural Language Processing techniques, we were able to further analyze and extract meaning. Most commonly used phrases, most used non-common words, important information such as birthdays and favorites – these pieces of information could be gleaned from any conversation. Using NLP, we can gather insight into Important topics such as subjects of bonding, and points of contention.

What seemed to be just an endless list of random, incoherent, badly misspelled strings proved to be a valuable source of information at the hands of these text-processing techniques.

Beyond the insight-driven approach, we also learned the importance of commemoration. Be it reading the very first text that sparked a friendship or seeing a word cloud of all the memorable moments and made-up phrases, being able to see a colorful summary of our relationships with others felt wholesome. It felt as if we were taking a step back to celebrate each journey.

Additionally, in terms of technology, we learned how to utilize Dash and Plotly to make interactive dashboards within Python. Furthermore, we better understood the NLTK library and learned how to accurately parse text and create several tools to extract insight.

What’s next for messages wrapped

In the future, we would like to expand messages wrapped to other messaging platforms including SMS, WhatsApp, Telegram, Discord, and WeChat. These integrations would also be seamless, not requiring any downloaded or intermediate files from the user’s side.

Additionally, we would continue to add interactive features, such as Spotify Wrapped’s two truths and a lie game, enabling the users to better engage with and understand their messaging statistics. Similarly, an achievements guide (similar to Minecraft) in terms of conversation topics and random metrics such as “who’s more of a simp?” would help users laugh and bond over their relationships.

On our NLP analysis front, we would also strive to use BERT to implement a search feature that accepts a generic word or phrase input and returns the message sender’s name and number, and the datetime of when the relevant messages were sent or received. Furthermore, implementing BERT’s Question-Answering feature would allow for a smart search engine custom-tailored to each conversation.

Share this project: