Inspiration
Coming in with an interest in machine learning applications to data science, our idea generation phase led us to think about data-centric hacks. Amalgamating this with our desire to work on something that has meaningful value to people, we thought of pressing issues in society. This led us to thinking about a person’s habits and indicators. With the wealth of data available on the internet, we settled on downloading facebook data. From messaging to the group function, which allows people to interact with like-minded individuals, it’s is hard to dispute much can be learned about a person from their interactions online, similar to interacting with someone face-to-face. Facebook, and many other companies have been taking advantage of this wealth of data, allowing users to be targeted for advertisements and recommendations. We noticed a distinct lack of users being able to actually access and use the data that is collected on them. Even though Facebook provides a way for them to be downloaded the process is time consuming, and for the average consumer the JSON, or HTML output files can be confusing and frustrating to manually read. Additionally, there are valuable takeaways from this data that can be difficult to interpret without advanced analytics followed by data visualization specifically designed for the user to have key takeaways. This is why we set out to build DataBird, an interactive, secure web app that inputs the user’s data and provides them with comprehensive and valuable outputs.
What it does
DataBird gives Facebook users access to analytical tools built upon cutting-edge machine learning and artificial intelligence technology, and applies it to data about themselves. With a focus on putting their data to their own use (rather than external parties’), DataBird gives users access to insights regarding their sleep trends, daily moods, frequently discussed topics, and best friends. This has applications in mental health and habit insights.
The ability to see your sleeping patterns and daily moods with relation to time allows you to explore trends such as high periods of stress, Seasonal Affective Disorder, and more. Moreover, we used machine learning technology to classify message conversations by predictions of what the topic is, leading the way to seeing if concerning topics (e.g. binge-drinking, depression, anxiety) are commonplace for their user.
How we built it
Our data was collected using Facebook’s data release feature available for its users in a JSON format. We used Python’s Panda libraries to scrape the data folders and parse the JSON files into wrangable data formats for analysis.
To analyze sleep patterns, we relied on analyzing a user’s message history. We created an algorithm that tested data points around a set flag at night in order to find the last message sent (before bed) and first session initiated (when woken up). We then performed a test to ensure that this was indeed an accurate prediction of a user going to sleep and not simply being idle from social media activity.
Our sentiment analysis relied on natural language processing from machine learning frameworks. We initially used IBM Watson’s ToneAnalyzer API in combination with parsing JSON-format Messenger conversations in order to aggregate emotions experienced throughout the day as predicted by AI. However, we switched to using Google’s Natural Language API after the advice of a friendly mentor. This was used to construct a mood analysis to showcase trends throughout the year. Additionally, we used this framework to have the algorithm classify conversations’ content by topic, allowing us to explore what topics a user discusses most.
Our Data Visualization was performed using Python libraries such Matplotlib, Pandas, WordCloud, and popular data visualization tool Tableau. The images generated were then uploaded to a HTML-built web app, as a proof of concept to showcase the intended design for the UI.
Challenges we ran into
A significant challenge faced by our team was on the front-end side, as we had very little combined experience with web development. After attending an introductory workshop for react.js, we decided it would be the best to use this for our web app. However, the learning curve was too steep to create an interactive web demonstration in conjunction with creating the interface system between the front and backend. This led us to pivot into establishing a visualization dashboard constructed from our backend’s output in order to showcase our analytics. Another challenge we ran into was learning how to parse JSON files, something we had very little experience with. Facebook outputs message conversations into formats with many inconsistencies (e.g. group chats are nested differently from individual conversations, and sometimes friend requests are nested in the middle of conversations). Overall, it was a good developmental experience in data wrangling, which is always to be expected with large-scale data analytics.
Finally, we had a framework set up to parse JSON data of a user’s comment history to be analyzed by IBM Watson’s Natural Language Understanding’s API for further insights. However, having a non-premium account meant that we could not submit many requests to the cloud server for data analysis. More importantly (and more unfortunately), we believe the Watson server was down for a period of time, which led to us pivoting to Google’s Natural Language API.
Accomplishments that we're proud of
We're immensely proud of how we came together as a team of newbies to a hackathon -- that have also never worked together in a group -- to successfully build something we're all so passionate about! We all shared similar enthusiasm when we came up with our idea, and the positive energy and excitement present in the room was vital to our success.
Together, we're proud of of building this platform that accurately performed the tasks assigned to it. We know this because we tested the platform for all our respective datasets provided for our profiles.
We had a set list of ambitious goals, such as successfully integrating Google Cloud's AI framework for our analysis. We also faced challenges along the way, but we managed to come up with solutions to them while adapting to our situation in order to keep ourselves on track.
What we learned
This was the first hackathon for three out of four team members! This meant that we had to quickly adapt to the fast-paced nature of a hackathon and its crash-course style workshops that give high-level approaches to complex topics. We learned how to adapt solutions to solving complex problems on-the-fly without getting fixated on tiny details (like making sure your code is commented super well!).
We also learned a lot about idea generation and brainstorming, which we found quite fun! It was exciting to see the room fill up with increasing excitement and energy as like-minded people came together with a desire to solve problems they’re passionate about. Technically speaking, we learned a great deal of stuff! The workshops were all super informative and the mentors were willing to sit down with you and help out! We learned more about using IBM’s Watson’s API, Google’s Natural Language AI, Python’s libraries (huge shoutout to Pandas), and data visualization. From the front end side, we learned about using React and having it interact with Flask, using CSS, and HTML.
Using the machine learning and AI applications was super interesting, and it was fascinating seeing the output of our platform as it was applied to data about ourselves!
What's next for DataBird
The next step for this project would be to launch an interactive user interface. Our goal is to have a drag-and-drop space for users to insert their Facebook data. Then, the data would be communicated to the back-end to be analyzed by the frameworks we built this weekend, and given back to the user!
In terms of the data analytics, there is much space for continued exploration. Our next steps include cross-referencing our sleeping patterns with session activity based on cookies and login-session-information; this would allow us to have more precise trends of times a user fall asleep and wakes up. Furthermore, we want to incorporate sentiment analysis from natural language processing to individual conversations with a user’s best friends (friends with most messages and shared groups), in order to communicate how these interactions make the user feel!
Of course, there’s always room for increased analytics! We want to build models using machine learning to find correlations between sleep patterns and associated mood the following day, in order to provide insights into how sleep affects our users. We also want to explore how long people spend on Facebook and their associated mood in order to approach the much-debated discussion on whether excessive social media use leads to poor mental health.
Advanced analytics such as those used by DataBird, have the power to provide first of its kind insights into mood and interaction with social media. Similarly, the data could be used to make recommendations of mental health or other resources that would be determined to be the most beneficial to the person by the data.
One thing’s for sure: we all agreed to continue meeting as a team to implement our stretch goals and visions for DataBird!
Log in or sign up for Devpost to join the conversation.