Inspiration

I have been part of a question answer (QA) Facebook group focused on 3D printing (over 40,000 members) and have observed several challenges faced by large groups where QA is a primary activity.

  • The Duplicate Question Challenge: As new members join the group, they tend to ask questions that have already been asked and addressed previously. While the Facebook group feed is well designed to foster social interaction (threaded conversations, highlighting important content), it is not particularly optimized for forum-like QA activity. For many users, finding existing relevant content can be challenging. In the specific case of 3D printing online QA groups, new members tend have difficulty constructing meaningful search queries (they have not yet learned the right vocabulary or terms to describe things) or may struggle with scrolling through pages of generic search results. And so they resort to simply posting a new questions. Unfortunately, duplicate questions can lead to frustrations for existing users who complain of diminishing benefits when content and conversations are continuously repeated. In many cases, negative expressions of this frustration can cause existing group members to troll new users (using sometimes borderline abusive language), antagonizing new users and diminishing their experience (and their eventual exit from the group). In this way, online groups can inadvertently become non-inclusive.

  • Group Content Curation Challenges: As groups expand (100s of posts a day), Admins sometimes struggle to curate the high volume of content generated within these groups i.e. disabling comments, enforcing compliance with group rules, deleting harmful posts or banning malicious members as needed. Admins must depend on other group members to sometimes highlight these cases (flagging posts, direct messages to admins etc). Interestingly, while users may care about QA-related interactions, they may not always be dedicated to policing the community (flagging posts).

  • Group Insights and Topical Discourse Challenge: Administrators need tools to help them obtain insights on topics discussed within the group and how these topics evolve over time. I was personally interested in this as I was looking for ways to characterize, quantify and understand interaction behaviour within Facebook groups - what types of posts are prevalent, on what day of the week are users most active, how often does the average user initiate a discussion via a post, what is the average interaction (likes, views etc) for each post, how does post content type (links, text, images, video) influence reactions? The answers to these questions may vary from group to group hence the need for tools that extract such insights. Results can then be used to implement better group content curation strategies e.g. dedicate resources to specific topics, .

I initially started working on GroupManager to address these challenges faced by both group members and group administrators.

What it does

Summary of Features.

Challenge/Feature Target How Addressed Status
Duplicate Question Group Member Chatbot + A text similarity neural network model v.0.1 Complete
Content Curation Group Admin Machine Learning + Visualizations v.0.1 Complete
Insights and Topical Discourse Group Admin Machine Learning + visualizations v.0.1 Complete

Group Member Features

  • Addressing the Duplicate Question Challenge using Messenger Chatbots:
    • A neural network model is trained in pytorch to extract semantic text similarity. This used to better identify existing posts that are most similar to a new question.
    • A messenger bot is then used to deliver the results of similarity matches. A user can ask the bot natural language questions which are then matched existing posts.
    • Users can also flag any results which they would like to bring to the attention of admins.

Group Admin Features

  • Addressing Group Content Curation Challenges using Visualization Summaries:
    • Admin can sign in and provide permission to access posts in a group they manage. Once access is granted, Group Manager imports posts using the Graph Api. Post data is then used for subsequent visualization/ML processes. For testing purposes the app contains sample data already imported from a 3D printing group.
    • Admin dashboard provides visualization summary of posts segmented by post type, sentiment and time duration.
    • Visualization of post word cloud for a quick overview of most common words within posts.

  • Addressing Group Insights and Topical Discourse Challenge using Topic Model Visualizations:
    • Visualization of automatically extracted topics and how they evolve over time. Topics are automatically extracted using Non-Negative Matrix Factorization (NMF). To improve interpretability of the extracted topic, the number of topics parameter is automatically selected using results from topic coherence metric tests (i.e. a range of topics clusters are tested and the cluster yielding the most coherence is selected).

In the example above, posts from a 3D printing group are automatically categorized into 4 topic. Based on the example words in each topic, they can be loosely interpreted as print quality issues, asking for help and receiving help, printer upgrading, external links and resources. Clearly, the first topic is the most prevalent - admins can thus focus on providing additional support that addresses this topic.

How I built it

Module Tools How Addressed Status
Duplicate Question Group Member Chatbot + A text similarity neural network model v.0.1 Complete
Content Curation Group Admin Machine Learning (CNN Based Sentiment Model) + Visualizations v.0.1 Complete
Insights and Topical Discourse Group Admin Machine Learning (NMF Topic Model) + visualizations v.0.1 Complete

Facebook developer tools used.

  • Artificial Intelligence
    • AI Tools and AI Research (PyTorch)
      • Sentiment Analysis (a Convolutional Neural Network for Sentence Classification is trained to locally compute sentiment analysis. )
      • Sentence Similarity (a siamese sentence similarity model is trained on word embeddings to determine semantic similarity between questions sent to the Chatbot and existing posts)
      • Topic Model (NMF topic model is trained to extract coherence topics from posts)
  • Business Tools
    • Facebook Login
    • Messenger Bot with generic views
    • Graph Api is used to download posts.

Notes on Data Privacy

This app generates visualizations and runs machine learning algorithms on group data (posts, comments). To ensure data privacy the following guidelines have been adopted in the design.

  • Data is only accessible to the group admin who provides initial access.
  • Data is never posted to, or sent to any external api, service or company for processing. All processing is done using machine learning models (topic modelling, sentiment, text similarity) that run locally.
  • To further ensure privacy, the project has been made open source. This way, interested users can run the application on their local servers.

Challenges I ran into

  • Training Pytorch Models: It took a while getting up to speed with Pytorch, training models (sentiment, similarity) and configuring it to work well for CPU. Sample code and issues on Github was extremely useful.

Accomplishments that I'm proud of

  • Training a sentiment model using a cloud based GPU, exporting the trained model and using it locally to process sentiment for post messages within a web application.
  • Automatic selection of topic modeling parameters (NMF number of topics), generation of topic evolution data over time and visualizing this using area charts.
  • Visualizations that answer important questions about group posting behaviour such as "what day of the week do people post the most? what is the overall sentiment within posts? what are the most important topics and how have they evolved over time? what are the most used words? etc"
  • Built and deployed a Messenger Bot!

What I learned

  • Pytorch: Learned about the torchtext library and worked through samples training, saving and loading a pytorch model.
  • Visualizations: Worked in detail Vega lite, creating visualizations for post and topic model results.

What's next for Group Manager

  • Messenger Bot Update
    • Image similarity search: allow the messenger bot return posts that contain images similar to an uploaded image. A first version will use a trained resnet pytorch model to extract a dataset of features from images in existing posts; uploaded images will then be compared (cosine similarity) to identify related posts. This feature will be useful for QA communities (e.g. 3D printing) where images or screenshots are a common means of sharing context.
    • User testing: The current Chatbot approach to finding similar posts has multiple steps (paste question, scroll through results etc). A detailed user interaction study is needed to help assess this interaction flow and possibly improve it.
    • Additional security testing: Testing of data end point security to ensure all requests are properly authenticated.
    • Fine tune Pytorch models used for semantic similarity and sentiment analysis : Current version of sentence similarity is done based on cosine distance and word embeddings trained on public datasets (imdb, wikipedia). The sentiment analysis model is also trained on imdb review data. Next steps would be to allow admins fine tune word embeddings to fit content domain to improve the relevance of results from similarity search, and also perform transfer learning using post data to improve sentiment model results.

Built With

Share this project:

Updates