Detecting Suicidal Intent in Tweets Using a Time-Aware LSTM

Our code: https://github.com/LeafyDriftwood/DL-Final-Project

Link to our reflection/write up: https://docs.google.com/document/d/1Z7BSIYk_MNCkhlu-XL5RVY6j7DZe7D19881eOdphxkE/edit?usp=sharing

Title: Detecting Suicidal Intent in Tweets Using a Time-Aware Transformer Based Model

Who: April Xu (axu58), Brandon Yan (byan12), Manav Chakravarthy (mchakra3)

Introduction: What problem are you trying to solve and why? For our final project, we plan on implementing the following paper that describes an NLP model in pytorch using tensorflow. This model essentially takes in different English tweets and outputs the suicidal intent of the tweet (Suicidal Intent Present or Suicidal Intent Absent) based on the content presented and past history of user tweets. Suicide is a huge problem in today’s world, and this paper attempts to use social media historical activity to determine suicidal ideation and provide preliminary screening of suicidal risk. This is a classification problem because we are trying to distinguish between tweets that demonstrate suicidal intent and those that don’t.

Related Work: Are you aware of any, or is there any prior work that you drew on to do your project? Not aware of any other similar projects at the moment (besides the paper we are implementing).

Data: What data are you using (if any)? If you’re using a standard dataset (e.g. MNIST), you can just mention that briefly. Otherwise, say something more about where your data come from (especially if there’s anything interesting about how you will gather it). We are using a dataset introduced by Sinha et al. (2019). Their dataset contains 34,306 tweets authored by 32558 unique twitter users since some tweets came from the same users. The tweets were annotated by students of Clinical Psychology for suicidal intent and given labels. Overall, there are 3984 suicidal tweets, and the Twitter timelines for each user was collected so we have a history of their tweets.

Preprocessing: We first remove any identifying information, and then we need to convert everything to lowercase, and remove punctuation and whitespace, and remove stop words. We then split the tweets in the dataset based on users so that there’s no overlap between users in the train, validation, and test sets.

Methodology: What is the architecture of your model? We’re using STATENet, which is a dual transformer based architecture to learn the linguistic and emotional cues in tweets. Our model is composed of a few modules: a historical module, a current module, a Bi-directional LSTM module, and a Bi-directional LSTM module with attention. These modules are composed of several LSTM, linear, and dropout layers.

How are you training the model? Potentially run a grid search for optimal combination of hyperparameters. Run for ~5 epochs. Ideally implement early stopping if performance on validation dataset starts to plateau. First assessing an individual tweet by passing the words through a BERT transformer and then passing the result through fully connected layers. We do historic tweet modeling by passing past tweets into the Pluchtik EmoNet transformer, which generates emotion vectors. Then, we pass these vectors through a time-aware LSTM layer.

If you are implementing an existing paper, detail what you think will be the hardest part about implementing the model here. The hardest will probably be figuring out how to implement Time-aware LSTM, and figuring out how to incorporate the notion of past tweets and past tweet frequency, which is something that we haven't done in class. This is beyond the easier task of solely looking at a tweet’s words and making a prediction based off of only that.

Metrics: What constitutes “success?” What experiments do you plan to run? Testing our data on the test dataset should give us an indication of our model’s performance. Potentially source new tweets and run our model on them (tweepy package)

For most of our assignments, we have looked at the accuracy of the model. Does the notion of “accuracy” apply for your project, or is some other metric more appropriate? A Macro F1 score is likely the best estimate of our model’s performance.

If you are implementing an existing project, detail what the authors of that paper were hoping to find and how they quantified the results of their model. The authors of the paper attempted to determine whether temporal cues (ie previous tweets that indicated suicidal intentions) could be used in conjunction with recent tweets for suicide risk assessment. They quantified results through the Macro F1 score and Recall score. Recall is how many tweets we correctly classified (based on the labels).

What are your base, target, and stretch goals? Base: Be able to use LSTM layers to classify a single tweet as suicidal intent or not. Only based on the words in that single tweet. Target: Re-implement entire paper, with proper functionality Stretch: Detecting intent of other violent acts from tweets

Ethics: Choose 2 of the following bullet points to discuss; not all questions will be relevant to all projects so try to pick questions where there’s interesting engagement with your project. (Remember that there’s not necessarily an ethical/unethical binary; rather, we want to encourage you to think critically about your problem setup.) What broader societal issues are relevant to your chosen problem space? Our project is very relevant for broader societal issues, as suicide is a huge problem in the world today, especially with younger adults. Suicide prevention is also a very important area that we need to do more work on, and this project addresses that area of focus.

What is your dataset? Are there any concerns about how it was collected, or labeled? Is it representative? What kind of underlying historical or societal biases might it contain? One concern with how our data is collected is that it may invade the privacy of Twitter users. However, this concern is addressed because we aren’t actually working with the raw data, just the paraphrased and de-identified versions of the real tweets.

Division of labor: Briefly outline who will be responsible for which part(s) of the project. April: dataloader/preprocessing, main function Brandon: Preprocessing, some of the training steps Manav: integrating loss, and the eval loop in the training

Built With

python
tensorflow

Updates

Brandon Yan posted an update — Nov 28, 2021 09:25 PM EST

Checkin 2 Reflection:

Introduction: This can be copied from the proposal. For our final project, we plan on implementing the following paper that describes an NLP model in pytorch using tensorflow. This model essentially takes in different English tweets and outputs the suicidal intent of the tweet (Suicidal Intent Present or Suicidal Intent Absent) based on the content presented and past history of user tweets. Suicide is a huge problem in today’s world, and this paper attempts to use social media historical activity to determine suicidal ideation and provide preliminary screening of suicidal risk. This is a classification problem because we are trying to distinguish between tweets that demonstrate suicidal intent and those that don’t.

Challenges: What has been the hardest part of the project you’ve encountered so far? So far, most of the work we have done has been centered around converting the code from PyTorch to TensorFlow. This has mostly been smooth, except for some parts of the code that were substantially different in Torch. For example, the Torch function for binary cross entropy loss had an additional weights parameter that TensorFlow didn’t have, so we needed to write separate code to handle that. Furthermore, we haven’t run everything yet, but we predict the hardest part will be debugging the code after we initially run everything.

Insights: Are there any concrete results you can show at this point? How is your model performing compared with expectations? We are not yet at the point of running everything yet, but we are very nearly there. We do have most of the code converted into TensorFlow, but after that we need to run everything, and we hope the model will perform well after some debugging. We were able to run all the PyTorch code in Google Colab and it worked seamlessly.

Plan: Are you on track with your project? What do you need to dedicate more time to? What are you thinking of changing, if anything? We are on track with this project, and after finishing the conversions we will start running all the code very shortly. We think that after running everything, we will need to spend a lot of time on debugging and that will be our main time-consuming activity. Other than that, we don’t think there’s anything we want to change, unless the model is very inaccurate and then we might change some hyperparameters.

Log in or sign up for Devpost to join the conversation.

Brandon Yan started this project — Nov 12, 2021 08:43 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.