Our code: https://github.com/LeafyDriftwood/DL-Final-Project
Link to our reflection/write up: https://docs.google.com/document/d/1Z7BSIYk_MNCkhlu-XL5RVY6j7DZe7D19881eOdphxkE/edit?usp=sharing
Title: Detecting Suicidal Intent in Tweets Using a Time-Aware Transformer Based Model
Who: April Xu (axu58), Brandon Yan (byan12), Manav Chakravarthy (mchakra3)
Introduction: What problem are you trying to solve and why? For our final project, we plan on implementing the following paper that describes an NLP model in pytorch using tensorflow. This model essentially takes in different English tweets and outputs the suicidal intent of the tweet (Suicidal Intent Present or Suicidal Intent Absent) based on the content presented and past history of user tweets. Suicide is a huge problem in today’s world, and this paper attempts to use social media historical activity to determine suicidal ideation and provide preliminary screening of suicidal risk. This is a classification problem because we are trying to distinguish between tweets that demonstrate suicidal intent and those that don’t.
Related Work: Are you aware of any, or is there any prior work that you drew on to do your project? Not aware of any other similar projects at the moment (besides the paper we are implementing).
Data: What data are you using (if any)? If you’re using a standard dataset (e.g. MNIST), you can just mention that briefly. Otherwise, say something more about where your data come from (especially if there’s anything interesting about how you will gather it). We are using a dataset introduced by Sinha et al. (2019). Their dataset contains 34,306 tweets authored by 32558 unique twitter users since some tweets came from the same users. The tweets were annotated by students of Clinical Psychology for suicidal intent and given labels. Overall, there are 3984 suicidal tweets, and the Twitter timelines for each user was collected so we have a history of their tweets.
Preprocessing: We first remove any identifying information, and then we need to convert everything to lowercase, and remove punctuation and whitespace, and remove stop words. We then split the tweets in the dataset based on users so that there’s no overlap between users in the train, validation, and test sets.
Methodology: What is the architecture of your model? We’re using STATENet, which is a dual transformer based architecture to learn the linguistic and emotional cues in tweets. Our model is composed of a few modules: a historical module, a current module, a Bi-directional LSTM module, and a Bi-directional LSTM module with attention. These modules are composed of several LSTM, linear, and dropout layers.
How are you training the model? Potentially run a grid search for optimal combination of hyperparameters. Run for ~5 epochs. Ideally implement early stopping if performance on validation dataset starts to plateau. First assessing an individual tweet by passing the words through a BERT transformer and then passing the result through fully connected layers. We do historic tweet modeling by passing past tweets into the Pluchtik EmoNet transformer, which generates emotion vectors. Then, we pass these vectors through a time-aware LSTM layer.
If you are implementing an existing paper, detail what you think will be the hardest part about implementing the model here. The hardest will probably be figuring out how to implement Time-aware LSTM, and figuring out how to incorporate the notion of past tweets and past tweet frequency, which is something that we haven't done in class. This is beyond the easier task of solely looking at a tweet’s words and making a prediction based off of only that.
Metrics: What constitutes “success?” What experiments do you plan to run? Testing our data on the test dataset should give us an indication of our model’s performance. Potentially source new tweets and run our model on them (tweepy package)
For most of our assignments, we have looked at the accuracy of the model. Does the notion of “accuracy” apply for your project, or is some other metric more appropriate? A Macro F1 score is likely the best estimate of our model’s performance.
If you are implementing an existing project, detail what the authors of that paper were hoping to find and how they quantified the results of their model. The authors of the paper attempted to determine whether temporal cues (ie previous tweets that indicated suicidal intentions) could be used in conjunction with recent tweets for suicide risk assessment. They quantified results through the Macro F1 score and Recall score. Recall is how many tweets we correctly classified (based on the labels).
What are your base, target, and stretch goals? Base: Be able to use LSTM layers to classify a single tweet as suicidal intent or not. Only based on the words in that single tweet. Target: Re-implement entire paper, with proper functionality Stretch: Detecting intent of other violent acts from tweets
Ethics: Choose 2 of the following bullet points to discuss; not all questions will be relevant to all projects so try to pick questions where there’s interesting engagement with your project. (Remember that there’s not necessarily an ethical/unethical binary; rather, we want to encourage you to think critically about your problem setup.) What broader societal issues are relevant to your chosen problem space? Our project is very relevant for broader societal issues, as suicide is a huge problem in the world today, especially with younger adults. Suicide prevention is also a very important area that we need to do more work on, and this project addresses that area of focus.
What is your dataset? Are there any concerns about how it was collected, or labeled? Is it representative? What kind of underlying historical or societal biases might it contain? One concern with how our data is collected is that it may invade the privacy of Twitter users. However, this concern is addressed because we aren’t actually working with the raw data, just the paraphrased and de-identified versions of the real tweets.
Division of labor: Briefly outline who will be responsible for which part(s) of the project. April: dataloader/preprocessing, main function Brandon: Preprocessing, some of the training steps Manav: integrating loss, and the eval loop in the training
Built With
- python
- tensorflow

Log in or sign up for Devpost to join the conversation.