Preprocessing is a vital part of machine learning to get clean data. Data cleansing improves data quality and improves productivity by leaving only significant and meaningful data. In this implementation for statistical analysis, phrases such as tags and hyperlinks were removed from the text as they rarely signified sentiment. Various machine learning algorithms were used for sentiment analysis on social media status updates and compared based on the evaluation metric. Initially, I tried deep learning techniques such as XGBoost and LSTM. Deep learning is one of the most advanced and recent ML methods that are powerful because of their hidden layers. However, both did not significantly improve accuracy while being computationally expensive to train. I finally tried Logistic Regression which improved accuracy and was able to train in a reasonable amount of time. Logistic regression is a simple yet efficient machine learning algorithm for sentiment analysis.
Log in or sign up for Devpost to join the conversation.