Identifying Fact vs. Opinion

Inspiration

With how easily positive and negative news spreads, whether true or false, due to the pervasiveness of social media platforms, we wanted to find a way to determine the amount of negativity, positivity, and neutrality in opinionated texts.

What it does

Our project first identifies whether an inputted text or text file is fact-based or opinion-based. The user has the option of typing in text to analyze or using a text file that contains an article. After determining if an article is factual or opinionated, our project calculates the percentage of negative, positive, and neutral words and emojis found based on a database within the program that identifies specific words and emojis as negative, positive, and neutral. This is referred to as sentiment analysis and we have implemented this only for opinionated text.

How we built it

First, we had to determine if our text was fact or opinion based. To do this, we created a dictionary that uses words as the keys and a number as the value. The number is either positive or negative with positive numbers indicating fact-based words and negative numbers indicating opinion-based words. Our program then analyzes a given text word by word and modifies a score using the numerical values associated with the words. The factual and opinion based words were manually created by us to prove our concept and we can further extend these using machine learning concepts and more data. We built this mini database of words by identifying some articles as either fact or opinion based manually and searching for opinionated and factual words. Next, using the score for a given article, the program then determines if the text is factual or opinionated based on the range the score falls into. If a given text is found to be opinionated, Our project implements an open-source program that uses sentiment analysis to determine the percentage of negative, positive, and neutral words. Using that data, we are able to conclude whether an opinionated text shares a positive or negative.

Challenges we ran into

In the initial stages, we struggled to think of a good idea since we knew that our idea would be the foundation for our program and we wanted to do something that has an impact and scalability potential. Another challenge we faced was discerning words that are used more often in factual texts vs opinionated ones and applying the connotation algorithm to the results. We realized that some words can be interpreted in multiple ways depending on the context and that phrases might sometimes be better indicators of the nature of an article. In terms of coding, we had issues with getting our program to analyze the articles that we chose. We realized we had to keep the format of the articles consistent and after implementing this, we were able to resolve this issue.

Accomplishments that we're proud of

With how widely utilized social media is and how easily information can spread from one end of the world to the other, whether factual or not and whether positive or negative, we like that our project idea is applicable to a prevalent ongoing problem. Currently, our project is a small portion of a bigger plan that we would like to implement in the future. The presence of disinformation is a huge challenge and our mini project is the first step in addressing this. While we can eradicate disinformation, we can raise awareness. We are excited to be a part of such a large project that has potential for multiple disciplines and immense scalability opportunities.

What we learned

Throughout the research we have done, we’ve learned so much about combating disinformation. We came across multiple algorithms such as opinion mining, sentiment analysis, text mining, computer vision, optical character recognition and blockchains. Optical character recognition is already implemented for image search on google. Based on statistics, a combinational implementation of all these algorithms will result in a higher accuracy rate in identifying disinformation. We were able to explore the different possibilities of coding as this was our first time working on a project that wasn’t assigned in class. This allowed us to learn about some new techniques for coding in python. Along with that, we were amazed by how much we were able to finish in a short period of time when we identify the goals and divide tasks amongst ourselves

What's next for Untitled

Currently, our program only detects words but later, we plan to implement analysis of phrases in order to segregate as opinion or fact based. In the future, our team plans to implement a multi-faceted plan that combats disinformation as a whole where the threat is constantly evolving. Our plan consists of using machine learning, computer vision, and blockchain to ensure company security in a global spectrum and combat the problem at hand. Our first step to the plan is the implementation of artificial intelligence and machine learning: we will use text mining to collect data and text analytics to apply sentimental analysis and other algorithms to infer whether the information should be flagged as inaccurate or opinion-based. The algorithm would run through a post and if fact-based, an accuracy scale would be used to check for the reliability of the information. The information pulled from these algorithms will allow users to make informed decisions. Further, we would create a computer vision algorithm based on artificial intelligence and detect pictures or videos that are being used as deep fakes. Images with text are also an extremely huge part of social media platforms, so Optical Character Recognition (OCR) can be used to analyze the text found in images to determine if a visual has been altered. Lastly, we will use block chaining, which allows for verification and re-verification, so it can ensure that original content has not been modified. Blockchain can also allow us to keep track of the chain of custody by tracing back to the source of misinformation. And since this database is immutable, the data stored in the blockchain would not be able to be manipulated, which would help reestablish trust in social media platforms.

We’re also excited about the possibility of scaling this idea into different disciplines. For example, qualitative data from professional research projects uses software to conduct linguistic coding. Our implementation of identifying whether information is opinionated or factual and whether it is a negative or positive perspective can be helpful contributing to existing methods. The idea of qualitative analysis is applicable to different fields such as medicine, business, psychology, politics and more. We could modify our project further to be integrated into any such discipline.

Built With

machine-learning
pycharm
python
sentiment-analysis

Updates

Stuti Chaurasia started this project — Nov 14, 2021 10:58 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.