The world is suffering from a pandemic of fake news surrounding the coronavirus. News such as Emissions from crematoriums in China could be seen from space, Russia unleashed lions to keep people indoors, Doctors in London are being mugged, Vitamins or certain oils cure the disease, Gargling with warm water and salt and vinegar cure the disease are not only misleading, but create wrong infograph about the virus. If we believe misinformation rather than facts, we are headed down a dark path that leads nowhere.
The two major types of false information are disinformation and misinformation.It appears certain political groups and state agents may wish to propagate chaos for the sake of political gains. The other kind of fake news is misinformation, which is spread innocently despite being incorrect. Disinformation, are spread intentionally by people in bad faith. In the case of COVID-19, there has been disinformation blaming racial groups, illegal immigrants and even governments for the spread of the virus.
Fake news also has an advantage when it comes to sharing information. When sharing information online, we give things very little scrutiny. We are also more inclined to share bad news, and a lot of the news related to COVID-19 is bad and hence there is a serious need to contain the same.
What it does?
The model being built is an attempt to contain some of the misinformation / disinformation floating around on the internet.
The model tries to identify a news to be either FAKE/REAL by trying to understand and learn from the news that are already classified as FAKE/REAL pertaining to Covid19 and then tries to classify a new NEWS to be either a FAKE/REAL based on its understanding from the trained data.
How we built it?
The proposed system uses naïve Bayes algorithm for detecting the fake news. The data is divided into Test and Train data. The Train data is trained and classified into groups with similar datasets. After the data is trained the test data is assigned to the group which has similar characteristics with the group.
The naïve Bayes algorithm is used to detect the accuracy of the fake news with which misinformation can be stopped. Weights are given to each and every individual word, the least important word is given less weight and the most important word is given most weight. The TFIDF vector is used to count the number of word, the number of unique words and also at the same time the weights are allotted to each and every word. Unimportant words are not taken into consideration and the accuracy of only important words are matched and detected from the dataset, this approach helps detect the accuracy/credibilty of the news.
Challenges we ran into...
One of the key challenges was to procure a good Dataset pertaining to Covid19 misinformation. The partnered platforms viz., Google Cloud, AWS, Ascend did not have the kind of information that we were looking into and it took a lot of effort to land into one. Procuring the right data which can help in doing a better assessment is the key and in this case, this was a big challenge. A more detailed list of misinformation could have made the Model learn better with a higher accuracy
Accomplishments that we are proud of!!
This is our first attempt to start building a model which can help in detecting FakeNews/Misinformation pertaining to Covid19 and we are proud that we have done a bit and we will continue to evolve this as we look into varied DataSet that will eventually be available for a better design of the existing model
What's next for COVID19 - Fake News Analyzer?
As we land with more detailed misinformation repository pertaining to Covid19, we intend to evolve the model to better anlaysis and also carve out any other hidden information/meaningful information that can be extracted. Additionally, we wish to build Credibility Score for the information pertaining to Covid which can help in analyzing if a news is FAKE/REAL. Also, we wish to extend this analysis from TEXT based to Visual and Social Media viz., Twitter as well