Introduction

A lot of people are confused by misinformation or fake news such as social media posts. According to [1], as many as 62% of U.S. adults consume news on social media, thus being able to identify fake content in online sources is a pressing need. Until now, automatic detection of fake news has increased more and more attention in academia and industry. However, current applications are embedded into other applications such as Messenger, thus it still remains a gap in a light weight, user-friendly, independent web application for fake news detection. We develop FATECTOR, a chat-bot style web application for automatic fake news detection with machine algorithm based on [2].

Technical Background

FATECTOR evaluates potantial fake news from four aspects: Punctuation, Pscyholinguistic, Readability and Syntax. Punctuation: [3], [4] both suggest that the use of punctuation might be useful to differentiate deceptive from truthful texts. Psycholinguistic: LIWC [5] is based on large lexicons of word categories that represent psycholinguistic processes, as well as part-of-speech categories. [6] showed that LIWC is a valuable tool for deception detection. Readability: FATECTOR evaluate six readability metrics, including the Flesch-Kincaid, Flesch Reading Ease, Gunning Fog, and the Automatic Readability Index. Syntax: FATECTOR gets context free grammars (CFG) trees, and uses some syntax patterns to extract features.

Application Usage

FATECTOR acquires a chat-bot style web interface for users. Users could easily follow the guide from chatbot. Users could input plain texts. FATECTOR then gives a fake degree score based on input and pre-trained model on fakeNewsDataset [2]. FATECTOR also gives feedback on which classes of clues influence most on the current score.

Challenges

From the algorithm perspective, we replicated the core algorithm of [2], and encapsulated it into a python module for web developers. Replicating a NLP top-tier paper in 36 hours is quite challenging. From the development perspective, effective communication of front end and back end is of extreme importance.

Accomplishments

We managed to build the website and the core underlying algorithm with fairly good accuracy. This project can even be used for fun, as every sentence, either you get it from media, or something you made by yourself, can be sources for fake news detection.

Future Work

We would further train the model on various deception based datasets, so that it could accuracy can approach or even surpass the human performance. We could also add more functions into the chatbot, for example, giving suggestions on the high-score relevant news if the input text has a poor score. Currently we use template-based chatbot, further work also includes a neural text generation model for chatbot, which could make this project extremely fun.

Acknowledgements

We generously thank Languages and Information Technologies group of the University of Michigan, Department of EECS for providing us software support for LIWC, and giving us technical guide. We also thank anonymous staff from Mhacks 2018. Without your effort, we could not have this amazing work. For any question regarding this project, please contact zxycarol@umich.edu.

Reference

[1] Victoria L Rubin, Niall J Conroy, Yimin Chen, and Sarah Cornwell. 2016. Fake news or truth? using satirical cues to detect potentially misleading news. In Proceedings of NAACL-HLT, pages 7–17.
[2] Pérez-Rosas, Verónica, Bennett Kleinberg, Alexandra Lefevre and Rada Mihalcea. “Automatic Detection of Fake News.” COLING (2018).
[3] Victoria L. Rubin, Yimin Chen, and Niall J. Conroy. 2015. Deception detection for news: Three types of fakes. Proceedings of the Association for Information Science and Technology, 52(1):1–4.
[4] Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey Hancock. 2011a. Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT ’11, pages 309–319, Stroudsburg, PA, USA. Association for Computational Linguistics. [5] James W Pennebaker, Ryan L Boyd, Kayla Jordan, and Kate Blackburn. 2015. The development and psychometric properties of liwc2015. Technical report.
[6] Myle Ott, Claire Cardie, and Jeffrey T Hancock. 2013. Negative deceptive opinion spam. In HLT-NAACL, pages 497–501.

--Tianyi Wu, Chaoyi Shen, Yimin Wang, Xinyi Zheng

Share this project:
×

Updates