PCRs (Press Credibility and Readability scoring)

A demonstration of credibility and readability scoring in a Chrome extension for a news article.

Inspiration

Our team wanted to address the need to improve the current state of average health literacy in our community.

What it does

Our Chrome extension provides non-health care receptionists who work in health care settings (physicians’ office, hospitals, virtual medical services etc.) with means to direct patients to credible and easy-to-understand health-related educational resources.

How we built it

Criteria for credibility
Credibility scores are consisted of a score for accuracy and one for neutrality, both out of 5.
Sampling: news press and corporates are randomly selected. Covid-19 related news was also randomly selected. The covid-19 related topics are: vaccine boosters, masking guidelines, and testing guidelines.
Scoring: scoring is completed by two pharmacy students who read through the articles and generated the scores manually. An average is taken between the scores generated by the two people.
      How we define accuracy: we compare the contents of news articles to cdc guidelines, which were chosen to be our standards. 1 is the least accurate, 5 is the most accurate, and each piece of misinformation costs one point.
      How we define neutrality: we subjected define if the information is provocative. 1 is the least neutral, 5 is the most neutral, and each provocative statement deducts one point.

Criteria for readability
We based our readability score on Flesch-Kincaid scoring basis. It is consisted of a F-K score, Flresch Reading Ease, and a F-K grade level. The scoring depends on total number of words per sentence, total number of sentences, and total number of syllabi per word. The grade level suggests that the difficulty is appropriate for students who have completed at least the indicated grade level.

How to calculate credibility
      Recurrent neural network is very effective to address the challenges in neutral language process and other sequence tests.
      LSTM is a recurrent neural network model used in the field of deep learning. It can read a sequence input one at a time. It has a cell state that saves the memory contextual information from the previous layers and passes the information to the later layer.
      A bi-directional model is used in this model. It has the ability to read the contextual information from the both beginning and the end of the text sequence.
      A embedding layer is added to give a featurized representation of the words because it can show the correlation between words.
      A dropout layer with a 20% dropout rate is added next to each dense layer to prevent the problem of overfitting.
      Softmax function is selected to generate the output probability for 5 score levels. The final prediction is determined by the class with the highest probability

How to calculate readability
Readability scores are generated from formula proposed by Flesch and Kincaid (F-K).
      See: https://en.wikipedia.org/wiki/Flesch–Kincaid_readability_tests
Specifically, they are F-K score and F-K grade level
      source code: https://github.com/cdimascio/py-readability-metrics
      Input: a string that contains the content to be assessed

Challenges we ran into

It’s our first Hackathon. All of us.

We have two people with limited coding experience, and two with no coding experience.

We were not able to create a big enough sample size

Our software was not able to identify what texts on the website belongs to the article or other information

We were only able to subjectively provide scoring for purpose (bias)

We have no experience in front and back-end development and building machine learning models

We are only able to manually score all the news articles

Accomplishments that we're proud of

We have a good idea :-)
We have a prototype for future automation.

What we learned

We learned a lot about covid 19 cdc guidelines, gained more coding experience, beginner level javascript coding, BioBert. We learned from each other.

What's next for PCRs (Press Credibility and Readability scoring)

Refine categories for news entities and news article types
Refine criteria for assessing accuracy and purpose (neutrality)
Refine criteria for assessing readability
More samplings would be needed to create a data base for deep learning model to predict score for news articles
Optimize current code for the model
We need to optimize the recognition for the news articles within the website
Add functionality of automatically popping up when the user visits an article
Add functionality of getting the url of the article for automated computation of scores
Design and add icons for better aesthetics
learning.html is the popup window that makes up the chrome extension.
Front-end dev: Add .js javascripts to automate computation of scores (Need to bridge language gap between Python and Javascript modules)
Improve Chrome extension aesthetics
Automate converting website to string. Currently done manually due to challenges of having to exclude recommendations, ads, and other unrelated information.

Built With

Updates

Feiran Liang started this project — Oct 24, 2021 04:49 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.