Team Dental Clinic
Project Poster
Data
Fully Connected Neural Network
CNN with Reshaping
CNN Model
CNN with tab2img
TabNet
Result 1
Result 2

Title: Dental Care Recommendation System

Team member: Yijing Gao (ygao98), Yutong Liu(yliu381), Qingyan Guo (qguo11), Wangsan Tian(wtian8)

Team Name: Dental Clinic

0.Links

1. Introduction

The team wants to solve a new problem by applying the deep learning model to develop a recommender system for dental care. The topic is based on an existing research paper: Development of a Recommender System for Dental Care Using Machine Learning (Hung et al., 2019). The paper wants to create a model for dental care recommendations based on individualized needs, by using machine learning classification algorithms (SVM, random forest, k-nearest neighbour, CART). The model classifies the need of seeing a dentist into three categories based on the level of urgency. Our group designed the implemented multiple deep learning networks to replaces the original methods and aimed for a better performance.

2. Related Work

Development of a Recommender System for Dental Care Using Machine Learning The authors utilized five machine learning models, including random forests, CART, KNN, SVM, and logistic regression to predict the overall recommendation for care, and leveraged LASSO to select 8 top features associated with recommendation for dental care. URL
Exploring the Intersection between Social Determinants of Health and Unmet Dental Care Needs Using Deep Learning The authors used machine learning models to determine top predictors of unmet dental care needs and to build a risk prediction model to identify those with unmet dental care needs. And deep learning models with five sequential blocks were proposed to predict the outcome. URL

3. Data

We will use the data sets from the CDC’s National Health and Nutrition Examination Survey (NHANES) https://wwwn.cdc.gov/nchs/nhanes/. NHANES is a program aimed to assess the health and nutritional status of adults and children in the United States. The data sets contain physical examinations and questionnaires data, covering multiple aspects including interviewees' demographics and current health conditions. In our studies, we would mainly focus on the oral health-related data, ranging from 2011 to 2017 and containing 40’000 approximately samples.

4. Methodology

For our project, we are planning to use deep learning methods to replace the original machine learning methods. We did a comparative study with fully connected neural networks with and without re-sampling, CNN with reshape, CNN with tab2img and TabNet to find a more efficient predictive model for dental care recommendations than previous contributions. We investigated the importance of each feature from the provided data sets with LASSO regression and and try to use the selected features to interpret our deep learning models and to find the most influential factors regarding dental health conditions.

5. Metrics

Firstly, we will do data cleaning to handle missing values, and do exploratory data analysis to explore the correlation between different variable pairs, since we have a huge number of features. Secondly, the dataset will be split into training and test sets. After training the model based on the two architectures above, we will compute the model evaluation on the test set. For this project, micro-average accuracy and F1 score will be chosen to access the model’s performance. Since there are 4 classes in the target variable - overall recommendation for care and the class is imbalanced, micro-average will be preferable to aggregate the contributions of all classes to compute the average metric. The baseline of the model is defined as predicting all classes as the majority class in the target variable. There are two possible outcomes for success: our two model architectures perform better than the baseline model, and both models can predict the minor class (see a dentist immediately) in the target variable. Our goals for the project are: Base goal - Implement two deep learning models and the baseline model Target goal - Feature selection Stretch goal - Model interpretability for explaining the predictions

6. Ethics

There have been some successful examples of using machine learning models to improve oral health delivery and outcomes. We will have data from normal demographic data to more complicated questionnaire data, which means basic machine learning models will be never suitable for data explosion, data complication, and data uncertainty. Dentistry is one area of medicine that can benefit from deep learning. Deep learning can help accurately assess the urgency of which a person should see a dentist. Our methods can efficiently address both the public health burden and the financial impact as the models can identify the impact of evidence-based community interventions.

We use the data directly from CDC’s National Health and Nutrition Examination Survey (NHANES). The data is collected through a series of interviews, questionnaires, and health examinations to gain information on lifestyle, diet, overall health status, socioeconomic status, and demographics. All data is de-identified, so there will never be a concern about data privacy and data leakage. Because of the forms of data collection, we will have data that represent different kinds of information. To better construct our deep learning models, we need to select reasonable and suitable data both manually and automatically to make the data more targeted. The more representative data will train different deep learning models and we think it will have better results.

7. Division of labors

Dataset collection: Wangsan Tian
Dataset preprocess: Qingyan Guo, Yijing Gao, Yutong Liu
Model training and testing for fully connected neural networks: Yutong Liu, Wangsan Tian
Model training and testing for CNNs: Qingyan Guo, Yijing Gao
Model training and testing for TabNet: Wangsan Tian
Explain the predictions of the classifier: Yijing Gao, Yutong Liu, Qingyan Guo, Wangsan Tian