Paediatric Bone Age Assessment Using Deep Convolutional Neural Networks
Who? Team Bones!
Joel Kim (jkim631), Mitchell (msalomo2), Andrew Xu (axu49)
Introduction
The goal of this paper was to implement a deep learning approach for the 2017 Pediatric Bone Age Challenge organized by the Radiological Society of North America. We decided upon this paper due to its detailed description of its implementation of preprocessing and the use of the VGG architecture. This will also allow us to compare the performance between VGG and ResNet. The purpose of the model is to be able to predict the bone age given radiological images in months, thus being a regression problem.
Related Work
Summarizing takeaways from this paper:
The rationale behind posing this problem as a competition was to spur competitiveness and innovation in producing the best machine learning model for a common dataset and to promote research in the area. Also interestingly, a model that combined the 2nd place and 4th place teams’ solutions had better performance than the first place solution, showing the power of collaboration and being open about sharing work. In general, the top-performing teams in the original challenge had several commonalities in their approaches, such as most using deep learning, some form of data augmentation to enlarge the data set (such as flipping the images, adding noise, etc.), preprocessing (breaking up image into subcomponents), and using a combination of multiple algorithms. Additionally, there were very small margins between the top five solutions’ performance, suggesting we are nearing the limitation of the current fundamental “atlas” methods of estimating bone age or current deep learning practices in this domain.
Living list of public implementations:
https://github.com/neuro-inc/ml-recipe-bone-age
https://www.kaggle.com/code/kmader/attention-on-pretrained-vgg16-for-bone-age
https://www.kaggle.com/datasets/kmader/rsna-bone-age/code
List of other related research papers
https://pubs.rsna.org/doi/10.1148/radiol.2018180736
https://pubs.rsna.org/doi/10.1148/radiol.2018182657
https://pubs.rsna.org/doi/10.1148/radiol.2017170236
Data
Our data comes from the Radiological Society of North America’s 2017 AI Challenge. It consists of 12,611 training images, 1,425 validation images, and 200 test images. The sample set is roughly 50/50 male female, with mean age 11-12 years old. The goal of this is to predict the skeletal age of a patient given the radiographic data in months, with a range between 0-216 months. The data has been pre annotated/labeled by radiologists.
Methodology
We will be using ResNet-50 as our model that will be trained on the 12,611 training images. As the ResNet-50 architecture is already defined, this will simplify the model design process and more easily compare directly the performance of VGG-16 and ResNet-50. Likely the most challenging aspect of implementing this model will be adjusting certain hyperparameters, such as the learning rate, batch size, or activation function.
Metrics
We intend to use the CNN to do regression of the images and predict the bone age given radiograph. Accuracy is very important for this, being measured in mean absolute error (MAE), and for the project to be successful, we aim to achieve an MAE of around 36 months (base goal). Our target goal is 12 months. If things go very well, we can hopefully achieve as good results as the publishers of the study, reaching an MAE of around 4-5 (stretch goal).
Ethics
What is your dataset? Are there any concerns about how it was collected, or labeled? Is it representative? What kind of underlying historical or societal biases might it contain?
The data set consists of a total of 14,236 x-ray images of children’s left hands. Each image was manually labeled with a skeletal age in months. The dataset is also already divided into training, validation, and testing sets. More info link.
Our dataset was developed by groups from Stanford University and the University of Colorado as part of a competition organized by RSNA (Radiological Society of North America). The actual data comes from Lucile Packard Children's Hospital (right next to Stanford) and Children’s Hospital Colorado in Aurora, CO. Thus, it is likely that the dataset is only representative of the populations in Palo Alto, CA and Aurora, CO. Interestingly, according to the US Census, Palo Alto’s median household income was $194,782 in 2021, which is significantly higher than the median household income of the United States ($70,784). The city also has lower poverty rates compared to the national average (5% vs 11.6%). This could have a big impact on the data, as the patients whose data were collected are likely to have access to high-quality healthcare and good nutrition (both of which would have an impact on bone growth). Thus, any neural network trained on this data may only perform well on similarly affluent demographics. Hopefully this effect is lessened by the inclusion of data from Aurora Colorado, as their demographics are much closer to the US’s average (for example, median income is $72,052).
We place a relatively high confidence on how the data was labeled, as a combination of clinical radiology reports and six expert pediatric radiologists were responsible for the labels. However, it seems patient consent for the data was “waived”, which might be a concern.
Who are the major “stakeholders” in this problem, and what are the consequences of mistakes made by your algorithm?
This problem will primarily impact pediatric radiologists and endocrinologists along with their patients. Chronological age, the years since birth, and bone age, the years corresponding to the maturation of bones, are distinct. Within the medical field, determining bone age is important in assessing the evaluation of impaired or accelerated growth, early or delayed puberty, and short or tall stature. These evaluations may be used to assist in diagnosis, delayed bone age, for example, being common in malnourished conditions associated with different chronic diseases such as celiac disease or cystic fibrosis. Delayed bone age may also indicate psychiatric conditions, such as anorexia or psychosocial stress or abuse. Outside of the medical field, bone age assessment is often required during immigration programs, especially important for the protection of children that immigrate without parents or identifying documents. As demonstrated by the importance of accurate bone age, any mistakes can hinder diagnoses or identification processes, even though they are rarely used without other corroborating information.
Deliverables
Built With
- python
- tensorflow
Log in or sign up for Devpost to join the conversation.