Brain Tumor Imaging Software

Poster Submission

Title: Summarizes the main idea of your project.

"Enhancing Tumor Detection with ResNet: Developing Advanced Recognition Software for Medical Imaging"

Who: Names and logins of all your group members.

Manuel Dal Bo (mdalbo), Nathan Kim (nykim), Priyam Parekh (paparekh)

Introduction: What problem are you trying to solve and why?

The detection and diagnosis of tumors pose significant challenges in the medical field, impacting patient outcomes and healthcare efficiency. Tumors and other medical abnormalities, if not identified early and accurately, can lead to severe consequences, including delayed treatment and an increased mortality rate. Traditional methods of tumor detection rely heavily on the expertise of radiologists who interpret medical imaging studies such as MRI, CT scans, and X-rays. However, these methods can be subject to human error and variability, leading to cases where tumors go undetected or are misdiagnosed. To address this critical issue, our project aims to harness the power of advanced machine learning technologies, specifically a modified ResNet architecture combined with additional convolutional layers and using regression, to develop software capable of identifying tumors with high accuracy and consistency. By integrating this technology into the diagnostic process, we strive to enhance detection capabilities, reduce the likelihood of human error, and ultimately provide a reliable tool that supports medical professionals in making more informed decisions.

If you are implementing an existing paper, describe the paper’s objectives and why you chose this paper.

The paper’s objective was to show that a neural network could detect tumors with a decent amount of accuracy. We chose this paper because we were interested in implementing some medical imaging diagnosis model because it seemed like it would be relatively easy to find good data and because it is directly clear how something like this could be used to make the world a better place and save lives. In particular, we chose this paper and the topic of brain tumors because the layout of the model they used in the paper was incredibly clear, and the data was also very clean and easy to access.

What kind of problem is this? Classification? Regression? Structured prediction? Reinforcement Learning? Unsupervised Learning?

This is a classification problem. We are trying to classify whether or not the patient in the photo has a brain tumor.

Related Work: Are you aware of any, or is there any prior work that you drew on to do your project?

There is a decent amount of publications that try to tackle making diagnosis based on mri images. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9854739/ describes many of the recent advances in this field.

Please read and briefly summarize at least one paper/article/blog relevant to your topic beyond the paper you are re-implementing/novel idea you are researching.

The article Scientists employ AI to predict brain cancer outcomes in the stanford medicine news center talks about new ways in which neural networks are being used to turn mri images into 3D representations of the cells. They could then use this new information to feed into another neural network to predict whether the cells were linked to more or less favorable cancer outcomes. This is somewhat of a more complicated model than hours as it has the whole aspect of trying to encode the mri images into something with more spatial awareness. That being said, it is another way of using deep learning to hopefully save people’s lives.

In this section, also include URLs to any public implementations you find of the paper you’re trying to implement. Please keep this as a “living list”–if you stumble across a new implementation later down the line, add it to this list.

https://www.sciencedirect.com/science/article/pii/S0306987720301717?ref=pdf_download&fr=RR-2&rr=872f62e21ee041cd

Data: What data are you using (if any)?

We will be using datasets from kaggle which is an online website that has many different datasets that are classified on if the image shows cancer or not. For instance: https://www.kaggle.com/datasets/navoneel/brain-mri-images-for-brain-tumor-detection/data?select=yes

How big is it? Will you need to do significant preprocessing?

Most datasets have sizes ranging from 100 mb to a gigabyte. We would like to combine multiple datasets in order to have a better range of angles, sizes, and types of cancers so we should expect a 5-10 gigabyte dataset in the end. We will most likely have to do some preprocessing to ensure that the data we get are the same size and properly formatted.

Methodology: What is the architecture of your model?/How are you training the model?

We will train our model using standard backpropagation and gradient descent. A good loss function for something like this would be binary cross entropy loss. A large portion of the model will be taken care of with the resnet50 layers, but we will also have a series of additional layers which will increase accuracy further.

If you are implementing an existing paper, detail what you think will be the hardest part about implementing the model here.

The hardest part of this paper will most likely be implementing the different layers and getting them to correlate with each other correctly to get an accurate result.

Metrics: What constitutes “success?”

For this project, what would constitute the success would be if we are able to properly identify if the image given shows that cancer is present. Since cancer is generally a yes or no diagnosis we really only have those 2 outcomes. Given that the paper reached a 97% accuracy, I would constitute an accuracy over 95% percent as a success.

For most of our assignments, we have looked at the accuracy of the model. Does the notion of “accuracy” apply for your project, or is some other metric more appropriate?

Yes, the notion of “accuracy” does apply for our project as we are trying to accurately predict whether a patient has a brain tumor based on their MRI.

If you are implementing an existing project, detail what the authors of that paper were hoping to find and how they quantified the results of their model.

The authors simply aimed to prove that it was possible to get accurate predictions of whether or not somebody had a brain tumor. They quantified their result in a couple of different ways. The first is just plain accuracy. The second is sensitivity which is just how many were predicted to be positive over the entire dataset. And the third was specificity which is how many were predicted to be negative over the entire dataset. Other metrics such as false positive and false negative rates were also used.

What are your base, target, and stretch goals?

Base goal: train a model with over 90% accuracy for the datasets listed in the articles. Target goal: incorporate additional data from different datasets to further increase the accuracy in order to achieve over 95% accuracy which is what the paper is able to accomplish. Stretch goal: Improve their model and achieve an accuracy higher than the paper, so over 97%. This could be achieved either by incorporating more data or fine-tuning/adding new layers.

Ethics: Choose 2 of the following bullet points to discuss; not all questions will be relevant to all projects so try to pick questions where there’s interesting engagement with your project. (Remember that there’s not necessarily an ethical/unethical binary; rather, we want to encourage you to think critically about your problem setup.)

What broader societal issues are relevant to your chosen problem space?

Firstly, there is the issue of accessibility and equity in healthcare. Access to advanced health services is usually available only to people of middle/high-income individuals and families. This discrepancy can further exacerbate existing healthcare disparities particularly affecting underprivileged communities who do not have access to new health technology. We aim to mitigate this issue by providing our software for free to any medical professional. Another issue is with regard to patient privacy and security. Since medical data is highly sensitive, we aim to promote strict confidentiality to protect patients' information. As a result, we will be training our model on publicly available (anonymized) datasets. Furthermore, when a medical professional uses our software, we will not ask for any personally identifiable information.

Who are the major “stakeholders” in this problem, and what are the consequences of mistakes made by your algorithm?

Since we are developing tumor detection software to be used by medical professionals, we have multiple stakeholder:

-Medical Professionals (Radiologists, Oncologists, etc.): These professionals rely on diagnostic tools to inform their clinical decisions. The reliability of the software impacts their diagnostic accuracy, influencing treatment plans and ultimately patient care. Errors in the software could lead to misdiagnosis, potentially causing inappropriate treatment plans which can affect the professionals' credibility and the trust patients place in the healthcare system.

-Medical Researchers: Researchers are stakeholders since our tool can be used in developing new treatment and detection methods. The accuracy and reliability of the software are critical for ensuring that the research is valid and accurate.

-Patients: Patients are the primary beneficiaries of our medical diagnostic tool. Accurate tumor recognition can lead to timely and appropriate treatments, improving outcomes and survival rates. However, they also bear the risks associated with any errors made by the algorithm. False negatives can result in a failure to diagnose a condition until it's potentially too late, while false positives might lead to unnecessary stress, additional testing, and potentially harmful treatment interventions.