Detecting Malaria-Infected Blood Cells with Machine Learning

Test Results
Graph of Test Results
Sample Data and Model Predictions
Model Layout

Malaria

Malaria is a deadly disease that affects nearly half of the world. There were over 200 million reported cases of Malaria in 2020 and around 500,000 deaths. With this many reported cases, testing individual blood samples in poverty-stricken areas and analyzing each singular image would result in inefficiency and delay testing. This is why we were inspired to develop a model that solves this issue by detecting Malaria in cells, making this process more efficient and effective.

What the program does

The program uses image analyzing techniques combined with convolutional neural networks and machine learning to detect whether or not a red blood cell is infected with malaria. The program outputs the probability that the cell is infected with Malaria (0 meaning the cell contains malaria and 1 meaning that the cell is normal). The model had a 98.5% accuracy on test data that it had not seen before.

How we built it

We built the program using Python as our main language in Google Colab and imported libraries such as NumPy, Pandas, and TensorFlow. We utilized a convolutional neural network with RMSprop to distinguish between images that had malaria with those that didn’t. We imported image data from Kaggle, and images were grouped in 603 batches of 32. We then trained the model with 2/3 this experimental data, and we ran 8 epochs (iterations), with each epoch resulting in a different percent accuracy. We used the remaining data to test and validate the model. Finally, once we were satisfied with the model’s effectiveness, we were able to analyze other images using it and determine if those cells were parasitized by Malaria.

Challenges we ran into

One of the main challenges we ran into was overfitting. By default, large neural networks have a tendency to overfit by memorizing the data and its labels instead of actually picking up patterns and insights from the data. We identified this error when we noticed that the training accuracy was a lot higher than the validation accuracy. We fixed this by adding Dropout layers to our model to introduce some randomization and regularize the model. We also lowered the number of epochs so that training accuracy does not diverge from validation accuracy.

What's next for the project

We hope to implement adding and testing for more diseases such as sickle cell anemia and other red blood cell related diseases. We believe that our project can potentially help many suffering from such diseases and detect them early to prevent serious medical complications.