Inspiration

Breast cancer is the most common type of cancer for women in the U.S. As such, it is imperative that we find strategies to optimize the allocation of mammography resources/facilities among the U.S. states.

What our project does

Optimizes mammogram allocation for the FDA to maximize impact. We calculated the expected number of deaths per facility via age/race segmentation. We also performed data visualization on US maps.

How we built it

In this project, we optimize via age and race segmentation. That is, we use the age and racial composition of each state to determine relative access to mammography facilities. For age, the expected number of deaths of women from breast cancer for all age ranges was summed together to produce the expected number of deaths of women per facility per state. (The number of facilities in each state was extracted from our original dataset, which included information on certified mammography facilities in U.S. states and territories.) For race, the expected number of deaths of women from breast cancer for each race per facility was determined for each state. In both cases, a larger ratio of the expected number of deaths to the number of facilities in a state implies a greater need for federal funding in that state, especially for uncertified facilities; if more deaths are experienced per facility, in other words, we need higher-quality facilities, as well as greater access to facilities in general, to provide adequate healthcare.

Challenges we ran into

The data was messy. State codes were in the wrong column, and some were even international, forcing us to come up with clever ways of cleaning data. The external dataset we brought in the first time came from Git Hub and turned out unreliable after the final computations, forcing us to find a more reliable dataset from the ACS and resetting our progress.

Accomplishments we're proud of

As freshmen with little experience, we are proud that we finished a complete project with potential real world implications. We are also proud that we learned how to create neat visualizations and utilize pandas, numpy, and plotly.

What we learned

For the dataset: According to our analysis of age, South Carolina, New Mexico, California, Washington, and Vermont are most in need of federal funding from the FDA. These states had the highest expected number of deaths of women from breast cancer per facility. Our analysis of race yielded varying results according to the particular race under investigation. However, we have concluded that marginalized populations of women (namely Black and Hispanic women) are most in need of aid in Maryland, New Mexico, Texas, and California.

What's next for Optimizing mammography allocation via age/race segmentation

Segment via other demographic variables and calculate death rate per facility by state to compare results with the previous two segmentations.

Share this project:

Updates