What it does
Uses a Random Forest to predict if the rate of AIDS infection for a given zip code will be in the top 25% of AIDS rates.
How we built it
Used scikit-learn to train a random forest model using the data in the aids data set. We also found a data set of tax data for zip codes in North Carolina, and used that to estimate the average income for each zip code.
Challenges we ran into
Getting the data is the largest challenge. Another frustration was the fact that we were focusing on the AIDS data, which is aggregated by zip code, while much of the other demographic data we wanted to use (age, income, race, etc.) is aggregate in other data sets by census tract. And getting the data.
Accomplishments that we're proud of
We didn't do too bad for being fairly new to scikit-learn.
What we learned
How to convert a bsv to csv in Pandas.
What's next for HackathonCLT MMXIX