City Improvement Using ML

Inputs and Outputs of program
Part of the data we used in excel

Inspiration

We wanted to learn machine learning because it is a large part of the current computer science field and we believed it could help give answers to complex societal problems. We were inspired by the simple linear regression that you can do in excel and what we'd be capable of without excel's limitations.

What it does

The user inputs a city and a list of statistical attributes of the city, then the program outputs which attribute would be most beneficial to the city if it was increased or decreased. With outcomes such as increasing city income, decreasing the city's murder rate, and decreasing the rate of depression. We chose this project because we believe that machine learning can improve cities and more specifically, people's lives.

How we built it

We used the Python library Sklearn to train machine learning models to understand what input variables likely contribute to certain output variables. In order to avoid over-fitting, we learned and applied Bagging (Bootstrap Aggregating) to train models with part of the data and grade the model based on how well it fit the rest of the data it didn't use to learn.

Challenges we ran into

Because there are so many machine learning models and they each have their own long list of hyper-parameters, it was difficult to figure out which model to use with our data. Models like linear regression would predict extremely inaccurate outputs if given values outside the range that was in its learning data. The model we decided worked well was random forest regression because it provided both a more reliable accuracy and overfit the data less.

Accomplishments that we're proud of

We are proud of how we were able to get a machine learning model running in the first place since it was a pretty complicated subject. In addition, we got it working with a good enough accuracy to understand what aspects of cities cause positive and negative effects and how to improve the cities.

What we learned

We learned the basics of supervised machine learning in Python and how data collection and data analysis is a very precise field due to potential inaccuracies in data and misleading data.

What's next for City Improvement Using ML

There is much we want to improve with the program and idea. Due to time limitations, we couldn't collect enough data for as many attributes as we wanted. The machine learning model could also be improved by learning which models to use when and the most effective ways to set hyper-parameters.