Submission for Hack for the People
I wanted to challenge myself because I've only created projects that target environmental issues so I decided to pick a topic about sociocultural x economic. This lead to me building the data app you see here.
The front end (all the graphs) are from dash plotly and a bit of css. I gathered data from the USA Justice System, USA Census and USA government websites and cleaned them with a combination of pyspark and pandas. Then I built a tensorflow model that predicts the total correctional population based on the input features you see in the app. Unfortunately, my first model that used states as a input feature had an absolute error 4 times smaller than the current model, but I was unable to one-hot encode the user inputs correctly and to be able to provide a model at all I had to only rely on the numeric inputs.
What I would do differently?
I would gather much more data, such as the ages per state, gender, and just more data in general. There was about 200 values to train on for the model so that was one of the reasons the error was less than desirable. I would also use more enumerations of input features such as gender, state and democratic affiliation because those are also very large factors in incarceration rates.
Where did I get my data?
You can find out where I got my data at my github