We looked into performing regression on house prices in the Boston Dataset. Team JARL have also performed a spatial analysis of all the housing locations, too.
We found that we could get our Random Forest and autoencoder regression models to be very accurate (with an R^2 of predicted vs actual around 90). The tools we used: Keras and scikit-learn! We also found that the number of rooms per dwelling had the most impact on prices, closely followed by the % lower status of population.
More importantly, we were able to actually correct the longitudes and latitudes of the locations in the dataset (which also helped all other teams too!) and were able to plot the locations on a map of Boston using Folium. Some very good insights came from there: for example, we found places outside the main city center to have the most wealthy places, while inside town happened to have the least valuable places. We found that this is because of the socio-economic situation when the dataset was made (1970s) when there were lots of riots around.
The challenges we faced were to optimise our models, get the locations working and get a working video and report in time for the deadline ;)