Abstract The housing market in Binghamton is seriously undervalued Real-estate developers need tools to enable smart portfolio management tools to make informed decisions Our model shows that significant outliers with the zip codes

Methodology and process We used two methods for linear regression: Jeremy - Gradient descent with Adamoptimizer Normalized all data from 0 -1 using sklearn.preprocessing import MinMaxScaler Used IsolationForest to remove outliers Split 10 percent of the data into validating set Five hidden layers with neuron numbers [200, 100, 50, 25, 12] Peter - Normal equation Reject the zipcodes that place properties outside the immediate Binghamton area (removing Cortland, Ithaca, the Adirondacks, Rochester, Sydney, etc.) Delete zipcode from dataset because I can’t find a way to import location data in time. Mix the dataset and separate into 90% training data and 10% verification data. Create categories that sum the ratings for subsets of a kitchen, bathroom, living room, and bedroom ratings into single ratings for each category. Preserve categories that contain furnishings. Create categories for ratios of space/bedroom, parking/bedroom and bathroom/bedroom. I deleted some obviously erroneous entries like there were two places with 3 bathrooms, no bedrooms and they each cost $425. Used standard square error to check accuracy of the algorithm

These methods were chosen in order to accommodate both iterative and noniterative methods. Iterative methods scale better for large numbers of features, so by working with both iterative and noniterative methods, we can have a predictive ability now, and predictive ability in the future, as more data are used.

Conclusions/Results We found that ~10% of the data was outliers

Future work We would like to track changes in housing prices over time in the future. Importantly, we’ve included the iterative and noniterative methods, so we could feasibly add many orders of magnitude more features to our model and get even better results using the iterative methods. However, we are interested in the feature-limits of our non-iterative method and when that model begins to become too computationally expensive. Weather data, flood data, etc could be used as additional features. We could take the predicted housing prices to predicted what features houses have especially ones with missing data A large dataset to work with would be desired which should lead to higher accuracy Further work is needed to better the data input To utilize a zip code we need to map a zip code to the distance between a property and downtown Binghamton

Built With

Share this project:

Updates