We have data. We wanted to have fun with it.
What it does
Our project asks you information about you apartment, and it uses a combination of different machine learning models to predict its price. Apartments in DC metropolitan area only.
How it works
We used typeform to poll your apartment info, and then run it through a sklearn model trained on 800 datapoints gathered by Apartments.com (CoStar). Finally, we send the information through email to you.
Of the 20 or so models we tried, we found that decision-tree ensembles tended to work best. Specifically, Gradient-boosted trees were the best performers overall, and its superiority is consistent over different feature configurations.
Challenges we ran into
Machine learning is hard. It took a while for us to figure out how we can best clean the data and train the models. Overfitting was a minor hurdle, and we also tried very hard to make sure that we're validating our models properly, without leaking testing data into the training set.
Accomplishments that we're proud of
Even with rigorous cross validation, our model is able to conistently explain 70% of the variation in our testing set.
What we learned
Decision trees are awesome!
We also learned to use (and abuse) Typeform, SkLearn, and machine learning for fun and profit.
What's next for How much can I rent my apartment for?
We're thinking about expanding our data to the whole United States, and add more geolocation-based features; from what we found about the D.C. metropolitan area, adding GPS coordinates doesn't seem to improve the accuracy of the model.
However, over wider spreads of data points, location might start to be a more important factor in price determination.