We're all very passionate about economics on a large scale and wanted to build something that would be useful. We were inspired by the Housing Crisis of 07-08 since it affected a lot of people not only in the US but worldwide, including and especially our own countries. Being International Students at Brown, this is a problem that we very much are a part of. We believe that such mistakes would not happen if there were computers involved because they are not affected by emotions.
What it does
We built an ML model that allows us to predict a house's price given certain parameters, such as land_valule, tax_amt, condition, and so on.
How we built it
We used sqlite3 to clean the data and then convert it into a SQL database, which we then read into a Pandas DF. We then used Time Series Analysis with SARIMAX to make this time-invariant. It was then trained using XGBoost to give the final mode.
Challenges we ran into
The data was not the most organized. We found many issues with bad and invalid values. We had to scratch more than half the data because of this. Along with that, none of us have proper ML experience, so it was very tough to 1) choose the right model 2) actually do it. We first used OLS because that is what we understood best but then suggested to XGBoost because of suggestions we found online. It was very tough to get the accuracy high up, again partly because of the bad data.
Accomplishments that we're proud of
We got 78% accuracy! Much higher than we expected initially, given our serious lack of experience. Learned a lot about data cleaning and coming up with ways to choose the best factors. We split the work very evenly amongst us, and that was very beneficial.
What we learned
How to use XGBoost and OLS on data. Splitting up training and testing data. Teamwork. How to make Time Series stationary. Data cleaning.
What's next for Estatimation
We plan to increase accuracy by normalizing the data, and getting more of it! We also think that fine-tuning the parameters more would help.