Inspiration

I've always wanted to start learning machine learning, but never had the chance to start. This hack-a-thon was the perfect opportunity for me to do so. What was the first step? One word: Kaggle.

What it does

I applied SotA machine learning to predict sale prices of houses, given non-trivial features in the city of Ames, Iowa

How I built it

I used numpy and pandas to cleanse the data into something model-friendly, and used sci-kit-learn to use linear regression to predict the final sale price of the previously menitoned houses.

Challenges I ran into

Categorical features were EVERYWHERE! Even some columns with what we thought were mathematical numbers had to be one-hot-encoded due to their categorical nature. The dataset was HUGE. We had to look through hundreds of columns with thousands of rows just to identify a handful of NaN's, which we had to eventually impute. After putting in all the effort to cleanse the data, we finally moved on to the machine learning portion of the project.

Our accuracy was 1%. We were heartbroken, but we realized we used classification instead of regression, making it an easy fix. Fortunately, that was the last challenge we faced. Soon after, our predictions were only $16,000 from the actual sale price!

Accomplishments that I'm proud of

I had little-to-no knowledge of machine learning before this hackathon, and I came out, knowing almost everything about supervised machine learning using sci-kit-learn with structured data!

What I learned

I learned the methods involving data cleansing, such as one-hot-encoding and imputation, different machine learning classifiers, from linear regression to support vector machines, and the methods to improve the accuracy of the model, such as K-Fold Cross-Validation.

What's next for House Price Predictor

We should use hyperparameter tuning using a grid search in order to improve the accuracy of our model. Also, we should implement XGBoost for the best results.

Share this project:

Updates