Inspiration

Citizens Bank.

What it does

It employs an autoencoder and the k-nearest neighbors algorithm to predict selling prices of houses based on past data.

Stages

Preprocessing

We had to trim the dataset somewhat to isolate the most useful fields, which we determined from their correlation with the amount the house sold for. We also convert the categorical fields to numerical ones, which is the most computationally intensive step, as we assign each category a ranking based on the average selling price. We also stripped outliers from the selling price column, and replaced NaNs with averages over the non-null values.

Autoencoder

The autoencoder compresses each row to about a third of its original size to obtain an easier representation for the KNN algorithm to work with. It's designed to minimize the mean squared error between its reconstruction and the original vector.

KNN Algorithm

This is the actual actual predictive algorithm. It takes in the compressed representation of a new transaction, finds the 10 most similar transactions in its training set, and returns the average of their selling prices.

Challenges we ran into

Preprocessing the data proved to be tedious and highly time-consuming. The application would struggle to convert the categorical fields to numerical data as well, which forced us to use only about a quarter of the dataset. The accuracy of the autoencoder's reconstructions also tended to be dependent on its initialization. We were also cold.

Accomplishments that we're proud of

It works, sometimes!

Built With

Share this project:

Updates