This project is a recommender system based!!

Inspiration

We were looking forward to explore different aspects of data science and machine learning.
Since the data we were working with was already really complete in these aspects, what seemed the best idea was to create a recommender system.

What it does

Given a living place of the dataset, the recommender returns the 5 more similar samples.

How we built it

The first step was to rearrange the data so that it was usable for the models. Then we built clusters with k-means and distribute the data into different groups. These were combined with the qualitative data (that uses jaccard distance) and the quantitative one (that uses KNN). The final factor that we consider is the localization of the houses.

Challenges we ran into

The main challenge we had to face was dealing with the humongous amount of data.

Accomplishments that we're proud of

The aspects that worked better in our group have been autonomous learning, decision making and team building.

What we learned

Although the large dataset processing was much more time consuming than we expected, we have incrementally/parallelized pipeline to read and preprocess the data. In addition to this, none of us had never used streamlit and we used the opportunity to give it a try.

What's next for This project is a recommender system based!!

There were some ideas that we needed to cut off from the original planning due to the time limit. For example, the API can be improved by including a map where you could directly select the desired zone. We also omitted the dimensionality reduction of the data with PCA. On the other hand, there were also some aspects that we wanted to include but already knew that were not feasible. In this group there are adding user profile data to improve the results or precomputing all of the dataset.