My main inspiration is based on the fact that I'm currently an international student. Moreover, I had the opportunity to live in different countries and I notice that there are significant differences between different cities and countries.
For instance, the education system in most South American countries is public and free while in USA and Canada are mostly private. However, the cities in USA and Canada will have a higher quality of life.
So, each factor can be vital in order to make a proper decision.
What it does
The project helps us to: 1) Visualize the worldwide distributions of each city with their respective essential features (e.g. average rent, crime rate, health care, population, or even weather) 2) Visualize the top cities using relevant features (e.g. top 20 cities with high crime rate and quality of life). 3) Understand the correlation between certain features (e.g. High crime rate vs Low quality of life) 4) Performs clustering (k-means and agglomerative) to find patterns between features in the dataset.
How I built it
I used jupyter notebook as my coding platform and several python packages (pandas, seaborn, plotly, statsmodels, sklearn, scipy)
Challenges I ran into
The main challenges were with the cleaning, preparation, and managing of the data. I had to carefully understand it and locate my missing values, duplicated rows, or even cities with the same name but different countries.
Accomplishments that I'm proud of
I'm proud of doing the visualization of my data. I'm very new to machine learning and I found plotly an excellent package for interactive visualizations.
I'm also very proud of my cluster data. I feel that I display different trends and patterns that were the same for both clustering algorithms.
What's next for City Search Tool
I definitely would like to add more features to my dataset and perform other clustering methods (maybe more sophisticated).