title: "Looking for Your Dream City? You are at the Right Place!" author: Sandipan Pramanik, Rochita Das, Anirban Chakraborty, Abhisek Chakraborty

output: html_document

knitr::opts_chunk$set(echo = TRUE)

What it does?

Are you working from home and looking for a city to move into? We can make your life simpler by doing it for you. Using our R Shiny app, choose the factors that matter to you for your big move, and we will provide you with the best cities worldwide to your relevance.

Factors about a City that might be important to you?

  1. Movehub Rating.
  2. Purchasing Power that you would enjoy.
  3. Health Care Index.
  4. Pollution Index.
  5. Quality of Life.
  6. Crime Rating
  7. Number of Sunny Hours per Year.
  8. Average Speed of the Internet.
  9. Peace Index.
  10. Happiness Index.
  11. Impact of COVID 19: Number of Deaths/ 1 Million.
  12. COVID 19 Response: Number of Tests / 1 Million.

How to use it?

Select the factors you care about from our collection? Tell us how much they matter! We would list out the best cities for you and the worst ones too.

Step 1. Drag the sliders for the variables that are important to you under the "Input Your Preference(s)" tab.

Step 2. Sit back and enjoy! You already got what you want!

Step3.: Go back to Step 1 until you find a city to call it your home.

What's happening behind the curtain

Suppose we observe data on $n$ cities around the world such that for each city there is a $p$-dimensional (continuous and real-valued) factor, say $\boldsymbol{x}_i \in \mathbb{R}^p$. Based on this data and the preference provided by the user on these $p$ factors, we calculate a score in between $0$ to $100$ and provide the user with 5 best and 5 worst cities based on it.

Avoiding the mathematical details, our approach can be summarized in the following steps:

Step 1. Suppose for $k$ out of the $p$ factors a user provides a non "None" preference. Since the user has no preference for the $p-k$ factors, we drop these and just focus on the $k$ factors for the next step.

Step 2. We jointly standardize the $k$ factors to remove underlying joint correlations.

Step 3. Based on the preference provided by the user, we first find a ``direction'' (or vector) in $\mathbb{R}^k$. This is the direction that is preferred by the user. Given these, we project all the $n$ points in that space on this vector. Then we sort the projected points, where the point lying the furthest in the direction is the most preferred city for the user.

Shiny App:

https://city-search-engine-2020.shinyapps.io/DATATHON2020/

Challenges we faced

The followings are some of the many challenges we faced:

1. Although we were provided with some factors corresponding to each city, we thought we required some more. Throughout the year we are not only experiencing unprecedented chaos regarding the covid pandemic, the social injustice among others has also risen around the world. To incorporate some of these issues in this project, we did an extensive search for such data. In this venture, we are partially successful and we are able to include factors like death (per million) and tests (per million) related to covid-19, happiness index, peace index corresponding to each city.

2. Once we collected the data for 216 cities, the next challenge was to come up with an algorithm that makes sense. To summarize: (i) the algorithm should consider the joint correlation of the factors. (ii) The output of the cities should be relevant to the user's input. (iii) We wanted to come up with a robust method to sort the cities in the multi-dimensional space. (iv) Finally, we wanted to provide a score against each city after they were obtained. Our proposed ML-based method is a vector projection-based method and addresses all these issues and provides a fast and relevant choice of solutions.

Future direction

Nonetheless, much more could be done. On one side, having more data with more cities and variables will help us to find better solutions. We can also try a better metric to sort.

Built With

Share this project:

Updates