We believe that we can tell the story about home buying from the individual. What affects the underlying propensity of buying homes is affected by macroeconomic sources such as house prices, salary, and age. The sparsity of data at the zip5 creates so much randomness that signal is not easy to detect. To combat this issue, we decided to take a top-down approach and use a 3-layer hiararchical model. Gathering data from external sources we first joined with citizen’s demographic data/labels on the zip5 level. We believed that aggregating up would allow up seperate the signal from the noise more easily. Using a classification model, we seperated inactive and active buying zip5 areas. For the active areas, we trained another regression-type model to predict the actual counts of homebuying within that area, In order to project down to zip9 again for the result, we trained another allocation model from the monthly financial data to project each zip9’s proportion of homebuying propensity in respect to its zip5 region, Multiplying our counts and proportions, we calculated the final predictions for each zip9 region.

Built With

Share this project:

Updates