Inspiration
Tired of not being able to find accurate quotes for your HDB apartment? Tired of property agents jacking up the prices artificially just to siphon more of your net worth into their own pockets? Soon, your nightmares will come to an end because the HDB Price Estimator is here to save your day!
What it does
The HDB Price Estimator application is an essential tool for both buyers and sellers. The HDB Price Estimator provides fair quotes based on the factors that the buyer or seller enters. The factors are put through our machine learning model that was trained on past HDB sales data since 2012.
How we built it
We used HDB sales data from Data.gov.sg
Data Preparation:
- Cleaning out the NULL value
- Adding new features - Geocoding. By using the street name in raw data and geocode the longitude and latitude of each listing, we’re able to add in more features to improves the accuracy of the model:
- Add distance to closest MRT
- Add distance to closest Mall
- Add distance to CBD
- Adding new features - Economic Environment.
- GDP per capita for the past 10 years (quarterly)
- SIBOR (Singapore Interbank Offered Rate) for the past 10 years (quarterly)
Modeling:
- Handling categorical data in Regression
- One-Hot Encoding
- Features Engineering:
- Transform categorical data into numeric ones by assigning numeric values based on the deviation from the median of the resale price of a specific category (see featuring engineering in Modeling)
- Assigning boolean value to every category of each categorical feature
- ML techniques
- Linear Regression
- GradientBoosting Regression (ensemble model)
We also made a Flutter application to bring this feature to the end-user. This application is multi-paged and fully interactive. The application sends API requests to a Node.js webserver that we deployed, then the webserver processes the user input by running it through a Python child-process that runs a pickled version of our machine learning model. The Python script then returns the estimated HDB value to the webserver to be sent back to the application.
Challenges we ran into
- The data provided by HDB did not include postal codes but only street addresses and block numbers. In order to find the longitude and latitude of every block, we wrote an algorithm to use OneMap's API to retrieve the postal codes for every address. However, there were almost 200,000 rows of data that we had to update, hence this process took about 8 hours just to retrieve and update the postal codes. In order to avoid losing data during this time-consuming operation if something fails, we stored all relevant API responses into a .csv file. This is done to allow easy resumption after a potential crash or error and would save us from waiting for the algorithm to run from the start.
- Data analysis and machine learning are often done through Python's Jupyter Notebook. However, our webserver is written in Node.js, hence due to the language difference, we had to figure out a method for the Node.js backend to be interoperable with the Python machine learning model that we created. In the end, we pickled our Jupyter Notebook containing the model and passed it as a byte stream to another Python script that handles the communication between the webserver and the ML model.
Accomplishments that we're proud of
“Data scientists spend 45% of the time prepping the data” (according to research conducted by Anaconda).
This is what happened to us while trying to deal with almost 200000 entries from real governmental datasets, and even trying to geocode and adding in new features like distance to the closest MRT station or shopping mall, or data repressing economy states like GDP per capita and SIBOR based on literature review.
To examine and experiment with different ways (Feature Engineering and One-Hot Encoding) of handling categorical data while both getting accurate results with gradient boosting regression (95% and 96%) was also not an easy task.
Last but definitely not least, it was difficult to fit the extracted model into the User Interface with correct output in a speedy fashion. But our team was able to complete a basically done UI with accurate and fast response in a short period of time given.
These additional efforts allow us to build more accurate models while actually deploying in real life to assist more young adults (or anyone who wishes to purchase an HDB) to make a wise purchase decision.
What we learned
Other than the accomplishments stated above that definitely help us develop more ML skills and stimulate our creativity, the model itself brought up some interesting facts.
As we built 5 different ML models for 5 different room types, there are some interesting differences. For example, the weight of each variable in the regression model can be quite different based on the room-type of HDB, e.g. while SIBOR only accounts for around 1% weighting in the resale price for Executive room type, it accounts for nearly 20% weighting when trying to predict the resale price of 4_room type.
Moreover, it was also surprising to see some of the features that seem intuitively minor at the beginning turned out to be important features or, in some instances, the opposite. For example, SIBOR (an indicator of current interest rate) was actually an important feature, while the floor area wasn’t really significant in most of the models.
What's next for HDB Price Estimator
First of all, a more complete UI interface can be built. We wish to return a more fully founded result to the user, not only the suggested pricing but reasonable pricing in a given area and visualization of release price history trend in the fast few months, etc.
Moreover, as the relative importance of each variable is known from different models, combining with some more features like supply and demand of HDB release market or other factors, prediction of release price in the future is possible with sufficient data given. In this case, we can advance our model into a forward-looking “predictor” that can even better assist the user with the purchase/investing decision, e.g. to purchase BTO or HDB at the moment by comparing the given price of BTO (from the government) and our prediction of HDB of a certain type.
Log in or sign up for Devpost to join the conversation.