- Installed Python 3.7 and Spyder IDE using anaconda navigator
- Python basics using Pandas, numpy and matplotlib libraries
Building constructions are mapped to ISO construction codes based on the materials and other details about construction. But in real time scenario, the actual details are hidden inside the construction descriptions provided by the brokers.The ISO Construction codes need to be identified by extracting details from the construction description.
We have analyzed the use case and started working on this in the below approach.
➢ Read the train data with pandas libraries
➢ Remove the duplicate rows from data
➢ Remove the special characters
➢ Gone through the different regression algorithms like Linear Regression, Polynomial and Random Forest regression.
➢ By using Random Forest regression, need to predict the values
from sklearn.ensemble import RandomForestRegressor regressor = RandomForestRegressor(n_estimators = 10, random_state = 0)
Random forest is suitable because the increased diversity in the forest leading to more robust overall predictions. When it comes time to make a prediction, the random forest regression model takes the average of all the individual decision tree estimates.
➢ Read the test data and depending on the prediction data, update the codes for the given descriptions.