Inspiration

wanting to build a good data model, that doesnt have any bugs and have high accuracy, and keep improving it so we can enter finals!

What it does

1)remove unnecessary columns with rationale 2)split numerical and non-numerical columns 3)undersampling our dataframe 4)do smote on our dataframe to deal with class imbalance 5)do feature selection to choose which features affect our target variable most 6)build our model 7)check classification reports to see accuracy of models

How we built it

Choosing python as our main tool, we made use of many libraries in python, such as scikit-learn, pandas, numpy to conduct data analysis, modelling step by step.

Challenges we ran into

  1. we had problems interpreting the data, such as where we do not understand if nan means missing data values, or if it means not applicable, where in the case of f_ever_declined_la column, where 1 refers to being declined, so we inferred nan values refer to not being declined before.
  2. one hot encoding was tough
  3. we had trouble doing smote cos we couldn't import imblearn

Accomplishments that we're proud of

  1. when we couldnt import imblearn and do smote, we managed to search for solutions on stackoverflow and fixed it ourseleves
  2. we each finished courses and read books which allow us to be more familiar with data analysis libraries like pandas and scikit-learn, and the project was ran much more smoothly due to that. It was encouraging to see that we used methods outside of the code given to us, to improve our model or clean the data better.
  3. after our first draft, we continued on improving our cleaning process, and we are proud to see the model f-score improve each time.

What we learned

  1. data cleaning: how we input missing values, and how we interpret the nan values 2.feature selection: we learnt how to select the best features that affect whether one buys the investment
  2. combining numerical and non-numerical columns to build the model
  3. use smote to handle imbalanced datasets

What's next for DataVision

we really had fun learning and working together these 4 days, and if there is a chance to do this again, we would love to and learn even more next time as we are beginners no more...

Share this project:

Updates