Inspiration

Invasive species are the result of international accidents in which humans accidentally bring species from other nations to a foreign one. Our inspiration for this project came from all the wasted resources and economic damages caused by these species. More specifically we decided to approach this problem locally here in Canada.

What it does

Essentially the model is able to predict which group of species is inhabiting a specific geographical location in British Columbia based on four features: time, latitude, longitude and UTM time zone.

How we built it

First, we had to research and find a dataset that would have a lot of features corresponding to the location of invasive species. We then visited the British Colombia Catalogue and found our dataset. We then had to clean it by getting rid of null values, dropping unnecessary columns. Then we did a correlation of feature importance test on our data and was finally able to begin training. Training consisted of splitting the data into a 60% for train, 20% test and 20% validate. After this, we had a completed functioning model.

Challenges we ran into

The initial model accuracy was quite low which required us to make changes such as instead of a standard 80% train, 20% test ratio we added the validation split as we had plenty of data available to perform a split like this. We also had to do a lot of feature testing to see how many would be optimal with the desired accuracy. When Feature testing we also had to consider how it would affect our model in terms of overfitting and underfitting and found that 4 features performed the best.

Accomplishments that we're proud of

The lasting impact that this project can have as it is open-ended and researchers can use our model the aid in getting rid of invasive species. Also that the model is functioning with a good accuracy

What we learned

How to use SKlearn for classification, feature importance test, data cleaning and model selection.

What's next for Invasive Species Prediction

Future improvements would primarily be increasing the scope of our dataset where we only measured 20% of invasive species whereas increasing this number to the 80 to 100 range would dramatically increase the impact of our project. The next improvement would be to create a container of data with our model on azure and then an app that uses the container so that researchers across the globe can have access to our research and model.

Built With

Share this project:

Updates