Earthquake Prédicteur

Inspiration

In the Pacific Northwest, the "Big One" earthquake is bound to hit sometime in the future, so we thought it would be useful to create a predictive model to help determine when earthquakes happen. As a result, we thought this would be a great opportunity to learn more about the realm of data science, data analysis, and predictive modeling.

What it does

This project trains a regression model to predict the time in seconds from an earthquake to occur. The training is done using seismic data that is publicly available and labeled with real time delta from simulated earthquakes.

How we built it

The Catboost imported library was very useful in encapsulating the regression model framework and letting us train and predict at a high abstraction. The features selecting model was also provided by Catboost. Numpy and many other data-focused libraries were used to engineer the features that were being determined from the subsets of raw acoustic data signals.

The website was developed with the opensource framework called Streamlit, which is used to create data-science app very seamlessly using python code.

Challenges we ran into

Some of the features that we have researched and deemed as important for model training were very difficult to learn. After learning, we had to also implement them in python through the limited amount of libraries that are available for the field of signal processing.

The dataset that was used for training was extremely large (in the order of gigabytes), so repeatedly training and testing the model was very time consuming. By the end of development, we came up with ideas to better the model but did not have the time to repeatedly reload the dataset and retrain the model.

Accomplishments that we're proud of

We are really proud that we not only able to create a reasonably accurate predictive model within the time frame, but also a web page to go along with it. We are also really proud of the amount of research and testing we had to do to not only learn about each of the features we wanted to implement in our model, but also implement them in code.

What we learned

We learned a lot about data analysis and statistical inference in this process, the features engineering portion involved a lot of research about sample parameters, data distributions, and signal processing statistics. Machine learning is a very esoteric field and it was fun for all of us to learn even more about the type of models that we used in development, and on the model choices that need to be made such as loss functions.

What's next for Earthquake Prédicteur

The next steps would include refining the model further to increase its overall accuracy. In addition, we could experiment with other types of predictive models as opposed to just gradient boosting.

Built With

big-query
catboost
datastudio
jupyter-notebook
python
streamlit

Submitted to

Oregon State Beaverhacks Winter 2022 - Data
- Winner Third Place - Overall

Created by

I worked on researching the machine learning concepts and features as well as performing exploratory data analysis. Afterwards, I worked on the interactive web app. I also provided free compute power with GPU for everyone on the team to use :)

Crypto :)
I worked on the feature engineering research+development and research on the models to use for training and features selection

Jonathan Macias
I worked on implementing feature engineering and model prediction in Python utilizing the libraries Pandas, NumPy, and CatBoost.

Adam Graneto
I am a recent computer science graduate from Oregon State University, where I did a focus area in data science.

Updates

Crypto :) started this project — Jan 09, 2022 04:25 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.