Rice Datathon 2024

What it does

Our project uses advanced data analytics and machine learning techniques to predict the peak oil production rate for wells in unconventional assets. By accurately forecasting these predictions, this will lead to the optimization of planning and resource allocation which will lead to more efficient and sustainable oil extraction practices.

How we built it

We started with the complex dataset that was given. After the initial (EDA) exploratory data analysis to identify patterns, correlations, and anomalies, we cleaned the data and handled missing values(NaN) and outliers. We then engineered relevant features and selected the most impactful ones using techniques like VIF and correlation analysis. For model development, we tested and ran multiple models such as XGBoost, EBM, Random Forest, etc. We found that Random Forest had the best performance. In addition, we optimized and fine-tuned the model using cross-validation and hyperparameters using Optuna, VIF, correlation, and feature clustering.

Challenges we ran into

One of the significant challenges we faced was addressing the numerous null or NaN values in the dataset. Handling these missing values required careful consideration to ensure the integrity and reliability of our analysis. Additionally, as our team initially lacked background knowledge in the oil and gas sector, comprehending the complexities of the industry and the nuances of the data posed a substantial learning curve. We had to quickly ramp up our understanding of the domain to make informed decisions during the data preprocessing and model development stages.

Accomplishments that we're proud of

We are particularly proud of developing a model that not only predicts with high accuracy but also provides insights into the key factors affecting oil production rates. Our success in constructing a robust predictive model is a testament to our team's analytical and problem-solving skills. Additionally, we take great pride in our steep learning curve throughout this challenge. We were able to immerse ourselves in and adapt to a real-world-like dataset, emulating the complexities and intricacies typically encountered in the oil and gas industry. This experience has significantly enhanced our practical understanding of data science applications in real-world scenarios.

What we learned

Through our participation in the Rice Datathon 2024, we gained invaluable insights and knowledge. We learned advanced techniques in handling and interpreting complex, real-world-like data sets, especially those prevalent in the oil and gas industry. Our experience with extensive null or NaN values deepened our understanding of effective data cleaning and preprocessing strategies. Additionally, the challenge of having no prior background in oil and gas pushed us to quickly assimilate domain-specific knowledge, enhancing our ability to apply data science techniques in industry-specific contexts. This datathon has not only improved our technical skills in predictive modeling and feature engineering but also sharpened our abilities to learn and adapt swiftly in data-rich, real-world environments.

What's next for Rice Datathon 2024

Moving forward, we aim to refine our model further by exploring more sophisticated algorithms and incorporating real-time data processing capabilities. We also plan to delve deeper into predictive analytics to extend our model's applications to other aspects of resource management and sustainability in the energy sector.