Inspiration

After discussing how the markets might behave in conjunction with the environment, we realized there would be a serious issue decomposing what is responsible for market changes. Our exploratory analysis confirmed that many of our environmental metrics worsened over time while the market grew; however, these trends were not necessarily systemically related. The goal of our analysis became how do we expose the underlying relationship between the environment and market behavior in absence of the growth of the US economy.

What it does

Our project takes in environmental, historical ticker, and GDP data and provides a prediction of stock performance while offering insight into how these factors independently influence the market.

How we built it

We first trained a linear model on stocks using GDP in order to capture the structural changes in the market. We chose a linear model because it achieved an adjusted R-squared of .77 and there were no indications a more flexible model was needed. The residuals of this model were used as the response in the second model as the second model looks to explain the remaining variance in the market performance.

For the second model, we used K-Nearest-Neighbors to model the relationship between the remaining volatility in the market and the principal components of the environmental factors. We chose this model because, out of all the models tested, it achieved the best out of sample RMSE.

Lastly, our predictive model used the market performance as the response and the environmental factors and GDP as predictors. This model was also K-Nearest-Neighbors as it achieved the best out of sample RMSE. All models were implemented and tuned with MLR3.

Challenges we ran into

The first major problem we ran into was collinearity between environmental degradation and growth of the markets over the observed time frame. We felt any assertions surrounding causality would be unfounded if structural changes in the market were not adjusted for. We addressed this by using two models, with the second model trained on the residuals of a model that used GDP as the sole feature. This way, the second model could only explain that variance in the response is not associated with the structural economic changes.

Our analysis was fundamentally limited by the number of market observations we had. This space was constrained because we were normalizing with annual GDP so we only took market samples from single years. However, we had a large feature space of environmental factors. To address this problem, we used principal component analysis as a dimension reduction technique to get three principal components for our environmental features.

Accomplishments that we are proud of

As a team, we were able to adapt to and use technologies that were foreign to us at the beginning of the datathon and come up with a working model, video, and write-up!

We are also proud of how we framed the question in a way that offered more novel insights into the relationship of the environment with the markets as opposed to just using black-box modeling techniques.

What we learned

Throughout the process, we gained knowledge on the different sectors in the United States and how they were captured through different ETFs. Additionally, we researched the various environmental variables offered in the dataset and chose ones that signified a significant effect on the environment as a whole. Working on this project also tested our ability to balance a macro and micro viewpoint while adapting to any limitations that came up throughout the process.

What's next for Let's Stock About the Environment!

Our project can be extended to sector-specific ETFs. We ran our models on US-based sector ETFs and had promising results for XLE (energy ETF) and XLB (materials ETF). We are really excited about these because the preliminary models achieved better RMSE than the whole market. Additionally, further automation could streamline the process for a non-technical user.

Ticker RMSE (using KNN)
VTI (total market) 12.6
XLE (energy market) 7.0
XLB (materials market) 3.7

Built With

  • mlr3
  • powerbi
  • r
  • rstudio
  • snowflake
  • sql
Share this project:

Updates