Investing in The Best Case Scenario

Inspiration

Humans have been attempting to break free from gravity and leave the Earth's surface for thousands of years. Since the end of the 20th century and the Cold War space race, the whole space industry has been growing rapidly. Today, when all technologies are growing at a rapid rate, we are still facing a big problem: How should we expand space exploration? Like all industries, the space industry's resources are scarce and limited, and a misallocation of resources and capital can slow down the development. That's why we, inspired by the dataset, seek to offer the best insights for resource distribution decisions to maximize efficiency and expand the space industry.

What it does

Our project includes two parts. We divided the task into finding the industries that have the most impact on the space economy and on employment.

How we built it

For the first part, we used four different machine learning models (Ridge, ElasticNet, PLS, BayesianRidge, and XGBoost) to simulate the space economy growth pattern. For each model, we calculate the importance of each feature (industry), and then we calculate a weighted average of the importance of each industry based on how precise the model is. The more precise the model, the more the rank of its feature importance is weighted in the final rank. We obtain the ranking of prediction contributions to the target across multiple models.

Challenges we ran into

The data cleaning part is the most challenging step for us. We did both manual cleaning in Excel and cleaning within our code using numpy. Also, the data on compensation didn't include the inflation impact, so we used the economy method in Excel to obtain a more precise dataset. Also, the dataset is very sparse and has 12 data points. This challenges us when we pick the machine learning model we use. For example, the XGBoost algorithm didn't produce anything because the dataset is too small, so the decision tree didn't split at all. For this, we added more regression models and adjusted the regulation strength to avoid over- or under-fitting.

Accomplishments that we're proud of

We derive our insight from this sparse dataset and find a pattern in it. And we use time series cross-validation within a relatively small dataset to secure the precision of our model.

What we learned

We learned how an analysis starts from just raw data and how to turn it into actually useful information for decision-making and resource location. Also, during the model selection, we learned the basic logic underlying each model so we have a better and deeper understanding of their characteristics.

What's next for Investing in The Best Case Scenario

Adding more data, we can definitely find a more precise and stronger correlation between what's happening in different industries and the growth of the space economy. And on the other hand, we aim to produce a

Built With

datascience
numpy
pandas
pyplot
python
sklean
squarify
xgboost

Updates

Jinghan He started this project — Sep 21, 2025 10:59 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.