Inspiration

"How much wood would a woodchuck chuck if a woodchuck could chuck wood?" We have all heard of the tongue twister. BUT, no one has ever actually tried to ANSWER it with data science.

Could we predict wood-chucking capacity 500 years into the future? Should we? Would the woodchucks approve? We decided the answer to all three questions was "absolutely."

What it does

An interactive heatmap with a slider that predicts the hotspot of woodchucking by Pennsylvanian woodchucks in 500 years. Also comes with a "chuck graph" that portrays the prediction of total wood chucked at an area for the year, and "log rain" (literally rain logs to visualise the amount of wood chucked by woodchucks).

How we built it

Data Sourcing

  • GBIF - Occurrence data for Marmota monax (groundhogs)
  • PENNSYLVANIA GAME COMMISSION BUREAU OF WILDLIFE MANAGEMENT ANNUAL PROJECT REPORT - Harvest per 100 hunter days by species (woodchuck)
  • Census.gov - Population density in PA per county
  • Forest Service U.S. Department of Agriculture NFI data - Wood density

Data Cleaning Using the Python Pandas library to clean the datasets and combine all the datasets into one CSV file.

Prediction (i) Random Forest Regressor (SCRAPED)

  • Using the machine learning model to predict the amount of wood chucked
  • The algorithm worked, but required too much computing power and isn't as accurate as we need

(ii) Exponential Growth Projection with Stochastic Variation

  • Use Polars to predict the trajectory of the amount of wood chucked
  • A CSV file is generated with all the data needed for our website

Web App

  • Developed with HTML, CSS, and JavaScript to visualise our data

Challenges we ran into

The Small Data Problem With only 2-7 observations per location, traditional time series models (ARIMA, SARIMAX) were completely unusable. We had to get creative with how to factor in uncertainties in real life to get data that are as accurate as possible.

500-Year Extrapolation Predicting half a millennium into the future with single-digit observations is... ambitious. We had to balance generating interesting predictions while acknowledging the massive epistemic uncertainty involved with tiny datasets.

Visualisation Performance Animating 500 years × 200+ locations with smooth transitions required optimisation. We had to balance visual appeal with computational efficiency.

Accomplishments that we're proud of

We actually answered the question (sort of) After all this work, we proved the tongue twister was right: woodchucks chuck as much wood as they COULD chuck. The limiting factor is capacity itself (it doesn't chuck wood).

Made data science fun We took a ridiculous premise and executed it seriously enough to be impressive but playfully enough to be entertaining.

Working end-to-end system From raw data to interactive visualisation, everything works. Data pipeline, forecasting methods, and polished presentation.

Honest about uncertainty We didn't pretend our 500-year predictions are accurate. We embraced the chaos and made uncertainty part of the story.

What we learned

Technical

  • How to handle forecasting with extremely limited data per group
  • Stochastic modelling and Monte Carlo simulation techniques
  • The difference between prediction and scenario exploration

Meta

  • You can take a silly premise seriously and have both technical rigour AND humour
  • Constraints breed creativity (our small dataset forced innovative solutions)
  • Uncertainty isn't a failure, but information

Of course, About woodchucks

  • They don't actually chuck wood
  • But if they did, it would vary significantly by location and external factors (like motivation, maybe)
  • The tongue twister contains more wisdom than we initially thought

What's next for "How much wood can a Woodchuck chuck?"

  • MORE DATA
  • Expand to other states/regions for comparative analysis
  • Add real-time climate data integration to improve environmental factors
  • Get cited in an actual woodchuck research paper
Share this project:

Updates