Weatherboy

Original Proposal

Project Idea: Short Description Our project is a modified reimplementation of the 2021 paper by Civitarese, et al. -“Extreme Precipitation Seasonal Forecasting Using a Transformer Neural Network” (link: https://arxiv.org/abs/2107.06846)

The paper utilizes a temporal fusion transformer (TFT) - a transformer architecture used for multi-horizon forecasting - and adapts it, for the first time at publication, to forecast extreme weather. In modelling time series data, the TFT performs especially well because it incorporates specialized components to identify relevant features, and gating layers to suppress less relevant features.

Variables used in the paper include temperature and soil moisture - both historical in that they have varied with time - and location and altitude - a static variable (unchanging with time). Florida and Rio de Janeiro were the geographical sites. To add novelty and complexity to our reimplementation, we plan to use data from different geographical locations, and perhaps modify independent variables (e.g. use only soil moisture.)

We use normalized quantile loss (q-risk) for validation testing. At a high level, quantile loss measures the accuracy of a model’s quantile predictions (i.e. whether a value will be in the 10th, 50th, or 90th quantile). Q-risk is quantile loss averaged across all time points. One reason q-risk is more appropriate than, for instance, MSE error, for a forecasting problem like ours, is that the future is inherently noisy, and we value distributional accuracy over point accuracy. It also allows us to apply asymmetric penalties, which reflects the real-world problem that, for instance, underestimating the possibility of extreme weather may be costlier than overestimating it for the purposes of staying prepared. (i.e. when calculating q-value for the 90th percentile, too-low values are penalized more harshly than too-high values.)

The paper compares the TFT model to two simpler, standard benchmark models: the climatology model and the S5 model (see paper for more information). The TFT does notably better for quantile 0.9 in Rio and Florida respectively (1.08% and 3.70% difference from climatology, 29.54% and 41.87% from S5). Since we’re using a different dataset and possibly less compute than the authors, a preliminary goal is to aim for a 1% increase from the climatology model, and a 20% increase from the S5 model.

What are some key limitations you anticipate facing when working on this project? Since the dataset used in this paper only extends up to 2019, there may be uncertainties regarding how well the model performs on more recent data. Therefore, a data exploration step should be conducted to identify any significant distributional shifts in newer datasets. In addition, several architectural components of the Temporal Fusion Transformer (TFT) are not fully explained in the paper. For instance, it is unclear how the model embeds categorical and continuous variables, and how the distances between the resulting embedded vectors can be interpreted — similar to how, in language models, opposite meanings often correspond to opposite directions, and vector arithmetic can reveal relational patterns. Furthermore, the variable selection step is described only briefly, leaving questions about how “relevance” is determined and which features are prioritized during training.

Project Data Ideas

ERA5: https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5 ERA5 provides hourly estimates of a large number of atmospheric, land and oceanic climate variables. The data cover the Earth on a 31km grid and resolve the atmosphere using 137 levels from the surface up to a height of 80km.

CHIRPS v2: https://www.chc.ucsb.edu/data Climate Hazards center InfraRed Precipitation with Station data (CHIRPS v2) is a 30+ year quasi-global rainfall data set. Spanning 50°S-50°N (and all longitudes), starting in 1981 to near-present, CHIRPS incorporates 0.05° resolution satellite imagery with in-situ station data to create gridded rainfall time series for trend analysis and seasonal drought monitoring.

The original paper analyzes data from Florida, USA and Rio de Janeiro, Brazil. However, the data sources above provide us information for multiple locations around the world.

Further Modifications

(1) Systematic Component Ablation and Substitution. We will perform systematic ablation studies on the paper’s model architecture, seeking to determine which parts of the model drive performance the most. Performance is defined both as overall accuracy and ability to predict certain relationships particularly effectively. We will construct an ablation table to quantify the efficacy of each component. Then, we’ll aim to improve on our model through architecture substitution and hyperparameter optimization. This can take the form of: Improving high-impact components even more with hyperparameter optimization, or through substitution. For instance, this article describes 5 alternative self-attention architectures: https://medium.com/@dr.teck/efficient-alternatives-to-transformer-self-attention-397851f324ab) Improving low-performing components through hyperparameter optimization or algorithm substitution. Attempting to make the model less computationally expensive without compromising on performance significantly, by replacing or removing components. e.g. by replacing LSTM modules with GRUs, or by reducing layers or encoder-decoder blocks.

(2) Introducing an Additional Independent Predictive Variable. To extend the model to handle additional weather phenomena or longer forecast horizons, we will include additional independent predictive variables. Rather than solely relying on the paper’s inputs, we will expand the feature space by incorporating other variables, like a hydrometeorological predictor variable or incorporating aspects like temperature, dew temperature, soil moisture, etc. These additions may increase the accuracy of our model as the model can choose the significance of the additional information and incorporate it as it sees fit. This means that phenomena like warm or cold fronts or humidity are now accounted for by the model and can be considered for predicting extreme precipitation. Overall, this changes amounts to increasing the overall scope and size of the data for our predictions.

(3) (Optional, Time-Permitting) Physics-Informed Machine Learning. Finally, we will explore a lightweight physics-informed modification. Specifically, we will augment the original loss function L(Ω,W) used in the paper—which is based on the summed quantile loss—with physically meaningful constraints derived from the water-balance equation: $$P−E−R \approx ΔW$$ , where P = precipitation, E = evapotranspiration, R = runoff, and ΔW = change in soil water (computed from soil-moisture time series). We will incorporate this physical knowledge by adding a regularization term of the form:

$$ L_{water} = \sum_t (P_t - E_t - R_t - \Delta W_t)^2$$

where E and R are obtained from external hydrological datasets and ΔW can be estimated directly from the existing soil-moisture data. In addition, we will encourage spatial smoothness in the predictions by adding a second regularization term:

where N(i) denotes the set of nearby grid points around location i. $$L_smooth =\sum_i \sum_{j \in \mathcal{N}(i)}(\hat{y}_i - \hat{y}_j)^2$$ where λ1 and λ2 control the strength of the physics and smoothness constraints.