Web Application: http://ichack.tmikey.tech/

Introduction

Our Aim

To use statistics and software engineering to understand climate change and its impacts on our world. We look at extreme distributions of maximum and minimum temperatures, and maximum precipitation across the whole of the US. This has important applications to disaster prevention, agricultural modelling and climate change research.

Allow the public to know more how climate change affects their lives.

Enable researches to present their results in an easy way

Scientific Platform

  • Flexible and scalable statistical machine learning models: Autoregressive time series models, Bayesian Additive Regression Trees (BART) and geostatistical Gaussian processes providing temporal, spatial and spatiotemporal (!) predictions

Computational Platform

  • The model training was done on a 384 GB AWS EC2 server over 30 years of data

  • The "Maths NextGen Compute Cluster" provided by the Department of Mathematics, Imperial College London. "HPC (High-Performance Compute) cluster of 34 Linux computer servers providing 340 processors plus additional servers for test & development purposes ... of short or lengthy computation jobs, either singly or in parallel."

  • Time series modelling: We embraced the concept of parallel programming to build an ARIMA(p,d,q) model at all the spatial knots individually. This computation scaled the computation from half-an-hour to a couple of seconds.

  • Spatial modelling: Since the computational complexity of BART is almost O(n), we easily fit the whole spatial domain in 1 model using Markov chain Monte Carlo and do inference using Monte Carlo integration.

  • Spatiotemporal modelling: We used computationally intensive inference techniques via Gaussian processes, and these were run on the EC2 and the Maths NextGen Compute Cluster. In Statistics, this is known as kriging.

Application Platform

  • Webpage hosted on another EC2 instance, together with MongoDB. The same server has RStudio server running to enable rapid prototyping.

Data Platform:

  • Latest satellite imagery and data scraped from Google Earth Engine Python API. Exploratory data analysis completed in Google Earth Engine the Javascript platform.

Discussion

  • Due to time constraints, we were not able to tune our advanced models and so if given more time, we would have focused more on this

  • Naturally if we draw from an unknown distribution F and look at the distribution of maximas and minimas, then by Fisher, Tippett & Gdnenko theorem this follows a generalised extreme value distribution. Furthermore, one can also look at max-stable processes.

Interface

  • Easy to use web application for interactive display of time series and spatial distribution of climate data based on Flask and Plotly

  • Some data could be missing because it is not provided by the data source and the models we build can perform predictions to fill in the missing data

Further Applications

  • The tool is a framework for scientific research and publications since the tools used are lightweight and flexible.

Built With

Share this project:

Updates