Towards global-scale species distribution models

What we do

In our project we aim to 1) develop a generic approach to obtain representative and unbiased species distribution models (SDMs) from publicly available global presence-only data from IUCN (range maps) and GBIF (point records) and 2) apply the SDMs obtained to identify species-specific spatial bias in the GBIF data.

Case: freshwater fish

We select freshwater fish as case study species, because dispersal barriers tend to be more important for fully aquatic than for terrestrial species, hence there is a clear need to account for dispersal barriers when delineating background sampling regions in the construction of SDMs. We intend to establish global-scale SDMs for ~6,000 freshwater fish species with range maps available through IUCN (www.iucnredlist.org).

Our approach

We take a hierarchical approach to species distribution modelling in which the occurrence probability of a species results from a set of nested environmental filters. We first model the potential distribution of each species as function of large-scale climate variables, using IUCN range maps to retrieve species presence records and sampling pseudo-absences from the surrounding biogeographic realms. Next, we model the actual distribution of each species by relating point records from GBIF to fine-grained, local habitat characteristics. Here we sample pseudo-absences from the same freshwater ecoregions and we account for spatial bias by including a sampling bias grid according to the target-group appraoch. Finally, we will apply the SDMs to project the probability of occurrence of each species onto a map, using species-specific thresholds to translate the probabilistic output to binary maps (presence and absence). We will then confront the projected range maps with the full set of GBIF records available for each species, in order to identify areas that are environmentally suitable for the species but not yet represented in GBIF.

Challenges

Data quality issues pose a main challenge for species distribution modelling based on GBIF data. In order to optimize between the quality and quantity of the records, we will try different selection criteria in terms of spatial accuracy and temporal coverage and evaluate how this affects the number of records available per species and the goodness-of-fit and ecological plausibility of the resulting models.