What is sampbias?
Sampbias is a statistical method to evaluate and visualise geographic sampling biases in species distribution datasets, implemented as an R package and graphical-user-interface shiny app.
Species occurrence datasets derived from biological collections or human observations are widely used in biological sciences, including ecology, conservation, systematics and evolution. However, such data are often geographically biased, with remote areas being strongly undersampled. Although spatial and taxonomic biases are widely recognised by the scientific community, few attempts have been made to quantify their strength and to discern among different sources of biases. The implications of not considering biases in biodiversity research have not yet been thoroughly assessed, but are likely to be substantial. Therefore, it is advisable that any study dealing with species occurrence data - either carefully validated or directly downloaded - should assess the biases covered by this package.
Sampbias is a method and tool to 1) quantify geographic sampling bias in any user-provided dataset, 2) quantify the biasing effect of geographic features related to human accessibility, such as proximity to cities or roads, and 3) create publication-level graphs of these biasing effects in space.
Sampbias evaluates the biasing effect of geographic features by comparing the statistical distance distribution observed in a user-provided dataset to a simulated distribution expected under random sampling. The method is scale independent, and any multi-species occurrence records can be tested against any set of geographic gazetteers (reliability increases with increasing dataset size). Default large-scale gazetteers for airports, cities, rivers and roads are provided with the package. Species Occurrence data as downloaded from the data portal of the Global Biodiversity Information Facility (GBIF) can be directly used as input data for sampbias. The output of the package includes measures of bias effect, comparison between different gazetteers (e.g. comparing biasing effect of roads and rivers), different taxa (e.g. birds vs. flowering plants) and different datasets (e.g. specimens vs human observations).
The results of sampbias can be used to identify priority for further collection or digitalisation efforts, provide bias surfaces for species distribution modelling, or assess the reliability of scientific results based on publicly available species distribution data.
Sampbias thus offers an efficient, largely automated means for biodiversity scientists and non-specialists alike to further explore species occurrence data.