The Primary Biodiversity Information Species Index


Our goal is to assess knowledge gaps for Primary Biodiversity Data (PBD) at the species level, for the ~35000 tetrapod species. This will bring a new perspective to explore PBD, by developing a metric that can help ecologist, conservationists and policy makers to find the types of information available at the species level. Furthermore it will be possible to explore PDB gaps at different taxonomic levels. For example, we may need to know how much do we know on PBD data for amphibians assessed as vulnerable to climate change. To have a simple way to scan which are the types of occurrence records and the temporal coverage can be crucial for the development of conservation science to develop conservation policies. The PDB Species Index (PDB-SI) will incorporate different metrics such as the number and types of records available, their temporal coverage, and the number of sampling locations for a given species . The main product will be a visual interactive landscape of the PBD-SI for mammals, birds, reptiles and amphibians liked to GBIF or in the GBIF site.

What is new to compared to what has been already explored on PBD gaps?

The development the PBD Species Index will provide a new way to explore data gaps and opportunities that will add new information and a different scale of analyses to the geographical gap analyses. While the assessment of geographical data gaps on biodiversity information has been one of the most explored topics on macro-ecology and conservation (Coellen et al. 2008, Boakes et al. 2010, Ammanot et al. 2016), the exploration of PBD gaps at the species level has not been even nearly explored. There are few recent analyses on tetrapod’s PBD, which show that there has been more accumulation of knowledge for birds and mammals, followed by reptiles and amphibians. For example Meyers (2015), shows that the percentages of species that have global numbers of at least 50 sampling locations, which is the minimum number required for SDMs (Boitani et al. 2011, Feeley et al. 2011 and Wisz et al. 2008). They found that approximately 54.9, 79.2, and 91.3% of the species with records in GBIF reach this number for bird, mammals and amphibians respectively. Additionally Meyers et al. (2016) analyzed tree metrics of occurrence: record count per species, the coverage of a range with records and the geographical bias in how the records represent different range parts. They show that these three metrics revealed severe species level biases. Moreover, the limitations of data were mainly linked to the species range size and shape, and the within-range geography of socio-economic conditions (Meyers et al. 2016). However there is still much more to explore on data gaps at the species level, such as the temporal extent of occurrences, the types of records available, and its representation across time.

How we built it

Already we have compiled the GBIF number of records for all the tetrapods species and made a first tree map visualizations with R (Figure 1). The next step is to develop a PDB Species Index (PDB-SI) to incorporate different occurrence metrics to summarise different aspects of the types of records across all tetrapods. The framework we propose to create the index is based on different atributes of GBIF data: such as the number of records per species, the type or record (i.e the Bases of records, such as “fossil”, “human observation”, etc.), if the species has at least 50 sampling locations globally, and the temporal coverage. Based on this framework we plan to do an expert workshop with GBIF data users, together with the GBIF core team, and experts that range from conservation NGOs up to multilateral institutions such as IUCN. The aim of the workshop is to define an index that shows in it simplest form the types PBD that has been accumulated per species across time. This will allow GBIF users and developers to see the progression of the data collection and digitalisation per species in a temporal form. Additionally with the index it will be possible to determine for which taxa, such as threatened species, or invasive species, which is the extent of the PBD data available temporary and by types of records. Moreover, our goal is to be able to explore how the PBD-SI compares with other areas of knowledge for each species, such as genetic information or demographic traits and data (see Figures 2 and 3).

The index will be build using GBIF data and it will be computed with R. To link between the PBD-SI, database with other species level indexes we will use the COL ids, which have been already retrieved with taxize (Chamberlain adn Szöcs 2013 ). The visualizations to make them interactive will be done using R and Java script.

Challenges we ran into

The main challenge has been the taxonomic standardisation and the retrieval of GBIF and Catalog of Life ids for all the species. Although we have done most of it with automatically with taxize ( Chamberlain adn Szöcs 2013 ) we still needed some manual corrections. So one of our goals is to support the further development of taxize with the aim to merge data from GBIF with IUCN Red List, COL and GenBank and the indices we already developed or are developing.

Accomplishments that we're proud of

We have mapped the number of PBD occurrences for all the tetrapods species in GBIF (see Figure 1). We are proud that we already developed a Demographic Index of Knowledge; DISKo (Figure 2) and that we are in the process to finish ZooIndex and develop GenIndex (Figure 3). Note for DISKo in Figure 2 we show the knowledge across Bird's families for fertility and survival, although we have already done it for Mammals, Amphibians and Reptile by using 24 life history traits and demographic data repositories.

What we learned

We learned that the process of developing a species knowledge index for a particularly area such as Demographics, Zoo information or Genetic information, it does not only relies on the technical and computational expertise. The collaboration and exchange of ideas with different key participants that include the database developers from which we extract the data, taxon expert groups, modelers and data users is essential.

We as well have learned through the exploration of GBIF, that the PBD covers 47 % of all tetrapod species. Per class GBIF has information for 41% of Amphibians, 53% of Birds, 50% Mammals and 43% of Reptiles. Through the development of this index we will learn more about the types of records available and have a simple measurement to explore its potential to study each of the tetrapod's species.

What's next for Species Index of Primary Biodiversity Information

We will compare the PBI-SI metrics together with DISKo, which stands for Demographic Index of Species Knowledge (Conde, etal Figure 2), which is a metric of the extent of data on fertility and survival for 99.2% of the described tetrapod species (as listed by the Catalogue of Life). Additionally we will be able to compare it with the ZooDataIndex (Silva et al. in prep) and the Genetic Knowledge Species Index (Staerk et al. in prep Figure 2). The ZooDataIndex is a metric of the individual data to estimate demographic traits for tetrpod’s species. These data is being standardized and collated by the Species360 organisation, across more than 1000 zoos since 1975. Additionally the Genetic Knowledge Species Index that we are currently developing is a metric of the availability, the number and the types of sequences in GenBank and the species in Genome 10K. We believe that the publications of these four indices through GBIF and the possibility of a new perspective to visualize and explore the PDB-SI, will allow policy makers, conservationists, founding institutions to further prioritise the collection and digitalization of data for particular species of interest. For example it will be feasible to visualize the extent and types of PBD data available for particular group of threatened taxa. Furthermore, it will be possible for users to easily determine for which species there is PBD records collected prior to major alterations of native landscapes. This is because those records provide the ideal data to measure changes in biodiversity over time (Graham et al. 2004) .We believe our project will be complementary to studies of geographical PBD gaps and will further advance the exploration of PBD at the species level.

To publish through GBIF the PBD-index, DISKo and GenIndex, will allow ecologists and conservationists to see which is the extent of data across some of the key axes of information for conservation, which can be classified as the species knowledge on its habitat (based on PBD), on the demographics, and the genetics. This information will contribute to one of the key tasks of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystems Services (IPBES). Which focuses on the identification of biodiversity knowledge gaps with the aim to catalyse efforts to generate new data. With the PBD-SI, it will be possible for IPBES to assess which types of data are available for particular species, and for which periods of time. The main product will be a visual interactive landscape of the PBD-SI for mammals, birds, reptiles and amphibians. In Figure (1) we show a representation of one of the possible ways to visualize the PDB-SI, in this case we only show the number of occurrences per species, since the estimation of the index we want to make with different experts and the core GBIF team. Additionally we show a preliminary exploration of GenBank (Figure 2) and DISKo (Figure 3, were we show the DISKo for birds). One of our goals is to provide visualization to link species occurrence data with other types of data, such as species traits and genetics that will enhance works flows in biodiversity research and conservation. These will contribute towards developing further collaborations for biodiversity assessments such as the proposed by Guralnick et al. 2007.


Amano, et al. "Spatial Gaps in Global Biodiversity Information and the Role of Citizen Science."BioScience 66 (5): 393-400 (2016).

Boakes, E. H. et al. Distorted views of biodiversity: spatial and temporal bias in species occurrence data. PLoS Biol. 8, e1000385 (2010).

Boitani, L. et al. What spatial data do we need to develop global mammal conservation strategies? Philos. Trans. R. Soc. Lond. B. Biol. Sci. 366, 2623–2632 (2011).

Chamberlain, S A., and Szöcs E. "taxize: taxonomic search and retrieval in R." F1000Research 2 (2013).

Collen, B., et al. The tropical biodiversity data gap: addressing disparity in global monitoring. Tropical Conservation Science 1, 75-88, doi:citeulike-article-id:7270039 (2008)

Feeley, K. J. & Silman, M. R. Keep collecting: accurate species distribution modelling requires more collections than previously thought. Divers. Distrib. 17, 1132–1140 (2011).

Graham CH et al. New developments in museum-based informatics and applications in biodiversity analysis. Trends in ecology & evolution. 30;19(9):497-503. (2004)

Guralnick RP et al. Towards a collaborative, global infrastructure for biodiversity assessment. Ecology letters. Aug 1;10(8):663-72 (2007) Meyer, C. et al. Range geometry and socio‐economics dominate species level biases in occurrence information. Global Ecology and Biogeography, 25(10), 1181-1193 (2016)

Meyer, et al. "Global priorities for an effective information basis of biodiversity distributions." Nature communications 6 (2015).

Wisz, M. S. et al. Effects of sample size on the performance of species distribution models. Divers. Distrib. 14, 763–773 (2008).

Built With

Share this project: