The Global Biodiversity Information Facility (GBIF) facilitates access to over 13,233 species occurrence datasets, collectively holding more than 570 million records. GBIF dataset pages are important access points to GBIF-mediated data (e.g. via DOIs) and currently show dataset metadata, a map of georeferenced occurrences, some basic statistics, and a paged table of download events. If a user wants to know more about the occurrences a dataset contains, he/she has to filter/page through a table of occurrences or download the data. Neither are convenient ways to get quick insights or assess the fitness for use.
For the 2015 GBIF Ebbe Nielsen challenge, we developed a proof of concept for enhancing GBIF dataset pages with aggregated occurrence metrics. These metrics are visualized as stacked bar charts - showing the occurrence distribution for basis of record, coordinates, multimedia, and taxa matched with the GBIF backbone - as well as an interactive taxonomy partition and a recent downloads chart. Metrics that score particularly well are highlighted as achievements. Collectively these features not only inform the user what a dataset contains and if it is fit for use, but also help data publishers discover what aspects could be improved.
The proof of concept consists of two parts: 1) an extraction and aggregation module to process GBIF occurrence downloads and calculate, aggregate, and store the metrics for each dataset and 2) a Google Chrome extension, allowing you to view these metrics in context on the GBIF website.
For the 2015 GBIF Ebbe Nielsen Challenge - Round 2, we added a sample of the images referenced in (the occurrences of) a dataset. Together with the multimedia bar and achievement, it highlights the currently undervalued multimedia richness of some datasets. We also improved our extraction and aggregation module to process all GBIF occurrences on the Amazon EC2 infrastructure and are now able to provide metrics for all GBIF occurrence datasets. We strongly believe however, that the functionality of our proof of concept - if considered useful - should be implemented on the GBIF infrastructure. For our motivation on this, including its challenges and opportunities, see our feedback to the jury comments.
You can use the extension by installing it from the Chrome web store and going to a GBIF dataset page. Don't want to install the extension, but just see a preview? See this demo page. Source code and documentation for all the features are available in this repository, which is also attached as a zip file.