Rxivist logo

Enhancing georeferenced biodiversity inventories: automated information extraction from literature records reveal the gaps

By Bjørn Tore Kopperud, Scott Lidgard, Lee Hsiang Liow

Posted 17 Jan 2020
bioRxiv DOI: 10.1101/2020.01.16.908962

Aim: We compare and combine data from a public georeferenced biodiversity database and from the published literature in order to identify strengths and gaps in global marine biogeographic knowledge. Using these data, we estimate the latitudinal species diversity distribution for a commonly occurring but under-studied clade, cheilostome Bryozoa, which has long been hypothesized to show a non-canonical latitudinal diversity gradient (LDG). Location: Global. Major taxa studied: Cheilostomata, Bryozoa, with around 5000 described extant species. Methods: We use natural language processing (NLP) to retrieve location data of cheilostome species (text-mined occurrences [TMO]) in an automated procedure. We compare and combine these results with data from the Ocean Biogeographic Information System (OBIS). Using OBIS and TMO data separately and in combination, we present latitudinal species richness curves using standard estimators (Chao2 and the Jackknife) and range-through approaches. Results: Our combined OBIS and TMO species richness curves quantitatively document a bimodal global latitudinal diversity gradient for cheilostomes for the first time, with peaks in the temperate zones. 79% of the georeferenced species we retrieved from TMO (N = 1780) and OBIS (N = 2453) are non-overlapping and underestimate known species richness, even in combination. Main conclusions: Despite clear indications that global location data compiled for cheilostome bryozoans should be improved with concerted effort, our study supports the view that latitudinal species richness patterns deviate from the canonical LDG. Moreover, combining online biodiversity databases with automated information retrieval from the published literature is a promising avenue for expanding taxon-location datasets.

Download data

  • Downloaded 413 times
  • Download rankings, all-time:
    • Site-wide: 93,071
    • In ecology: 2,542
  • Year to date:
    • Site-wide: 45,186
  • Since beginning of last month:
    • Site-wide: 137,581

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide