Rxivist logo

sampbias, a method for quantifying geographic sampling biases in species distribution data

By Alexander Zizka, Alexandre Antonelli, Daniele Silvestro

Posted 14 Jan 2020
bioRxiv DOI: 10.1101/2020.01.13.903757 (published DOI: 10.1111/ecog.05102)

Georeferenced species occurrences from public databases have become essential to biodiversity research and conservation. However, geographical biases are widely recognized as a limiting factor that could severely affect usefulness of such data for understanding species diversity and distributions. In particular, differences in sampling intensity across a landscape due to differences in human accessibility is ubiquitous but may differ in strength among taxonomic groups and datasets. Although several factors have been described to influence human access (such as presence of roads, rivers, airports and cities), quantifying their specific and combined effects on recorded occurrence data remains challenging. Here we present sampbias, an algorithm and software for quantifying the effect of accessibility biases in species occurrence datasets. Sampbias uses a Bayesian approach to estimate how sampling rates vary as a function of proximity to one or multiple bias factors. The results are comparable among bias factors and datasets. We demonstrate its use on a dataset of mammal occurrences from the Indonesian island of Borneo, showing a high biasing effect of cities and a moderate effect of roads and airports. Sampbias is implemented as a well-documented, open-access and user-friendly R package that we hope will become a standard tool for anyone working with species occurrences in ecology, evolution, conservation and related fields.

Download data

  • Downloaded 748 times
  • Download rankings, all-time:
    • Site-wide: 45,103
    • In ecology: 997
  • Year to date:
    • Site-wide: 132,178
  • Since beginning of last month:
    • Site-wide: 166,813

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide