biosurvey: Biological Survey Planning Considering Hutchinson’s Duality - rstats-gsoc/gsoc2020 GitHub Wiki

Background

One of the challenges in biodiversity conservation is to complete an inventory of existing species in the world, at fine scale. This is in light of the increasing number of threats to global biodiversity due to anthropogenic activities. Although various developed countries have already planned how to accomplish this task and have obtained good results, many developing countries are still in the initial phases of this titanic challenge. Among multiple limitations for inventorying biodiversity, the complications related to survey planning processes are highlighted. These complications derive not only from economic capacities but also from the difficulty of designing a survey system that warranties registering most of the species in a region. Registering most taxa in a planned survey system is challenging because by looking only at the geographic configuration of an area not all relevant factors for species distributions are seen.

Species distributions depend on complicated relationships between accessible areas, environmental conditions, and biotic interactions (Soberón and Peterson 2005). As planning a survey system aims to register species in a region, biodiversity interaction can be overlooked in this case. However, the relationship between environmental conditions and the geographic configuration of an area is of crucial importance when trying to identify key sites for biodiversity surveys. The relationship between these two spaces (environmental and geographic) has been called Hutchinson’s Duality (Colwell and Rangel 2009) and although it has been somehow overlooked, it plays an important role in various aspects of distributional ecology.

Related work

We are aware of two R packages, BiodiversityR and WhereNext, that have relatively similar functionalities. In BiodiversityR function spatialsample allows establishing sites for surveys of biodiversity using methods that locate sites randomly or based on a grid or random grid (Kindt and Coe 2005). WhereNext is an R package that implements interesting statistical analyses to identify places where surveys can be performed (Velásquez-Tibatá 2019). The main goal of WhereNext is to show users where a given region presents novel environmental conditions considering existent information about previous surveys for a biological group of organisms. This package is based on generalized dissimilarity modeling (Ferrier et al. 2007) which allows identifying how similar are previously sampled areas to areas that have not been sampled.

The approach that I want to implement differs from existing packages in that it allows users to explore both geographic and environmental characteristics of a given region, and use these configurations to select sites for future surveys. Exploring the environmental space of a region allows seeing how common or rare some environments can be in an area. Exploring geography helps in identifying how related are distinct sites in the region, as sites with similar environmental conditions may or may not be close to each other. My proposed package will explicitly explore the Hutchinsonian Duality to help in selecting sites that could represent better most environments in the region and similar environments in distant areas. By doing so, common and rare species are most probably to be included in surveys. None of the existing R packages or other available software performs these tasks.

Details of your coding project

We expect the development of functions to perform the analyses listed in the following modules. We like the idea of presenting the package in a modular way, so it can be used more properly and facilitates the use of these functions or modules in further implementations. Most functions will require the development of small functions to make the package more modular. All functions, including the small ones, need to be documented and at least one vignette for the package will be required.

Data preparation module

In this module, functions will allow users to prepare data for further analyses. Initial data will generally be raster layers representing the environmental space of a region and spatial polygons representing such a region. The functions will help to:

  • Prepare an S3 object that will serve as the base to perform all further analyses.
  • Prepare a presence-absence matrix (PAM) using tables of species records.

Analysis module

The functions of this module are the core of the package. All analyses described in this module will be done using the initial S3 object prepared with functions from the previous module. The tools created here will help in performing the analyses listed below:

  • Regionalization of a bi-dimensional environmental space.
  • Selecting survey sites by an overdispersion of points in environmental space only.
  • Selecting survey sites based on an overdispersion of points in geography only.
  • Selecting survey sites randomly.
  • Sampling relevant blocks in environmental space based on a user-defined criterion.
  • Selecting survey sites based on overdispersion of points in environmental space but considering geographic clusters of areas with similar environmental characteristics.

Testing module

In this module, functions will allow users to compare how complete an inventory will be if distinct sets of survey sites are used. This type of analysis will be possible only for regions where previous biodiversity surveys have been performed and have a certain degree of completeness. The tools created here will help to:

  • Create PAMs for survey sites of interest, by reducing larger PAMs for a given region.
  • Compare species richness represented in distinct sets of survey sites.

Plotting module

This module will help in preparing graphic representations of the data, intermediate results, and final results. A particularity of most of these functions is that they will have two components according to the two spaces of our interest—environment and geography. The functions will help to:

  • Produce plots to show all candidate sites to be sampled in a given region, and their configuration in environmental and geographic space.
  • Create plots to represent how the environmental space of a region has been blocked.
  • Prepare plots to show the blocks that have been selected based on a user-defined criterion.
  • Create plots to show sites that have been selected to serve as survey stations based on distinct methods implemented in this package.
  • Produce plots for comparing the diversity of species represented considering sites selected using distinct approaches.

Expected impact

Researchers, conservation planners, and students in fields related to biodiversity inventory and conservation that use R will be benefited by the new tools generated with this project. The set of tools to be created will facilitate simultaneous explorations in environmental and geographic spaces that otherwise will require extensive analyses in distinct software platforms. These tools will help in planning where to locate survey stations based on information that is relevant for all taxa. Users will find in this package an important set of tools that will help them to make informed decisions.

Mentors

Students, please contact mentors below after completing at least one of the tests below.

  • EVALUATING MENTOR: Narayani Barve [email protected] is a biodiversity informatics scientist and was a GSoC student (2014) as well as a mentor (2016-2019) with the R project organization. She developed the package ENMGadgets and has contributed to various other R packages. She has extensive experience working with spatial information.
  • Vijay Barve [email protected] is a biodiversity data scientist that has been a GSoC student and mentor since 2012 with the R project organization. Vijay is the author and maintainer of bdvis and has contributed to several packages on CRAN.
  • Andrew Townsend Peterson [email protected] is a scientist with years of experience in biological surveys and in exploring the relationship of environmental and geographic characteristics of distinct regions of the world (Hutchinson’s Duality). Town has been a mentor in GSoC with the R project organization, and he has contributed to creating several R packages; for instance, kuenm, ellipsenm, and nichevol.

References

  • Colwell, R. K., and Rangel, T. F. (2009). Hutchinson’s Duality: The once and future niche. Proceedings of the National Academy of Sciences 106(2): 19651–58.
  • Ferrier, S., Manion, G., Elith, J., & Richardson, K. (2007). Using generalized dissimilarity modelling to analyse and predict patterns of beta diversity in regional biodiversity assessment. Diversity and Distributions, 13(3), 252-264.
  • Kindt R, Coe R. (2005). Tree diversity analysis. A manual and software for common statistical methods for ecological and biodiversity studies. World Agroforestry Centre (ICRAF), Nairobi (Kenya). ISBN 92-9059-179-X.
  • Soberón, J. and Peterson A. T. (2005) Interpretation of Models of Fundamental Ecological Niches and Species’ Distributional Areas. Biodiversity Informatics 2: 1–10.
  • Velásquez-Tibatá J. (2019). WhereNext: Biological Survey Recommending System Based on General Dissimilarity Modeling. R package version 1.0.0.