Species richness as a measure of biodiversity

EstimateS software

Download the software from: Estimates

EstimateS works basically with two kinds of data files:

  • Individual-based abundance data (a single sample of abundance data)
  • Sample-based incidence or abundance data (several related samples of incidence or abundance data)

In this exercise, we will focus on both data files

Get some data to work with

We will use the datafile “aurora_test.txt

Remember the appropriate format for EstimateS (from the manual)

The two required header records (rows) for this filetype are:

Record #1 (Title Record): Datafile Title <tab> [SampleSet] <tab> [Format Code] <tab> [Skip rows] <tab> [Skip columns] The first record (line) of the Input File must contain a title in the first field (column); any text will do. The second field of this Title Record should read (exactly) SampleSet (including the asterisks; no spaces).

Record #2 (Parameter Record): Number of SpeciesNumber of Sampling Units in the Sample Set The second record (line) of the Input File must contain two obligatory control parameters: the number of species and the number of sampling units, separated by a <tab> character.

Record #3 etc.: The rest of the Input File contains the input data

To plot the results, we can go to EXCEL and plot samples/individuals vs estimated richness (based on rarefaction) or richness estimator (Chao1)

R and RStudio

We should already have R and RStudio installed and ready in our computers!

We can use R and the ‘vegan’ package to compute and plot most of the EstimateS results. For instance, function specaccum finds species accumulation curves or the number of species for a certain number of sampled sites or individuals (by default, based on exact sample-based rarefaction). The function poolaccump estimates the common richness estimators and plots all of them. NOTE: Both functions, speaccum and poolaccum seem to work with sample-based abundance data (several related samples of abundance data for each) instead of individual-based abundance data (a single site/sample with abundance data). Also, note that the format for the datafile is different from that of EstimateS. In R, the format will also be a tab-delimited text but simply with species (rows) by sites (columns) matrix of abundance/incidence data.

Install the required packages

#Dowload the packages from a mirror (choose a mirror when it asks us to)
install.packages("vegan")
install.packages("iNEXT")

#Load the packages into the R session
library("vegan")
library("iNEXT")

Get the data

#Read the data into R (the '~' in the path corresponds to your working directory. If you're already there, just write the name of the file, if not then write the whole path to the file)
aurora <- read.table("~aurora_for_R.txt",sep="\t",row.names = 1)
#Take a look at the data
head(aurora)
#Data has species as rows and sites as columns, but we need the opposite. Thus, tranpose the original data
auroraT <- t(aurora)
#Now the data is as follows: rows are sites, columns are species
#Check it!

Use the vegan package calculate some richness estimators and build the species accumulation curve.

Richness estimators

#get richness estimators (for each sample, cumulative)
pool.aurora <- poolaccum(auroraT)
#plot all: obs richness and  estimators
plot(pool.aurora)

Species accumulation curve

#build the species accumulation curve & rarefaction curve (expected)
aurora.specaccum <- specaccum(auroraT,method = "rarefaction")
#plot the curve with some predefined settings
plot(aurora.specaccum,ci.type="poly", col="blue", lwd=2, ci.lty=0, ci.col="lightblue")

#build a expected curve (randomization for boxplot comparison)
aurora.specaccum.rand <- specaccum(auroraT, "random")
#plot both curves ("observed" vs "randomized")
plot(aurora.specaccum,ci.type="poly", col="blue", lwd=2, ci.lty=0, ci.col="lightblue")
boxplot(aurora.specaccum.rand, col="yellow", add=TRUE, pch="+")

Use the iNEXT package calculate some richness estimators and build the species accumulation curve. The iNEXT package is mainly aimed for comparisons between/among sites. So, to use it for the aurora data, we need to transform the data into a single site with species abundance (just for the sake of this exercise). Otherwise, each day within the data would be treated as a sample/site.

#Sum abundances for the aurora site; i.e. consider the aurora data as a single site with spp abundances (summed over the sampling days)
aurora.sum <- rowSums(aurora)

#apply `iNEXT` main function
aurora.sum.inext <- iNEXT(aurora.sum,datatype = "abundance")
#look at the data
aurora.sum.inext
#plot the results
ggiNEXT(aurora.sum.inext)

Comparing diversity of different sites

Richness estimators can also be used to compare between/among sites based on diversity indices or dissimilarity (beta diversity) or simply on rarefaction curves. Because richness is inherently sample-size dependent, however, any such comparison must be done at equivalent sample sizes, which is why we rarefy (and extrapolate) (EstimateS manual).

using the Tan database from the Parc Botanique et Zoologique de Tsimbazaza (P.B.Z.T.)

(Ramanandraisoa V, Rakotomanjaka A (2016). tan-database. Version 1.2. Parc Botanique et Zoologique de Tsimbazaza (P.B.Z.T.). Occurrence Dataset https://doi.org/10.15468/bpx7wi accessed via GBIF.org on 2017-06-18.)

This data was downloaded from GBIF and contains data on the herbarium of Tsimbazaza. This data will be treated in QGIS (see the file Madagascar_QGIS_exercise.html). In QGIS, we plotted, selected, and imported data two natural parks (Tsingy and Tsaratanana)

Import data from QGIS

#import points/occurrences for the two reserves/parks (Tsingy (PA_1_tanpoints) and Tsaratanana (PA_2_tanpoints)). NOTE: the Tsingy (PA_1) data table may have some issues. If so, select only those columns with taxonomy and coordinates
mdg.park1 <- read.table("PA_1_tanpoints_2.txt",h=T,sep="\t")
mdg.park2 <- read.table("PA_2_tanpoints.txt",sep="\t",h=T)

Import shapefiles of Madagascar and its parks

#Install and load the required package
install.packages("maptools")
library("maptools")

# Madagascar shapefile
mdg.shp <- readShapePoly("Madagascar_shp/madagascar.shp")
# Madagascar parks shapefile
mdg.parks <- readShapePoly("PAs/WDPA_June2017_MDG-shapefile-polygons.shp")

Plot the occurrences

plot(mdg.shp)
plot(mdg.parks,add=T,col="red")
points(mdg.park1$decimallon,mdg.park1$decimallat,col="blue")
points(mdg.park2$decimallon,mdg.park2$decimallat,col="green")

Get the number of occurrences per species (as a proxy for abundance)

#park1
park1.spp.abund <- as.numeric(table(mdg.park1$species))

#park2
park2.spp.abund <- as.numeric(table(mdg.park2$species))

#combine both datasets
parks.spp <- vector("list") 
parks.spp$park1 <- park1.spp.abund
parks.spp$park2 <- park2.spp.abund

Comparison between parks

NOTE: can’t use vegan functions (specaccum and poolaccum) for each park because these are individual-based abundance data (only one site/sample, instead of a collection of sites/samples). This can be done by converting each abundance vector into a matrix (columns=spp, row=site) and running the rarecurve function in the vegan package.

rarecurve(t(as.matrix(park1.spp.abund)))
park1.rarecurve <- rarecurve(t(as.matrix(park1.spp.abund)))
head(park1.rarecurve)

ALSO: this curve (and the estimators) can be obtained directy from EstimateS, preparing and treating each vector as an individual-based abundance data, and then plotting the results using EXCEL

Species accumulation curve & rarefaction curve (expected) using iNEXT package.

# iNEXT: individual-based abundance data (comparing sites)
mdg.parks.inext <- iNEXT(parks.spp, q=0, datatype="abundance")
ggiNEXT(mdg.parks.inext)

How does it look?

Do the parks have the same biodiversity?

Which park “needs” more sampling?