Biodiversity informatics workshop - Namibia University of Science and Technology JRS Biodiversity Foundation
Download the software from: Estimates
EstimateS works basically with two kinds of data files:
In this exercise, we will focus on both data files
We will use the datafile “aurora_test.txt”
Remember the appropriate format for EstimateS (from the manual)
The two required header records (rows) for this filetype are:
Record #1 (Title Record): Datafile Title <tab>
[SampleSet] <tab>
[Format Code] <tab>
[Skip rows] <tab>
[Skip columns] The first record (line) of the Input File must contain a title in the first field (column); any text will do. The second field of this Title Record should read (exactly) SampleSet (including the asterisks; no spaces).
Record #2 (Parameter Record): Number of Species<tab>
character.
Record #3 etc.: The rest of the Input File contains the input data
To plot the results, we can go to EXCEL and plot samples/individuals vs estimated richness (based on rarefaction) or richness estimator (Chao1)
We should already have R and RStudio installed and ready in our computers!
We can use R and the ‘vegan’ package to compute and plot most of the EstimateS results. For instance, function specaccum
finds species accumulation curves or the number of species for a certain number of sampled sites or individuals (by default, based on exact sample-based rarefaction). The function poolaccum
p estimates the common richness estimators and plots all of them. NOTE: Both functions, speaccum
and poolaccum
seem to work with sample-based abundance data (several related samples of abundance data for each) instead of individual-based abundance data (a single site/sample with abundance data). Also, note that the format for the datafile is different from that of EstimateS. In R, the format will also be a tab-delimited text but simply with species (rows) by sites (columns) matrix of abundance/incidence data.
Install the required packages
#Dowload the packages from a mirror (choose a mirror when it asks us to)
install.packages("vegan")
install.packages("iNEXT")
#Load the packages into the R session
library("vegan")
library("iNEXT")
Get the data
#Read the data into R (the '~' in the path corresponds to your working directory. If you're already there, just write the name of the file, if not then write the whole path to the file)
aurora <- read.table("~aurora_for_R.txt",sep="\t",row.names = 1)
#Take a look at the data
head(aurora)
#Data has species as rows and sites as columns, but we need the opposite. Thus, tranpose the original data
auroraT <- aurora
#Now the data is as follows: rows are sites, columns are species
#Check it!
Use the vegan
package calculate some richness estimators and build the species accumulation curve.
Richness estimators
#get richness estimators (for each sample, cumulative)
pool.aurora <- poolaccum(auroraT)
#plot all: obs richness and estimators
plot(pool.aurora)
Species accumulation curve
#build the species accumulation curve & rarefaction curve (expected)
aurora.specaccum <- specaccum(auroraT,method = "rarefaction")
#plot the curve with some predefined settings
plot(aurora.specaccum,ci.type="poly", col="blue", lwd=2, ci.lty=0, ci.col="lightblue")
#build a expected curve (randomization for boxplot comparison)
aurora.specaccum.rand <- specaccum(auroraT, "random")
#plot both curves ("observed" vs "randomized")
plot(aurora.specaccum,ci.type="poly", col="blue", lwd=2, ci.lty=0, ci.col="lightblue")
boxplot(aurora.specaccum.rand, col="yellow", add=TRUE, pch="+")
Use the iNEXT
package calculate some richness estimators and build the species accumulation curve. The iNEXT
package is mainly aimed for comparisons between/among sites. So, to use it for the aurora data, we need to transform the data into a single site with species abundance (just for the sake of this exercise). Otherwise, each day within the data would be treated as a sample/site.
#Sum abundances for the aurora site; i.e. consider the aurora data as a single site with spp abundances (summed over the sampling days)
aurora.sum <- rowSums(aurora)
#apply `iNEXT` main function
aurora.sum.inext <- iNEXT(aurora.sum,datatype = "abundance")
#look at the data
aurora.sum.inext
#plot the results
ggiNEXT(aurora.sum.inext)
Richness estimators can also be used to compare between/among sites based on diversity indices or dissimilarity (beta diversity) or simply on rarefaction curves. Because richness is inherently sample-size dependent, however, any such comparison must be done at equivalent sample sizes, which is why we rarefy (and extrapolate) (EstimateS manual).
The data was obtained from GBIF and treated in QGIS (see the file QGIS_exercise.html
). This data consists of herp records from localities across Angola. In QGIS, we plotted, selected, and imported data for the herps within two natural parks (Iona and Luengue-Luiana)
Import data from QGIS
#import points/occurrences for chordates in two reserves/parks (Iona (park1) & Luengue-Luiana (park2))
anglherps.park1 <- read.table("iona_herps.txt",sep="\t",h=T)
anglherps.park2 <- read.table("luengue_luiana_herps.txt",sep="\t",h=T)
Import shapefiles of Angola and its parks
#Install and load the required package
install.packages("maptools")
library("maptools")
# Angola shapefile
angola.shp <- readShapePoly("angola_municipalities/ANG.shp")
# Angolan parks shapefile
angola.parks <- readShapePoly("NamibiaCourse2016/WDPA_Apr2016_AGO-shapefile/WDPA_Apr2016_AGO-shapefile-polygons.shp")
Plot the occurrences
plot(angola.shp)
plot(angola.parks,add=T,col="red")
points(anglherps.park1$decimalLon,anglherps.park1$decimalLat,col="blue")
points(anglherps.park2$decimalLon,anglherps.park2$decimalLat,col="green")
Get the number of occurrences per species (as a proxy for abundance)
#park1
park1.spp.abund <- as.numeric(table(anglherps.park1$Species))
#park2
park2.spp.abund <- as.numeric(table(anglherps.park2$Species))
#combine both datasets
parks.spp <- vector("list")
parks.spp$park1 <- park1.spp.abund
parks.spp$park2 <- park2.spp.abund
Comparison between parks
NOTE: can’t use vegan
functions (specaccum
and poolaccum
) for each park because these are individual-based abundance data (only one site/sample, instead of a collection of sites/samples). This can be done by converting each abundance vector into a matrix (columns=spp, row=site) and running the rarecurve
function in the vegan
package.
rarecurve(t(as.matrix(park1.spp.abund)))
park1.rarecurve <- rarecurve(t(as.matrix(park1.spp.abund)))
head(park1.rarecurve)
ALSO: this curve (and the estimators) can be obtained directy from EstimateS, preparing and treating each vector as an individual-based abundance data, and then plotting the results using EXCEL
Species accumulation curve & rarefaction curve (expected) using iNEXT package.
# iNEXT: individual-based abundance data (comparing sites)
angl.parks.inext <- iNEXT(parks.spp, q=0, datatype="abundance")
ggiNEXT(angl.parks.inext)
How does it look?
Do the parks have the same biodiversity?
Which park “needs” more sampling?