Comparing the richness between two parks
Biodiversity Informatics Workshop
Madagascar
Download the software from: Estimates
EstimateS works basically with two kinds of data files:
In this exercise, we will focus on both data files
We will use the datafile “aurora_test.txt”
Remember the appropriate format for EstimateS (from the manual)
The two required header records (rows) for this filetype are:
Record #1 (Title Record): Datafile Title <tab>
[SampleSet] <tab>
[Format Code] <tab>
[Skip rows] <tab>
[Skip columns] The first record (line) of the Input File must contain a title in the first field (column); any text will do. The second field of this Title Record should read (exactly) SampleSet (including the asterisks; no spaces).
Record #2 (Parameter Record): Number of Species<tab>
character.
Record #3 etc.: The rest of the Input File contains the input data
To plot the results, we can go to EXCEL and plot samples/individuals vs estimated richness (based on rarefaction) or richness estimator (Chao1)
We should already have R and RStudio installed and ready in our computers!
We can use R and the ‘vegan’ package to compute and plot most of the EstimateS results. For instance, function specaccum
finds species accumulation curves or the number of species for a certain number of sampled sites or individuals (by default, based on exact sample-based rarefaction). The function poolaccum
p estimates the common richness estimators and plots all of them. NOTE: Both functions, speaccum
and poolaccum
seem to work with sample-based abundance data (several related samples of abundance data for each) instead of individual-based abundance data (a single site/sample with abundance data). Also, note that the format for the datafile is different from that of EstimateS. In R, the format will also be a tab-delimited text but simply with species (rows) by sites (columns) matrix of abundance/incidence data.
Install the required packages
#Dowload the packages from a mirror (choose a mirror when it asks us to)
install.packages("vegan")
install.packages("iNEXT")
#Load the packages into the R session
library("vegan")
library("iNEXT")
Get the data
#Read the data into R (the '~' in the path corresponds to your working directory. If you're already there, just write the name of the file, if not then write the whole path to the file)
aurora <- read.table("~aurora_for_R.txt",sep="\t",row.names = 1)
#Take a look at the data
head(aurora)
#Data has species as rows and sites as columns, but we need the opposite. Thus, tranpose the original data
auroraT <- t(aurora)
#Now the data is as follows: rows are sites, columns are species
#Check it!
Use the vegan
package calculate some richness estimators and build the species accumulation curve.
Richness estimators
#get richness estimators (for each sample, cumulative)
pool.aurora <- poolaccum(auroraT)
#plot all: obs richness and estimators
plot(pool.aurora)
Species accumulation curve
#build the species accumulation curve & rarefaction curve (expected)
aurora.specaccum <- specaccum(auroraT,method = "rarefaction")
#plot the curve with some predefined settings
plot(aurora.specaccum,ci.type="poly", col="blue", lwd=2, ci.lty=0, ci.col="lightblue")
#build a expected curve (randomization for boxplot comparison)
aurora.specaccum.rand <- specaccum(auroraT, "random")
#plot both curves ("observed" vs "randomized")
plot(aurora.specaccum,ci.type="poly", col="blue", lwd=2, ci.lty=0, ci.col="lightblue")
boxplot(aurora.specaccum.rand, col="yellow", add=TRUE, pch="+")
Use the iNEXT
package calculate some richness estimators and build the species accumulation curve. The iNEXT
package is mainly aimed for comparisons between/among sites. So, to use it for the aurora data, we need to transform the data into a single site with species abundance (just for the sake of this exercise). Otherwise, each day within the data would be treated as a sample/site.
#Sum abundances for the aurora site; i.e. consider the aurora data as a single site with spp abundances (summed over the sampling days)
aurora.sum <- rowSums(aurora)
#apply `iNEXT` main function
aurora.sum.inext <- iNEXT(aurora.sum,datatype = "abundance")
#look at the data
aurora.sum.inext
#plot the results
ggiNEXT(aurora.sum.inext)
Richness estimators can also be used to compare between/among sites based on diversity indices or dissimilarity (beta diversity) or simply on rarefaction curves. Because richness is inherently sample-size dependent, however, any such comparison must be done at equivalent sample sizes, which is why we rarefy (and extrapolate) (EstimateS manual).
(Ramanandraisoa V, Rakotomanjaka A (2016). tan-database. Version 1.2. Parc Botanique et Zoologique de Tsimbazaza (P.B.Z.T.). Occurrence Dataset https://doi.org/10.15468/bpx7wi accessed via GBIF.org on 2017-06-18.)
This data was downloaded from GBIF and contains data on the herbarium of Tsimbazaza. This data will be treated in QGIS (see the file Madagascar_QGIS_exercise.html
). In QGIS, we plotted, selected, and imported data two natural parks (Tsingy and Tsaratanana)
Import data from QGIS
#import points/occurrences for the two reserves/parks (Tsingy (PA_1_tanpoints) and Tsaratanana (PA_2_tanpoints)). NOTE: the Tsingy (PA_1) data table may have some issues. If so, select only those columns with taxonomy and coordinates
mdg.park1 <- read.table("PA_1_tanpoints_2.txt",h=T,sep="\t")
mdg.park2 <- read.table("PA_2_tanpoints.txt",sep="\t",h=T)
Import shapefiles of Madagascar and its parks
#Install and load the required package
install.packages("maptools")
library("maptools")
# Madagascar shapefile
mdg.shp <- readShapePoly("Madagascar_shp/madagascar.shp")
# Madagascar parks shapefile
mdg.parks <- readShapePoly("PAs/WDPA_June2017_MDG-shapefile-polygons.shp")
Plot the occurrences
plot(mdg.shp)
plot(mdg.parks,add=T,col="red")
points(mdg.park1$decimallon,mdg.park1$decimallat,col="blue")
points(mdg.park2$decimallon,mdg.park2$decimallat,col="green")
Get the number of occurrences per species (as a proxy for abundance)
#park1
park1.spp.abund <- as.numeric(table(mdg.park1$species))
#park2
park2.spp.abund <- as.numeric(table(mdg.park2$species))
#combine both datasets
parks.spp <- vector("list")
parks.spp$park1 <- park1.spp.abund
parks.spp$park2 <- park2.spp.abund
Comparison between parks
NOTE: can’t use vegan
functions (specaccum
and poolaccum
) for each park because these are individual-based abundance data (only one site/sample, instead of a collection of sites/samples). This can be done by converting each abundance vector into a matrix (columns=spp, row=site) and running the rarecurve
function in the vegan
package.
rarecurve(t(as.matrix(park1.spp.abund)))
park1.rarecurve <- rarecurve(t(as.matrix(park1.spp.abund)))
head(park1.rarecurve)
ALSO: this curve (and the estimators) can be obtained directy from EstimateS, preparing and treating each vector as an individual-based abundance data, and then plotting the results using EXCEL
Species accumulation curve & rarefaction curve (expected) using iNEXT package.
# iNEXT: individual-based abundance data (comparing sites)
mdg.parks.inext <- iNEXT(parks.spp, q=0, datatype="abundance")
ggiNEXT(mdg.parks.inext)
How does it look?
Do the parks have the same biodiversity?
Which park “needs” more sampling?