Modified Forbes Metric
This method is not my own. It was developed in two papers by John Alroy, as a refinement of a method from 1907. I just coded it into R for my own use.
Forbes, S. A. 1907. On the local distribution of certain Illinois fishes: an essay in statistical ecology. Bulletin of the Illinois State Laboratory of Natural History (7): 272–303.
Alroy, J., 2015. A new twist on a very old binary similarity coefficient. Ecology (96): 575-586.
Alroy, J., 2015. A simple way to improve multivariate analyses of paleoecological data sets. Paleobiology (41) 377-386.
Indices of faunal similarity or difference have been the foundation of bioegeographic study since the beginning of the 20th century. They can form the basis of simple comparison of the species present in two localities or be used for a broader-scale studies involving principal coordinate analyses, hierarchical cluster analyses or study of beta diversity.
There is a huge variety of metrics available, all largely variations on the same principal: given two localities or bioregions, compare the number of species shared by each locality to the total number of species. The precise equation varies but the same theoretical principles apply. The metrics generally vary between 0 and 1. If you have a similarity metric, values of 0 means no species shared between the two localities, and 1 means all species shared; vice versa if it’s a distance metric. I prefer to work with distances rather than similarities as it allows you to do cluster and principal coordinate analyses, but you can convert one to the other easily enough: just subtract the value from 1.
Back in 2015, John Alroy published two papers (references above) focusing on one of the oldest similarity metrics: the Forbes metric (Forbes 1907). He described some of its more desirable properties and made some changes to the equation to make it more suitable for incompletely sampled data. The final equation for the modified Forbes index (F’) is:
F’ = a(n + √n) / [a(n + √n + 3/2(b*c)]
Yes, I know it looks horrible, but there’s only 4 numbers you need to get from your data, and then just use the code here to plug them into the equation. If you have two localities being compared: a is the number of species found in both localities; b is the number of species found in the first only, c the number found in the second only, and n the total number of species in your sample.
I’m not going to reiterate all Alroy's arguments on why this should be the metric of choice for analysing faunal similarity; he makes several clear arguments in the two papers. I do, however, want to discuss briefly an interesting aspect of this metric, that while not unique, is not universal. The thing is, most metrics will produce a similarity of 1 (or a distance of 0) only if the taxa in the two localities being compared are identical i.e. if every species found in one is found in the other. The Forbes’ metric, and a couple of others, will also produce a similarity of 1 if one of the localities is a subset of another. That is, if locality 1 contains species A,B,C,D and E, while locality two contains species A,B,C and D. The reason this latter method is desirable is that the former method will be heavily influenced by sampling heterogeneity. If you have two localities with identical faunas (a similarity of 1), but one is sampled better than the other, the better sampled one will appear to contain more species. So, if you’re using a metric where having an identical set of species is a requirement for having a similarity of 1, the fact that the better sampled locality appears to contain more species will artificially lower the similarity.
Anyway, at the time of writing this (to the best of my knowledge), none of the available R packages for biogeography contains a function to carry out this method, so I’ve written one. Its very simple, and I’ve given a quick tutorial on how to use it (its also used internally in my RAC Beta Diversity method):
1.Download the Alroy_Forbes() function via the link here and read it into R*
*NOTE: the function here calculates a distance metric, so the equation shown above has been subtracted from 1
2.Download the example.csv file and place it in your working directory; you can check what your working directory is with the line:
getwd()
3.Load data and store as an object called “dataset”. The data needs to be a matrix of presences/absences or abundances with taxa in rows and localities in columns**. The example.csv file provided may be read into R using the following line:
dataset<-read.csv("example.csv",row.name=1,header=T)
**Note: row names and column names are required, and each must be unique.
4.Run analysis, storing output as an object called results, and have a look at it
results<-Alroy_Forbes(dataset)
results
5.The results object is an object of class dist: a pairwise distance matrix showing the F’ distance between each pair of localities
Forbes, S. A. 1907. On the local distribution of certain Illinois fishes: an essay in statistical ecology. Bulletin of the Illinois State Laboratory of Natural History (7): 272–303.
Alroy, J., 2015. A new twist on a very old binary similarity coefficient. Ecology (96): 575-586.
Alroy, J., 2015. A simple way to improve multivariate analyses of paleoecological data sets. Paleobiology (41) 377-386.
Indices of faunal similarity or difference have been the foundation of bioegeographic study since the beginning of the 20th century. They can form the basis of simple comparison of the species present in two localities or be used for a broader-scale studies involving principal coordinate analyses, hierarchical cluster analyses or study of beta diversity.
There is a huge variety of metrics available, all largely variations on the same principal: given two localities or bioregions, compare the number of species shared by each locality to the total number of species. The precise equation varies but the same theoretical principles apply. The metrics generally vary between 0 and 1. If you have a similarity metric, values of 0 means no species shared between the two localities, and 1 means all species shared; vice versa if it’s a distance metric. I prefer to work with distances rather than similarities as it allows you to do cluster and principal coordinate analyses, but you can convert one to the other easily enough: just subtract the value from 1.
Back in 2015, John Alroy published two papers (references above) focusing on one of the oldest similarity metrics: the Forbes metric (Forbes 1907). He described some of its more desirable properties and made some changes to the equation to make it more suitable for incompletely sampled data. The final equation for the modified Forbes index (F’) is:
F’ = a(n + √n) / [a(n + √n + 3/2(b*c)]
Yes, I know it looks horrible, but there’s only 4 numbers you need to get from your data, and then just use the code here to plug them into the equation. If you have two localities being compared: a is the number of species found in both localities; b is the number of species found in the first only, c the number found in the second only, and n the total number of species in your sample.
I’m not going to reiterate all Alroy's arguments on why this should be the metric of choice for analysing faunal similarity; he makes several clear arguments in the two papers. I do, however, want to discuss briefly an interesting aspect of this metric, that while not unique, is not universal. The thing is, most metrics will produce a similarity of 1 (or a distance of 0) only if the taxa in the two localities being compared are identical i.e. if every species found in one is found in the other. The Forbes’ metric, and a couple of others, will also produce a similarity of 1 if one of the localities is a subset of another. That is, if locality 1 contains species A,B,C,D and E, while locality two contains species A,B,C and D. The reason this latter method is desirable is that the former method will be heavily influenced by sampling heterogeneity. If you have two localities with identical faunas (a similarity of 1), but one is sampled better than the other, the better sampled one will appear to contain more species. So, if you’re using a metric where having an identical set of species is a requirement for having a similarity of 1, the fact that the better sampled locality appears to contain more species will artificially lower the similarity.
Anyway, at the time of writing this (to the best of my knowledge), none of the available R packages for biogeography contains a function to carry out this method, so I’ve written one. Its very simple, and I’ve given a quick tutorial on how to use it (its also used internally in my RAC Beta Diversity method):
1.Download the Alroy_Forbes() function via the link here and read it into R*
*NOTE: the function here calculates a distance metric, so the equation shown above has been subtracted from 1
2.Download the example.csv file and place it in your working directory; you can check what your working directory is with the line:
getwd()
3.Load data and store as an object called “dataset”. The data needs to be a matrix of presences/absences or abundances with taxa in rows and localities in columns**. The example.csv file provided may be read into R using the following line:
dataset<-read.csv("example.csv",row.name=1,header=T)
**Note: row names and column names are required, and each must be unique.
4.Run analysis, storing output as an object called results, and have a look at it
results<-Alroy_Forbes(dataset)
results
5.The results object is an object of class dist: a pairwise distance matrix showing the F’ distance between each pair of localities