1. Overview
Once organisms in a habitat have been sampled, ecologists use statistics to turn raw counts into meaningful conclusions. Two questions come up again and again:
- How diverse is this habitat? — answered with Simpson's index of diversity (), which combines species richness (how many species) with species evenness (how balanced their numbers are).
- Is a species' abundance or distribution linked to an environmental factor? — answered with a correlation test: Spearman's rank when data are ranked or not normally distributed, and Pearson's linear when both variables are normally distributed and a straight-line relationship is expected.
Correlations are especially useful for testing whether biotic and abiotic factors affect where species live and in what numbers.
Symbol warning: this topic reuses the same letters for unrelated things. In Simpson's index, = the count of one species and = the total count. In Spearman's formula, = a rank difference (nothing to do with diversity) and = the number of pairs. Read each formula's own key before substituting.
Key Definitions
- Biodiversity: the variety of living organisms in an area, which at the species level depends on both the number of species present and how evenly individuals are spread among them.
- Species richness: the number of different species present in a habitat.
- Species evenness: how similar the population sizes of the different species in a habitat are to one another.
- Simpson's index of diversity: a measure of biodiversity calculated from the proportion of individuals in each species, giving values from 0 to 1, where higher values indicate greater diversity.
- Correlation: a statistical relationship between two variables, in which a change in one is associated with a change in the other.
- Spearman's rank correlation coefficient: a value between minus 1 and plus 1 that measures the strength and direction of a relationship between two variables after their values have been ranked.
- Pearson's linear correlation coefficient: a value between minus 1 and plus 1 that measures the strength and direction of a straight-line relationship between two variables that are normally distributed.
- Biotic factor: a living component of an ecosystem, such as predation, competition or food availability, that affects the distribution and abundance of a species.
- Abiotic factor: a non-living component of an ecosystem, such as temperature, light intensity, pH or soil moisture, that affects the distribution and abundance of a species.
Content
Simpson's index of diversity (D)
A simple count of species (species richness) is not enough to describe biodiversity, because a habitat dominated by one species is less diverse than one where individuals are spread evenly across the same number of species. Simpson's index of diversity captures both ideas by using the proportion of individuals in each species.
The index is calculated from:
where is the number of individuals of one species and is the total number of individuals of all species. For each species, work out the fraction , square it, then add these squared fractions together; finally subtract the total from 1.
Values of run from 0 to 1:
- A value near 0 means low diversity — usually one species dominates, or there are very few species. Such communities tend to be unstable and are easily disrupted by environmental change.
- A value near 1 means high diversity — many species with fairly even numbers. These communities are generally more stable and better able to resist change, because losing one species is less likely to upset the whole community.
Comparing between sites (for example a managed field versus ancient woodland), or for the same site over time, reveals the effect of habitat change, pollution or conservation work.
Worked example
Exam-style question: A student sampled small invertebrates in leaf litter and recorded: woodlice 8, springtails 12, mites 5, beetles 3. Using Simpson's index of diversity, , calculate for this sample and comment on what the value shows. [3]
Model answer:
- Total individuals .
- Sum of squared proportions .
- (3 s.f.). This is closer to 1 than to 0, so the community has a moderately high diversity with no single species strongly dominating.
Why measure correlations
The distribution (where a species is found) and abundance (how many there are) of an organism are often controlled by environmental factors. These divide into:
- Abiotic factors — non-living: light, temperature, pH, soil moisture, salinity.
- Biotic factors — living: predation, competition, disease, food supply.
A correlation test asks whether two such measurements vary together in a consistent way — for example, does the abundance of a plant species change as soil moisture changes along a transect?
A correlation coefficient (symbol for Pearson, for Spearman) always lies between minus 1 and plus 1:
- close to = strong positive correlation (as one variable rises, so does the other);
- close to = strong negative correlation (as one rises, the other falls);
- close to = little or no correlation.
Before calculating, it is good practice to plot a scatter graph so you can see whether the points suggest a trend at all. As the scatter graph below shows, points that climb from bottom-left to top-right slope upward, which signals a positive correlation between species abundance and the abiotic factor. (The axes here are illustrative — the figure shows the shape of a positive correlation, not specific data values.)
Choosing Spearman or Pearson
The two tests answer the same kind of question but suit different data:
| Spearman's rank | Pearson's linear | |
|---|---|---|
| Use when | data can be ranked, or are not normally distributed, or the relationship is not a straight line | both variables are normally distributed and a straight-line (linear) relationship is expected |
| Works on | the ranks of the values | the actual values |
| Tests for | a general trend (monotonic) | a specifically linear trend |
The formulae are provided in the exam, so you do not need to memorise them, but you must be able to substitute correctly and interpret the result.
Carrying out Spearman's rank
The supplied formula is:
where is the difference between the two ranks for each pair, and is the number of pairs of measurements. The method is:
- Rank the values of variable 1 (smallest = 1), and separately rank variable 2.
- For tied values, give each the mean of the ranks they would otherwise occupy.
- Find the rank difference for each pair, then square it to give .
- Add up all the values and substitute into the formula.
Worked example
Exam-style question: Along a transect, a student measured soil moisture and the abundance of a plant species at six points:
| Point | Soil moisture (%) | Plant abundance |
|---|---|---|
| 1 | 12 | 4 |
| 2 | 18 | 11 |
| 3 | 22 | 8 |
| 4 | 27 | 14 |
| 5 | 31 | 17 |
| 6 | 35 | 17 |
State which correlation test is appropriate and why, then use Spearman's rank correlation, , to calculate and interpret the result. [4]
Model answer:
- Spearman's rank is appropriate here because the data are ranked along a transect and we cannot assume plant abundance is normally distributed, so a test based on ranks rather than the raw values is the safer choice.
- Rank each variable from smallest (= 1) to largest. The two abundance values of 17 (points 5 and 6) are tied for ranks 5 and 6, so each takes the mean rank .
| Point | Moisture rank | Abundance rank | (rank diff.) | |
|---|---|---|---|---|
| 1 | 1 | 1 | 0 | 0 |
| 2 | 2 | 3 | 1 | |
| 3 | 3 | 2 | 1 | 1 |
| 4 | 4 | 4 | 0 | 0 |
| 5 | 5 | 5.5 | 0.25 | |
| 6 | 6 | 5.5 | 0.5 | 0.25 |
- , with pairs.
- .
- (3 s.f.).
- This is close to , so there is a strong positive correlation: plant abundance generally increases as soil moisture increases, even though points 2 and 3 are slightly out of order. (A correlation alone does not prove moisture causes the change — a biological reason, such as the plant needing damp soil, is still needed.)
Carrying out Pearson's linear
Pearson's coefficient uses the actual measurements rather than ranks. The supplied formula is:
where and are the paired values and , are their means. In words: it compares how the two variables vary together with how much each varies on its own, giving a value between and .
Pearson is only valid when both sets of data are roughly normally distributed and the scatter graph suggests a straight-line pattern. If those conditions are not met, use Spearman's rank instead.
Interpreting : suppose a study of light intensity and the number of a sun-loving plant returns . You should report both features of the value:
- Sign: positive — abundance rises as light intensity rises.
- Strength: very close to — a strong, near-linear relationship.
A value such as would instead show a weak negative relationship that is barely a trend at all.
Interpreting the result honestly
A high correlation coefficient shows that two variables change together — it does not by itself prove that one causes the other. Both might be driven by a third factor. To claim that an abiotic or biotic factor controls a species' distribution, you need a biological mechanism to explain the link as well as the statistical correlation.
Key Equations
Simpson's index of diversity:
where = number of individuals of one species, = total number of individuals of all species.
Spearman's rank correlation coefficient:
where = difference in rank for each pair, = number of pairs.
Pearson's linear correlation coefficient:
where , are paired values and , their means. Both and lie between and . All these formulae are supplied in the exam.
Common Mistakes to Avoid
- Rounding part-way through a Simpson's index calculation. Squaring small fractions magnifies rounding errors. Keep full decimals (or stored values) for every intermediate step and round only the final to three significant figures.
- Forgetting the final "" step in Simpson's index. The squared proportions add up to a number that increases with dominance; you must subtract it from 1 so that a higher means higher diversity.
- Mixing up and . Lower-case is the count for one species; capital is the total for all species. Using the wrong one in gives a meaningless answer.
- Treating species richness as the whole story. Two habitats can have the same number of species but very different diversity; rewards an even spread of individuals as well as richness.
- Picking the wrong correlation test. Use Pearson only when both variables are normally distributed and the relationship looks linear; otherwise use Spearman's rank — and state your reason if asked.
- Mishandling tied ranks in Spearman's. Tied values must each take the mean of the ranks they share, not be skipped or given the same whole-number rank twice.
- Claiming correlation proves causation. A significant correlation shows variables change together; always back up any causal claim with a biological explanation.
- Confusing the two conservation bodies. One organisation assesses how threatened species are and places them on a Red List of threatened species; a separate international agreement is what actually regulates and restricts trade in those species. Do not write that the assessing body bans trade.
- Muddling Domains with Kingdoms. The three Domains are Bacteria, Archaea and Eukarya. Groups such as Prokaryota, Protoctista, Fungi or Plantae are Kingdoms, not Domains — do not list them at the Domain level.
Exam Tips
- The formulae for Simpson's index, Spearman's rank and Pearson's are given in the exam — your marks come from correct substitution and clear interpretation, so practise these rather than memorising the formulae.
- Show your working line by line (for Simpson's: , the sum of squared proportions, then ; for Spearman's: the rank table, , then ). Method marks are available even if the final figure is slightly off.
- Quote the final answer to three significant figures with no units — both and the correlation coefficients are dimensionless.
- When interpreting a correlation coefficient, comment on both its sign (positive or negative) and its strength (how close to ).
- Watch the symbol clash: and mean different things in Simpson's and Spearman's formulae, so check which formula you are in before substituting.
- For data-handling questions, link the statistic back to the biology — e.g. a high suggests a stable community, or a strong correlation suggests an abiotic/biotic factor may influence a species' distribution.
- If asked which test to use, name the test and justify it from the type and distribution of the data.