18.2 A2 Level BETA

Biodiversity

2 learning objectives

1. Overview

Once organisms in a habitat have been sampled, ecologists use statistics to turn raw counts into meaningful conclusions. Two questions come up again and again:

  • How diverse is this habitat? — answered with Simpson's index of diversity (DD), which combines species richness (how many species) with species evenness (how balanced their numbers are).
  • Is a species' abundance or distribution linked to an environmental factor? — answered with a correlation test: Spearman's rank when data are ranked or not normally distributed, and Pearson's linear when both variables are normally distributed and a straight-line relationship is expected.

Correlations are especially useful for testing whether biotic and abiotic factors affect where species live and in what numbers.

Symbol warning: this topic reuses the same letters for unrelated things. In Simpson's index, nn = the count of one species and NN = the total count. In Spearman's formula, DD = a rank difference (nothing to do with diversity) and nn = the number of pairs. Read each formula's own key before substituting.

Key Definitions

  • Biodiversity: the variety of living organisms in an area, which at the species level depends on both the number of species present and how evenly individuals are spread among them.
  • Species richness: the number of different species present in a habitat.
  • Species evenness: how similar the population sizes of the different species in a habitat are to one another.
  • Simpson's index of diversity: a measure of biodiversity calculated from the proportion of individuals in each species, giving values from 0 to 1, where higher values indicate greater diversity.
  • Correlation: a statistical relationship between two variables, in which a change in one is associated with a change in the other.
  • Spearman's rank correlation coefficient: a value between minus 1 and plus 1 that measures the strength and direction of a relationship between two variables after their values have been ranked.
  • Pearson's linear correlation coefficient: a value between minus 1 and plus 1 that measures the strength and direction of a straight-line relationship between two variables that are normally distributed.
  • Biotic factor: a living component of an ecosystem, such as predation, competition or food availability, that affects the distribution and abundance of a species.
  • Abiotic factor: a non-living component of an ecosystem, such as temperature, light intensity, pH or soil moisture, that affects the distribution and abundance of a species.

Content

Simpson's index of diversity (D)

A simple count of species (species richness) is not enough to describe biodiversity, because a habitat dominated by one species is less diverse than one where individuals are spread evenly across the same number of species. Simpson's index of diversity captures both ideas by using the proportion of individuals in each species.

The index is calculated from:

D=1((nN)2)D = 1 - \left( \sum \left( \frac{n}{N} \right)^2 \right)

where nn is the number of individuals of one species and NN is the total number of individuals of all species. For each species, work out the fraction nN\tfrac{n}{N}, square it, then add these squared fractions together; finally subtract the total from 1.

Values of DD run from 0 to 1:

  • A value near 0 means low diversity — usually one species dominates, or there are very few species. Such communities tend to be unstable and are easily disrupted by environmental change.
  • A value near 1 means high diversity — many species with fairly even numbers. These communities are generally more stable and better able to resist change, because losing one species is less likely to upset the whole community.

Comparing DD between sites (for example a managed field versus ancient woodland), or for the same site over time, reveals the effect of habitat change, pollution or conservation work.

Worked example

Exam-style question: A student sampled small invertebrates in leaf litter and recorded: woodlice 8, springtails 12, mites 5, beetles 3. Using Simpson's index of diversity, D=1((nN)2)D = 1 - \left( \sum \left( \tfrac{n}{N} \right)^2 \right), calculate DD for this sample and comment on what the value shows. [3]

Model answer:

  • Total individuals N=8+12+5+3=28N = 8 + 12 + 5 + 3 = 28.
  • Sum of squared proportions =(828)2+(1228)2+(528)2+(328)2=0.08163+0.18367+0.03189+0.01148=0.30867= \left(\tfrac{8}{28}\right)^2 + \left(\tfrac{12}{28}\right)^2 + \left(\tfrac{5}{28}\right)^2 + \left(\tfrac{3}{28}\right)^2 = 0.08163 + 0.18367 + 0.03189 + 0.01148 = 0.30867.
  • D=10.30867=0.691D = 1 - 0.30867 = \mathbf{0.691} (3 s.f.). This is closer to 1 than to 0, so the community has a moderately high diversity with no single species strongly dominating.

Why measure correlations

The distribution (where a species is found) and abundance (how many there are) of an organism are often controlled by environmental factors. These divide into:

  • Abiotic factors — non-living: light, temperature, pH, soil moisture, salinity.
  • Biotic factors — living: predation, competition, disease, food supply.

A correlation test asks whether two such measurements vary together in a consistent way — for example, does the abundance of a plant species change as soil moisture changes along a transect?

A correlation coefficient (symbol rr for Pearson, rsr_s for Spearman) always lies between minus 1 and plus 1:

  • close to +1+1 = strong positive correlation (as one variable rises, so does the other);
  • close to 1-1 = strong negative correlation (as one rises, the other falls);
  • close to 00 = little or no correlation.

Before calculating, it is good practice to plot a scatter graph so you can see whether the points suggest a trend at all. As the scatter graph below shows, points that climb from bottom-left to top-right slope upward, which signals a positive correlation between species abundance and the abiotic factor. (The axes here are illustrative — the figure shows the shape of a positive correlation, not specific data values.)

GraphGraph with axes abiotic factor and species abundance. positive correlationabiotic factorspecies abundance
Scatter graph of species abundance against an abiotic factor, with a best-fit line sloping upward to illustrate a positive correlation.

Choosing Spearman or Pearson

The two tests answer the same kind of question but suit different data:

Spearman's rank Pearson's linear
Use when data can be ranked, or are not normally distributed, or the relationship is not a straight line both variables are normally distributed and a straight-line (linear) relationship is expected
Works on the ranks of the values the actual values
Tests for a general trend (monotonic) a specifically linear trend

The formulae are provided in the exam, so you do not need to memorise them, but you must be able to substitute correctly and interpret the result.

Carrying out Spearman's rank

The supplied formula is:

rs=16D2n3nr_s = 1 - \frac{6 \sum D^2}{n^3 - n}

where DD is the difference between the two ranks for each pair, and nn is the number of pairs of measurements. The method is:

  1. Rank the values of variable 1 (smallest = 1), and separately rank variable 2.
  2. For tied values, give each the mean of the ranks they would otherwise occupy.
  3. Find the rank difference DD for each pair, then square it to give D2D^2.
  4. Add up all the D2D^2 values and substitute into the formula.

Worked example

Exam-style question: Along a transect, a student measured soil moisture and the abundance of a plant species at six points:

Point Soil moisture (%) Plant abundance
1 12 4
2 18 11
3 22 8
4 27 14
5 31 17
6 35 17

State which correlation test is appropriate and why, then use Spearman's rank correlation, rs=16D2n3nr_s = 1 - \dfrac{6 \sum D^2}{n^3 - n}, to calculate rsr_s and interpret the result. [4]

Model answer:

  • Spearman's rank is appropriate here because the data are ranked along a transect and we cannot assume plant abundance is normally distributed, so a test based on ranks rather than the raw values is the safer choice.
  • Rank each variable from smallest (= 1) to largest. The two abundance values of 17 (points 5 and 6) are tied for ranks 5 and 6, so each takes the mean rank 5+62=5.5\tfrac{5+6}{2} = 5.5.
Point Moisture rank Abundance rank DD (rank diff.) D2D^2
1 1 1 0 0
2 2 3 1-1 1
3 3 2 1 1
4 4 4 0 0
5 5 5.5 0.5-0.5 0.25
6 6 5.5 0.5 0.25
  • D2=0+1+1+0+0.25+0.25=2.5\sum D^2 = 0 + 1 + 1 + 0 + 0.25 + 0.25 = 2.5, with n=6n = 6 pairs.
  • n3n=636=2166=210n^3 - n = 6^3 - 6 = 216 - 6 = 210.
  • rs=16×2.5210=115210=10.07143=+0.929r_s = 1 - \dfrac{6 \times 2.5}{210} = 1 - \dfrac{15}{210} = 1 - 0.07143 = \mathbf{+0.929} (3 s.f.).
  • This is close to +1+1, so there is a strong positive correlation: plant abundance generally increases as soil moisture increases, even though points 2 and 3 are slightly out of order. (A correlation alone does not prove moisture causes the change — a biological reason, such as the plant needing damp soil, is still needed.)

Carrying out Pearson's linear

Pearson's coefficient uses the actual measurements rather than ranks. The supplied formula is:

r=(xxˉ)(yyˉ)(xxˉ)2  (yyˉ)2r = \frac{\sum (x - \bar{x})(y - \bar{y})}{\sqrt{\sum (x - \bar{x})^2 \; \sum (y - \bar{y})^2}}

where xx and yy are the paired values and xˉ\bar{x}, yˉ\bar{y} are their means. In words: it compares how the two variables vary together with how much each varies on its own, giving a value rr between 1-1 and +1+1.

Pearson is only valid when both sets of data are roughly normally distributed and the scatter graph suggests a straight-line pattern. If those conditions are not met, use Spearman's rank instead.

Interpreting rr: suppose a study of light intensity and the number of a sun-loving plant returns r=+0.92r = +0.92. You should report both features of the value:

  • Sign: positive — abundance rises as light intensity rises.
  • Strength: very close to +1+1 — a strong, near-linear relationship.

A value such as r=0.15r = -0.15 would instead show a weak negative relationship that is barely a trend at all.

Interpreting the result honestly

A high correlation coefficient shows that two variables change together — it does not by itself prove that one causes the other. Both might be driven by a third factor. To claim that an abiotic or biotic factor controls a species' distribution, you need a biological mechanism to explain the link as well as the statistical correlation.

Key Equations

Simpson's index of diversity:

D=1((nN)2)D = 1 - \left( \sum \left( \frac{n}{N} \right)^2 \right)

where nn = number of individuals of one species, NN = total number of individuals of all species.

Spearman's rank correlation coefficient:

rs=16D2n3nr_s = 1 - \frac{6 \sum D^2}{n^3 - n}

where DD = difference in rank for each pair, nn = number of pairs.

Pearson's linear correlation coefficient:

r=(xxˉ)(yyˉ)(xxˉ)2  (yyˉ)2r = \frac{\sum (x - \bar{x})(y - \bar{y})}{\sqrt{\sum (x - \bar{x})^2 \; \sum (y - \bar{y})^2}}

where xx, yy are paired values and xˉ\bar{x}, yˉ\bar{y} their means. Both rr and rsr_s lie between 1-1 and +1+1. All these formulae are supplied in the exam.

Common Mistakes to Avoid

  • Rounding part-way through a Simpson's index calculation. Squaring small fractions magnifies rounding errors. Keep full decimals (or stored values) for every intermediate step and round only the final DD to three significant figures.
  • Forgetting the final "11 -" step in Simpson's index. The squared proportions add up to a number that increases with dominance; you must subtract it from 1 so that a higher DD means higher diversity.
  • Mixing up nn and NN. Lower-case nn is the count for one species; capital NN is the total for all species. Using the wrong one in nN\tfrac{n}{N} gives a meaningless answer.
  • Treating species richness as the whole story. Two habitats can have the same number of species but very different diversity; DD rewards an even spread of individuals as well as richness.
  • Picking the wrong correlation test. Use Pearson only when both variables are normally distributed and the relationship looks linear; otherwise use Spearman's rank — and state your reason if asked.
  • Mishandling tied ranks in Spearman's. Tied values must each take the mean of the ranks they share, not be skipped or given the same whole-number rank twice.
  • Claiming correlation proves causation. A significant correlation shows variables change together; always back up any causal claim with a biological explanation.
  • Confusing the two conservation bodies. One organisation assesses how threatened species are and places them on a Red List of threatened species; a separate international agreement is what actually regulates and restricts trade in those species. Do not write that the assessing body bans trade.
  • Muddling Domains with Kingdoms. The three Domains are Bacteria, Archaea and Eukarya. Groups such as Prokaryota, Protoctista, Fungi or Plantae are Kingdoms, not Domains — do not list them at the Domain level.

Exam Tips

  • The formulae for Simpson's index, Spearman's rank and Pearson's are given in the exam — your marks come from correct substitution and clear interpretation, so practise these rather than memorising the formulae.
  • Show your working line by line (for Simpson's: NN, the sum of squared proportions, then DD; for Spearman's: the rank table, D2\sum D^2, then rsr_s). Method marks are available even if the final figure is slightly off.
  • Quote the final answer to three significant figures with no units — both DD and the correlation coefficients are dimensionless.
  • When interpreting a correlation coefficient, comment on both its sign (positive or negative) and its strength (how close to ±1\pm 1).
  • Watch the symbol clash: DD and nn mean different things in Simpson's and Spearman's formulae, so check which formula you are in before substituting.
  • For data-handling questions, link the statistic back to the biology — e.g. a high DD suggests a stable community, or a strong correlation suggests an abiotic/biotic factor may influence a species' distribution.
  • If asked which test to use, name the test and justify it from the type and distribution of the data.

Test Your Knowledge

Practice with 18 flashcards covering Biodiversity.

Study Flashcards

Frequently Asked Questions: Biodiversity

What is Biodiversity in A-Level Biology?

Biodiversity: the variety of living organisms in an area, which at the species level depends on both the number of species present and how evenly individuals are spread among them.

What is Species richness in A-Level Biology?

Species richness: the number of different species present in a habitat.

What is Species evenness in A-Level Biology?

Species evenness: how similar the population sizes of the different species in a habitat are to one another.

What is Simpson's index of diversity in A-Level Biology?

Simpson's index of diversity: a measure of biodiversity calculated from the proportion of individuals in each species, giving values from 0 to 1, where higher values indicate greater diversity.

What is Correlation in A-Level Biology?

Correlation: a statistical relationship between two variables, in which a change in one is associated with a change in the other.

What is Spearman's rank correlation coefficient in A-Level Biology?

Spearman's rank correlation coefficient: a value between minus 1 and plus 1 that measures the strength and direction of a relationship between two variables after their values have been ranked.

What is Pearson's linear correlation coefficient in A-Level Biology?

Pearson's linear correlation coefficient: a value between minus 1 and plus 1 that measures the strength and direction of a straight-line relationship between two variables that are normally distributed.

What is Biotic factor in A-Level Biology?

Biotic factor: a living component of an ecosystem, such as predation, competition or food availability, that affects the distribution and abundance of a species.