1. Overview
Evolution is the gradual change in the allele frequencies of a population's gene pool from one generation to the next.
It works because natural selection acts on the variation created by random mutation: alleles that give a selective advantage become more common, while disadvantageous alleles become rarer. Over very long timescales these accumulating genetic changes can lead to speciation - the formation of new species from pre-existing ones.
This topic covers three things:
- how allele frequencies change to drive evolution;
- how comparing DNA and protein sequences measures how closely related species are;
- the two main routes to speciation - allopatric (with a geographical barrier) and sympatric (within the same area).
Key Definitions
- Evolution: the change in the allele frequencies (the gene pool) of a population over many generations, which can lead to the formation of new species.
- Gene pool: all of the alleles of all of the genes present in a population at a given time.
- Allele frequency: the proportion of all the alleles of a gene in a population's gene pool that are one particular allele.
- Species: a group of organisms with similar features that can interbreed to produce fertile offspring, and that are reproductively isolated from other such groups.
- Speciation: the formation of one or more new species from a pre-existing species, occurring when populations become reproductively isolated.
- Reproductive isolation: any barrier that prevents two populations from interbreeding to produce fertile offspring, so that genes can no longer be exchanged between them.
- Allopatric speciation: speciation that occurs when populations are separated by a geographical barrier.
- Sympatric speciation: speciation that occurs without a geographical barrier, when populations in the same area become reproductively isolated by ecological, behavioural or temporal differences.
- Molecular clock: the use of the steady accumulation of DNA or protein sequence differences over time to estimate how long ago two species shared a common ancestor.
Content
The theory of evolution
A population is a group of organisms of the same species living in the same place at the same time and able to interbreed. The gene pool is the complete set of alleles in that population. Evolution is a change in the allele frequencies of this gene pool over many generations. No single individual evolves; it is the population that changes as the proportions of its alleles shift through time.
The process depends on three linked ideas:
- Variation arises from random mutation. Mutation creates new alleles and is the ultimate source of all genetic variation. Crucially, mutations occur at random - the environment does not direct which mutations appear. A selection pressure simply determines which existing alleles happen to be advantageous.
- Natural selection acts on this variation. A selection pressure (such as a predator, disease, competition or a limited food supply) means that individuals with advantageous alleles are more likely to survive, reproduce and pass those alleles on. Those with disadvantageous alleles are less likely to do so.
- Allele frequencies change over generations. Advantageous alleles become more frequent in the gene pool and disadvantageous alleles become less frequent. Over a long time this can change the characteristics of the whole population.
If these changes continue until two populations can no longer interbreed to produce fertile offspring, the populations have become separate species. In this way new species form from pre-existing species - the central claim of the theory of evolution.
Using DNA sequence data to show evolutionary relationships
The key idea: two species that share a recent common ancestor have fewer differences in their DNA base sequences than two species whose common ancestor is more distant. This is because, after two lineages diverge, each accumulates mutations independently over time. So counting these differences measures how closely two species are related.
How the comparison is carried out: the same gene (or region) is taken from each species, the two sequences are aligned, and the differences are then counted position by position (base by base, or codon by codon for amino acids). The total number of differences is the measure used:
- Few differences in the DNA (or protein) sequence indicate a close relationship and a recent common ancestor.
- Many differences indicate a distant relationship and a more ancient common ancestor.
Which molecular data can be compared?
| Type of data | What it is | Why it works |
|---|---|---|
| DNA base sequences | The order of bases in a particular gene | Aligned and compared directly, base by base |
| mRNA base sequences | The transcript of a gene | Reflects the gene's base sequence, so compared the same way |
| Amino acid sequences | The order of amino acids in a protein (e.g. cytochrome c, haemoglobin) | The gene codes for the protein, so fewer amino acid differences mean a closer relationship |
Note: mRNA and amino acid comparisons are not separate "rival" evidence - they ultimately reflect the underlying DNA differences, because the gene determines both.
Two further points to learn:
- The molecular clock. Because mutations accumulate at a roughly steady average rate, the number of differences can act as a molecular clock to estimate how long ago two species diverged.
- Why molecular evidence is powerful. It is more objective and quantitative than comparing visible features alone, and it can reveal relationships that body structure hides - for example, showing that two similar-looking species are only distantly related, or that two very different-looking species are close relatives.
This data is used to build phylogenetic trees that show patterns of common ancestry.
How speciation occurs
Speciation requires reproductive isolation: a barrier that stops two populations interbreeding, so that gene flow (the exchange of alleles) between them stops. Once isolated, each population accumulates different mutations, and natural selection and genetic drift act on them differently because they face different conditions. Their gene pools diverge until, even if the populations met again, they could no longer interbreed to produce fertile offspring - they have become separate species.
The logic chain is always the same:
isolation → gene flow stops → gene pools change independently (mutation + selection) → reproductive isolation → new species
The two routes differ only in how the isolation begins.
Allopatric speciation (geographical separation)
Allo- means "other place". A geographical barrier - such as a new river, a mountain range, an area of unsuitable habitat, or the sea separating an island from the mainland - physically splits one population into two.
- These remain populations of the same species; they are not yet different species.
- Because gene flow is prevented, the two gene pools change independently: different mutations arise, and the populations may face different selection pressures (climate, food, predators).
- Over many generations the gene pools diverge so far that the populations become reproductively isolated. Speciation is now complete, and even if the barrier is removed they cannot interbreed successfully.
Sympatric speciation (ecological and behavioural separation)
Sym- means "together". Here new species arise within the same geographical area, with no physical barrier. Reproductive isolation develops through differences that stop two groups breeding together even though they live alongside one another. Isolating mechanisms include:
- Behavioural isolation: differences in courtship behaviour, mating calls or display, so individuals only respond to and mate with members of their own group.
- Temporal (seasonal) isolation: groups breed or flower at different times of year or day, so their breeding periods do not overlap.
- Ecological isolation: groups occupy different habitats or niches within the same area, so they rarely meet to mate.
- Mechanical or genetic incompatibility: differences (for example a change in chromosome number) that prevent successful mating or the production of fertile offspring.
Once any of these mechanisms stops interbreeding, the two gene pools diverge independently and, given enough time, become separate species.
Worked example
Exam-style question: A species of insect lives on a single large grassland. Some individuals begin to feed and breed only on a newly arrived plant, while the rest continue to feed and breed on the original plant. Over many generations the two groups can no longer interbreed. Name the type of speciation taking place and explain how reproductive isolation could have led to the formation of two species. [5]
Model answer:
- This is sympatric speciation, because the two groups occur in the same geographical area with no geographical barrier between them.
- The groups become ecologically and behaviourally isolated because they feed and breed on different plants, so members of one group rarely mate with members of the other and gene flow stops.
- With no gene flow, the two gene pools change independently: different random mutations arise in each, and natural selection acts differently on each plant.
- The allele frequencies of the two gene pools therefore diverge over many generations.
- Eventually the differences are so great that the two groups are reproductively isolated - they can no longer interbreed to produce fertile offspring - so they are now separate species.
Worked example
Exam-style question: The table shows the number of amino acid differences in the protein cytochrome c when it is compared between four species, W, X, Y and Z.
| Pair compared | Amino acid differences |
|---|---|
| W and X | 3 |
| W and Y | 14 |
| W and Z | 15 |
| X and Y | 13 |
| X and Z | 14 |
| Y and Z | 4 |
(a) State which two species are the most closely related and which two are the most distantly related. (b) Explain how the number of differences indicates the relationships. (c) State which pair shares the most recent common ancestor. [5]
Model answer:
- (a) The most closely related pair is W and X (only 3 differences); the most distantly related pair is W and Z (15 differences).
- (b) After two species diverge from a common ancestor, each lineage accumulates random mutations independently over time.
- So fewer amino acid differences mean less time has passed since the two species shared a common ancestor, indicating a closer relationship; more differences mean a more distant relationship.
- (c) W and X share the most recent common ancestor, because they have the fewest sequence differences of any pair.
- Note: "state" needs no justification, but the mark is only awarded if the named pair matches the data. Parts (a) and (c) must agree - the most closely related pair always shares the most recent common ancestor, so both answers should be W and X.
Key Equations
This topic is largely qualitative. Allele frequencies themselves can be written numerically as a proportion of the gene pool:
Quantitative work on allele frequencies - including the Hardy-Weinberg principle - is covered in the population-genetics section of this unit.
Common Mistakes to Avoid
- Calling the separated populations "different species" too soon. Straight after a barrier appears they are still populations of the same species; only call them different species once they are reproductively isolated and can no longer interbreed to produce fertile offspring.
- Naming the wrong type of speciation. If the question says (or implies) there is no geographical barrier, the speciation is sympatric, caused by behavioural, temporal or ecological isolation - do not default to allopatric speciation.
- Saying the environment "causes" the right mutations. Mutations are random; selection pressures do not create them. State clearly that selection only determines which existing alleles are advantageous.
- Confusing low genetic variation with a single rare allele. Low variation means there are few different alleles at a gene locus in the gene pool, regardless of how common each one is - not simply that one allele has a low frequency.
- Saying individuals evolve. Individuals do not evolve; it is the population's gene pool (its allele frequencies) that changes over generations.
- Treating molecular differences the wrong way round. More sequence differences mean a more distant relationship; fewer differences mean a closer, more recent common ancestor.
- Saying two species "evolved from each other". Related species do not evolve from one another - they both diverged from a shared common ancestor. Always phrase relationships in terms of a common ancestor, not a direct line from one living species to another.
- Treating protein evidence as separate from DNA evidence. Differences in mRNA or amino acid sequences ultimately reflect differences in the DNA, because the gene codes for the protein. Do not present protein data as if it could contradict the DNA evidence.
Exam Tips
- Build every speciation answer around the same logic chain: isolation → gene flow stops → independent change in gene pools (mutation + selection) → reproductive isolation → new species. Stating each step earns the marks.
- Always finish a speciation explanation with "can no longer interbreed to produce fertile offspring" - this is the definition of separate species and is a key marking point.
- When asked to compare allopatric and sympatric speciation, lead with the key distinction: allopatric has a geographical barrier, sympatric does not.
- For DNA-comparison questions, link the number of base (or amino acid) differences to how recently species shared a common ancestor, and remember to explain why - independent accumulation of mutations after lineages diverge.
- Use precise terms: gene pool, allele frequency, reproductive isolation and gene flow score better than vague phrases like "genes change".
- If you do any allele-frequency calculation, keep intermediate values in your calculator and round only the final answer to the required number of significant figures.