-
How can you get estimates on contemporary and ancient events?
- by estimating population parameters for different genetic markers
- Some processes are recurrent at equilibrium and other events only occur once
- recall DNA sequence variation captures the long-term evolutionary history of the population
-
If FST ≤ 0.05, is differentiation negligible?
As Wright points out: Differentiation is by no means negligible if FST is ≤ 0.05
-
What are the test statistics for geographic structure?
-
For sequenced-based stats, it is best to use when your data has:
mutation rate and n
High mutation rate and small n
-
For haplotype-based stats, it is best to use when your data has:
mutation rate and n
Low mutation rate and large n
-
How can you interpret FST when:
0.00 to 0.05
0.05 to 0.15
0.15 to 0.25
> 0.25
- 0.00 to 0.05 = little genetic differentiation
- 0.05 to 0.15 = moderate genetic differentiation
- 0.15 to 0.25 = great genetic differentiation
- > 0.25 = very great genetic differentiation
-
Heterozygosity
haplotype diversity for haploid data
-
Which two tests are analogous to interpretation to the haplotype based statistic GST?
- FST frequency-based test statistic for haplotype diversity
- KST tests for diversity in populations based on sequence based statistics
-
How can you increase the power of Hudson's test to detect subdivision?
Hudson's test (KST) can have increased power in the presence of recombination
-
How can the presence of recombination increase the power of KST to detect subdivision?
- increased haplotype diversity in the total population
- recombination giving rise to unique recombination blocks that distinguish the different subpopulations (clades)
-
Give the strengths and weaknesses for the following test statistic for geographic structure:
Χ2
- Strengths: Useful in almost all cases, esp n > 50. Can have unequal sample sizes
- Weaknesses: poor at high mutation rates and small sample sizes
-
Give the strengths and weaknesses for the following test statistic for geographic structure:
HST
- Strengths: Useful with high haplotype diversity. Can have unequal sample sizes
- Weaknesses: Poor with small sample size and low diversity
-
Give the strengths and weaknesses for the following test statistic for geographic structure:
KS
- Strengths: Powerful with high mutation rates, small sample size, and recombination
- Weaknesses: must have equal sample sizes
-
Give the strengths and weaknesses for the following test statistic for geographic structure:
Z
- Strengths: Can have unequal sample sizes. Powerful with high mutation rates and small sample size
- Weaknesses: n/a
-
What are examples of different clustering methods?
- Agglomerated
- Gap Statistic
- K-means
- Calinski-Harabasz
-
Agglomerated
- (UPGMA – Unweighted Pair Group Method with Arithmetic mean)
- Hierarchical clustering method with a ‘bottom up’ approach; each observation starts in its own cluster, and pairs of cluster are merged as one move up the hierarchy.
- Uses a dissimilarity matrix based on the Euclidean distances.
-
Gap-Statistic
Compares within-cluster dispersion to the expected value under a null distribution
-
K-means
Aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean
-
Calinski-Harabasz
Variance ratio criterion
-
What is population stratification
grouping into classes, categories, or clusters
-
What is a problem when calculating population stratification, and how can you correct for it?
- Stratification may induce false positives
- This must be corrected for using the PCA and the EIGENSTRAT method to identify cryptic structure in the data
-
How can you use principal component analysis to correct for stratification in population data that may have cryptic structures?
y regressing out the top PCs (e.g., ev1) from the genotype data obtained using the EIGENSTRAT method, we can identify cryptic structure in the data (e.g., ev3).
- The CHD and JPT populations are so close to each other that they can be distinguished only on eigenvector 3

-
What is structure?
An alternative approach in determining the optimal number of genetic clusters
-
Why use structure?
- One of the most widely used population analysis tools that assesses patterns of genetic structure in a set of samples
- Identifies populations from the data and assigns individuals to that population representing the best fit for the variation patterns found.
-
Compatibility methods (Le Quesne):
- Evaluate compatibility of genetic data with different evolutionary scenarios.
- Non-genealogical (frequency-based)
-
Population parameter estimators (θ):
- Estimate mutation rates and effective population sizes
- Non-genealogical (frequency-based)
-
Neutrality tests (Tajima, Fu):
- Used to detect deviations from neutrality, indicating selection or demographic changes
- Non-genealogical (frequency-based)
-
Genetic Diversity measures (Nei):
- Assess genetic variation within a population
- Non-genealogical (frequency based)
-
Classical Wright’s FST statistics:
- Measures genetic differentiation among populations
- Non-genealogical (frequency-based)

-
Population subdivision (Hudson):
- Examines how populations are divided and structured
- Non-genealogical (frequency based)
|
|