The principal component analysis clearly separated the Finnish regions and Eastern and Western counties from the Swedish as well as the Finnish regions and counties from each other (Figure 2C and 2D). Geneland showed three clusters (Figure 3B), roughly corresponding to Sweden, Eastern Finland and Western Finland. Thus, Geneland was able to correctly identify the country of origin of the individuals despite the lower quality of the Swedish data. Interestingly, the county-level PCA (Figure 2D) and Geneland (Figure 3B) placed the Finnish subpopulation of Swedish-speaking Ostrobothnia closest to Sweden. This minority population originates from the 13th century, when Swedish settlers inhabited areas of coastal Finland . Our result is in congruence with earlier studies where intermediate allele frequencies between Finns and Swedes have been observed in the Swedish speaking Finns .
If this assumption is correct, the power of inferring clusters increases; if the assumption is incorrect, it will lead to a loss of power but generally not to inference of spurious clusters (in the case of weak spatial organization, Geneland tends to perform like Structure in terms of inferred clusters ). Besides, in previous studies with similar goals it has been estimated that Structure needs a minimum of 65 to 100 random markers to separate continental groups and that the number of markers rather than samples is the most important parameter determining statistical power [13, 37]. The differences between and within the neighbouring countries studied here are presumably smaller than those between continents and not large enough to be detected by Structure.The abstract:
The detection of three clusters by Geneland versus one single cluster by Structure can thus be interpreted as an example of increased power in spatially structured populations.
[. . .]
Our results from the Geneland algorithm demonstrate the benefit of including spatial information in clustering individuals according to their genetic similarity, particularly at low levels of differentiation. Although Geneland has successfully clustered individuals into groups with low or moderate FST in ecological studies [44-46], to the best of our knowledge, this is the first time the algorithm has been used for human or SNP data.
Population substructure in Finland and Sweden revealed by the use of spatial coordinates and a small number of unlinked autosomal SNPs
Ulf Hannelius, Elina Salmela, Tuuli Lappalainen, Gilles Guillot, Cecilia M Lindgren, Ulrika von Dobeln, Paivi Lahermo and Juha Kere
BMC Genetics 2008, 9:54doi:10.1186/1471-2156-9-54
Published: 19 August 2008
Despite several thousands of years of close contacts, there are genetic differences between the neighbouring countries of Finland and Sweden. Within Finland, signs of an east-west duality have been observed, whereas the population structure within Sweden has been suggested to be more subtle. With a fine-scale substructure like this, inferring the cluster membership of individuals requires a large number of markers. However, some studies have suggested that this number could be reduced if the individual spatial coordinates are taken into account in the analysis.
We genotyped 34 unlinked autosomal single nucleotide polymorphisms (SNPs), originally designed for zygosity testing, from 2044 samples from Sweden and 657 samples from Finland, and 30 short tandem repeats (STRs) from 465 Finnish samples. We saw significant population structure within Finland but not between the countries or within Sweden, and isolation by distance within Finland and between the countries. In Sweden, we found a deficit of heterozygotes that we could explain by simulation studies to be due to both a small non-random genotyping error and hidden substructure caused by immigration. Geneland, a model-based Bayesian clustering algorithm, clustered the individuals into groups that corresponded to Sweden and Eastern and Western Finland when spatial coordinates were used, whereas in the absence of spatial information, only one cluster was inferred.
We show that the power to cluster individuals based on their genetic similarity is increased when including information about the spatial coordinates. We also demonstrate the importance of estimating the size and effect of genotyping error in population genetics in order to strengthen the validity of the results.