race/history/evolution notes: Fine-scale genetic substructure in Finland and Sweden

Compared to the recent Europe-wide genetic structure paper, this paper contains more (and better-characterized with respect to geography) samples from Finland and Sweden, but typed at fewer loci. The authors detect an east-west duality in Finland. They fail to detect substructure within Sweden, though poorer-quality data or the presence of non-European immigrants in their Swedish sample may be confusing the issue. Nonetheless:

The principal component analysis clearly separated the Finnish regions and Eastern and Western counties from the Swedish as well as the Finnish regions and counties from each other (Figure 2C and 2D). Geneland showed three clusters (Figure 3B), roughly corresponding to Sweden, Eastern Finland and Western Finland. Thus, Geneland was able to correctly identify the country of origin of the individuals despite the lower quality of the Swedish data. Interestingly, the county-level PCA (Figure 2D) and Geneland (Figure 3B) placed the Finnish subpopulation of Swedish-speaking Ostrobothnia closest to Sweden. This minority population originates from the 13th century, when Swedish settlers inhabited areas of coastal Finland [34]. Our result is in congruence with earlier studies where intermediate allele frequencies between Finns and Swedes have been observed in the Swedish speaking Finns [35].

Geneland is an algorithm which "in contrast with Structure, assumes that population membership is structured across space":

If this assumption is correct, the power of inferring clusters increases; if the assumption is incorrect, it will lead to a loss of power but generally not to inference of spurious clusters (in the case of weak spatial organization, Geneland tends to perform like Structure in terms of inferred clusters [27]). Besides, in previous studies with similar goals it has been estimated that Structure needs a minimum of 65 to 100 random markers to separate continental groups and that the number of markers rather than samples is the most important parameter determining statistical power [13, 37]. The differences between and within the neighbouring countries studied here are presumably smaller than those between continents and not large enough to be detected by Structure.

The detection of three clusters by Geneland versus one single cluster by Structure can thus be interpreted as an example of increased power in spatially structured populations.

[. . .]

Our results from the Geneland algorithm demonstrate the benefit of including spatial information in clustering individuals according to their genetic similarity, particularly at low levels of differentiation. Although Geneland has successfully clustered individuals into groups with low or moderate FST in ecological studies [44-46], to the best of our knowledge, this is the first time the algorithm has been used for human or SNP data.

The abstract:

Population substructure in Finland and Sweden revealed by the use of spatial coordinates and a small number of unlinked autosomal SNPs

Ulf Hannelius, Elina Salmela, Tuuli Lappalainen, Gilles Guillot, Cecilia M Lindgren, Ulrika von Dobeln, Paivi Lahermo and Juha Kere

BMC Genetics 2008, 9:54doi:10.1186/1471-2156-9-54
Published: 19 August 2008

Abstract (provisional)

Background
Despite several thousands of years of close contacts, there are genetic differences between the neighbouring countries of Finland and Sweden. Within Finland, signs of an east-west duality have been observed, whereas the population structure within Sweden has been suggested to be more subtle. With a fine-scale substructure like this, inferring the cluster membership of individuals requires a large number of markers. However, some studies have suggested that this number could be reduced if the individual spatial coordinates are taken into account in the analysis.

Results
We genotyped 34 unlinked autosomal single nucleotide polymorphisms (SNPs), originally designed for zygosity testing, from 2044 samples from Sweden and 657 samples from Finland, and 30 short tandem repeats (STRs) from 465 Finnish samples. We saw significant population structure within Finland but not between the countries or within Sweden, and isolation by distance within Finland and between the countries. In Sweden, we found a deficit of heterozygotes that we could explain by simulation studies to be due to both a small non-random genotyping error and hidden substructure caused by immigration. Geneland, a model-based Bayesian clustering algorithm, clustered the individuals into groups that corresponded to Sweden and Eastern and Western Finland when spatial coordinates were used, whereas in the absence of spatial information, only one cluster was inferred.

Conclusions
We show that the power to cluster individuals based on their genetic similarity is increased when including information about the spatial coordinates. We also demonstrate the importance of estimating the size and effect of genotyping error in population genetics in order to strengthen the validity of the results.

3 comments:

Anonymous said...: Yeah, only 34 SNPs...not sure how helpful that is.

Obviously, it doesn't register any geographic substructure within Sweden that makes sense...however, it's interesting that there are Swedish provinces much closer to Finns than Uppsala, which was the province sampled in the Kayser 2008 study.; August 20, 2008 at 8:17 AM
n/a said...: A failure to detect structure in the combined Finnish/Swedish sample would suggest insufficient power. One can't blame poor resolution for the structure which was detected.

I can see three major factors affecting the power of studies of this sort:

(1) The number of markers tested and the accuracy of the testing.
(2) The numbers of samples.
(3) The suitability of the samples for answering the question under investigation / the provenance of the samples (including information on ethnic and geographic background).

Factor (1) will take care of itself. Genotyping will become increasingly cheap and easy in the future. At the same time, it will become increasingly difficult to find large numbers of people who trace all their ancestry to particular locales. So I'd be happy to see researchers focus on (3) and (2) for now, even if some of the conclusions drawn are tentative. Hopefully, the samples will by genotyped more fully in the future.; August 20, 2008 at 9:38 PM
Anonymous said...: From the orginal version of the study, I though this might be better since the new edition is flawed with semantic gimmics.

"Clear East-West duality was observed when when the Finnish individuals were clustering using Geneland. Individuals from the Swedish-speaking part of Ostrobotnia clustered with Sweden when a joint analysis was performed on Swedish and Finnish autosomal genotypes".; August 25, 2008 at 5:20 PM