Northern European population structure

As mentioned elsewhere by Tuuli Lappalainen, a new analysis of Northern European genome-wide SNP data is available today at PLoS One ("Genome-Wide Analysis of Single Nucleotide Polymorphisms Uncovers Population Structure in Northern Europe"). The results are broadly consistent with those of previous studies. British, German, and American samples are most similar to each other. Eastern Finns are most distinctive.
After genotyping on Affymetrix 250K Sty SNP arrays (see Methods and Table S1 for success rates and quality criteria), the data from 1003 European individuals were first compared without prior population assignment in the analyses of pairwise identities by state (IBS) and calculations with the Structure software. In multidimensional scaling of the IBS distances, there were four clusters: Eastern Finns, Western Finns, Swedes, and a group including the Germans, British and CEU (from now on called ”Central Europeans„; Fig. 2a,b, Fig. S1a). [. . .] The Structure analysis (Fig. 3, Fig. S2a,b) found most support for three or four clusters, one dominated by the Eastern Finns, one by the Swedes, and one by the Central Europeans; increasing the number of clusters did not bring out further differences. When only the Finnish samples were analysed with Structure, they formed two clusters (Fig. S2c), consisting of the Eastern and Western Finns, with only 1.8% of the samples associating more strongly to the cluster not corresponding to their geographic origin (data not shown). A Structure analysis of the three Central European populations combined found only one cluster.

[. . .]

Quantile-quantile plots of pairwise allele frequency differences (Fig. 5) and FST calculations (Table 1) showed a pattern of the largest differences being between Eastern Finland versus Great Britain, Germany and Sweden (FST = 0.0072–0.0094) and the smallest between the British and Germans (FST = 0.0005).
Incidentally, a different study reports an FST of 0.00054 between samples from north-east and southern Germany [1], suggesting the British and north Germans may be slightly more similar than some German subpopulations are to each other (although the data may not be strictly comparable).

The new paper touches on a couple items I've mentioned recently:
The differences between populations detected with FST and other measures accounted for such a small proportion of the total genetic variation that large numbers of SNPs are needed to observe them, once again illustrating how most of the human genetic variation is found between individuals instead of populations [39]. Even small differences between populations can be interesting regarding population history, but elucidating their phenotypic significance will require further studies.

The MDS plot of the European populations showed a pattern of population differences that was consistent with our other analyses and earlier observations of a greater degree of differentiation in the geographical extremes of Europe [3], [5], [7], [9]–[11]. Our German, British and CEU samples formed a single cluster, possibly due to a lack of neighbouring reference populations, and contrary to studies with a more comprehensive sampling from Central Europe [7], [9]. The Swedes showed a wider spreading than the other populations, but this was supported neither by diversity calculations nor by a more detailed comparison of the IBS and MDS distance matrices (results not shown). Thus, the differential spread was at least partly an artefact of the MDS, where the representation in a few dimensions likely fails to capture all aspects of complex data. Thus, as visually attractive as the MDS plots are, they must be interpreted with caution and, if sample sizes allow, be accompanied with analyses based on allele frequencies.
In fact, the differences between Eastern and Western Finns were of the same magnitude as the differences between Swedes and British, and much stronger than those between British and Germans. Thus, relevant units of genetic variation often do not correspond to preconceived political, linguistic or even cultural borders.

[1] Steffens et al. SNP-Based Analysis of Genetic Substructure in the German Population. Hum Hered. 2006;62(1):20-9. Epub 2006 Sep 21.


Anonymous said...

Incidentally, a different study reports an FST of 0.00054 between samples from north-east and southern Germany [1], suggesting the British and north Germans may be slightly more similar than some German subpopulations are to each other (although the data may not be strictly comparable).

There was another northern German population in the Steffens paper, from Schleswig-Holstein (northwestern Bundesland that borders Denmark). The FST measure between the southern sample (from Augsburg, Bavaria) and that from Schleswig-Holstein was 0.00017. (Between the 2 northern populations, FST was 0.000013).

Anonymous said...

Correction: between the northern populations, the FST estimate is -0.000013 (missed the negative sign). So, there is essentially no difference detectable using their combined marker set (combines 1 intragenic and 2 intergenic marker sets). Looking at Table 3, I notice that the 2 intergenic marker sets show very low levels of differentiation (FST 0.00009 and 0.00008) whereas the estimate with the intragenic marker set (from protein-coding genes, nonsynonymous or 'effective promoter alteration') is negative (-0.00017). The FST estimate estimate for southern vs. northern populations is also lower using the intragenic SNPs. For Augsburg/Schleswig-Hostein, the estimate is 0.00003 with a standard error of 0.00014. Each marker set only had ~70 SNPs. Using a larger commercial chip would improve these estimates.

It will be nice when large samples from European populations are publicly available (preferably from villages or small towns). Some of the larger studies with European populations unfortunately don't provide FST tables (I also had no luck in getting one when contacting the authors).