Northern European population structure

As mentioned elsewhere by Tuuli Lappalainen, a new analysis of Northern European genome-wide SNP data is available today at PLoS One ("Genome-Wide Analysis of Single Nucleotide Polymorphisms Uncovers Population Structure in Northern Europe"). The results are broadly consistent with those of previous studies. British, German, and American samples are most similar to each other. Eastern Finns are most distinctive.
After genotyping on Affymetrix 250K Sty SNP arrays (see Methods and Table S1 for success rates and quality criteria), the data from 1003 European individuals were first compared without prior population assignment in the analyses of pairwise identities by state (IBS) and calculations with the Structure software. In multidimensional scaling of the IBS distances, there were four clusters: Eastern Finns, Western Finns, Swedes, and a group including the Germans, British and CEU (from now on called ”Central Europeans„; Fig. 2a,b, Fig. S1a). [. . .] The Structure analysis (Fig. 3, Fig. S2a,b) found most support for three or four clusters, one dominated by the Eastern Finns, one by the Swedes, and one by the Central Europeans; increasing the number of clusters did not bring out further differences. When only the Finnish samples were analysed with Structure, they formed two clusters (Fig. S2c), consisting of the Eastern and Western Finns, with only 1.8% of the samples associating more strongly to the cluster not corresponding to their geographic origin (data not shown). A Structure analysis of the three Central European populations combined found only one cluster.

[. . .]

Quantile-quantile plots of pairwise allele frequency differences (Fig. 5) and FST calculations (Table 1) showed a pattern of the largest differences being between Eastern Finland versus Great Britain, Germany and Sweden (FST = 0.0072–0.0094) and the smallest between the British and Germans (FST = 0.0005).
Incidentally, a different study reports an FST of 0.00054 between samples from north-east and southern Germany [1], suggesting the British and north Germans may be slightly more similar than some German subpopulations are to each other (although the data may not be strictly comparable).

The new paper touches on a couple items I've mentioned recently:
The differences between populations detected with FST and other measures accounted for such a small proportion of the total genetic variation that large numbers of SNPs are needed to observe them, once again illustrating how most of the human genetic variation is found between individuals instead of populations [39]. Even small differences between populations can be interesting regarding population history, but elucidating their phenotypic significance will require further studies.

The MDS plot of the European populations showed a pattern of population differences that was consistent with our other analyses and earlier observations of a greater degree of differentiation in the geographical extremes of Europe [3], [5], [7], [9]–[11]. Our German, British and CEU samples formed a single cluster, possibly due to a lack of neighbouring reference populations, and contrary to studies with a more comprehensive sampling from Central Europe [7], [9]. The Swedes showed a wider spreading than the other populations, but this was supported neither by diversity calculations nor by a more detailed comparison of the IBS and MDS distance matrices (results not shown). Thus, the differential spread was at least partly an artefact of the MDS, where the representation in a few dimensions likely fails to capture all aspects of complex data. Thus, as visually attractive as the MDS plots are, they must be interpreted with caution and, if sample sizes allow, be accompanied with analyses based on allele frequencies.
In fact, the differences between Eastern and Western Finns were of the same magnitude as the differences between Swedes and British, and much stronger than those between British and Germans. Thus, relevant units of genetic variation often do not correspond to preconceived political, linguistic or even cultural borders.

[1] Steffens et al. SNP-Based Analysis of Genetic Substructure in the German Population. Hum Hered. 2006;62(1):20-9. Epub 2006 Sep 21.

DNA analysis and small-scale admixture events

This paper ("Calculating expected DNA remnants from ancient founding events in human population genetics"; provisional PDF) concludes, based on a series of computer simulations:
while genetic data may be sensitive and powerful in large genetic studies, caution must be used when applying genetic information to small, recent admixture events. For some parameter sets, genetic data will not be adequate to detect historic admixture. In such cases, studies should consider anthropologic, archeological, and linguistic data where possible.
The authors point out that:
While genetic studies can provide considerable information, they are also accompanied by variation and stochasticity. Because of these limitations, even the most complete studies of human populations have been called “not unequivocal”[21] or “sobering”[22] by those conducting the research. Recent reports have also addressed the limited depth of current genetic studies[23], indicating that most studies make conclusions after sequencing less than 1% of subjects’ genomes, and sampling only small numbers of a population. Such methods can be especially problematic when dealing with historic admixture events that are very small. The difficulty is a function of the current architecture of genetic studies: researchers sample loci from a group of individuals and categorize individuals into groups based on which alleles they have at the loci tested[24, 25]. These categorizations are determined based on the most prevalent or probable genetic markers in an individual’s genome. The results of these studies, then, can overlook genetic markers that simply are not sampled, which is common in small admixture events. Additionally, stochastic events can lead to allele fixation and further complicate matters, particularly in small populations. It has been suggested that studies of even the largest migrations should couple genetic information with archeological, anthropological, and linguistic data[26].
The simulation results aren't too surprising:
The sizes of the migrant and native populations are fundamental for an understanding of expected allele frequency. With time since admixture as low as those we consider in our simulations, the most important factors are the sizes of the migrating and native populations. In our simulations, if the native population is large, changing the migrating population size results in a change of mean final allele frequency from .0243 to .0010. If the native population is small, those numbers change to .5016 and .0407. These are the most significant differences illustrated by our simulations and they attest to the important role of population sizes. Researchers should not expect to find many alleles from a small migratory group of 50 individuals in a large population today, even if sampling methods are exhaustive.

Sample size matters. Or, to kick a (hopefully) dead horse, why DNAprint is crap:
The average final allele frequency of the migrant allele in our population from the second simulation was 1.017%. We calculated the cumulative density function (CDF) for a genetic study that samples 50 loci for each individual and where the probability of detecting the migrant allele is equal to the probability found in our simulations. The CDF demonstrates that in 60% of individuals sequenced for 50 loci, we would not expect to find a single migrant allele (Figure 7a). Furthermore, we will only find more than one migrant allele in 9% of the subjects examined.

In the case of a large study with as many as 933 loci, based upon the expected migrant allele frequency of 1.017%, almost every subject would demonstrate at least one migrant allele (Figure 7b). In fact, most subjects would demonstrate more than 9 migrant alleles. However, while large studies would expect to succeed in finding more migrant alleles in today’s population, this alone cannot link the admixed population to the migrant population. The migrant alleles will still only represent, on average, 1% of every allele sequenced in the entire study. Therefore, although 9 migrant alleles may, on average, be found in each subject, it is hard to know if the migrant alleles will be redundant among loci and subjects or spread evenly throughout all the loci in the study. Additionally, these numbers could be considerably lower depending on the allele frequency in the migrating population.
As time increases, genetic drift causes the spread of final allele frequencies to increase, particularly when the population sizes are small. Thus, as the time since the admixture event increases, sample size for both loci and subjects becomes increasingly important.

In our second simulation, most of the migrant alleles are present in less than 2% of the population. In a study of a population where few subjects from many human populations are studied, alleles from a small-scale admixture will usually not be recovered at all. And these rare alleles could easily be ignored in favor of haplotypes that better categorize the population into clusters.
The authors conclude:
DNA data have been touted as a panacea for recovering information about the past, but their use depends so extensively on factors that are beyond our control that its applicability is not always appropriate. It is imperative, therefore, that researchers understand the implications of the variables we have presented and not rely solely on DNA sequence data when researching small, recent human migrations.

We can only hope to understand basic details of population history when quantifying genetic data and even valid results derived from genetic data may still be misleading if viewed unilaterally, as demonstrated by Harpending et. al.[46, 47]

Our results, however, are not completely ominous. Carefully designed studies should be able to draw specific and valid conclusions from genetic data. One area for major improvement is the number of individuals and loci sampled. Our results indicate that a large sample size and large number of loci are needed to obtain robust results. Studies that are unable to sample sufficiently do not have the power to draw appropriate conclusions and should be interpreted with caution.

[. . .]

The random nature of admixed genetic data seen in these simulations demonstrates that the utility of genetic data is dependent on the context of each individual study. Increasing the number of loci and the number of individuals sampled will increase the probability of detecting small traces of signal, but other sources of evidence should always be considered where possible.

DRD2*A1 and obesity

Frequencies of the DRD2*A1 allele are about twice as high in Africans and Asians as in Europeans.

The AP reports today:
WASHINGTON - Drink a milkshake and the pleasure center in your brain gets a hit of happy — unless you're overweight. It sounds counterintuitive. But scientists who watched young women savor milkshakes inside a brain scanner concluded that when the brain doesn't sense enough gratification from food, people may overeat to compensate.
[. . .]
A healthy diet and plenty of exercise are the main factors in whether someone is overweight. But scientists have long known that genetics also play a major role in obesity — and one big culprit is thought to be dopamine, the brain chemical that's key to sensing pleasure.
[. . .]
Yet that brain region was far less active in overweight people than in lean people, and in those who carry that A1 gene variant, the researchers reported. Moreover, women with that gene version were more likely to gain weight over the coming year.

It's a small study with few gene carriers, and thus must be verified, Volkow stressed.

Still, it could have important implications. Volkow, who heads NIH's National Institute of Drug Abuse, notes that "dopamine is not just about pleasure." It also plays a role in conditioning — dopamine levels affect drug addiction — and the ability to control impulses.

She wonders if instead of overeating to compensate for the lack of pleasure — Stice's conclusion — the study really might show that these people with malfunctioning dopamine in fact eat because they're impulsive.
The study covered in the above article:
Relation Between Obesity and Blunted Striatal Response to Food Is Moderated by TaqIA A1 Allele
E. Stice, S. Spoor, C. Bohon, and D. M. Small (17 October 2008)
Science 322 (5900), 449. [DOI: 10.1126/science.1161550]
Individuals whose reward centers of the brain respond sluggishly after eating prefer calorie-dense foods, which may account for their greater propensity to gain weight.
Podcast interview with Eric Stice.

Alcoholism, genetics, and race

Part-time GNXP houseboy / full-time mestizo failfuck "birch barlow" attributes his problems with alcohol and crack cocaine to "Anglo-Celtic" genes, while he raves about the quality genetics of small brown Asian women. It seems the self-proclaimed "cogelite" never bothered to actually check what genetic research suggests about relative group propensities for addictive behavior. So we're going to do it for him.

SNPedia lists several SNPs for which associations with Alcholism have been claimed, probably the most-studied of which is rs1800497:
rs1800497, a SNP also known as the TaqIA (or Taq1A) polymorphism of the dopamine D2 receptor DRD2 gene (even though it is actually located over 10,000bp downstream of the gene), gives rise to the DRD2*A1 allele. This allele (rs1800497(T)) is associated with a reduced number of dopamine binding sites in the brain [PMID 9672901], and has been postulated to play a role in alcoholism, smoking, and certain neuropsychiatric disorders.

The reduced number of dopamine binding sites may play a role in nicotine addiction by causing an "understimulated" state that can be relieved by smoking (and/or use of other drugs). [PMID 8873216]
The HapMap Phase III data indicates the following frequencies of the T/A (increased addiction risk) allele:
44% MEX (Mexican ancestry in Los Angeles, California)
44% CHB (Han Chinese in Beijing, China)
42% CHD (Chinese in Metroopolitan Denver, Colorado)
41% YRI (Yoruba in Ibadan, Nigeria)
40% ASW (African ancestry in Southwest USA)
40% JPT (Japanese in Tokyo, Japan)
28% GIH (Gujarati Indians in Houston, Texas)
21% CEU (Utah residents with Northern and Western European ancestry)
18% TSI (Toscans in Italy)

Not a good start for birch's imaginary mixed-race spawn. Let's look at another SNP:
rs1076560 is located in intron 6 of the dopamine receptor D2 gene.

In one study of Japanese males, rs1076560(A) alleles were 1.3 fold more associated with Alcoholism than the rs1076560(C) alleles. [PMID 17196743]

The DRD2 risk allele A was more prevalent in the alcoholic patients than in the healthy controls. These data identify rs1076560 as a potentially important variable in the development of alcoholism.
HapMap Phase III population frequencies for the A allele:
11% YRI (Yoruban)
12% TSI (Toscan)
14% ASW (Afram)
14% CEU (NW Euro)
27% GIH (Asian Indian)
36% MEX (Mexican)
40% JPT (Japanese)
45% CHB (Chinese)
46% CHD (Chinese)

Still not looking good. Next SNP:
The rs1799971(G) allele in exon 1 of the mu opiod receptor gene causes the normal amino acid at residue 40, asparagine, to be replaced by aspartic acid.

Carriers of at least one rs1799971(G) allele appear to have stronger cravings for alcohol than carriers of two rs1799971(A) alleles, and are thus hypothesized to be more at higher risk for alcoholism. [PMID 17207095]

Allele frequencies for rs1799971 (G):
4% ASW (Afram)
16% CEU (NW Euro)
18% TSI (Toscan)
20% MEX (Mexican)
33% CHB (Chinese)
42% GIH (Asian Indian)
46% JPT (Japanese)
47% CHD (Chinese)

By this point, birch should perhaps be thankful he's incapable of holding down a job long enough to buy that priced-to-sell Cambodian bride he has his eye on. Moving on:

Findings showed that SNP rs2232165 of the GHS-R1A gene was associated with heavy alcohol consumption (and therefore presumably alcohol dependence). SNP rs2948694 of the same gene as well as haplotypes of both the pro-ghrelin and the GHS-R1A genes were associated with an increased body mass in individuals consuming heavy amounts of alcohol.

Allele frequencies for rs2232165 (T):
8.3% YRI (Yoruban)
2.5% CEU (NW Euro)
0% CHB (Chinese)
0% JPT (Japanese)

Allele frequencies for rs2948694 (G):
33.8% JPT (Japanese)
32.4% CHB (Chinese)
9.6% YRI (Yoruban)
7.4% CEU (NW Euro)

But wait:
rs671 is a classic SNP, well known in a sense through the phenomena known as the "alcohol flush", also known as the "Asian Flush" or "Asian blush", in which certain individuals, often of Asian descent, have their face, neck and sometimes shoulders turn red after drinking alcohol.[PMID 6582480]

The rs671(A) allele of the ALDH2 gene is the culprit, in that it encodes a form of the aldehyde dehydrogenase 2 protein that is defective at metabolizing alcohol. This allele is known as the ALDH*2 form, and individuals possessing either one or two copies of it show alcohol-related sensitivity responses including facial flushing, and severe hangovers (and hence they are usually not regular drinkers). Perhaps not surprisingly they appear to suffer less from alcoholism and alcohol-related liver disease. [PMID 511165, PMID 16046871]
So much for "high executive function" saving Asians from the bottle. But only around a third of East Asians have genotypes associated with the flush reaction, and defective alcohol metabolism can't protect against opium (or cigarette) smoking. Unfortunately for birch:
"We have shown that Native Americans, who have a high rate of alcoholism, do not have these protective genes. The one that is particularly effective is a mutation of the gene for the enzyme aldehyde dehydrogenase, which plays a major role in metabolizing alcohol. The mutation is found very frequently in Chinese and Japanese populations but is less common among other Asian groups, including Koreans, the Malayo-Polynesian group, and others native to the Pacific Rim. "We've also looked at Euro-Americans, Native Americans, and Eskimos, and they don't have that gene mutation," says Li.
Amerindians also apparently have the world's highest frequencies of the DRD2 A1 allele, suggesting birch might be closer to the mark if he decided to blame his failure on his Amerindian ancestors.

"Genomics" and intra-European variation

Guessedworker claims:
As for any attempt to reify Germans over Slavs, Englishmen over Irishmen, Nordics over Alpines and Mediterreneans, those are screamingly obviously on the “Idealist” or “thought” side of the equation. But they are falsified by the genomic components on the “empirical” or “experience” side.
I assume GW's assertion reflects a misinterpretation of studies such as this one, which involve principal components analysis of European SNP genotype data. I understand GW to mean he somehow believes these studies indicate no two European populations differ genetically in any systematic way which could lend support to "intra-European supremacist" arguments. Naturally, GW is wrong.

Granting that "superior" and "inferior" are subjective judgments rather than scientific universals, essentially any demonstration of a population's genetic distinctiveness can be seen to support both preservationist and "supremacist" arguments. Studies (from Cavalli-Sforza's work on "classical" markers to the recent analyses of 500k+ SNP microarray datasets) have repeatedly demonstrated sub-European genetic distinctiveness (particularly along a N/S or NW/SE axis).

SNP/PCA studies can (and do) demonstrate distinctiveness. They can't (and haven't) proven the absence of intra-European differences in genes influencing IQ and personality, for example -- if for no other reason than that no one has so far been able to use SNP genotypes to explain much variation in phenotypes like IQ. (In addition, 2-dimensional PCA plots typically leave plenty of variation unaccounted for, so -- even if we limit ourselves to considering common SNP variants -- samples which have identical values on the first two PCs might turn out to vary in some important way.)

Even the largest commonly-used SNP microarrays capture only a small fraction of human genetic variation, and definitive answers on many issues will await complete sequencing of large numbers of genomes.

In the meantime:

Compared to southern Euros, NW Europeans are demonstrably "superior" at digesting lactose as adults (92% LP in Utah vs. 11% in S. Italy), and -- though this may shock GW -- have demonstrably higher frequencies of alleles associated with light pigmentation.

Recent and ongoing (and probably accelerating) human evolution is a reality. "Genomically", Southern Europeans are more similar to Ashkenazi Jews than to Northern Europeans. Strangely, AJs and SEs don't have identical average IQ scores and personalities, and one doubts the finding would lull a Jewish supremacist into calling for a merger between AJs and SEs. "Small" genetic differences may have large phenotypic effects.

Even if, say, the English and Sicilians sprang from identical pools of ancestors 15,000 or 10,000 or even 5,000 years ago (and the genetic evidence says this was not the case), there's been plenty of time for differences to accumulate and plenty of reason to believe they have. See, e.g., Gregory Clark (thanks TGGP for that particular link). I find it hard to imagine radical differences in culture between Eastern and Western Europe (or, to a lesser extent, between England and Ireland) haven't engendered (and/or been engendered by) some degree of genetic differentiation. Again, even if you could show large German and Polish samples plot identically on a 2-d PCA chart (they don't), you would not have demonstrated genetic identity between them.

Misc. links

Y DNA and surnames in Britain:
Dr King’s research showed that between two men who share the same surname there is a 24% chance of sharing a common ancestor through that name but that this increases to nearly 50% if the surname they have is rare.

The limits of mtDNA phylogeography: complex patterns of population history in a highly structured Iberian lizard are only revealed by the use of nuclear markers.

Admixture as the basis for genetic mapping.

Noah Webster's 250th birthday:
His dictionary, and earlier spellers and readers widely used in schools, would help a new nation achieve unity and cultural independence at a time when most were focused on political freedom.

"He was the shaper of our language and the shaper of American identity," said Joshua Kendall, who is working on a biography about Webster. "Webster at last bonded us through our language." [. . .]

Webster was later astounded when he heard all the languages spoken by the Continental Army.

"The language of the new nation was up for grabs," Kendall said. "Webster said we're going to speak American English."

[Wikipedia: Noah Webster was born on October 16, 1758, in the West Division of Hartford, Connecticut, to a family who had lived in Connecticut since colonial days. His father, Noah, Sr. (1722-1813), was a farmer and a sower. His father was a descendant of Connecticut Governor John Webster; his mother, Mercy (née Steele; d. 1794), was a descendant of Governor William Bradford of Plymouth Colony. Noah had two brothers, Abraham (1751-1831) and Charles (b. 1762), and two sisters, Mercy (1749-1820) and Jerusha (1756-1831).]