Northern European population structure

As mentioned elsewhere by Tuuli Lappalainen, a new analysis of Northern European genome-wide SNP data is available today at PLoS One ("Genome-Wide Analysis of Single Nucleotide Polymorphisms Uncovers Population Structure in Northern Europe"). The results are broadly consistent with those of previous studies. British, German, and American samples are most similar to each other. Eastern Finns are most distinctive.
After genotyping on Affymetrix 250K Sty SNP arrays (see Methods and Table S1 for success rates and quality criteria), the data from 1003 European individuals were first compared without prior population assignment in the analyses of pairwise identities by state (IBS) and calculations with the Structure software. In multidimensional scaling of the IBS distances, there were four clusters: Eastern Finns, Western Finns, Swedes, and a group including the Germans, British and CEU (from now on called ”Central Europeans„; Fig. 2a,b, Fig. S1a). [. . .] The Structure analysis (Fig. 3, Fig. S2a,b) found most support for three or four clusters, one dominated by the Eastern Finns, one by the Swedes, and one by the Central Europeans; increasing the number of clusters did not bring out further differences. When only the Finnish samples were analysed with Structure, they formed two clusters (Fig. S2c), consisting of the Eastern and Western Finns, with only 1.8% of the samples associating more strongly to the cluster not corresponding to their geographic origin (data not shown). A Structure analysis of the three Central European populations combined found only one cluster.

[. . .]

Quantile-quantile plots of pairwise allele frequency differences (Fig. 5) and FST calculations (Table 1) showed a pattern of the largest differences being between Eastern Finland versus Great Britain, Germany and Sweden (FST = 0.0072–0.0094) and the smallest between the British and Germans (FST = 0.0005).
Incidentally, a different study reports an FST of 0.00054 between samples from north-east and southern Germany [1], suggesting the British and north Germans may be slightly more similar than some German subpopulations are to each other (although the data may not be strictly comparable).

The new paper touches on a couple items I've mentioned recently:
The differences between populations detected with FST and other measures accounted for such a small proportion of the total genetic variation that large numbers of SNPs are needed to observe them, once again illustrating how most of the human genetic variation is found between individuals instead of populations [39]. Even small differences between populations can be interesting regarding population history, but elucidating their phenotypic significance will require further studies.

The MDS plot of the European populations showed a pattern of population differences that was consistent with our other analyses and earlier observations of a greater degree of differentiation in the geographical extremes of Europe [3], [5], [7], [9]–[11]. Our German, British and CEU samples formed a single cluster, possibly due to a lack of neighbouring reference populations, and contrary to studies with a more comprehensive sampling from Central Europe [7], [9]. The Swedes showed a wider spreading than the other populations, but this was supported neither by diversity calculations nor by a more detailed comparison of the IBS and MDS distance matrices (results not shown). Thus, the differential spread was at least partly an artefact of the MDS, where the representation in a few dimensions likely fails to capture all aspects of complex data. Thus, as visually attractive as the MDS plots are, they must be interpreted with caution and, if sample sizes allow, be accompanied with analyses based on allele frequencies.
And:
In fact, the differences between Eastern and Western Finns were of the same magnitude as the differences between Swedes and British, and much stronger than those between British and Germans. Thus, relevant units of genetic variation often do not correspond to preconceived political, linguistic or even cultural borders.

[1] Steffens et al. SNP-Based Analysis of Genetic Substructure in the German Population. Hum Hered. 2006;62(1):20-9. Epub 2006 Sep 21.

DNA analysis and small-scale admixture events

This paper ("Calculating expected DNA remnants from ancient founding events in human population genetics"; provisional PDF) concludes, based on a series of computer simulations:
while genetic data may be sensitive and powerful in large genetic studies, caution must be used when applying genetic information to small, recent admixture events. For some parameter sets, genetic data will not be adequate to detect historic admixture. In such cases, studies should consider anthropologic, archeological, and linguistic data where possible.
The authors point out that:
While genetic studies can provide considerable information, they are also accompanied by variation and stochasticity. Because of these limitations, even the most complete studies of human populations have been called “not unequivocal”[21] or “sobering”[22] by those conducting the research. Recent reports have also addressed the limited depth of current genetic studies[23], indicating that most studies make conclusions after sequencing less than 1% of subjects’ genomes, and sampling only small numbers of a population. Such methods can be especially problematic when dealing with historic admixture events that are very small. The difficulty is a function of the current architecture of genetic studies: researchers sample loci from a group of individuals and categorize individuals into groups based on which alleles they have at the loci tested[24, 25]. These categorizations are determined based on the most prevalent or probable genetic markers in an individual’s genome. The results of these studies, then, can overlook genetic markers that simply are not sampled, which is common in small admixture events. Additionally, stochastic events can lead to allele fixation and further complicate matters, particularly in small populations. It has been suggested that studies of even the largest migrations should couple genetic information with archeological, anthropological, and linguistic data[26].
The simulation results aren't too surprising:
The sizes of the migrant and native populations are fundamental for an understanding of expected allele frequency. With time since admixture as low as those we consider in our simulations, the most important factors are the sizes of the migrating and native populations. In our simulations, if the native population is large, changing the migrating population size results in a change of mean final allele frequency from .0243 to .0010. If the native population is small, those numbers change to .5016 and .0407. These are the most significant differences illustrated by our simulations and they attest to the important role of population sizes. Researchers should not expect to find many alleles from a small migratory group of 50 individuals in a large population today, even if sampling methods are exhaustive.

Sample size matters. Or, to kick a (hopefully) dead horse, why DNAprint is crap:
The average final allele frequency of the migrant allele in our population from the second simulation was 1.017%. We calculated the cumulative density function (CDF) for a genetic study that samples 50 loci for each individual and where the probability of detecting the migrant allele is equal to the probability found in our simulations. The CDF demonstrates that in 60% of individuals sequenced for 50 loci, we would not expect to find a single migrant allele (Figure 7a). Furthermore, we will only find more than one migrant allele in 9% of the subjects examined.

In the case of a large study with as many as 933 loci, based upon the expected migrant allele frequency of 1.017%, almost every subject would demonstrate at least one migrant allele (Figure 7b). In fact, most subjects would demonstrate more than 9 migrant alleles. However, while large studies would expect to succeed in finding more migrant alleles in today’s population, this alone cannot link the admixed population to the migrant population. The migrant alleles will still only represent, on average, 1% of every allele sequenced in the entire study. Therefore, although 9 migrant alleles may, on average, be found in each subject, it is hard to know if the migrant alleles will be redundant among loci and subjects or spread evenly throughout all the loci in the study. Additionally, these numbers could be considerably lower depending on the allele frequency in the migrating population.
And:
As time increases, genetic drift causes the spread of final allele frequencies to increase, particularly when the population sizes are small. Thus, as the time since the admixture event increases, sample size for both loci and subjects becomes increasingly important.

In our second simulation, most of the migrant alleles are present in less than 2% of the population. In a study of a population where few subjects from many human populations are studied, alleles from a small-scale admixture will usually not be recovered at all. And these rare alleles could easily be ignored in favor of haplotypes that better categorize the population into clusters.
The authors conclude:
DNA data have been touted as a panacea for recovering information about the past, but their use depends so extensively on factors that are beyond our control that its applicability is not always appropriate. It is imperative, therefore, that researchers understand the implications of the variables we have presented and not rely solely on DNA sequence data when researching small, recent human migrations.

We can only hope to understand basic details of population history when quantifying genetic data and even valid results derived from genetic data may still be misleading if viewed unilaterally, as demonstrated by Harpending et. al.[46, 47]

Our results, however, are not completely ominous. Carefully designed studies should be able to draw specific and valid conclusions from genetic data. One area for major improvement is the number of individuals and loci sampled. Our results indicate that a large sample size and large number of loci are needed to obtain robust results. Studies that are unable to sample sufficiently do not have the power to draw appropriate conclusions and should be interpreted with caution.

[. . .]

The random nature of admixed genetic data seen in these simulations demonstrates that the utility of genetic data is dependent on the context of each individual study. Increasing the number of loci and the number of individuals sampled will increase the probability of detecting small traces of signal, but other sources of evidence should always be considered where possible.

DRD2*A1 and obesity

Frequencies of the DRD2*A1 allele are about twice as high in Africans and Asians as in Europeans.

The AP reports today:
WASHINGTON - Drink a milkshake and the pleasure center in your brain gets a hit of happy — unless you're overweight. It sounds counterintuitive. But scientists who watched young women savor milkshakes inside a brain scanner concluded that when the brain doesn't sense enough gratification from food, people may overeat to compensate.
[. . .]
A healthy diet and plenty of exercise are the main factors in whether someone is overweight. But scientists have long known that genetics also play a major role in obesity — and one big culprit is thought to be dopamine, the brain chemical that's key to sensing pleasure.
[. . .]
Yet that brain region was far less active in overweight people than in lean people, and in those who carry that A1 gene variant, the researchers reported. Moreover, women with that gene version were more likely to gain weight over the coming year.

It's a small study with few gene carriers, and thus must be verified, Volkow stressed.

Still, it could have important implications. Volkow, who heads NIH's National Institute of Drug Abuse, notes that "dopamine is not just about pleasure." It also plays a role in conditioning — dopamine levels affect drug addiction — and the ability to control impulses.

She wonders if instead of overeating to compensate for the lack of pleasure — Stice's conclusion — the study really might show that these people with malfunctioning dopamine in fact eat because they're impulsive.
The study covered in the above article:
Relation Between Obesity and Blunted Striatal Response to Food Is Moderated by TaqIA A1 Allele
E. Stice, S. Spoor, C. Bohon, and D. M. Small (17 October 2008)
Science 322 (5900), 449. [DOI: 10.1126/science.1161550]
Individuals whose reward centers of the brain respond sluggishly after eating prefer calorie-dense foods, which may account for their greater propensity to gain weight.
Podcast interview with Eric Stice.

Alcoholism, genetics, and race

Part-time GNXP houseboy / full-time mestizo failfuck "birch barlow" attributes his problems with alcohol and crack cocaine to "Anglo-Celtic" genes, while he raves about the quality genetics of small brown Asian women. It seems the self-proclaimed "cogelite" never bothered to actually check what genetic research suggests about relative group propensities for addictive behavior. So we're going to do it for him.

SNPedia lists several SNPs for which associations with Alcholism have been claimed, probably the most-studied of which is rs1800497:
rs1800497, a SNP also known as the TaqIA (or Taq1A) polymorphism of the dopamine D2 receptor DRD2 gene (even though it is actually located over 10,000bp downstream of the gene), gives rise to the DRD2*A1 allele. This allele (rs1800497(T)) is associated with a reduced number of dopamine binding sites in the brain [PMID 9672901], and has been postulated to play a role in alcoholism, smoking, and certain neuropsychiatric disorders.

The reduced number of dopamine binding sites may play a role in nicotine addiction by causing an "understimulated" state that can be relieved by smoking (and/or use of other drugs). [PMID 8873216]
The HapMap Phase III data indicates the following frequencies of the T/A (increased addiction risk) allele:
44% MEX (Mexican ancestry in Los Angeles, California)
44% CHB (Han Chinese in Beijing, China)
42% CHD (Chinese in Metroopolitan Denver, Colorado)
41% YRI (Yoruba in Ibadan, Nigeria)
40% ASW (African ancestry in Southwest USA)
40% JPT (Japanese in Tokyo, Japan)
28% GIH (Gujarati Indians in Houston, Texas)
21% CEU (Utah residents with Northern and Western European ancestry)
18% TSI (Toscans in Italy)

Not a good start for birch's imaginary mixed-race spawn. Let's look at another SNP:
rs1076560 is located in intron 6 of the dopamine receptor D2 gene.

In one study of Japanese males, rs1076560(A) alleles were 1.3 fold more associated with Alcoholism than the rs1076560(C) alleles. [PMID 17196743]

The DRD2 risk allele A was more prevalent in the alcoholic patients than in the healthy controls. These data identify rs1076560 as a potentially important variable in the development of alcoholism.
HapMap Phase III population frequencies for the A allele:
11% YRI (Yoruban)
12% TSI (Toscan)
14% ASW (Afram)
14% CEU (NW Euro)
27% GIH (Asian Indian)
36% MEX (Mexican)
40% JPT (Japanese)
45% CHB (Chinese)
46% CHD (Chinese)

Still not looking good. Next SNP:
The rs1799971(G) allele in exon 1 of the mu opiod receptor gene causes the normal amino acid at residue 40, asparagine, to be replaced by aspartic acid.

Carriers of at least one rs1799971(G) allele appear to have stronger cravings for alcohol than carriers of two rs1799971(A) alleles, and are thus hypothesized to be more at higher risk for alcoholism. [PMID 17207095]

Allele frequencies for rs1799971 (G):
4% ASW (Afram)
16% CEU (NW Euro)
18% TSI (Toscan)
20% MEX (Mexican)
33% CHB (Chinese)
42% GIH (Asian Indian)
46% JPT (Japanese)
47% CHD (Chinese)

By this point, birch should perhaps be thankful he's incapable of holding down a job long enough to buy that priced-to-sell Cambodian bride he has his eye on. Moving on:

Findings showed that SNP rs2232165 of the GHS-R1A gene was associated with heavy alcohol consumption (and therefore presumably alcohol dependence). SNP rs2948694 of the same gene as well as haplotypes of both the pro-ghrelin and the GHS-R1A genes were associated with an increased body mass in individuals consuming heavy amounts of alcohol.

Allele frequencies for rs2232165 (T):
8.3% YRI (Yoruban)
2.5% CEU (NW Euro)
0% CHB (Chinese)
0% JPT (Japanese)

Allele frequencies for rs2948694 (G):
33.8% JPT (Japanese)
32.4% CHB (Chinese)
9.6% YRI (Yoruban)
7.4% CEU (NW Euro)

But wait:
rs671 is a classic SNP, well known in a sense through the phenomena known as the "alcohol flush", also known as the "Asian Flush" or "Asian blush", in which certain individuals, often of Asian descent, have their face, neck and sometimes shoulders turn red after drinking alcohol.[PMID 6582480]

The rs671(A) allele of the ALDH2 gene is the culprit, in that it encodes a form of the aldehyde dehydrogenase 2 protein that is defective at metabolizing alcohol. This allele is known as the ALDH*2 form, and individuals possessing either one or two copies of it show alcohol-related sensitivity responses including facial flushing, and severe hangovers (and hence they are usually not regular drinkers). Perhaps not surprisingly they appear to suffer less from alcoholism and alcohol-related liver disease. [PMID 511165, PMID 16046871]
So much for "high executive function" saving Asians from the bottle. But only around a third of East Asians have genotypes associated with the flush reaction, and defective alcohol metabolism can't protect against opium (or cigarette) smoking. Unfortunately for birch:
"We have shown that Native Americans, who have a high rate of alcoholism, do not have these protective genes. The one that is particularly effective is a mutation of the gene for the enzyme aldehyde dehydrogenase, which plays a major role in metabolizing alcohol. The mutation is found very frequently in Chinese and Japanese populations but is less common among other Asian groups, including Koreans, the Malayo-Polynesian group, and others native to the Pacific Rim. "We've also looked at Euro-Americans, Native Americans, and Eskimos, and they don't have that gene mutation," says Li.
Amerindians also apparently have the world's highest frequencies of the DRD2 A1 allele, suggesting birch might be closer to the mark if he decided to blame his failure on his Amerindian ancestors.

"Genomics" and intra-European variation

Guessedworker claims:
As for any attempt to reify Germans over Slavs, Englishmen over Irishmen, Nordics over Alpines and Mediterreneans, those are screamingly obviously on the “Idealist” or “thought” side of the equation. But they are falsified by the genomic components on the “empirical” or “experience” side.
I assume GW's assertion reflects a misinterpretation of studies such as this one, which involve principal components analysis of European SNP genotype data. I understand GW to mean he somehow believes these studies indicate no two European populations differ genetically in any systematic way which could lend support to "intra-European supremacist" arguments. Naturally, GW is wrong.

Granting that "superior" and "inferior" are subjective judgments rather than scientific universals, essentially any demonstration of a population's genetic distinctiveness can be seen to support both preservationist and "supremacist" arguments. Studies (from Cavalli-Sforza's work on "classical" markers to the recent analyses of 500k+ SNP microarray datasets) have repeatedly demonstrated sub-European genetic distinctiveness (particularly along a N/S or NW/SE axis).

SNP/PCA studies can (and do) demonstrate distinctiveness. They can't (and haven't) proven the absence of intra-European differences in genes influencing IQ and personality, for example -- if for no other reason than that no one has so far been able to use SNP genotypes to explain much variation in phenotypes like IQ. (In addition, 2-dimensional PCA plots typically leave plenty of variation unaccounted for, so -- even if we limit ourselves to considering common SNP variants -- samples which have identical values on the first two PCs might turn out to vary in some important way.)

Even the largest commonly-used SNP microarrays capture only a small fraction of human genetic variation, and definitive answers on many issues will await complete sequencing of large numbers of genomes.

In the meantime:

Compared to southern Euros, NW Europeans are demonstrably "superior" at digesting lactose as adults (92% LP in Utah vs. 11% in S. Italy), and -- though this may shock GW -- have demonstrably higher frequencies of alleles associated with light pigmentation.

Recent and ongoing (and probably accelerating) human evolution is a reality. "Genomically", Southern Europeans are more similar to Ashkenazi Jews than to Northern Europeans. Strangely, AJs and SEs don't have identical average IQ scores and personalities, and one doubts the finding would lull a Jewish supremacist into calling for a merger between AJs and SEs. "Small" genetic differences may have large phenotypic effects.

Even if, say, the English and Sicilians sprang from identical pools of ancestors 15,000 or 10,000 or even 5,000 years ago (and the genetic evidence says this was not the case), there's been plenty of time for differences to accumulate and plenty of reason to believe they have. See, e.g., Gregory Clark (thanks TGGP for that particular link). I find it hard to imagine radical differences in culture between Eastern and Western Europe (or, to a lesser extent, between England and Ireland) haven't engendered (and/or been engendered by) some degree of genetic differentiation. Again, even if you could show large German and Polish samples plot identically on a 2-d PCA chart (they don't), you would not have demonstrated genetic identity between them.

Misc. links

Y DNA and surnames in Britain:
Dr King’s research showed that between two men who share the same surname there is a 24% chance of sharing a common ancestor through that name but that this increases to nearly 50% if the surname they have is rare.

The limits of mtDNA phylogeography: complex patterns of population history in a highly structured Iberian lizard are only revealed by the use of nuclear markers.

Admixture as the basis for genetic mapping.

Noah Webster's 250th birthday:
His dictionary, and earlier spellers and readers widely used in schools, would help a new nation achieve unity and cultural independence at a time when most were focused on political freedom.

"He was the shaper of our language and the shaper of American identity," said Joshua Kendall, who is working on a biography about Webster. "Webster at last bonded us through our language." [. . .]

Webster was later astounded when he heard all the languages spoken by the Continental Army.

"The language of the new nation was up for grabs," Kendall said. "Webster said we're going to speak American English."

[Wikipedia: Noah Webster was born on October 16, 1758, in the West Division of Hartford, Connecticut, to a family who had lived in Connecticut since colonial days. His father, Noah, Sr. (1722-1813), was a farmer and a sower. His father was a descendant of Connecticut Governor John Webster; his mother, Mercy (née Steele; d. 1794), was a descendant of Governor William Bradford of Plymouth Colony. Noah had two brothers, Abraham (1751-1831) and Charles (b. 1762), and two sisters, Mercy (1749-1820) and Jerusha (1756-1831).]

Measuring differentiation among populations at different levels of genetic integration

This (provisional PDF) looks interesting. The not-unexpected conclusion seems to be that deltas between populations are larger when considering multi-locus genotypes (as opposed to population gene frequencies).
This new approach to the analysis of genetic differentiation among populations demonstrates that the consideration of gene associations within populations adds a new quality to studies on population differentiation that is overlooked when viewing only gene-pools.

Northern Euros whiter than Southern Euros

More ASHG 2008:
Frequency distribution and selection in 4 pigmentation genes in Europe. M. P. Donnelly, W. C. Speed, J. R. Kidd, A. J. Pakstis, K. K. Kidd Dept Genetics, Yale Univ Sch Med.

Pigmentation is one of the more obvious forms of variation in humans, particularly in Europeans where one sees more within group variation in hair and eye pigmentation than in the rest of the world. We studied 4 genes (SLC24A5, SLC45A2, OCA2 and MC1R) that are believed to contribute to the pigment phenotypes in Europeans. SLC24A5 has a single functional variant that leads to lighter skin pigmentation. Data on 83 populations worldwide (including 55 from our lab) show the variant (at rs1426654) has almost reached fixation in Europe, Southwest Asia, and North Africa, has moderate to high frequencies (.2-.9) throughout Central Asia, and has frequencies of .1-.3 in East and South Africa. The variant is essentially absent elsewhere. SLC45A2 also has a single functional variant (at rs16891982) associated with light skin pigmentation in Europe. Data on 84 populations worldwide show the light skin allele is nearly fixed in Northern Europe but has lower frequencies in Southern Europe, the Middle East and Northern Africa. In Central Asia the frequency of the SLC45A2 variant declines more quickly than the SLC24A5 variant. It is absent in both East and South Africa. In OCA2 we typed 4 SNPs (rs4778138, rs4778241, rs7495174, rs12913832) with a haplotype associated with blue eyes in Europeans. This haplotype shows a Southeastern to Northwestern pattern in Europe with frequencies of .25 (.05 homozygous) in the Adygei to .85 (.75 homozygous) in the Danes. In MC1R we typed 5 SNPs (rs3212345, rs3212357, rs3212363, C_25958294_10, rs7191944) that cover the entire MC1R gene and found a predominantly European haplotype that ranges in frequency from .35 to .65 in Europe, reaching its highest levels in Southwest Asia and Northwestern Europe. Extended Haplotype Heterozygosity (EHH) and normalized Haplosimilarity (nHS) show evidence of selection at SLC24A5 in not only our European and Southwest Asian populations but also our East African populations. Neither SLC45A2 or OCA2 showed evidence of selection in either test. MC1R did not show evidence of selection for our European specific haplotype but we did see some evidence both upstream and downstream in our nHS test in Europe.

Genetic differentiation in the UK

An ASHG 2008 poster abstract on the People of the British Isles project:
The need for a well characterised UK Control Population. B. Winney, A. Boumertit, R. Bowden, D. Davison, S. Day, E. Echeta, I. Evseeva, K. Nicodemus, S. Tonks, X. Yang, P. Donnelly, W. Bodmer Dept. Clinical Pharmacology, University of Oxford, Oxford, OX3 7DQ, UK.

Until the recent advent of Whole Genome Association studies (WGAs), there were problems replicating significant associations between gene variation and complex diseases in studies that were generally underpowered. Population structure was widely considered to be the most significant reason. A powerful approach to this problem may be to characterise genetically both the cases and controls. Individuals from the controls can then be chosen to match the cases so as to minimise the stochastic differences between the two populations. Such a well-characterised control population would complement the current generation of WGAs. Importantly, the samples would be a resource that could be key to the search for rare variants that can be associated with disease susceptibility. We are assembling a UK control population as a resource for future studies. It will comprise 3,500 samples (3,200 collected so far), which will have been carefully selected from throughout the UK. Rural regions are targeted to avoid the admixture observed in large urban environments and volunteers are sought who were born in the same place as their parents and grandparents to ensure historical integrity. The collection will be genotyped for around 3,000 markers, with the aim of identifying about 200 ancestrally informative markers, which will then be used to match controls to cases. DNA from the samples will then be made available as a resource for future studies. An initial pilot project on about 400-500 samples, using a variety of markers, indicates that this approach is valid. MC1R data suggest structure differentiating the Celtic Fringe from Eastern England, whilst NRY data show evidence of Norse incursions into Orkney. Preliminary analyses of a larger pilot project, comprising about 700 samples and 400 markers, including HLA, provide further signals of population structure when all the samples are combined. There is also evidence of differentiation between some pairs of populations and simple admixture analyses suggest that there is an east-west gradient of Anglo-Saxon ancestry across England.

Going by the website, the project is now up to 3453 samples collected (out of 3500 sought). Looking forward to further results. Should shut some people up.

Admixture analysis: 23andMe vs. deCODEme

Customers of deCODE and 23andMe may have noticed that deCODE seems to overestimate non-European ancestry in European-descended people. A 23andMe employee explains:
Both companies will walk along each of this person's chromosomes, and will tend to find that each stretch is found in all three reference populations, but is most likely from Europe. This would be expressed by Decode as a high chance, maybe 80-90%, that the stretch comes from Europe, and a smaller chance, maybe more like 5-10% apiece, that the stretch comes from Africa or Asia. Adding up all the stretches, you'll tend to get 80-90% European ancestry, and 5-10% African and Asian ancestry, each, for the Northern European -- this is consistent with the Decode chromosomal ancestry analyses of Northern Europeans that I've seen. This is a reasonable way to show the data.

The reasoning behind 23and Me's Ancestry Painting is, while it's true that each stretch is found all over the world, we know or are willing to assume that the stretch can only have come from /one/ population, and it chooses the most likely population. For our example Northern European, for just about every stretch along their genome, the single most likely origin will almost always be European, and would be expressed in final ancestry proportion estimates of about 100% European, and about 0% African and Asian.

This is how Ancestry Painting and Decode can end up with different total ancestry estimates for the same person.

So, even leaving aside the question of data quality, deCODE's claims regarding James Watson's ancestry are revealed to be a hoax. Watson's alleged complement of 16% "black genes" was compared to a figure (originating with Kari Stefansson, I believe) of "no more than 1%" for "most people of European descent" -- a number clearly not arrived at using deCODE's platform. In fact, deCODE's technique can be expected to give most if not all Europeans implausibly high readings of non-European ancestry.

SNPs better than STRs for inferring population structure

Another ASHG 2008 abstract:
Inferring Human Population Structure: STR or SNP? S. Xu1, L. Jin1,2 1) Chinese Academy of Sciences and Max Planck Society (CAS-MPG) Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China;; 2) Ministry of Education (MOE) Key Laboratory of Contemporary Anthropology and Center for Evolutionary Biology, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, Shanghai 200433, China.

Both microsatellites (STRs) and single nucleotide polymorphisms (SNPs) have played important roles in inferring population structure. With the availability of genome-wide STR and SNP data for the same collection of world-wide human population samples (HGDP-CEPH panel), we now have the opportunity to compare the usefulness of the two types of data in inferring population structure. We selected the same set of 940 unrelated HGDP individuals in which both 783 STRs and 650,000 SNPs were genotyped, and performed both classical (phylogenetic and principal component) analysis and STRUCTURE analysis. We found for all analyses, with the same allele number, SNP data perform better and generate more reasonable results than STR data. Notably, a) SNP data offer superior clustering of individuals and populations; b) the phylogenetic tree reconstructed using SNP data is consistent better with the geographical distribution of populations; c) SNP data reveals fine structures and reasonable admixture pattern in STRUCTURE analysis for both full samples and subset of samples, but STR can not.
Yes, Rienzi, if it wasn't already obvious, thousands of SNPs beat 32 STR loci. Why would anyone "agonize" over an inconsequential result?

Breeding sex ratio higher in Europe than sub-Saharan Africa

Another blow to Peter Frost's goofy-ass theories.
One interpretation of these results is that there is strong evidence for an unequal female and male Ne in at least three of our six populations, with estimates of the breeding sex ratio (i.e., the effective size of females to males) ranging from 2.1 in the San to 12.5 in the Basque.

[Hammer MF, Mendez FL, Cox MP, Woerner AE, Wall JD (2008) Sex-Biased Evolutionary Forces Shape Genomic Patterns of Human Diversity. PLoS Genet 4(9): e1000202. doi:10.1371/journal.pgen.1000202]
Figure 2 shows Nx/Na is higher in Basques than in any of the sampled African populations (Mandenka, Biaka, and San).

"Americans" are not homogeneous

Via Dienekes, another ASHG 2008 abstract. This research seems to reproduce earlier findings of separate, distinct clusters among "European Americans":
Ethnicity-Confirmed Genetic Structure in New Hampshire.

Genetic population structure is known to result from shared ancestry. Though there have been several studies of genetic structure within and among different geographic regions and ethnic groups, little is known of the genetic structure of highly admixed US populations or whether the structure is concordant with self-reported ancestry. In this study, 1529 single nucleotide polymorphisms (SNPs) from 864 healthy control individuals from New Hampshire were measured as part of a bladder cancer epidemiology study. The SNPs were from approximately 500 cancer susceptibility genes scattered throughout the genome. Of these, 960 Tag SNPs were used to cluster individuals using the Structure algorithm for between 2 and 5 subpopulations. Subtle genetic structure was found, suggesting the appropriate number of subpopulations to be either 4 or 5 (FSTs 4 populations: 0.0377, 0.0399, 0.0363, 0.0340; 5 populations: 0.0452, 0.0536, 0.0585, 0.0534, 0.0521). We coded the individuals self-reported ancestries in a genotype fashion (i.e. 0= not reporting that ancestry, 1= reporting part that ancestry, 2= reporting only that ancestry) and conducted a Spearmans rank correlation between each ancestry and the structure q value, which represents the proportion of an individual that originated from a certain genetic subpopulation. Those of Russian, Polish and Lithuanian ancestry most consistently clustered together. The ancestry results support either 4 or 5 subpopulations. In order to investigate linkage disequilibrium (LD), the complete set of SNPs from the 7 most densely genotyped genes were used to make haploview plots between the different groups. The results vary by gene, though for one gene in particular, GHR, the results are very different for 4 subpopulations. These results suggest that despite New Hampshires admixture and presumed homogeneity, there are 4 or 5 distinct genetic subgroups within the population that can be linked to self-reported ancestry and display differences in patterns of LD.
Previous research has identified at least three clusters among "European Americans", corresponding to Northern and Central European, Southern European, and Ashkenazi Jewish ancestry groups. I'm curious exactly which 4 or 5 subgroups have been identified here. From the context, it's not immediately clear whether the Russian/Polish/Lithuanian cluster consists of ethnic E. Euros or Ashkenazi Jews.

Greater genetic diversity in S. Europe due in part to African ancestry

Recent papers using genome-wide SNP data have indicated higher levels of haplotype diversity in Southern vs. Northern Europe. The authors have typically attributed this finding to south to north migrations within Europe. While such migrations no doubt must have occurred (at some remote date), it seems strange to me to ignore the subsequent extra-European gene flow into Southern Europe which also undoubtedly occurred. An ASHG 2008 abstract, after repeating the intra-European migration theory, includes some additional observations:
Interestingly, we find that within Europe there is a south-to-north gradient with decreasing levels of haplotype diversity moving north, consistent with south to north migrations. We also find that the southwestern European sample has higher haplotype diversity than the southeastern European sample. Additionally, a higher proportion of haplotypes are shared between the southwestern European sample and the Yoruba sample than between southeastern European sample and the Yoruba sample. These two patterns are consistent with recent admixture across the Mediterranean from Northern Africa.

(Inference of human demographic parameters using haplotype patterns from genome-wide SNP data.)
Increased genetic affinity between SW Euros and Yorubans is consistent with mtDNA data indicating sub-Saharan admixture in Iberia (Spain and particularly Portugal) at levels higher than typically observed elsewhere in Europe. My guess is a similar analysis would show an increased similarity between SE Euros and NE Africans -- as well as between SE Euros and Middle Easterners, obviously.

Digit ratio not a proxy for prenatal androgen exposure?

Selective Breeding for a Behavioral Trait Changes Digit Ratio:
The ratio of the length of the second digit (index finger) divided by the fourth digit (ring finger) tends to be lower in men than in women. This 2D:4D digit ratio is often used as a proxy for prenatal androgen exposure in studies of human health and behavior. For example, 2D:4D ratio is lower (i.e. more “masculinized”) in both men and women of greater physical fitness and/or sporting ability. Lab mice have also shown variation in 2D:4D as a function of uterine environment, and mouse digit ratios seem also to correlate with behavioral traits, including daily activity levels. Selective breeding for increased rates of voluntary exercise (wheel running) in four lines of mice has caused correlated increases in aerobic exercise capacity, circulating corticosterone level, and predatory aggression. Here, we show that this selection regime has also increased 2D:4D. This apparent “feminization” in mice is opposite to the relationship seen between 2D:4D and physical fitness in human beings. The present results are difficult to reconcile with the notion that 2D:4D is an effective proxy for prenatal androgen exposure; instead, it may more accurately reflect effects of glucocorticoids, or other factors that regulate any of many genes.

Yan RHY, Malisch JL, Hannon RM, Hurd PL, Garland T Jr (2008) Selective Breeding for a Behavioral Trait Changes Digit Ratio. PLoS ONE 3(9): e3216. doi:10.1371/journal.pone.0003216

State-level differences in personality

Update (9/14/08): Vanishing American comments; full text (pdf, final version).

The surprising results:
New York is home to the most neurotic and unfriendly people in American while North Dakota is where the nicest people live, according to a Cambridge University "personality map" of the USA.

[. . .]

Researchers created the first ever map of its kind is based on the results of a six year online survey of 620,000 people.

They claim it reveals how certain types of people are more likely to live and flourish in different parts of the country and showed links between personality traits and social phenomenon, like crime rates.

[. . .]

The report, "The Geography Of Personality; A Theory of the Emergence, Persistence and Expression of Geographic Variation in Basic Traits" is published in the journal, Perspectives On Psychological Science.

Key findings:

EXTRAVERSION
Personality traits: Sociable, energetic and enthusiastic

High-scoring states: North Dakota, Wisconsin, District of Columbia, Nebraska, Minnesota, Georgia, South Dakota, Utah, Illinois, Florida

Low-scoring states: Vermont, Washington, Alaska, New Hampshire, Maryland, Idaho, Virginia, Oregon, Montana, Massachusetts

AGREEABLENESS
Personality traits: Warm, compassionate, co-operative and friendly.

Highest-scoring states: North Dakota, Minnesota, Mississippi, Utah, Wisconsin, Tennessee, North Carolina, Georgia, Oklahoma, Nebraska.

Lowest-scoring states: New York, Nevada, Wyoming, District of Columbia, Alaska, Maine, Rhode Island, Virginia, Connecticut, Montana.

CONSCIENTIOUSNESS
Personality traits: Dutiful, responsible, self-disciplined.

Highest-scoring states: New Mexico, North Carolina, Georgia, Utah, Kansas, Oklahoma, Nebraska, Florida, Arizona, Missouri.

Lowest-scoring states: Wyoming, Rhode Island, Hawaii, Maine, Alaska, Connecticut, New Jersey, New Hampshire, Massachusetts, New York.

NEUROTICISM
Personality traits: Anxious, stressful and impulsive.

Highest-scoring states: West Virginia, Rhode Island, New York, Mississippi, New Jersey, Pennsylvania, Kentucky, Louisiana, Ohio, Arkansas.

Lowest-scoring states: Alaska, Oregon, South Dakota, Colorado, Utah, Washington, Arizona, Nebraska, North Dakota, Nevada.

OPENNESS
Personality traits: Curious, intellectual, creative.

Highest-scoring states: District of Columbia, New York, Oregon, Massachusetts, Washington, California, Vermont, Colorado, Nevada, Maryland.

Lowest-scoring states: Wisconsin, Alabama, Alaska, Wyoming, North Dakota, Hawaii, Kentucky, Nebraska, Iowa, Delaware.


Manuscript (pdf). Abstract.

Body type and perceived personality

The perceived associations of the participants in this study are reminiscent of--though not identical to--the associations argued for by Sheldon.
Through the use of many photographs and measurements of nude figures (mainly Ivy League students), Sheldon assigned people into three categories of body types in the 1940s: endomorphic, mesomorphic, and ectomorphic. He also assigned personality traits to the body types as well. Endomorphics had fat, soft, and round body types, and their personality was described as relaxed, fond of eating, and sociable. Mesomorphics were muscular, rectangular, strong, and personality-wise were filled with energy, courage, and assertive tendencies. Ectomorphics were thin, long, fragile, as well as brainy, artistic, and introverted; they would think about life, rather than consuming it or acting on it.
This study looks at perceptions of six different body forms (three different body types, with and without abdominal obesity):
Physical characters were associated with the appropriate body forms as expected. The physical traits strong, rough and tough and physically aggressive were associated with the muscular non-obese [M−] figure. Lethargic was associated with F+. Disease prone was significantly associated with L− [lean, without central obesity] on the one hand and F+ [feminine, with central obesity] on the other indicating that people negatively associate both the extremes with health. The trait swift was also strongly associated with L−. The traits that are not obviously physical were also strongly associated with certain body forms. Brave, conscious about looks, influential, dominating, status conscious, modern and confident were associated with M−; physical risk avoider, money minded, political, rich, stupid, selfish and greedy were associated most strongly with F+; friendly, intelligent, methodical, business risk avoider, successful, loving, kind, and honest were associated with F− [feminine without central obesity]; and L− was the commonest choice for swift, physical risk avoider, talkative and the trait depressed was associated with L+ [table 1].

The abstract:
PLoS ONE 3(9): e3187. doi:10.1371/journal.pone.0003187

Obesity as a Perceived Social Signal

Manasee Mankar et al.

Fat accumulation has been classically considered as a means of energy storage. Obese people are theorized as metabolically ‘thrifty’, saving energy during times of food abundance. However, recent research has highlighted many neuro-behavioral and social aspects of obesity, with a suggestion that obesity, abdominal obesity in particular, may have evolved as a social signal. We tested here whether body proportions, and abdominal obesity in particular, are perceived as signals revealing personality traits. Faceless drawings of three male body forms namely lean, muscular and feminine, each with and without abdominal obesity were shown in a randomized order to a group of 222 respondents. A list of 30 different adjectives or short descriptions of personality traits was given to each respondent and they were asked to allocate the most appropriate figure to each of them independently. The traits included those directly related to physique, those related to nature, attitude and moral character and also those related to social status. For 29 out of the 30 adjectives people consistently attributed specific body forms. Based on common choices, the 30 traits could be clustered into distinct ‘personalities’ which were strongly associated with particular body forms. A centrally obese figure was perceived as “lethargic, greedy, political, money-minded, selfish and rich”. The results show that body proportions are perceived to reflect personality traits and this raises the possibility that in addition to energy storage, social selection may have played some role in shaping the biology of obesity.

23 and Me v2: new markers / major price reduction

I make no endorsement, but this may be of interest to some.
23andMe Democratizes Personal Genomics With New Analytical Platform

23andMe is proud to announce another step toward our goal of democratizing genetic information by giving as many people as much information as possible about their DNA.

With the introduction of v2, our next-generation analytical platform, 23andMe customers will have access to an even more powerful set of the SNPs we use to probe their unique genetic composition. And thanks to advances by Illumina, the provider of our genetic analysis technology, that information will now be available at the reduced price of $399. By making genetic data more affordable and accessible, we hope this development will spur the evolution of personal genomics as a potent force not just in science but also in medicine and everyday life.

Sexual imprinting in humans

BBC article:
Page last updated at 07:05 GMT, Wednesday, 3 September 2008 08:05 UK

Women pick men who look like dad

Women tend to choose husbands who look like their fathers, a study shows.

And it works both ways - the women in the Proceedings B study also resembled their partner's mother.

The latest work from the University of Pécs in Hungary provides yet more evidence for the phenomenon, known as sexual imprinting.

[. . .]

They found significant correlations between the young men and their fathers-in-law, especially on facial proportions belonging to the central area of face - nose and eyes.

Women also showed resemblance to their mothers-in-law in the facial characteristics of their lower face - lips and jaw.

Lead researcher Tamas Bereczkei said: "Our results support the sexual imprinting hypothesis which states that children shape a mental template of their opposite-sex parents and search for a partner who resembles that perceptual schema."

[. . .]

Experts say there may be an advantage to selecting a mate somewhat similar to themselves genetically.

Dr Lynda Boothroyd from the University of Durham, a psychologist who has carried out similar research, said: "There is an argument that a certain degree of similarity makes people more fertile and genetically compatible."
The abstract:
Facialmetric similarities mediate mate choice: sexual imprinting on opposite-sex paents
DOI 10.1098/rspb.2008.1021
Online Date Tuesday, September 02, 2008

Tamas Bereczkei, Gabor Hegedus, Gabor Hajnal

Former studies have suggested that imprinting-like processes influence the shaping of human mate preferences. In this study, we provide more direct evidence for assessing facial resemblance between subjects' partner and subjects' parents. Fourteen facial proportions were measured on 312 adults belonging to 52 families, and the correlations between family members were compared with those of pairs randomly selected from the population. Spouses proved to be assortatively mated in the majority of measured facial proportions. Significant correlations have been found between the young men and their partner's father (but not his mother), especially on facial proportions belonging to the central area of the face. Women also showed resemblance to their partner's mother (but not to their father) in the facial characteristics of their lower face. Replicating our previous studies, facial photographs of participants were also matched by independent judges who ascribed higher resemblance between partners, and subjects and their partners' opposite-sex parents, compared with controls. Our results support the sexual imprinting hypothesis which states that children shape a mental template of their opposite-sex parents and search for a partner who resembles that perceptual schema. The fact that only the facial metrics of opposite-sex parents showed resemblance to the partner's face tends to rule out the role of familiarity in shaping mating preferences. Our findings also reject several other rival hypotheses. The adaptive value of imprinting-related human mating is discussed, and a hypothesis is made of why different facial areas are involved in males' and females' search for resemblance.

Keywords
facial resemblance, sexual imprinting, homogamy

"Race" and European genetic substructure

The New Scientist article on the work of Novembre et al. asserts the following (my emphasis):
By reading single-letter DNA differences in the genomes of thousands of Europeans, researchers can tell a Finn from a Dane and a German from a Brit. In fact a visual genetic map mirrors the geopolitical map of the continent, right down to Italy's boot.

"It tells us that geography matters," says John Novembre, a population geneticist at the University of California, Los Angeles, who led one of the studies. Despite language, immigration and intermarriage, genetic differences between Europeans are almost entirely related to where they were born.

This, however, does not mean that the citizens of each European nation represent miniature races. "The genetic diversity in Europe is very low. There isn't really much," says Manfred Kayser, a geneticist at Erasmus University Rotterdam in the Netherlands, who led the other study.

The question of how genetic diversity in Europe relates to national borders is an empirical one. Finns probably constitute a local "race" or subrace (or two) distinict from other Europeans, while -- at the national level -- the Swiss apparently do not.

But I strongly take issue with the suggestion that "low" levels of genetic diversity are of no taxonomic significance. The paper under discussion (and many before it) pretty clearly demonstrate the opposite. One can imagine plenty of forensic, genealogical, preservationist, and other applications which may benefit from knowledge of these "small" genetic differences (and those which remain to be discovered). Precisely how one chooses to (sub-)classify Europeans will depend on one's objectives and the available statistical tools and data, but the existence of substructure is not in question. Choosing to call the resulting sub-European taxonomic units "miniature races", "subraces", "local types", or so on, is merely a matter of semantics.

Some relevant discussion of the race concept, from Anthropology A to Z [1]:
But race is, from the biological point of view, not a static, but a dynamic condition. Within the constantly changing movement of "life" it represents a breeding unit continously modifying itself, by infinitesimal degrees, through mtuations. This mobile condition was most aptly formulated by the American geneticist Dobzhansky, who said: "Race is a process." His statement at last fits the concept of race meaningfully into the history of life, and at the same time makes race understandable as the smallest ever-changing taxonomic unit (genetic population) by which we can interpret the total course of organic evolution.

To identify a race, it is theoretically sufficient to note the prevalence of a new gene and the consequent characteristic trait or traits that occur predominantly in a particular genetic population and distinguish it from neighboring populatins. But the one-trait basis of differentiation, considering the vast numbers of genes in mammals alone, would lead to an unjustifiable multiplicity of taxonomic units such as species and subspecies. Therefore, Fischer's early postulate -- of the gene groups in man which unite to produce certain characteristic patterns of traits, thus permitting a clear differentation of the various genetic poulations of a species within a large framework -- has been used as a basis for the determination and classification of races. [. . .]

There is some disagreement concerning the numbers of races, even though the races are always differentiated in accordance with the same basic principles of genetics. The differences in the estimated numbers result mostly from differences in emphasis. Some writers tend to base their estimate on the number of observable regional genetic populations; others are concerned with more general considerations. Of course, the more local races that are subsumed under related super-regional entities, or "major races" (i.e., Caucasoid, Mongoloid, Congoid), the smaller the number of single traits withing each characteristic combination that can be taken into consideration. [n/a: Conversely, as we become technically able to measure very large numbers of traits (e.g., hundreds of thousands of SNPs using gene chips), we can refine classifications.] At at the same time, authors will differ in assigning local groups to one of the other major race, especially in the contact zones. These differences in taxonomy and in point of view do not militate, however, against the validity of the definition of race itself; they arise solely from the fact that various writers evaluate racial traits differently and assign a different significance to them. Such conceptual differences also underline the fact that in talk about races we are dealing with developing life, with processes not easily subsumed under the necessarily rigid schemes which our need for methodical classification demands.

[pp. 4-5]

A race is a group of individuals who belong to the same reproductive community, and are characterized by the possession of certain genes that differ from those corresponding populations of the same species. The original gene pool of a population is steadily, but very slowly, enriched through mutations, of which only a limited number will survive selection pressures over the long term. For the accumulation of favorable (partially new) genes within a population to an extent that might become characteristic, restriction of propagative activity for a certain time is necessary, so that the gene flow between it and neighboring genetic populations is stopped. This process is called isolation. Its result depends largely on the duration and the severity of the mating restrictions -- how long the gene flow in and out of the propagation circle is actually stoppped. This leads us to a prerequisite for the formation of a race. It is not possible for several new races to develop sympatrically -- that is, from the same original population and simultaneously in the same geographic space. The formation of a race can occur only allopatrically -- that is, in separate living areas, or isolates. The original population, just before splitting into daughter populations which in turn become isolated, was essentially uniform in the racial sense, and the daughter populations have come out of the same gene pool. Differentiation starts with the actual isolation. If differentiation continues for a certain span of time and a corresponding number of generations, the phylogenetic direction is established.

The isolation of populations is normally due to geographic barriers -- seas, deserts, dry prairies, obstructive forests or tropical-rain-forest zones, edge locations (peninsulas or continental tips).

[pp. 82-83]

Thus the foundations were laid for what seems almost self-evident to us today; namely, that races are groups of individuals with similar gene compositions, which were built from mutant alleles that combined in various ways and became differentiated in their geographic regions, but which may be disintegrated in the process of cross-breeding. Difference in race does not constitute a barrier to the production of offspring. It became possible to make the process of race formation increasingly understandable in terms of cause and effect. Genes go through mutation; then, through the mechanism of natural selection, they are either increased in frequency, becoming part of combinations in characteristic concentrations, or else they fail to perpetuate themselves.

[pp. 116-117]


[1] Heberer, Gerhard, Carleton S. Coon, Edward E. Hunt, Gottfried Kurth, and Ilse Schwidetzky-Roesing. 1963. Anthropology A to Z: Based on the work of Gerhard Heberer, Gottfried Kurth, and Ilse Schwidetzky-Roesing. New York: Grosset & Dunlap.