race/history/evolution notes: 2012

Mismodeling Indo-European Origins: The Assault On Historical Linguistics

Reply to some self-important dork whining about this talk at Dienekes':

"You simply cannot criticize a new, rapidly-evolving and improving model just based on its trivial, known shortcomings. Such a thing is ludicrous and paints a truly bad picture of the talk presenters."

I'm afraid your effeminate idea of proper protocol has no bearing on actual science. Gray and Atkinson's "innovation" is insisting that Bayesian phylogenetics with limited and sometimes questionable inputs of data can produce highly accurate and precise readouts of linguistic history that supercede all previous linguistic and archaeological knowledge. Their results may dazzle twits like you and appeal to those who find their results politically or ethnically congenial. But the first question a serious person would ask is how closely Gray and Atkinson's attempts at reconstruction recapitulate recent/known linguistic history. That they frequently fail to do so is extremely germane to the question of how much faith one should put in their deeper reconstructions.

Statistical models are not magic. Bayesian tree building is not magic. Even with large corpuses of genetic data, the "most likely" tree is often overwhelmingly likely to be wrong. For genetics, where there's an explosion of data with comparatively few human analysts and little or no historical context, such results are useful, being often the best we have until additional data and further refinements of models appear. On the other hand, in linguistics, where on the PIE question relatively many human analysts have been poring over a comparatively limited corpus for many decades, it's up to Gray and Atkinson to demonstrate they have something useful to contribute. Every indication says they do not.

Frank Salter on multiculturalism

Multiculturalism in the life of a society 7

23andMe price drop

For those interested: "23andMe Raises More Than $50 Million in New Financing / Company Sets Growth Goal Of One Million Customers, Reduces Price to $99 from $299"

The GenoChip: A New Tool for Genetic Anthropology

Preprint at arXiv:

The Genographic Project is an international effort using genetic data to chart human migratory history. The project is non-profit and non-medical, and through its Legacy Fund supports locally led efforts to preserve indigenous and traditional cultures. In its second phase, the project is focusing on markers from across the entire genome to obtain a more complete understanding of human genetic variation. Although many commercial arrays exist for genome-wide SNP genotyping, they were designed for medical genetic studies and contain medically related markers that are not appropriate for global population genetic studies. GenoChip, the Genographic Project's new genotyping array, was designed to resolve these issues and enable higher-resolution research into outstanding questions in genetic anthropology. We developed novel methods to identify AIMs and genomic regions that may be enriched with alleles shared with ancestral hominins. Overall, we collected and ascertained AIMs from over 450 populations. Containing an unprecedented number of Y-chromosomal and mtDNA SNPs and over 130,000 SNPs from the autosomes and X-chromosome, the chip was carefully vetted to avoid inclusion of medically relevant markers. The GenoChip results were successfully validated. To demonstrate its capabilities, we compared the FST distributions of GenoChip SNPs to those of two commercial arrays for three continental populations. While all arrays yielded similarly shaped (inverse J) FST distributions, the GenoChip autosomal and X-chromosomal distributions had the highest mean FST, attesting to its ability to discern subpopulations. The GenoChip is a dedicated genotyping platform for genetic anthropology and promises to be the most powerful tool available for assessing population structure and migration history.

Let's be clear: the "most powerful tool available for assessing population structure and migration history" is whole genome sequencing. The Genographic Project, which represents a large fraction of the global spending on its type of population genetics research, unnecessarily hobbled itself from the outset in hopes of pre-emptively appeasing rent-seeking shrill self-appointed advocates for "indigenous peoples". I don't think Spencer Wells and company thought they were giving up much, since the short-sighted original plan was to examine only uniparental markers. In that light, perhaps we can be thankful that they've come up with a way of sidestepping the restrictions they placed on themselves and generating at least some useful autosomal data.

Several steps were taken to ensure that the genetic results would not be exploited for pharmaceutical, medical, and biotechnology purposes. First, participant samples were maintained in a completely anonymous status during GenoChip analysis. Second, no phenotypic or medical data were collected from the participants. Third, we included only SNPs in noncoding regions without any known functional association, as reported in dbSNP build 132. Lastly, we filtered our SNP collection against a 1.5 million SNP data set containing all variants that have potential, known, or suspected associations with diseases.

But however they'd like to spin it there's nothing ideal about ignoring "functional" variation or limiting the number of SNPs tested. Razib has a bizarre post up at his Discover blog in which he confuses SNP ascertainment and "Ancestry Informative Marker" ascertainment, and I see that the authors of the paper themselves appear to be eliding the distinction. But the overwhelming majority of the "450 populations" from which "AIMs" were "ascertained" for the GenoChip had merely been typed on existing microarrays -- which goes no ways towards addressing the issue the Affymetrix Human Origins array was designed to address (putting together SNP panels with known ascertainment, starting by sequencing individuals from multiple populations). Ultimately, the most useful and complete picture of human genetic history will come from whole genome sequencing, which should be cheap enough within a few years for use by the Genographic Project. The question is have they permanently handicapped themselves from applying the actual best tool for their stated mission, or will we eventually see at least some whole genome data for their 75,000 indigenous samples (no doubt with at minimum coding regions redacted).

Protective buttressing of the human fist and the evolution of hominin hands

FIGHTING SHAPED HUMAN HANDS. Protective buttressing of the human fist and the evolution of hominin hands

The derived proportions of the human hand may provide supportive buttressing that protects the hand from injury when striking with a fist. Flexion of digits 2–5 results in buttressing of the pads of the distal phalanges against the central palm and the palmar pads of the proximal phalanges. Additionally, adduction of the thenar eminence to abut the dorsal surface of the distal phalanges of digits 2 and 3 locks these digits into a solid configuration that may allow a transfer of energy through the thenar eminence to the wrist. To test the hypothesis of a performance advantage, we measured: (1) the forces and rate of change of acceleration (jerk) from maximum effort strikes of subjects striking with a fist and an open hand; (2) the static stiffness of the second metacarpo-phalangeal (MCP) joint in buttressed and unbuttressed fist postures; and (3) static force transfer from digits 2 and 3 to digit 1 also in buttressed and unbuttressed fist postures. We found that peak forces, force impulses and peak jerk did not differ between the closed fist and open palm strikes. However, the structure of the human fist provides buttressing that increases the stiffness of the second MCP joint by fourfold and, as a result of force transfer through the thenar eminence, more than doubles the ability of the proximal phalanges to transmit ‘punching’ force. Thus, the proportions of the human hand provide a performance advantage when striking with a fist. We propose that the derived proportions of hominin hands reflect, in part, sexual selection to improve fighting performance.

Human hands have 'evolved for fighting'

Compared with apes, humans have shorter palms and fingers and longer, stronger flexible thumbs.
Experts have long assumed these features evolved to help our ancestors make and use tools.
But new evidence from the US suggests it was not just dexterity that shaped the human hand, but violence also.
Hands largely evolved through natural selection to form a punching fist, it is claimed.
''The role aggression has played in our evolution has not been adequately appreciated,'' said Professor David Carrier, from the University of Utah.
''There are people who do not like this idea but it is clear that compared with other mammals, great apes are a relatively aggressive group with lots of fighting and violence, and that includes us. We're the poster children for violence.'' [. . .]
''Individuals who could strike with a clenched fish could hit harder without injuring themselves, so they were better able to fight for mates and thus be more likely to reproduce,'' he said. [. . .]
To test the theory Prof Carrier conducted experiments with volunteers aged 22 to 50 who had boxing or martial arts experience.
In one, participants were asked to hit a punchbag as hard as possible from different directions with their hands in a range of shapes, from open palms to closed fists.
The results, published in the Journal of Experimental Biology, show that tightly clenched fists are much more efficient weapons than open or loosely curled hands.
A punch delivers up for three times more force to the same amount of surface area as a slap. And the buttressing provided by a clenched fist increases the stiffness of the knuckles fourfold, while doubling the ability of the fingers to deliver a punching force. [. . .]
''Human-like hand proportions appear in the fossil record at the same time our ancestors started walking upright four million to five million years ago. An alternative possible explanation is that we stood up on two legs and evolved these hand proportions to beat each other.''
Manual dexterity could have evolved without the fingers and palms getting shorter, he said. But he added: ''There is only one way you can have a buttressed, clenched fist: the palms and fingers got shorter at the same time the thumb got longer.''
Prof Carrier cited other evidence pointing to the role of fighting in the evolution of human hands.
:: No ape other than humans hits with a clenched fist.
:: Humans use fists instinctively as threat displays. ''If you are angry, the reflexive response is to form a fist,'' said Prof Carrier. ''If you want to intimidate somebody, you wave your fist.''
:: Sexual dimorphism, or the difference in body size between the sexes, tends to be greater among primates when there is more competition between males. In humans the difference is mainly in the upper body and arms, especially the hands. ''It's consistent with the hand being a weapon,'' said Prof Carrier.
In their paper the professor and colleague Michael Morgan, a University of Utah medical student, ponder on the paradoxical nature of the human hand.
''It is arguably our most important anatomical weapon, used to threaten, beat and sometimes kill to resolve conflict. Yet it is also the part of our musculoskeletal system that crafts and uses delicate tools, plays musical instruments, produces art, conveys complex intentions and emotions, and nurtures,'' they write.
''More than any other part of our anatomy, the hand represents the identity of Homo sapiens. Ultimately, the evolutionary significance of the human hand may lie in its remarkable ability to serve two seemingly incompatible but intrinsically human functions.''

War of words: The language paradox explained

New Scientist article (free copy) by Mark Pagel (via Jason Malloy's bookmarks). Some mostly worthwhile paragraphs precede the requisite pollyannaish-on-globalism denouement.

This highlights an intriguing paradox at the heart of human communication. If language evolved to allow us to exchange information, how come most people cannot understand what most other people are saying? This perennial question was famously addressed in the Old Testament story of the Tower of Babel, which tells of how humans developed the conceit that they could use their shared language to cooperate in the building of a tower that would take them to heaven. God, angered at this attempt to usurp his power, destroyed the tower and to ensure it would not be rebuilt he scattered the people and confused them by giving them different languages. The myth leads to the amusing irony that our separate languages exist to prevent us from communicating. The surprise is that this might not be far from the truth. [. . .]
Of course that still leaves the question of why people would want to form into so many distinct groups. For the myriad biological species in the tropics, there are advantages to being different because it allows each to adapt to its own ecological niche. But humans all occupy the same niche, and splitting into distinct cultural and linguistic groups actually brings disadvantages, such as slowing the movement of ideas, technologies and people. It also makes societies more vulnerable to risks and plain bad luck. So why not have one large group with a shared language?
An answer to this question is emerging with the realisation that human history has been characterised by continual battles. Ever since our ancestors walked out of Africa, beginning around 60,000 years ago, people have been in conflict over territory and resources. In my book Wired for Culture (Norton/Penguin, 2012) I describe how, as a consequence, we have acquired a suite of traits that help our own particular group to outcompete the others. Two traits that stand out are "groupishness" - affiliating with people with whom you share a distinct identity - and xenophobia, demonising those outside your group and holding parochial views towards them. In this context, languages act as powerful social anchors of our tribal identity. How we speak is a continual auditory reminder of who we are and, equally as important, who we are not. Anyone who can speak your particular dialect is a walking, talking advertisement for the values and cultural history you share. What's more, where different groups live in close proximity, distinct languages are an effective way to prevent eavesdropping or the loss of important information to a competitor.
In support of this idea, I have found anthropological accounts of tribes deciding to change their language, with immediate effect, for no other reason than to distinguish themselves from neighbouring groups. For example, a group of Selepet speakers in Papua New Guinea changed its word for "no" from bia to bune to be distinct from other Selepet speakers in a nearby village. Another group reversed all its masculine and feminine nouns - the word for he became she, man became woman, mother became father, and so on. One can only sympathise with anyone who had been away hunting for a few days when the changes occurred.
The use of language as identity is not confined to Papua New Guinea. People everywhere use language to monitor who is a member of their "tribe". We have an acute, and sometimes obsessive, awareness of how those around us speak, and we continually adapt language to mark out our particular group from others. In a striking parallel to the Selepet examples, many of the peculiar spellings that differentiate American English from British - such as the tendency to drop the "u" in words like colour - arose almost overnight when Noah Webster produced the first American Dictionary of the English Language at the start of the 19th century. He insisted that: "As an independent nation, our honor [sic] requires us to have a system of our own, in language as well as government."

The Myth of American Meritocracy: How corrupt are Ivy League admissions?

Via Sailer. Note: the article is by Ron Unz, though in this case his numbers appear consistent with my own impressions and previous knowledge.

The Myth of American Meritocracy:

The evidence of the recent NMS semifinalist lists seems the most conclusive of all, given the huge statistical sample sizes involved. As discussed earlier, these students constitute roughly the highest 0.5 percent in academic ability, the top 16,000 high school seniors who should be enrolling at the Ivy League and America’s other most elite academic universities. In California, white Gentile names outnumber Jewish ones by over 8-to-1; in Texas, over 20-to-1; in Florida and Illinois, around 9-to-1. Even in New York, America’s most heavily Jewish state, there are more than two high-ability white Gentile students for every Jewish one. Based on the overall distribution of America’s population, it appears that approximately 65–70 percent of America’s highest ability students are non-Jewish whites, well over ten times the Jewish total of under 6 percent.
Needless to say, these proportions are considerably different from what we actually find among the admitted students at Harvard and its elite peers, which today serve as a direct funnel to the commanding heights of American academics, law, business, and finance. Based on reported statistics, Jews approximately match or even outnumber non-Jewish whites at Harvard and most of the other Ivy League schools, which seems wildly disproportionate. Indeed, the official statistics indicate that non-Jewish whites at Harvard are America’s most under-represented population group, enrolled at a much lower fraction of their national population than blacks or Hispanics, despite having far higher academic test scores. [. . .]
Just as striking as these wildly disproportionate current numbers have been the longer enrollment trends. In the three decades since I graduated Harvard, the presence of white Gentiles has dropped by as much as 70 percent, despite no remotely comparable decline in the relative size or academic performance of that population; meanwhile, the percentage of Jewish students has actually increased. This period certainly saw a very rapid rise in the number of Asian, Hispanic, and foreign students, as well as some increase in blacks. But it seems rather odd that all of these other gains would have come at the expense of whites of Christian background, and none at the expense of Jews.
Furthermore, the Harvard enrollment changes over the last decade have been even more unusual when we compare them to changes in the underlying demographics. Between 2000 and 2011, the relative percentage of college-age blacks enrolled at Harvard dropped by 18 percent, along with declines of 13 percent for Asians and 11 percent for Hispanics, while only whites increased, expanding their relative enrollment by 16 percent. However, this is merely an optical illusion: in fact, the figure for non-Jewish whites slightly declined, while the relative enrollment of Jews increased by over 35 percent, probably reaching the highest level in Harvard’s entire history. Thus, the relative presence of Jews rose sharply while that of all other groups declined, and this occurred during exactly the period when the once-remarkable academic performance of Jewish high school students seemed to suddenly collapse. [. . .]
Each year, the Ivy League colleges enroll almost 10,000 American whites and Asians, of whom over 3000 are Jewish. Meanwhile, each year the NMS Corporation selects and publicly names America’s highest-ability 16,000 graduating seniors; of these, fewer than 1000 are Jewish, while almost 15,000 are non-Jewish whites and Asians. Even if every single one of these high-ability Jewish students applied to and enrolled at the Ivy League—with none going to any of America’s other 3000 colleges—Ivy League admissions officers are obviously still dipping rather deep into the lower reaches of the Jewish ability-pool, instead of easily drawing from some 15,000 other publicly identified candidates of far greater ability but different ethnicity. [. . .]
The situation becomes even stranger when we focus on Harvard, which this year accepted fewer than 6 percent of over 34,000 applicants and whose offers of admission are seldom refused. Each Harvard class includes roughly 400 Jews and 800 Asians and non-Jewish whites; this total represents over 40 percent of America’s highest-ability Jewish students, but merely 5 percent of their equally high-ability non-Jewish peers. It is quite possible that a larger percentage of these top Jewish students apply and decide to attend than similar members from these other groups, but it seems wildly implausible that such causes could account for roughly an eight-fold difference in apparent admissions outcome. Harvard’s stated “holistic” admissions policy explicitly takes into account numerous personal characteristics other than straight academic ability, including sports and musical talent. But it seems very unlikely that any remotely neutral application of these principles could produce admissions results whose ethnic skew differs so widely from the underlying meritocratic ratios.
One datapoint strengthening this suspicion of admissions bias has been the plunge in the number of Harvard’s entering National Merit Scholars, a particularly select ability group, which dropped by almost 40 percent between 2002 and 2011, falling from 396 to 248. This exact period saw a collapse in Jewish academic achievement combined with a sharp rise in Jewish Harvard admissions, which together might easily help to explain Harvard’s strange decline in this important measure of highest student quality. [. . .]
It is important to note that these current rejection rates of top scoring applicants are vastly higher than during the 1950s or 1960s, when Harvard admitted six of every seven such students and Princeton adopted a 1959 policy in which no high scoring applicant could be refused admission without a detailed review by a faculty committee.78 An obvious indication of Karabel’s obtuseness is that he describes and condemns the anti-meritocratic policies of the past without apparently noticing that they have actually become far worse today. An admissions framework in which academic merit is not the prime consideration may be directly related to the mystery of why Harvard’s ethnic skew differs in such extreme fashion from that of America’s brightest graduating seniors. In fact, Harvard’s apparent preference for academically weak Jewish applicants seems to be reflected in their performance once they arrive on campus.79

Racial Intermarriage and Household Production

For the edification of Whiskey and SeanTodRoy (pdf):

ASHG 2012 twitter copy/paste

Charley Farley ?@charley_farley Very cool PoBI talk from Leslie. I'm one of the red squares - Anglo-Saxon/pre-Roman British admix. Incredible UK structure #ASHG2012

Nick Eriksson ?@nkeriks Lovely talk by Stephen Leslie about fineSTRUCTURE and the fine structure of UK populations. learned lots about UK history. #ASHG2012

Luke Ward ?@luke__ward M.Wilson Sayres: Ultralow Y diversity due not only to diff reproductive success of sexes but also selection, both coding+noncoding #ashg2012

Yaniv erlich ?@erlichya WS: selection on the Y chromosome does not only act on the coding regions. #ASHG2012

Yaniv erlich Yaniv erlich ?@erlichya WS: reproductive success difference between male&female is not enough to explain the genetic signal from the Y chromosome #ASHG2012

Yaniv erlich ?@erlichya I am exited to hear Wlson Sayres talk about the Y chromosome! One of my favorite chromosomes #ASHG2012

23andMe ASHG 2012 poster presentations

Here. From "Genome wide association study of sexual orientation in a large, web-based cohort" (pdf):

Our GWAS results did not identify any genetic loci reaching genomewide significance at p < 5 x 10-8 among men or women. Among men, the peak (non-significant) hit was in chromosome 8q12.3 (chr8:63532921 in NKAIN3, p= 7.1 x 10-8).

More interesting (if generally unsurprising) are the phenotypic associations:

We examined the correlation between sexual identity and ~1000 phenotypes already characterized in the 23andMe database through other surveys. These analyses are preliminary; we have not checked for outliers or confounders beyond what is listed in the methods. We replicated previous findings showing a positive association between lesbians and alcoholism, and between lesbians and gay men and several psychiatric conditions.

A commenter at the 23andMe blog:

The phenotypic information is interesting if I’m reading it correctly: Gay men are less likely to to have played common US sports, and are more likely to cry easily or to have had liposuction. Lesbians are less likely to shave their legs. Surprisingly, gay men are less likely to be atheist or agnostic.

"New" R1a1 SNPs

New Y-chromosome binary markers improve phylogenetic resolution within haplogroup R1a1:

Despite the limited data available for Z280 and Z93, some general inferences can be drawn from the geographic distributions of these two haplogroups. The R1a1- Z280 subclade is a strong candidate for covering the R1a1a* (xM458) in Eastern Europe, which was found in high frequency by Underhill et al. (2010).The tested set of 53 Malaysian Indian samples presented 100% frequency for the R1a1-Z93 subclade, without co-existence Z280 or M458 sub-haplogroups. Inner and Central Asia seem to be the overlap zones for the R1a1-Z280 and R1a1-Z93 chromosomes as both forms were observed at low frequencies. This is again consistent with the observations described for R1a1a* spread in Central Asia and in the Altai region by Underhill et al. (2010). This pattern suggests that the origin of R1a1-M198 arguably occurred somewhere between South Asia and Eastern Europe. Potential candidates could be the Eurasian Steppes (Ukraine – Southern Russia – Kazakhstan – Caucasus) or the Middle East. European populations showed higher M458 and Z280, whereas Asian populations presented higher Z93 frequencies, indicating that the new markers can be effectively used to distinguish between the European and Asian branches of the haplogroup R1a1-M198. [. . .]
The coalescent time calculated by us for R1a1-M458 carriers is consistent with the age calculated by Underhill et al. (2010) in Europe yielding 7.3 KYA versus 7.9 KYA (thousands of years ago). Underhill et al. (2010) also noted the potential association of R1a1-M458 with the Linear Pottery Neolithic culture in the territory of present-day Hungary—this observation is supported by our data. The TMRCA calculated for R1a1-Z280 diversification (10.3 KYA) is approximately in agreement with the estimation of Underhill et al. (2010) for R1a1a*(xM458) chromosomes in Eastern Europe ( 11 KYA). However, the coalescent age of 10.3 KYA for R1a1- Z93 chromosomes in this study is lower than that of populations of the Indus Valley (14 KYA) for the STR associated diversity of R1a1a*(xM458) chromosomes calculated by Underhill et al. (2010).

Of course, these markers and other markers defining additional layers of structure under M417 have been known for over a year. Budgetary constraints and the magic of peer review combine to render this paper relatively uninformative. One of the authors explains:

I have to agree with all, but those who never tried to push an article through a serious academic journal has no idea how difficult this is. The first version was submitted like 1 year ago, and also contained pedigree rates plus 500+ FTDNA samples from different ethnic groups. But unfortunately the reviewers were so narrow-minded that we had finally to drop all FTDNA samples plus the pedigree calcs.
Personally I also do not consider Zhiv. rate valid, but I had to accept this compromise to get the paper accepted. Anyway, as Lukasz pointed out, the main goal was to introduce Z93 and Z280 into the "academic circles" so in the future we may have a comprehensive paper from a more wealthy lab. The Budapest forensics are not full of money so we had no chance to have more than 12 markers tested and "low-chance SNPs" like Z284 in Hungary. Actually we submitted the first draft before Z283 was established securely on the FTDNA tree so we could not include it later...

My comments from last year on the dna-forums postings of an Underhill(lab that brought us Zhivotovsky "evolutionary" mutation rates)-affiliated academic stand:

Another poster points out: "Dividing by 3 [to bring the estimate more in line with real mutation rates] gives an age of 3300 years, almost exactly the estimate from Nordtvedt's spreadsheet." Someone else recently estimated the TMRCA for L342.2+ at around 3,600 years. So: if current patterns hold, the bulk of South Asian R1a unambiguously falls within European R1a variation. While I fully expect, when we eventually see results for these markers in large academic samples published, the papers will feature evolutionary mutation rates and less than parsimonious attempts to fit the distribution of M417 sublineages to archaeology, it's pretty clear to me Z93 and L342.2 originated on the Steppe within the past 4000 years or so and spread with Indo-Iranian.

Again: the most straightforward interpretation of the evidence is that Z93 is a relatively young branch of an evidently European lineage. Accurate, unbiased dates using SNPs instead of STRs should be here soon enough, definitively settling this and other issues.

Moment Magazine's great (Jewish) DNA experiment

23andMe:

In September, Moment Magazine got all nerdy and wrote about their Great DNA Experiment, in which they look at the 23andMe results of 15 notable Americans of Jewish ancestry and make some interesting genetic connections. It’s a good illustration of how our DNA can tell us about our interconnectedness.
The piece shows it’s not “six degrees” that separates these individuals from each other, but, in all but one case, no degrees of separation. This means that these individuals are all directly related to one another, albeit in most cases distantly. This was also news to the 15 participants.
All but one of the individuals has Ashkenazi Jewish ancestry. The one exception is Linda Chavez, the political commentator, who descended from Conversos, indivduals of Jewish and Muslim ancestry who converted to Catholicism during the Inquisition. Her ancestors eventually settled in New Mexico. But even in her case, although she isn’t directly related to any of the group, she is connected to each of the others individuals focused on in the piece by just one other individual in the 23andMe database.
The connections shown in the article are what prompted The New York Times columnist David Brooks and NPR “All Things Considered” host Robert Siegel to joke about learning they were distant cousins, sharing a common ancestor several generations back. Brooks kidded that he was “most surprised that our ancestors worked together on National Schetl Radio, on a program called ‘All Pogroms Considered.’” (Maybe the line needs a drum roll to work.)
The article shows the genetic connections between people like Mayim Bialik, the actress on the Big Bang Theory, and Stephen Dubner, co-author of Freakonomics. Or the connections between NPR’s Siegel, and Harvard Law professor, Alan Dershowitz, or his connections to 23andMe’s CEO’s mother Esther Wojcicki, a journalist and teacher. The magazine shows the connections and the amount of shared DNA, measured in centimorgans (cMs), to illustrate the “relatedness” of any two individuals. More closely related individuals share more DNA.

In the full article at the Moment Magazine website we also learn, for example, that Stephen Dubner's "first cousin once removed was Ethel Greenglass, wife of Julius Rosenberg", or that "Esther Wojicki's maternal-grandparents gave each of the boys among their 13 children a different surname in order to help them avoid conscription in the Russian army".

Our findings are typical of what geneticists would expect from a group of people of mostly Ashkenazi Jewish origin, says 23andMe's Mike Macpherson. Today's Ashkenazi Jews descend from approximately 25,000 ancestors who survived plagues and massacres in the 12th and 13th centuries. Survivors of these "bottlenecks" and other similar occurrences then married one another, sharing their DNA with millions of descendants.

Miscellaneous links

Mangan's is back.

New video depicts human migration across generations

A new video created by Whitehead Institute in collaboration with the genealogical website Geni.com shows the births of millions people, from the Middle Ages through the early 20th Century, as single dots on a black background. As time advances, those births define the coastlines and countries of Europe and Great Britain, then the Pilgrims’ voyage to the New World, the migration to Australia, the overland expansion of the United States through the Oregon Trail and Gold Rush, and the founding of Johannesburg, South Africa. [. . .]
In the future, Erlich and Daniel MacArthur, a Group Leader in Genetics at Massachusetts General Hospital and the Broad Institute, will be partnering with Geni to delve even deeper into the information submitted to the world’s largest collaborative genealogical website.

Richard III skeleton reveals 'hunchback king'

DNA tests are expected to take 12 weeks. The team will compare samples from the skeletal remains with the DNA of a direct descendant of the king's sister, Michael Ibsen, 55, a Canadian furniture maker who lives in London.

Biology and ideology: The anatomy of politics / From genes to hormone levels, biology may help to shape political behaviour.

Four Species of Homo You’ve Never Heard Of (not a profile of the writers and editors Counter Currents)

Kennewick Man update

Kennewick Man bones not from Columbia Valley, scientist tells tribes

The skeleton, more than 9,500 years old, has long been at the center of a rift between tribal members and scientists, led by Doug Owsley, a physical anthropologist at the Smithsonian Institution's National Museum of Natural History who spearheaded the legal challenge to gain access to the skeleton for scientific study.
Owsley says study shows that not only wasn't Kennewick Man Indian, he wasn't even from the Columbia Valley, which was inhabited by prehistoric Plateau tribes. [. . .]
Isotopes in the bones told scientists Kennewick Man was a hunter of marine mammals, such as seals, Owsley said. "They are not what you would expect for someone from the Columbia Valley," he said. "You would have to eat salmon 24 hours a day and you would not reach these values.
"This is a man from the coast, not a man from here. I think he is a coastal man." [. . .]
Pressed by Armand Minthorn of the Umatilla Board of Trustees, who asked Owsley directly, "Is Kennewick Man Native American?" Owsley said no. "There is not any clear genetic relationship to Native American peoples," Owsley said. "I do not look at him as Native American ... I can't see any kind of continuity. He is a representative of a very different people."
His skull, Owsley said, was most similar to an Asian Coastal people whose characteristics are shared with people, later, of Polynesian descent.
And, while tribes want the remains returned for reburial, Owsley said there is still much more to learn from the skeleton, which has largely been inaccessible but for two instances, in which a team of about 15 scientists could study it for a total of about two weeks.

Note: my own understanding is that Kennewick Man is broadly similar to other Paleoindians, and that historical Amerindians probably derive most of their ancestry from Paleoindians (with some later Asian gene flow and evolution in a more Mongoloid direction). On the other hand, it appears W. Eurasian-affiliated ancient Central Asians did contribute significantly to the ancestry of Paleoindians (and, to a lesser extent, to the ancestry of modern E. Eurasians in general), which is what I expect most of the heightened affinity between Northern Europeans and Amerindians found by Reich et al. is attributable to.

Testosterone Administration Reduces Lying in Men

From the Plos ONE article:

Testosterone is known to influence brain development and reproductive physiology but also plays an important role in social behavior [4]–[9]. While most studies have investigated a potential association between testosterone and aggressive behavior, two recent studies suggest that testosterone may also increase prosocial behavior or lead to less selfish behavior in certain situations [6], [9]. We therefore investigate a link between testosterone and self-serving lying. A prominent interpretation of the existing evidence on the role of testosterone in social behavior is that the hormone enhances dominance behavior, i.e., behavior intended to gain high social status [6]–[8], [10]–[14], which in humans can be aggressive or prosocial depending on the context. Recent research suggests that pride may have evolved as an affective mechanism for motivating such status seeking behavior [15]. Pride is indirectly linked to status seeking because it is an inward directed emotion that signals high status or ego. It has been speculated that testosterone helps translate such motivation into action, for example, in acts of heroic altruism [16], [17]. Importantly, an effect of testosterone on behavior via pride should also work if behavior cannot be observed by others and an individual’s status in the eyes of the others may therefore not be directly affected. [. . .]
Our main finding is a lower incidence of self-serving lies in the testosterone group. [. . .]
While we can rule out a belief effect we cannot ultimately conclude whether our findings are driven by a direct influence of testosterone on prosocial preferences or via increased status concerns. A potential interpretation for our findings is that testosterone administration affects a concern for self-image [25], or pride [16], i.e., enhances behavior which will make a subject feel proud and leads to the avoidance of behavior considered “cheap” or dishonorable. Subjects in our testosterone group may therefore lie less. This is intriguing because pride could be an affective mechanism underlying a link between testosterone and dominance behavior. An interpretation of our findings in terms of pride is in line with anecdotal and correlational evidence indicating that testosterone plays a positive part in heroic altruism [17]. It is also in line with reports that high testosterone individuals display more disobedient behavior in prison environments where proud individuals may be less willing to follow the strict rules and comply with orders [26], [27]. Finally, a relation between pride, testosterone, and the willingness to engage in “cheap” behavior also fits the observation that the five inmates with the lowest testosterone levels in a sample of 87 female prison inmates were characterized as “sneaky” and “treacherous” by prison staff members [27]. Further experiments manipulating whether lying is an honorable action (e.g., lying for charity) or not (lying for self) are needed to clarify the role of pride in the effect of testosterone on human social behavior. An alternative interpretation of our results, which we cannot rule out, is that testosterone has a direct effect on prosocial behavior, making people more honest per se.

The press release:

The researchers compared the results from the testosterone group to those from the control group. "This showed that the test subjects with the higher testosterone levels had clearly lied less frequently than untreated test subjects," reports the economist Prof. Dr. Armin Falk, who is one of the CENS co-directors with Prof. Weber. "This result clearly contradicts the one-dimensional approach that testosterone results in anti-social behavior." He added that it is likely that the hormone increases pride and the need to develop a positive self-image. "Against this background, a few euros are obviously not a sufficient incentive to jeopardize one's feeling of self-worth," Prof. Falk reckons.

Dutch ancestry - two NYT articles

The Van Dusens of New Amsterdam:

As with the Old Testament patriarch who gave birth to a nation, it all began with Abraham, whose forebears were from the town of Duersen in northern Brabant. Known in official documents as “Abraham the miller,” or “Abraham Pieterszen,” as in son of Peter, he landed on the island of “Manatus” some time before February 1627. Nearly 400 years later, he has more than 200,000 descendants over 15 generations scattered across the Americas, according to several genealogical experts who have built on intensive studies of the family over the centuries. In the 1880 census, there were 3,000 heads of household with the name Van Dusen — or Van Deusen, Van Deursen, Van Duzer and other common variants — all, the experts say, traceable back to Abraham the miller.
Theirs is among a small cohort of large, long-running Dutch families — including under-the-radar Rapeljes, with more than a million descendants, and the more prominent Kips and Rikers, with their names on neighborhoods and institutions — whose well-documented histories provide a compelling window into the development of what would become New York and, later, the United States.
Two of Abraham’s progeny — Martin Van Buren, a great-great-great-grandson; and Franklin Delano Roosevelt (add four more greats) — served as presidents of the United States. A third, Eliza Kortright (Generation 7), married one, James Monroe. Egbert Benson (Generation 6) was the first attorney general of postcolonial New York. The Rev. Dr. Henry Pitney Van Dusen, a theologian (Generation 10), made the cover of Time magazine in 1954.
There were family members on both sides of the early border wars between New York and Massachusetts, the War of Independence and the Civil War. At the Battle of Gettysburg, Pvt. William Jackson Raburn of Indiana’s “Fighting 300” died of a gunshot wound on July 2, 1863; a day later, Matthew Henry Van Dusen — Raburn’s fourth cousin twice removed (by marriage) — a “reb” with the fabled Hood’s Texas Brigade, was sidelined with a head injury.
Cornelis Kortright (Generation 5) owned slaves accused of participating in a “Negro plot” in 1741. Jan Van Deusen Jr., Kortright’s second cousin, saved New York’s historical records when the British burned the state’s first capital to the ground in 1777. [. . .]
Phoebe shares her father’s fascination with the family, particularly since she read some of the excerpts from her great-great-great-great-grandfather’s Civil War diary. “It kind of amazed me that I knew someone who was part of what I was studying in school in textbooks,” she said. “A lot of my friends’ parents just came here and don’t speak English yet. And some came here two generations ago. The one who has been here the longest came from Scotland, and that’s only a hundred years.”

Jets’ Tebow Can Trace His Lineage to New Jersey:

Tim Tebow arrives in New Jersey, where the Jets practice and play, as the world’s most famous backup quarterback. It is a homecoming, of sorts, centuries in the making, because Tebow appears to be the great-great-great-great-great-great-great-great-grandson of a man from Hackensack.
MetLife Stadium, home of the Jets and the Giants in East Rutherford, is about 10 miles from where an immigrant, Andries Tebow (spelled variously as Thybaut, Tibout, TeBow and other derivations), settled down after landing from Europe in the late 1600s. One of his children was Pieter, born in Hackensack and baptized there in 1696, records show.
More than 300 years and 10 generations later, Tim Tebow brings the family name full circle, according to the amateur genealogist — and Tebow’s fourth cousin, once removed — Dean Enderlin. [. . .]
It is unclear how much Tebow knows about his genealogy. While his own recent background is well chronicled — born to Christian missionaries in the Philippines, raised in Florida, now a preacher in a championship quarterback’s body — little has been examined about his deeper roots.
But there is no doubt that early generations of Tebows settled in what is now Bergen County, and Tim Tebow appears to be the latest link in a long chain of North Jersey arrivals. [. . .]
Enderlin said that, like many Tebows in the country, he and Tim Tebow can be traced to Andries Tebow, who sailed to the New World out of Bruges, Belgium. Enderlin is unsure where Andries lived — either Belgium or Holland — but he believes his family was Walloon, a French-speaking minority rooted in southern Belgium.
“Belgium was governed by the Catholic rulers of Spain and persecuted Protestants, forcing many to flee,” Myra Vanderpool Gormley wrote in an article for Genealogy Magazine titled, “Belgian Migrations: Walloons Arrived Early in America.”
“Many went to the northern parts of the Netherlands,” she wrote. “It was from their exile in Holland that they emigrated again.”

Dutch / English / Old American ancestry

Greg Cochran writes:

When responding to the Census, more than five million Americans claim to be of Dutch descent. And they mostly are, at least a little. Now you might wonder how they compare with the Dutch back in the Netherlands: you might wonder about the relative academic or economic success of these two groups, which presumably have a common ancestry. But you would be wrong to do so. You would be comparing apples and House of Orangemen.
There were four or five different Dutch waves of settlement in this country. The first is pretty well-known, the Dutch colony in New York. Of course, it was only about half Dutch in origin: the rest were Walloons and French Huguenots. Lots of people have some ancestry from that group, including people I know. Why, if there was any justice, Henry Harpending would own a fine farm on Manhattan Island right now.
Of course, Henry isn’t all that Dutch. His surname is. He comes from an area of New York State that really did have some Dutch settlement. The thing is, white Protestants in this country have been intermarrying rather freely for several hundred years: it is rare to find someone in that category whose ancestors all come from one ethnicity. I would be surprised if Henry is 1/8th Dutch. In much the same way, my patrilineal lineage is Ulster Scot (who fears mention the battle of the Boyne!?) but the rest includes English, Welsh, Scottish, Green Irish, and a component that, I suspect, only became Dutch in 1918, and was Bavarian before that. We’re talking about ye olde Americans, not Ellis Island types. Not that they haven’t mixed as well, but less so… [. . .]
Most of the people who self-identify as Dutch-Americans are mostly something else. Why? Sometimes a family tradition, or a surname, but more than anything else, fashion.
Fashions change. For example, the fraction of Americans who report English ancestry has dropped drastically since 1980 – so much that so that you would have to wonder about secret death camps if you took it seriously. But it’s fashion. I looked at the census numbers for my home county, and then looked at the phone book: Census result was 20% English ancestry, real number was more like 80%. Of course this means that people in the US claiming a particular ethnicity can not only have limited ancestry from that group, but be oddly unrepresentative as well.

Henry Harpending confirms:

I would probably put “Dutch” on a census form if an answer were required. I am either 1/32 or 1/64 Dutch, and worse the supposed Dutch ancestor was a Huguenot or something like that, so I am likely really 0% Dutch. No matter…….

I've commented on this phenomenon before (e.g.), but a periodic reminder is useful. I don't see a problem with someone identifying with his patrilineal national origin for census purposes while remaining aware of his overall ancestry. What I find irritating is the eagerness of some with American ancestry to identify as "Scotch-Irish" after reading a review of Albion's Seed, or "Celtic" in the name of Celtic Southronism, or "German" because they had a German great-grandfather, and then declare themselves at war with or at least safely distinct from evil/culpable "WASPs" / "Anglo-Saxons" (which appellations in reality describe the core of the breeding population from which the newly self-identified Borderer/Celt/German sprung).

Hilarious satire of Colin Laney / wintermute

Very funny take-off of CL/WM's tendency to come up with convoluted arguments in support of his view that Anglo-Saxon=bad and continental European=good.

ASHG 2012 abstracts (3): miscellaneous

The Myth of Random Mating: Evidence of ancestry-related assortative mating across 3 generations in Framingham, MA. R. Sebro^1,2, G. Peloso^3,4, J. Dupuis^5,6, N. Risch^1,7,8 1) Institute for Human Genetics, University of California, San Francisco, San Francisco, CA; 2) Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, CA; 3) Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA; 4) Program in Medical and Population Genetics, Broad Institute, Cambridge, MA; 5) Department of Biostatistics, Boston University School of Public Health, Boston MA; 6) The National Heart, Lung and Blood Institute’s Framingham Heart Study, Framingham, MA; 7) Department of Biostatistics and Epidemiology, University of California, San Francisco, San Francisco, CA; 8) Division of Research, Kaiser Permanente, Oakland, CA.

   The factors that influence spouse selection are important to geneticists because the mating pattern determines the genetic structure of a population. There has been evidence of positive assortative mating (PAM) related to several phenotypic traits like height. Ancestrally-related PAM is necessary for genetic population stratification, which means spouses are more likely to share genes of common ancestry. Prior studies have shown strong ancestry-related assortative mating among Latino populations. Here, Caucasian spouse pairs from the Framingham Heart Study (FHS) Original and Offspring cohorts (N=885) genotyped on Affymetrix 500K were analyzed using principal components (PC) analysis. Data from individuals genotyped in HapMap and the Human Genome Diversity Project (HGDP) were projected onto these PCs to facilitate interpretation. Based on these and other data, the first principal component delineates the prominent northwest-to-southeast European cline. In our data, there was clear clustering on this axis, probably separating individuals of English/Irish/German ancestry from those of Italian ancestry. The second principal component also reveals strong clustering, and likely reveals individuals of Ashkenazi Jewish ancestry. In the Original (older) cohort, there is a very strong correlation in PC1 between the spouses (r=0.73, P=2e-22) and also for PC2 (r=0.80, P=4e-29). In the Offspring cohort the spouse correlations were lower but still highly significant: r=0.38, P=3e-28 for PC1 and r=0.45, P =9e-40) for PC2. Examination of scatter plots for spouse pairs in the two generations reveals both a reduction in clustering and lower but still evident correlation in the Offspring cohort. Of genetic impact, we observed highly significant Hardy-Weinberg disequilibrium (homozygote excess) for SNPs loading heavily on PC1 and PC2 across 3 generations, and also highly significant linkage disequilibrium between the same set of SNPs located on different chromosomes. These results are consistent with demographic patterns of social homogamy which have existed in Framingham over several generations, and a general trend of reduced homogamy over time. While Framingham is not representative of the general US population, its historic mating patterns serve as a reminder that assumptions of Hardy Weinberg and Linkage Equilibrium need to be made with caution when applied to genetic loci that are related to ancestry in any population.

A web-based initiative to accelerate research on genetics and disease in African Americans. K. E. Barnholt¹, A. K. Kiefer¹, H. L. Gates, Jr.², M. Nelson¹, M. Mullins¹, E. Baker³, J. Frank¹, C. D. Bustamante⁴, T. W. Love⁵, R. A. Kittles⁶, N. Eriksson¹, J. L. Mountain¹ 1) 23andMe, Inc., Mountain View, CA; 2) W.E.B. Du Bois Institute for African and African American Studies, Harvard University, Cambridge, MA; 3) 23andYou.com; 4) Department of Genetics, Stanford University School of Medicine, Stanford, CA; 5) Onyx Pharmaceuticals, Inc., South San Francisco, CA; 6) College of Medicine, University of Illinois at Chicago, Chicago, IL.

   Little is known about the connections between DNA and disease in African Americans, in part because most genetics research has involved only those of European ancestry. Greater understanding of such connections could improve diagnoses and lead to opportunities for more personalized health care. In 2011 23andMe, Inc., a personal genomics and research company, launched the Roots into the Future initiative, which aims to enroll 10,000 African Americans in an innovative research project. The study seeks to determine whether genetic associations previously identified in Europeans are relevant to African Americans and to discover other genetic markers linked to conditions of particular relevance to the African American community. Currently the 23andMe cohort includes nearly 10,000 African Americans, over 5700 of whom were recruited through the Roots into the Future initiative. Each of these individuals (58% female, 42% male; mean age: 44) has submitted a saliva sample for genotyping via 23andMe’s custom genotyping array, which includes approximately 1 million single nucleotide polymorphisms. Participants are currently contributing information about their health and traits through online surveys. To date over 6200 participants have completed an average of 10.6 surveys. Using the genetic data we estimated the percent African and European ancestry of each participant. Median estimates were 73% and 23% respectively (with 4% uncertain). As expected, the higher a person’s proportion of European ancestry, the greater the chance that person carries variants that are more common among Europeans than among Africans, such as those linked to HIV-resistance and alpha-1 antitrypsin deficiency. Furthermore, the higher a person’s proportion of African ancestry, the more likely that person reported having curly hair, high blood pressure and type 2 diabetes, and the less likely that person reported having facial wrinkles, rosacea and Parkinson’s Disease. Based on data for over 8700 individuals likely to self-identify as African American, we replicated over 25 genetic associations reported previously for African Americans, including those for body-mass index, type 2 diabetes, lupus, height, and osteoporosis. For conditions for which we have already accrued at least 500 cases among this cohort, such as asthma, migraines, and uterine fibroids, we anticipate having power either to replicate associations identified through previous studies of Europeans or to find new associations.

Hidden heritability and risk prediction based on genome-wide association studies. N. Chatterjee¹, B. Wheeler², J. Sampson¹, P. Hartge¹, S. Chanock¹, J. Park¹ 1) National Institute of Health, Rockville, MD, USA; 2) Information management system, Rockville, MD.

   Known discoveries from genome-wide association studies have limited predictive ability for individual traits, but recent estimates of “hidden heritability” suggest that in the future performance of predictive models can be potentially enhanced by incorporation of a large number of SNPs each with individually small effects. We use a novel theoretical model, discoveries from the largest genome-wide association studies and recent estimates of hidden heritability to project the predictive performance of polygenic models for ten complex traits as a function of the number and distribution of effect sizes for the underlying susceptibility SNPs, the sample size of the training dataset and the balance of true and false positives associated with the SNP selection criterion. We project, for example, that while 45% of the total variance of adult height has been attributed to common variants, a predictive model built based on as many as one million people may only explain 33.4% of variance of the trait in an independent sample. For rare highly familial conditions, such as Type 1 diabetes and Crohn’s disease, risk models including family history and optimal polygenic scores built based on current GWAS can identify a large proportion (e.g 80-90%) of cases by targeting a small group of high-risk individuals (e.g subjects with top 20% risk). In contrast, for more common conditions with modest familial components, such as Type 2 diabetes (T2D), coronary heart disease (CAD) and prostate cancer (PrCA), risk models built based on GWAS with current or foreseeable sample sizes (e.g triple in size) can miss large proportion (>50%) of cases by targeting a small group of high-risk individuals. For these common disease, the proportion of the population that can be identified to have 2-fold or higher risk than an average person in the population ranged between 1.1% (CAD) and 7.0% (PrCA) for polygenic models built based on current GWAS. If the sample size for future studies could be tripled, these proportions could range between 6.1% (CAD) and 18.8% (T2D). Our analyses suggest that the predictive utility of polygenic models depends not only on heritability, but also on achievable sample sizes, effect-size distribution and information on other risk-factors, including family history.

GWAS Identifies Biologically Relevant SNP Associations with Sexual Partnering Behavior. J. Gelernter^{1, 2}, H. R. Kranzler³, R. Sherva⁴, R. Koesterer⁴, L. Almasy⁵, H. Zhao¹, L. A. Farrer⁴ 1) Yale University School of Medicine, New Haven, CT; 2) VA CT Healthcare Center, West Haven, CT; 3) University of Pennsylvania School of Medicine, Philadelphia, PA; 4) Boston University School of Medicine, Boston, MA; 5) Texas Biomedical Research Institute, San Antonio, TX.

   The specific factors influencing human sexual partnering are poorly understood. Arguably, in the pre-modern era, multiple mating may have been tied to selection for traits related to survival including resistance to infection and starvation, strength, and certain behaviors. Recently, we completed a GWAS using the Illumina Omni-Quad microarray in ~5800 African- and European-American (AA and EA) participants in genetic studies of alcohol, cocaine, and opioid dependence. Subjects were interviewed using the Semi-Structured Assessment for Drug Dependence and Alcoholism (SSADDA) - an instrument that covers all major DSM-IV diagnoses as well as other numerous psychiatric and lifestyle traits. One of these is the response to: “How many sexual partners have you had in your life?” Association of age-adjusted residuals of this variable with more than 3 million SNPs reliably imputed using the 1K Genomes reference panel was tested in each sex*population subgroup using generalized estimating equations. Results from subgroup analyses were combined by meta analysis. SNPs with p-values <1E-06 were genotyped in a replication sample of ~2300 subjects. Genomewide-significant results were obtained for 13 SNPs including ones that map to genes coding proteins involved in reproductive-related functions (e.g., rs74738626 in KCNU1 which encodes a testes-specific K+ channel [p=1.2E-12], rs78227383 in NME5, a nucleoside diphosphate kinase which may have a specific function in the phosphotransfer network involved in spermatogenesis [p=4.0E-11 in EAs only], and rs76221611 in CCND2 which encodes cyclin D2, shown to be highly expressed in ovarian and testicular tumors [p=3.3E-11 in AAs only]), immune response (e.g., rs2709778 in GARS which encodes gylcyl-tRNA synthetase shown to be a target of autoantibodies in human autoimmune diseases [p=1.0E-10 in males only]), and other genes of biological interest (e.g., rs10849971 in ALDH2, an alcohol-metabolizing enzyme that is also an alcohol dependence risk locus [p=9.6E-09 in females only]). These findings have clear implications with respect to normal sexual function and potentially for risk of sexually transmitted disease.

Ubiquitous polygenicity of human complex traits: genome-wide analysis of 49 traits in Koreans. J. Yang¹, T. Lee², J. Kim³, S. Cho⁴, P. Visscher^1,5, H. Kim^2,3,4 1) University of Queensland Diamantina Institute, University of Queensland, Brisbane, Queensland, Australia; 2) Department of Agricultural Biotechnology, Seoul National University, Seoul, Korea; 3) Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea; 4) C&K Genomics, Seoul, Korea; 5) The Queensland Brain Institute, The University of Queensland, Brisbane, Queensland 4072, Australia.

   Recent studies in population of European ancestry have shown that 30-50% of heritability for human complex traits such as height (Yang et al. 2010) and body mass index (Yang et al. 2011), and common diseases such as schizophrenia (Lee et al. 2012) and rheumatoid arthritis (Stahl et al. 2012) can be captured by common SNPs, and that genetic variation can be attributed to chromosomes, in proportion to their length. Using genome-wide estimation and partitioning approaches, we analyzed 49 human quantitative traits, many of which are relevant to human diseases, in 7,170 unrelated Korean individuals genotyped on 326,591 SNPs. For 43 of the 49 traits, we estimated a significant (P < 0.05) proportion of variance explained by all SNPs (h2G). On average across 47 of the 49 traits for which the estimate of h2G is non-zero, 13.4% (range of 3.4% to 31.6%) of phenotypic variance can be explained by all the SNPs being analysed, or approximately one-third (range of 7.8% to 76.8%) of narrow sense heritability. In contrast, on average across 25 of the 49 traits, the top associated SNPs at genome-wide significance level (P < 5e-8) explain 1.5% (range of 0.5% to 3.8%) of phenotypic variance. The majority (~92%) of explained variation estimated from all SNPs is captured by the SNPs with p-values < 0.031 in single SNP association analyses. Longer genomic segments tend to explain more phenotypic variation, with a correlation of 0.78 between the estimate of variance explained by individual chromosomes and their physical length. This correlation was stronger (0.81) for intergenic regions. Despite the fact that there are a few SNPs with large effects for most traits, these results suggest that polygenicity is ubiquitous for most human complex traits, and that a substantial proportion of heritability is captured by common SNPs.

What is the total SNP-associated heritability for alcohol dependence? N. G. Martin¹, G. Zhu¹, P. A. Lind¹, A. C. Heath², P. A. F. Madden², M. L. Pergadia², G. W. Montgomery¹, J. B. Whitfield¹ 1) Genetic Epidemiology, Queensland Institute of Medical Research, Brisbane, Australia; 2) Department of Psychiatry, Washington University School of Medicine, St Louis, MO, USA.

   Background. Much has been written about the so-called “missing heritability” for complex traits. Nowhere is this more pertinent than for alcohol and nicotine dependence (AD, ND) for which there are estimates of heritability of up to 65% from twin studies, yet few causal variants have been replicated from GWAS studies, despite large sample sizes, suggesting that individual effect sizes of SNPs must be very small. Recently new statistical genetic techniques have been developed which allow estimation of the total variance associated with all SNPs on a GWAS chip, but this has yet to be applied to AD and ND. Methods. The current analysis is based on AD and ND symptom count data from over 8000 participants in our population-based twin-family studies who have used either alcohol or cigarettes at some stage of their lives. They were individually genotyped with Illumina 370K or 660K chips and 7.034M genotypes were imputed from HapMap 3 and 1000-Genomes data. The GCTA program of Yang, Visscher et al is used first to detect the degree of relatedness between apparently unrelated subjects, based on a set of about 300,000 SNPs pruned for LD. Phenotypic similarity is then regressed on IBS sharing for all possible relative pairs to estimate the total amount of variance due to SNPs on the chip. Results. Based on GCTA analysis for other complex traits we expect to find SNP associated variance accounting for about half the heritability estimated from conventional genetic epidemiology designs. However, these estimates are highly sensitive to population stratification so great care will be taken to remove all traces of population stratification during the analysis. Conclusions. The gap between the SNP-associated variance estimated by GCTA and twin and family estimates of heritability is most likely due to several factors. First, the tag SNPs on the chip are not in perfect LD with the causal SNPs; for other traits, simulation has shown that correcting for imperfect LD raises the SNP “heritability” by about 10%. Another major factor is that commercial chips only interrogate common SNPs so large effects of rare SNPs are simply not captured. Reasonable estimates from simulations suggest that this could account for another 20% of variance. Finally, we recognize that there are large sections of the genome containing highly repetitive DNA which are very poorly tagged by current chips, and where substantial proportions of genetic variance may be hidden.

What is the total SNP-associated heritability for alcohol dependence? N. G. Martin¹, G. Zhu¹, P. A. Lind¹, A. C. Heath², P. A. F. Madden², M. L. Pergadia², G. W. Montgomery¹, J. B. Whitfield¹ 1) Genetic Epidemiology, Queensland Institute of Medical Research, Brisbane, Australia; 2) Department of Psychiatry, Washington University School of Medicine, St Louis, MO, USA.

   Background. Much has been written about the so-called “missing heritability” for complex traits. Nowhere is this more pertinent than for alcohol and nicotine dependence (AD, ND) for which there are estimates of heritability of up to 65% from twin studies, yet few causal variants have been replicated from GWAS studies, despite large sample sizes, suggesting that individual effect sizes of SNPs must be very small. Recently new statistical genetic techniques have been developed which allow estimation of the total variance associated with all SNPs on a GWAS chip, but this has yet to be applied to AD and ND. Methods. The current analysis is based on AD and ND symptom count data from over 8000 participants in our population-based twin-family studies who have used either alcohol or cigarettes at some stage of their lives. They were individually genotyped with Illumina 370K or 660K chips and 7.034M genotypes were imputed from HapMap 3 and 1000-Genomes data. The GCTA program of Yang, Visscher et al is used first to detect the degree of relatedness between apparently unrelated subjects, based on a set of about 300,000 SNPs pruned for LD. Phenotypic similarity is then regressed on IBS sharing for all possible relative pairs to estimate the total amount of variance due to SNPs on the chip. Results. Based on GCTA analysis for other complex traits we expect to find SNP associated variance accounting for about half the heritability estimated from conventional genetic epidemiology designs. However, these estimates are highly sensitive to population stratification so great care will be taken to remove all traces of population stratification during the analysis. Conclusions. The gap between the SNP-associated variance estimated by GCTA and twin and family estimates of heritability is most likely due to several factors. First, the tag SNPs on the chip are not in perfect LD with the causal SNPs; for other traits, simulation has shown that correcting for imperfect LD raises the SNP “heritability” by about 10%. Another major factor is that commercial chips only interrogate common SNPs so large effects of rare SNPs are simply not captured. Reasonable estimates from simulations suggest that this could account for another 20% of variance. Finally, we recognize that there are large sections of the genome containing highly repetitive DNA which are very poorly tagged by current chips, and where substantial proportions of genetic variance may be hidden.

Vascular Stiffness in a Healthy High Risk African American Population is Modified by the Extent of European Admixture. D. Vaidya, R. A. Mathias, L. R. Yanek, L. C. Becker, D. M. Becker Medicine, Johns Hopkins University, Baltimore, MD.

   Background: Compared to European Americans (EA), African-Americans (AA) have stiffer peripheral vessels, reflected in reduced carotid distensibility coefficient (DC). To determine whether this racial difference may be genetically determined, we examined the extent to which the variance in carotid distensibility in AA could be explained by EA admixture either at a global or local at genomic level. Methods: We examined data from 344 AA, 62% women, aged 25-76 years, enrolled in a large study (GeneSTAR) of apparently healthy people with a family history of early-onset coronary artery disease. DC was assessed using 2D ultrasound, calculated as 2*(fractional change in diameter from diastole to systole)/(systolic -diastolic blood pressure). By its calculation DC is inherently corrected for blood pressure levels. EA admixture was determined using a panel of 50,000 ancestry informative markers (deCODE Genetics), and local ancestry was calculated on Illumina Human 1M genomewide SNP panel using LAMP. Associations of log-transformed DC were tested using mixed model regressions adjusted for age, sex, sex*age interaction and within-family correlations. LAMP models were adjusted for population stratification PCAs derived from the Illumina 1M SNPs (EIGENSTRAT). Results: The median [interquartile range] of the DC was 0.0017 [0.0012-0.0024] mmHg^-1. Every 10% incremental level of EA admixture was associated with 5% higher DC (95% CI: 1% to 9%, p=0.005), reflecting more distensibility, and less stiffness. In genomewide local ancestry analysis adjusted for sex, age, sex*age interaction, population stratification PCAs and within-family correlations, of 2756 genome segments in local ancestry LD, the highest association for local ancestry was found in Chromosome 8, positions 8.3M to 10M (Build 37.3), p=0.0012. On adjusting for local ancestry in this region, population stratification PCA1 representing global Caucasian ancestry was no longer significantly associated with DC (p=0.93). Conclusions: The racial difference in arterial distensibility between AA and EA is likely to have a basis in genetic admixture. We have identified a candidate region on chromosome 8 that may be responsible for this global admixture association.

A population isolate reveals enriched recessive deleterious variants underlying neurodevelopmental traits. O. Pietilainen^1,2,3, J. Suvisaari⁵, W. Hennah², V. Leppa², T. Paunio^2,3,4, M. Torniainen⁵, S. Ripatti^1,2, S. Ala-Mello⁶, K. Rehnstrom¹, A. Tuulio-Henriksson⁵, T. Varilo², J. Tallila¹, K. Kristiansson², M. Isohanni⁷, J. Kaprio², J. Eriksson⁸, M. Jarvelin⁹, R. Durbin¹, J. Lonnqvist^4,5, M. Hurles², H. Stefansson¹⁰, N. Freimer¹¹, M. Daly¹², A. Palotie^1,2,12 1) The Wellcome Trust Sanger Institute, Cambridge, Cambridge, United Kingdom; 2) Institute for Molecular Medicine Finland FIMM, Helsinki, Finland; 3) National Institute for Health and Welfare, Public Health Genomics Unit, Helsinki, Finland; 4) University of Helsinki and Helsinki University Central Hospital, Department of Psychiatry, Helsinki, Finland; 5) National Institute for Health and Welfare, Department of Mental Health and Substance Abuse Services, Helsinki, Finland; 6) Helsinki University Central Hospital, Department of Clinical Genetics, Helsinki, Finland; 7) Department of Psychiatry, Institute of Clinical Medicine, University of Oulu, Finland; 8) National Institute for Health and Welfare, Chronic Disease Epidemiology and Prevention, Helsinki, 90014, Finland; 9) Institute of Health Sciences, University of Oulu, Oulu, Finland; 10) deCODE genetics, 101 Reykjavik, Iceland; 11) Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, UCLA, Los Angeles, California, USA; 12) The Broad Institute of MIT and Harvard University, Cambridge, Ma, USA.

   Low frequency variants (MAF <5%) likely contribute to susceptibility for complex traits, but their study is challenging in admix populations. We hypothesize that population isolates that have experienced bottlenecks would have an enrichment of specific low frequency variants some of which could be predisposing to complex traits. This enrichment could benefit especially identification of variants with recessive effects. To test this hypothesis, we studied homozygous deletions in a prospective birth cohort from an isolated Northern Finnish population (N=4,931). The role of rare deletions being clearly establish in abnormal neuronal development led us to constrain our initial analysis to seven supposedly relevant phenotypes including diagnosis of schizophrenia, intellectual deficit, learning difficulties, epilepsy, neonatal convulsion, impaired hearing and cerebral palsy/perinatal brain damage. The analysis included 32,487 homozygous deletions in 205 loci of which 11% included exons of one or more genes. Among the seven traits studied, the strongest association was found with impaired hearing and a deletion on 15q15.3, overlapping STRC, previously associated with deafness (p = 10-4). The largest identified homozygous deletion was 240 kb on 22q11.22 and was associated with intellectual deficit (p<0.02). The deletion showed significant regional enrichment in an internal north-eastern isolate with 3-fold risk of schizophrenia compared to elsewhere in the country. Follow up of the deletion in 265 schizophrenia patients and 5140 controls revealed an allelic association with schizophrenia (p= 0.02, OR = 1.9) and was further replicated in 9,539 cases and 15,677 controls of European origin (p = 0.03, OR = 2.1). After screening over 13,106 Finns, we identified four individuals being homozygous for the deletion, all diagnosed with schizophrenia and/or intellectual disability. The deletion overlaps a gene encoding for TOP3B and was found to down regulate its expression to half among heterozygous carriers and zero in homozygous carriers (p < 10-10). Our results demonstrate the effect of multiple consecutive population bottlenecks in the enrichment of sizable deletions contributing to abnormal neuronal development. In addition the findings highlight the usefulness of population isolates in studying rare and low frequency variants in complex traits.

Identifying age- and sex- associated gene expression profiles in >7,000 whole-blood samples. M. J. Peters^1,2,17, R. Joehanes^3,17, T. Esko^4,17, K. Heim^5,17, H. Völzke^6,17, L. Pilling^7,17, J. Brody^8,17, Y. F. Ramos^9,17, B. E. Stranger^10,11, M. W. Christiansen⁸, S. Gharib⁸, R. Hanson¹², A. Hofman^2,13, J. Kettunen¹⁴, D. Levy³, P. Munson³, C. O’Donnell³, B. Psaty⁸, F. Rivadeneira^1,2,13, A. Suchy-Dicey⁸, A. G. Uitterlinden^1,2,13, H. Westra¹⁵, I. Meulenbelt^2,9,17, D. Enquobahrie^8,17, T. Frayling^7,17, A. Teumer^16,17, H. Prokisch^5,17, A. Metspalu^4,17, J. B. J. Van Meurs^1,2,17, A. D. Johnson^3,17 on behalf of the CHARGE Gene Expression Working Group. 1) Department of Internal Medicine, Erasmus Medical Centre, Rotterdam, the Netherlands; 2) Netherlands Genomics Initiative-Sponsored by the Netherlands Consortium for Healthy Aging, Rotterdam and Leiden, the Netherlands; 3) Framingham Heart Study, National Heart, Lung and Blood Institute, Framingham, USA; 4) Estonian Genome Center and Institute of Molecular and Cell Biology of University of Tartu, Estonia; 5) Institute of Human Genetics, Technische Universität München, Munich, Germany; 6) Institute for Community Medicine, University Medicine Greifswald, Germany; 7) Epidemiology and Public Health, Peninsula College of Medicine and Dentistry, University of Exeter, UK; 8) Cardiovascular Health Research Unit, Departments of Medicine and Epidemiology, University of Washington, Seattle, WA, United States; 9) Department of Molecular Epidemiology, Leiden University Medical Center, Leiden, the Netherlands; 10) Division of Genetics, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA; 11) Broad Institute of Harvard and MIT, Cambridge, USA; 12) Phoenix Epidemiology and Clinical Research Branch, NIDDK, National Institute of Health, Phoenix, AZ, USA; 13) Department of Epidemiology, Erasmus Medical Centre, Rotterdam, the Netherlands; 14) Institute for Molecular Medicine Finland FIMM, University of Helsinki, Finland; 15) Department of Genetics, University of Groningen, University Medical Center Groningen, the Netherlands; 16) Interfaculty Institute for Genetics and Functional Genomics, Ernst-Moritz-Arndt University Greifswald, Germany; 17) Contributed Equally.

   Genome-Wide Expression Profiles (GWEPs) have been assayed in a growing number of cohort studies, but few attempts have been made to meta-analyse and cross-validate expression datasets. Consequently, many expression studies have been under powered. Therefore, we established a large-scale multi-cohort GWEP meta-analysis. The aim of this study was to robustly identify novel gene expression signatures associated with age and sex, two major risk factors for many diseases. We analyzed 6,993 European-ancestry PAXgene (whole-blood) samples from 6 cohort studies (RS, FHS, EGCUT, KORA, SHIP, INCHIANTI). GWEPs were quantile-normalized, log2-transformed, probe-centered and sample-z-transformed prior to analysis. In the discovery stage we meta-analysed age- and sex-associated signals for samples hybridized to an Illumina or Affymetrix array separately. All analyses were adjusted for plate ID, RNA quality, fasting- and smoking status, and cell counts (when available). The age analysis was additionally adjusted for sex. All significant signals were cross-validated between the Illumina and Affymetrix platforms. We examined the top-associated GWEPs in 3 additional studies: HVH (n=348), GARP (n=134), and NIDDK/PIMA (n=1457). We identified 396 age-associated transcripts with p<1E-5 and same direction in both platforms. NELL2, a protein kinase C-binding protein, was the most significant result with gene expression levels decreasing with age (Illumina p=8.2E-81, Affymetrix p=3.2E-64). NELL2 is involved in cell growth regulation and differentiation, and there is evidence for developmental fluctuation in puberty. We identified 347 transcripts differentially expressed between males and females(p<1E-5, same direction both platforms), of which >200 show mapping to sex chromosomes. The top autosomal gender-differentiated transcript is DACT1, which has higher mRNA levels in females (Illumina p=2.4E-47, Affymetrix p=1.6E-75). DACT1 is an antagonist of beta-catenin and prior work indicates it to be differentially methylated in testes. It is a biomarker for semen and DACT1 knockout mice showed developmental defects. Both the NELL2 and the DACT1 signals were replicated in all 3 additional cohorts. With the GWEP meta-analysis, we gained power relative to individual cohort analyses, and were able to identify novel replicable significant age- and sex- associated loci. These loci may have implications for age-related disease biology, gender biology, and in sample forensics.

Genetic variants in pigmentation genes, skin color, and risk of skin cancer in Japanese. T. Suzuki¹, Y. Abe¹, J. Yoshizawa¹, Y. Hozumi¹, T. Nakamura², G. Tamiya² 1) Dept Dermatology, Yamagata Univ Sch Med, Yamagata, Japan; 2) Advanced Molecular Epidemiology Research Institute, Yamagata Univ Sch Med, Yamagata, Japan.

   Melanin pigmentation plays an important role in shielding the body from ultraviolet (UV) radiation and may serve as a scavenger for reactive oxygen species. More than 150 genes have been implicated in determining in mice, and include transcription factors, membrane and structural proteins, enzymes, and several kinds of receptors and their legands, most of which have human orthologues. Although many molecular mechanisms involved in melanin pigmentation are being determined, relatively little is understood about the genetic component responsible for variations in skin color within or between human populations. First, in order to reveal their genetic contribution to skin color, we examined the association of pigmentation-related genes variants and variations in the melanin index in members of the general Japanese population whose skin color was objectively measured by reflectometry. The multiple regression showed that OCA2 A481T rs74653330 (p = 6.18e-8) and, OCA2 H615R rs1800414 (p = 5.72e-6) were strongly associated with the mean of the melanin index in the female population. Three variants (SLC45A2 T500P rs11568737 p = 0.048, OCA2 T387M p = 0.015, TYR D125Y rs13312741 p = 0.022) were also significantly associated with melanin index. However, no significant associations were found between age and melanin index for variants of MC1R. Second, we evaluated the associations of the pigmentation-related genes variants and the risk of skin cancer. The statistical analysis revealed that only OCA2 H615R was associated with the risk of all skin cancers, especially malignant melanoma. We could not find any statistical significance in the associations of other variants, including OCA2 A481T, or melanin index with the risk of skin cancer. This is the first report on the association between the genetic variants in pigmentation genes and the risk of skin cancer in East Asian population.

You may contact the first author (during and after the meeting) at tamsuz@med.id.yamagata-u.ac.jp

Molecular phylogeny of an autosomal region under natural selection. V. A. Canfield¹, A. Berg¹, S. Peckins¹, S. Oppenheimer², K. C. Cheng¹ 1) Penn State College of Medicine, Hershey, PA; 2) Oxford University, Oxford, UK.

   The derived (A111T) variant of SLC24A5 is associated with lighter skin pigmentation compared to the ancestral allele. A111T is fixed or nearly fixed in most European, North African and Middle Eastern populations, extending east to Pakistan. In Europeans, a large genomic region of diminished variation on chromosome 15, nearly 150 kb in extent, includes SLC24A5. We analyzed the haplotypes in this region using existing genomic data. Eleven haplotypes, defined on the basis of 16 SNPs that span a 76 kb genomic region in which recombination was rare, account for 95% of the total. A single haplotype (here called C11) carries A111T, suggesting that its origin did not long predate the onset of selection. Haplotype C11 was the product of recombination between haplotypes C3 and C10, followed by the A111T mutation. C3 and C10 are both present in East Asia and the New World but virtually absent in Africa, suggesting that C11 originated outside of Africa, most likely in the Middle East. The current distribution of A111T is consistent with the view that it originated after the divergence between populations that settled Europe and those that settled East Asia.

You may contact the first author (during and after the meeting) at vac3@psu.edu


Sharing by descent, phasing, rare variants and population structure. A. Kong deCODE Genet., Reykjavik, Iceland. Session Descriptions: Identity by descent (IBD) is fundamental to genetics and has diverse applications. Recently developed statistical methods and genome-wide SNP data have made it possible to detect haplotypes shared identically by descent between individuals with common ancestry up to 25-50 generations ago. With sequence data, shared haplotypes from even more distant ancestry can be detected. Patterns of IBD segment sharing within and between populations reveal important population demographic features including recent effective population size and migration patterns. IBD segment sharing is directly relevant to disease gene mapping and estimation of heritability. Individuals who share a genetic basis for a trait are more likely to have IBD sharing compared to randomly chosen individuals, and this forms the basis for IBD mapping and heritability estimation. Analysis of data from extended pedigrees was extremely difficult with standard linkage approaches, but is now possible using approaches based on detected IBD segments. Detected IBD can be present across pedigrees, which enhances power to detect association with the trait. Further, in population samples there is potential to utilize detected IBD segments to improve power to detect association when multiple variants within a gene influence the trait. IBD segments can also be used to greatly improve haplotype phase estimates, which is critical to understanding the functional consequence of genetic variation. IBD-based long-range phasing has previously been shown to be effective in isolated populations such as Iceland, but recent advances have extended its application to large outbred populations. In this session, we explore these exciting new developments.

ASHG 2012 abstracts (2): physical traits

Chromosome X revisited - Variants in Xq21.1 associate with adult stature in a meta-analysis of 14,700 Finns. T. Tukiainen¹, J. Kettunen^1,2, A.-P. Sarin^1,2, J. G. Eriksson^3,4,5,6,7, A. Jula⁸, V. Salomaa³, O. T. Raitakari^9,10, M.-R. Järvelin^11,12, S. Ripatti^1,2,13 1) Institute for Molecular Medicine Finland FIMM, University of Helsinki, Finland; 2) Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Finland; 3) Department of Chronic Disease Prevention, National Institute for Health and Welfare, Finland; 4) Department of General Practice and Primary Healthcare, University of Helsinki, Finland; 5) Unit of General Practice, Helsinki University Central Hospital, Finland; 6) Folkhälsan Research Center, Helsinki, Finland; 7) Vaasa Central Hospital, Vaasa, Finland; 8) Population Studies Unit, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Turku, Finland; 9) Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Finland; 10) Department of Clinical Physiology, Turku University Hospital, Finland; 11) Department of Epidemiology and Biostatistics, Faculty of Medicine, Imperial College London, United Kingdom; 12) Institute of Health Sciences, Biocenter Oulu, University of Oulu, Finland; 13) Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

   Genome-wide association studies (GWAS) provide a powerful tool to assess genetic associations between common marker alleles and complex traits in large numbers of individuals. Typically these studies have focused on testing the markers in the 22 autosomal chromosomes while the X-chromosome has been omitted from the analyses. The chromosome X, however, constitutes approximately 5% of genomic DNA encoding for more than 1000 genes, and thus also likely contains genetic variation contributing to common traits and disorders.
   We set to test associations between 560,000 genotyped and imputed SNP markers and eight anthropometric (BMI, stature, WHR) and biochemical (CRP, HDL, LDL, TC, TG) traits in 14,710 individuals (7468 males, 7242 females) from five Finnish cohorts.
   A region in chromosome Xq21.1 was associated with adult stature (meta-analysis p-value = 3.32×10^-10). The lead SNP in the locus explained up to 0.55% of the variance in height in 31-year-old women corresponding to 1.09 cm difference between minor and major allele homozygotes. The associated lead variant (MAF = 0.31) is located upstream of ITM2A, a gene encoding for a membrane protein that plays a role in osteo- and chondrogenic differentiation. As this is among the first studies using the X chromosome reference haplotypes from the 1000 Genomes project, we are currently validating the imputation with genotyping methods.
   The findings pinpoint the value of including chromosome X in the GWAS of complex traits to identify further relevant gene regions that also account for some of the missing heritability. The study illustrates that the 1000 Genomes reference haplotypes allow for high-resolution investigations of the genetic variants in chromosome X even with a relative modest sample sizes compared to the current-day GWAS meta-analyses. Our finding demonstrates that the same analysis strategy is also likely to be useful in the meta-analyses of the large consortia with complex traits.

Dissection of polygenic variation for human height into individual variants, specific loci and biological pathways from a GWAS meta-analysis of 250,000 individuals. T. Esko¹, A. R. Wood², S. Vedantam^3,4,5, J. Yang⁶, S. Gustaffsson⁷, S. I. Berndt⁸, J. Karjalainen⁹, H. M. Kang¹⁰, A. E. Locke¹¹, A. Scherag¹², D. C. Croteau-Chonka¹³, F. Day¹⁴, R. Magi¹, T. Ferreira¹⁵, J. Randall¹⁵, T. W. Winkler¹⁶, T. Fall⁷, Z. Kutalik¹⁷, T. Workalemahu¹⁸, G. Abecasis¹⁰, M. E. Goddard⁶, L. Franke⁹, R. J. F. Loos^14,19, M. N. Weedon², E. Ingelsson⁷, P. M. Visscher⁶, J. N. Hirschhorn^3,4,5, T. M. Frayling², GIANT Consortium 1) Estonian Genome Center, University of Tartu, Tartu, Tartumaa, Estonia; 2) Genetics of Complex Traits, Peninsula College of Medicine and Dentistry, University of Exeter, Exeter, UK; 3) Divisions of Genetics and Endocrinology and Program in Genomics, Children's Hospital, Boston, Massachusetts 02115, USA; 4) Metabolism Initiative and Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts 02142, USA; 5) Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA; 6) University of Queensland Diamantina Institute, University of Queensland, Princess Alexandra Hospital, Brisbane, Queensland, Australia; 7) Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, 171 77 Stockholm, Sweden; 8) Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Bethesda, Maryland 20892, USA; 9) Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands; 10) Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan 48109, USA; 11) Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA; 12) Institute for Medical Informatics, Biometry and Epidemiology, University of Duisburg-Essen, Germany; 13) Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27599, USA; 14) MRC Epidemiology Unit, Institute of Metabolic Science, Cambridge, UK; 15) Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK; 16) Public Health and Gender Studies, Institute of Epidemiology and Preventive Medicine, Regensburg University Medical Center, Regensburg, Germany; 17) Department of Medical Genetics, University of Lausanne, 1005 Lausanne, Switzerland; 18) Department of Nutrition, Harvard School of Public Health, Boston, Massachusetts 02115, USA; 19) Mount Sinai School of Medicine, New York, NY, USA.

   Adult human height is a highly heritable polygenic trait. Previous genome-wide analyses have identified 180 independent loci explaining an estimated 1/8th of the heritable component (80%). Our aims were a) to increase the understanding of the role of common genetic variation in a model quantitative trait, and b) to help understand the biology of normal growth and development. Within the GIANT consortium, we performed a GWAS of ~250,000 individuals of European ancestry. We tested for the presence of multiple signals at individual loci using an approximate conditional and joint multiple SNP regression analysis. We identified 698 independent variants associated with height at p<5x10-8, which fell in 424 loci (+/-500kb from lead SNP) and altogether explained 1/4 of the inherited component in adult height. Half of the loci contained multiple signals of association. By applying a novel pathway analysis approach that uses co-expression data from 80,000 samples to predict the biological function of poorly annotated genes, we observed enrichment for novel and biologically relevant pathways in these loci. For example, for more than 10 % of the loci a gene was found in their vicinity with a predicted "regulation of ossification" function (GO:0030278, WMW P < 10-34), including newly identified genes such as PRRX1and SNAI1. Other genes and pathways newly highlighted by pathway analysis include WNT (WNT2B, WNT4, WNT7A) and FGF (FGF2, FGF18) signaling and osteoglycin. We also noted an excess of signals across the entire genome, with the median test statistic twice that expected under null (lambda = 2.0). This result is consistent with either a very deep polygenic component to height that covers most of the genome or population stratification contributing partly to the results, or a combination of the two. Encouragingly, initial results from family based analyses and mixed models that correct for distant relatedness across samples indicate that a large proportion of the discovered signals are genuine height-associated variants rather than confounded by stratification. In conclusion, data from 250,000 individuals show that adult height is highly polygenic with, typically, multiple signals of association per locus now accounting for ¼ of heritability. Furthermore, these results suggest that increasing GWAS sample sizes can continue to uncover substantial new insights into the aetiological pathways involved in common human phenotypes.

Over 250 novel associations with human morphological traits. N. Eriksson, C. B. Do, J. Y. Tung, A. K. Kiefer, D. A. Hinds, J. L. Mountain, U. Francke 23andMe, Mountain View, CA.

   External morphological features are by definition visible and are typically easy to measure. They also generally happen to be highly heritable. As such, they have played a fundamental role in the development of the field of genetics. As morphological traits have frequently been the target of natural selection, their genetics may also provide clues into our evolutionary history. Many rare diseases include dysmorphologic features among their symptoms. However, aside from height and BMI, currently little is known about the genetics of common variation in human morphology. Here we present a series of genome-wide association studies across 18 self-reported morphological traits in a total of over 55,000 people of European ancestry from the customer base of 23andMe. The phenotypes studied include hair traits (baldness, unibrow, hair curl, upper and lower back hair, widow’s peak), as well as many soft tissue and skeletal traits (chin dimple, nose shape, dimples, earlobe attachment, nose-wiggling ability, the presence of a gap between the top incisors, joint hypermobility, finger and toe relative lengths, arch height, foot direction, height-normalized shoe size). Across the 18 phenotypes, we find a total of 281 genome-wide significant associations (including 53 for unibrow and 29 each for hair curl and chin dimple). Almost all of these associations are novel; we believe this is the largest set of novel associations ever described in a single report. Many of these SNPs show pleiotropic effects, e.g., a SNP near GDF5 is associated with hypermobility, arch height, relative toe length, shoe size, and foot direction; another near AUTS is associated with both back hair and baldness. Nearby genes are significantly enriched to be transcription factors (p<1e-14) and to be involved in rare disorders that cause cleft palate, ear, limb, or skull abnormalities (p<1e-7). A SNP near ZEB2 is associated with both widow’s peak and chin dimple; mutations in ZEB2 cause Mowat-Wilson syndrome, which includes distinctive facial features such as a pronounced chin. Morphology-associated SNPs are also enriched within regions that have been identified as undergoing selection since the divergence from Neanderthals (18 associations in 11 regions, p = 4e-5). The abundance of these SNPs, which include the ZEB2 and GDF5 associations above, suggest that physical traits may have played a significant role in driving the natural selection processes that gave rise to modern humans.

Genome-wide association study of Tanner puberty staging in males and females. D. Cousminer¹, N. Timpson², D. Berry³, W. Ang⁴, I. Ntalla⁵, M. Groen-Blokhuis⁶, M. Guxens⁷, M. Kähönen⁸, J. Viikari⁹, T. Lehtimäki¹⁰, K. Panoutsopoulou¹¹, D. Boomsma⁶, E. Zeggini¹¹, G. Dedoussis⁵, C. Pennell⁴, O. Raitakari¹², E. Hyppönen³, G. Davey Smith², M. McCarthy¹³, E. Widén¹, The Early Growth Genetics (EGG) Consortium 1) Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Finland; 2) The Medical Research Council (MRC) Centre for Causal Analyses in Translational Epidemiology, School of Social and Community Medicine, University of Bristol, Bristol, UK; 3) Centre for Paediatric Epidemiology and Biostatistics, MRC Centre for Epidemiology of Child Health, UCL Institute of Child Health, London, UK; 4) University of Western Australia, Perth, Western Australia, Australia; 5) Harokopio University of Athens, Department of Dietetics and Nutrition, Athens; 6) Netherlands Twin Register, Department of Biological Psychology, VU University, Amsterdam, The Netherlands; 7) Center for Research in Environmental Epidemiology (CREAL), Barcelona, Catalonia, Spain; 8) Department of Clinical Physiology, University of Tampere and Tampere University Hospital, Finland; 9) Department of Medicine, University of Turku, Finland; 10) Department of Clinical Chemistry, Fimlab Laboratories, University Hospital and University of Tampere, Finland; 11) Wellcome Trust Sanger Institute, Hinxton, UK; 12) Department of Clinical Physiology and Nuclear Medicine, University of Turku, Finland; 13) Wellcome Trust Centre for Human Genetics, Roosevelt Drive, University of Oxford, Oxford, UK.

   Puberty is a complex trait with large variation in timing and tempo in the population, and extremes in pubertal timing are a common cause for referral to pediatric specialists. Recently, large genome-wide association studies (GWAS) have revealed 42 common variant loci associated with age at menarche (AAM), and some implicated genes are known from severe single-gene disorders. However, little remains known of the genetic architecture underlying normal variation in the onset of puberty, especially in males.
   Tanner staging, a 5-stage scale assessing female breast and male genital development, is a commonly used measure of pubertal development. While AAM is a late event during puberty, Tanner staging during mid-puberty may correlate more closely with the central activation of puberty. With Tanner scale data at the comparable ages of 11-12 yrs in girls and 13-14 yrs in boys, we performed GWAS meta-analyses across 10 cohorts with up to 9,900 samples. The combined male and female analysis showed evidence for association near LIN28B (P=1.95x10^-8), previously implicated in AAM and height growth in both sexes. Our data confirms that this locus is also important for male pubertal development and may be part of the pubertal initiation program upstream of sex-specific mechanisms. A novel signal (P= 4.95 x 10^-8) with a consistent direction of effect across contributing datasets locates on chromosome 1 at an intronic transcription factor binding-site cluster within the gene CAMTA1. Furthermore, the primary analyses revealed suggestive evidence for male-specific loci, e.g. nearby MKL2 (P=4.68 x 10^-7), which may be confirmed by follow-up genotyping. MAGENTA gene-set enrichment analysis of the combined-gender GWAS results showed enrichment of genes involved in expected pathways given the known biology underlying activation of puberty via the HPG axis. Novel genes near suggestively associated loci may also pinpoint novel regulatory mechanisms; CAMTA1 is a calmodulin-binding transcriptional activator, while MKL2 is also a transcriptional activator involved in cell differentiation and development. These results suggest the presence of multiple real signals beneath the genome-wide significant threshold, and further exploration of enriched pathways may reveal new insights into the biology of pubertal development.

Heritability estimation of height from common genetic variants in a large sample of African Americans. F. Chen¹, G. K. Chen¹, R. C. Millikan², E. M. John^3,4, C. B. Ambrosone⁵, L. Berstein⁶, W. Zheng⁷, J. J. Hu⁸, R. G. Ziegler⁹, S. L. Deming⁷, E. V. Bandera¹⁰, W. J. Blot^{7, 11}, S. S. Strom¹², S. I. Berndt⁹, R. A. Kittles¹³, B. A. Rybicki¹⁴, W. Issacs¹⁵, S. A. Ingles¹, J. L. Stanford¹⁶, W. R. Diver¹⁷, J. S. Witte¹⁸, L. B. Signorello^7,11, S. J. Chanock⁹, L. Le Marchand¹⁹, L. N. Kolonel¹⁹, B. E. Henderson¹, C. A. Haiman¹, D. O. Stram¹ 1) Preventive Medicine, University of Southern California, Los Angeles, CA; 2) Epidemiology, Gillings School of Global Public Health, and Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC; 3) Northern California Cancer Center, Fremont, CA; 4) School of Medicine, Stanford University, and Stanford Cancer Center, Stanford, CA; 5) Cancer Prevention and Control, Roswell Park Cancer Institute, Buffalo, NY; 6) Cancer Etiology, Population Science, Beckman Research Institute, City of Hope, CA; 7) Epidemiology, Vanderbilt Epidemiology Center and Vanderbilt-Ingram Cancer Center, Vanderbilt University, Nashville, TN; 8) Sylvester Comprehensive Cancer Center, Department of Epidemiology and Public Health, University of Miami, Miami, FL; 9) Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bathesda, MD; 10) The Cancer Institute of New Jersey, New Brunswick, NJ; 11) International Epidemiology Institute, Rockville, MD; 12) Epidemiology, The University of Texas M.D. Anderson Cancer Center, Huston, TX; 13) Medicine, University of Illinois at Chicago, Chicago, IL; 14) Biostatistics and Research Epidemiology, Henry Ford Hospital, Detroit, MI; 15) James Buchanan Brady Urological Institute, Johns Hopkins Hospital and Medical Institutions, Baltimore, MD; 16) Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA; 17) Epidemiology Research, American Cancer Society, Atlanta, GA; 18) Institute of Human Genetics, Dept of Epidemiology and Biostatistics, University of California, San Francisco, CA; 19) Epidemiology, Cancer Research Center, University of Hawaii, Honolulu, HI.

   Height has an extremely polygenic pattern of inheritance. Genome-wide association studies (GWAS) have revealed hundreds of common variants that are associated with human height at genome-wide levels of significance. Each of these common variants has a very modest effect, and only a small fraction of phenotypic variation can be explained by the aggregate of these common variants. In this large study of African-American men and women, we genotyped and analyzed 975,519 autosomal SNPs across the entire genome using a variance components approach, and found that 46.4% of phenotypic variation can be explained by these SNPs in a sample of 9,779 evidently unrelated individuals. We noted that in two samples of close relatives defined by probability of identical-by-descent (IBD) alleles sharing (Pr (IBD=1)>=0.3 and Pr (IBD=1)>=0.4), the proportion of phenotypic variation explained by the same set of SNPs increased to 75.5% (se: 14.8%) and 70.3% (26.9%), respectively. We conclude that the additive component of genetic variation for height may have been overestimated in earlier studies (~80%) and argue that this proportion also includes variation from epistatic effects. Using simulation, we showed that by using common SNPs that are only weakly correlated with causal SNPs, the model could explain a large proportion of heritability. We therefore argue that the heritability estimate from the variance components approach is not necessarily the variation explained by a given set of SNPs, but also possibly reflects distant relatedness between nominally unrelated participants. Finally, we explored the performance of the variance components approach and concluded that the approach fails when a large number of independent variables are included in the model as the structure of the two components becomes similar. Thus some degree of population stratification seems to be required in order for the method to perform well for very large numbers of SNPs; however when modest stratification is present there is a risk of miss-attribution of effects of unmeasured (and untagged) variants to measured variants.

A multi-SNP locus-association method reveals a substantial fraction of the missing heritability. Z. Kutalik^1,2, G. Ehret^3,4, D. Lamparter^1,2, C. Hoggart⁵, J. Whittaker⁶, J. Beckmann^1,7, GIANT consortium 1) Med Gen, Univ Lausanne, Lausanne, Switzerland; 2) Swiss Institute of Bioinformatics, Switzerland; 3) Division of Cardiology, Geneva University Hospital, Geneva, Switzerland; 4) McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland, United States of America; 5) Department of Pediatrics, Imperial College London, London, United Kingdom; 6) Quantitative Sciences, GlaxoSmithKline, Stevenage, UK; 7) Service of Medical Genetics, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzer- land.

   There are many known examples of multiple (semi-)independent associations at individual loci, which may arise either because of true allelic heterogeneity or imperfect tagging of an unobserved causal variant. This phenomenon is of great importance in monogenic traits but has not yet been systematically investigated and quantified in complex trait GWAS. We describe a multi-SNP association method that estimates the effect of loci harbouring multiple association signals using GWAS summary statistics. Applying the method to a large anthropometric GWAS meta-analysis (GIANT), we show that for height, BMI, and waist-hip-ratio (WHR) 10%, 9%, and 8% of additional phenotypic variance can be explained respectively on top of the previously reported 10%, 1.5%, 1%. The method also permitted to substantially increase the number of loci that replicate in a discovery-validation design. Specifically, we identified in total 263 loci at which the multi-SNP explains significantly more variance than the best individual SNP at the locus. A detailed analysis of multi-SNPs shows that most of the additional variability explained is derived from SNPs not in LD with the lead SNP suggesting a major contribution of allelic heterogeneity to the missing heritability.

Hundreds of loci contribute to body fat distribution and central adiposity. A. E. Locke¹, D. Shungin^2,3,4, T. Ferreira⁵, T. W. Winkler⁶, D. C. Croteau-Chonka⁷, R. Magi^5,8, T. Workalemahu⁹, K. Fischer⁸, J. Wu¹⁰, R. J. Strawbridge¹¹, A. Justice¹², F. Day¹³, N. Heard-Costa^14,15, C. S. Fox¹⁴, M. C. Zillikens¹⁶, E. K. Speliotes^17,18, H. Völzke¹⁹, L. Qi⁹, I. Barroso^20,21, I. M. Heid⁶, K. E. North¹², P. W. Franks^2,4,9, M. I. McCarthy²², J. N. Hirschhorn²³, L. A. Cupples^10,14, E. Ingelsson²⁴, A. P. Morris⁵, R. J. F. Loos^13,25, C. M. Lindgren⁵, K. L. Mohlke⁷, Genetic Investigation of ANthropometric Traits (GIANT) Consortium 1) Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI; 2) Genetic and Molecular Epidemiology Group, Department of Public Health and Clinical Medicine, Umeå University, Umeå, Sweden; 3) Department of Odontology, Umeå University, Umeå, Sweden; 4) Department of Clinical Sciences, Skåne University Hospital, Lund University, Malmö, Sweden; 5) Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK; 6) Regensburg University Medical Center, Department of Epidemiology and Preventive Medicine, Regensburg, Germany; 7) Department of Genetics, University of North Carolina, Chapel Hill, NC; 8) Estonian Genome Center, University of Tartu, Estonia; 9) Department of Nutrition, Harvard School of Public Health, Boston, MA; 10) Department of Biostatistics, School of Public Health, Boston University, Boston, MA; 11) Cardiovasvular Genetics and Genomics Group, Karolinska Institutet, Stockholm Sweden; 12) Department of Epidemiology and Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, NC; 13) MRC Epidemiology Unit, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge, UK; 14) National Heart, Lung, and Blood Institute, Framingham, MA; 15) Department of Neurology, Boston University School of Medicine, Boston, MA; 16) Department of Internal Medicine, Erasmus MC Rotterdam, the Netherlands; 17) Department of Internal Medicine, Division of Gastroenterology, and Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI; 18) Broad Institute, Cambridge, MA; 19) Institute for Community Medicine, Ernst-Moritz-Arndt-University Greifswald, Greifswald, Germany; 20) Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK; 21) University of Cambridge Metabolic Research Labs, Institute of Metabolic Sciences,; 22) University of Oxford, Oxford, UK; 23) Department of Genetics, Harvard Medical School, Boston, MA; 24) Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden; 25) Charles R. Bronfman Institute of Personalized Medicine, Child Health and Development Institute, Department of Preventive Medicine, Mount Sinai School of Medicine, New York, NY.

   Central adiposity and body fat distribution are risk factors for type 2 diabetes and cardiovascular disease and can be measured using waist circumference (WC), hip circumference (HIP), and waist-to-hip ratio (WHR). Adjusting for body mass index (BMI) differentiates effects from those for overall obesity. We performed fixed effects inverse variance meta-analysis for these traits with 72,919 individuals from 30 studies in a prior genome-wide association study (GWAS) meta-analysis, 71,139 individuals from 24 additional GWAS, and 67,163 individuals from 28 studies genotyped on Metabochip by the GIANT consortium. We identified 48 independent genome-wide significant (p<5x10^-8) associations for WHR adjusted for BMI, including all 14 previously published signals. Twelve signals are located near genes for transcription factors, including developmental homeobox-containing proteins. Among them, two are in the HOXC gene cluster near HOXC8 and miR-196a2. HOXC8 is expressed in white adipose tissue and is a regulator of brown adipogenesis, while miR-196a inhibits Hoxc8 expression. Signals are located near PPARG, encoding a transcription factor known to regulate adipocyte differentiation, and near HMGA1 and CEPBA, encoding transcription factors that act downstream of insulin receptor and leptin signaling, respectively. Further novel signals are located near genes involved in angiogenesis (PLXND1, VEGFB, and MEIS1). Among the other five traits, we estimate that a significant proportion of the genetic effects for WC and HIP adjusted for BMI are correlated with height (0.59, p<5x10^-79 and 0.83, p<2x10^-40, respectively). Despite this strong correlation, an appreciable proportion of the genetic contributions to these traits will be independent of height. Association meta-analysis for the five additional traits identified an additional 148 independent signals (p<5x10^-8), 32 of which have not been reported previously for an anthropometric trait. These novel signals suggest regulation of adipose gene expression (KLF14) and transcriptional control of cell patterning and differentiation in early development (HLX, SOX11, ZNF423, and HMGXB4) affect fat distribution. Meta-analyses for WHR, WC, and HIP, with and without adjustment for BMI, identified a total of 196 independent loci, 66 novel, affecting fat deposition and body shape, and implicating genes involved in development, adipose gene expression and tissue differentiation, response to metabolic signaling, and angiogenesis.

Prediction of human height with large panels of SNPs - insights into genetic architecture. Y. C. Klimentidis¹, A. I. Vazquez¹, G. de los Campos² 1) Energetics, University of Alabama at Birmingham, Birmingham, AL; 2) Biostatistics, University of Alabama at Birmingham, Birmingham, AL.

   Prediction of complex traits from genetic information is an area of major clinical and scientific interest. Height is a model trait since it is highly heritable and easily measured. Substantial strides in understanding the genetic basis of height have recently been made through genome-wide association studies (GWAS), and whole-genome prediction (WGP) which fits thousands of SNPs jointly. Here, we attempt to gain insight into the genetic architecture of human height by examining how WGP accuracy is affected by the choice of single-nucleotide polymorphism (SNPs). Specifically, we compare the prediction accuracy of models using: 1) SNPs selected based on the ‘top hits’ of the GIANT consortium meta-analysis for height at different p-value thresholds, and 2) SNPs in genomic regions that surround the most significant ‘top hits’. We use the Framingham Heart Study and GENEVA datasets, imputed up to 10 million SNPs with 1000 Genomes reference data. The predictive accuracy of each model was evaluated in cross-validation. We find that prediction accuracy increases up to a certain point with the inclusion of more ‘top hits’ from the GIANT study, that including SNPs from the regions surrounding ‘top hits’ contributes minimally to prediction accuracy, and that prediction accuracy increases with the size of the training dataset. Finally, we find that prediction accuracy is greatest for individuals at the phenotypic extremes of height. Our results suggest that improvement of genomic prediction models will require the use of information from a large number of selected SNPs, and that these models may be most useful at the phenotypic extremes.

Evidence of Inbreeding Depression on Human Height. J. F. Wilson¹, N. Eklund^2,3, N. Pirastu⁴, M. Kuningas⁵, B. P. McEvoy⁶, T. Esko⁷, T. Corre⁸, G. Davies⁹, P. d'Adamo⁴, N. D. Hastie¹⁰, U. Gyllensten¹¹, A. F. Wright¹⁰, C. M. van Duijn⁵, M. Dunlop¹⁰, I. Rudan¹, P. Gasparini⁴, P. P. Pramstaller¹², I. J. Deary⁹, D. Toniolo⁸, J. G. Eriksson³, A. Jula³, O. T. Raitakari¹³, A. Metspalu⁷, M. Perola^2,3,7, M. R. Jarvelin^14,15, A. Uitterlinden⁵, P. M. Visscher⁶, H. Campbell¹, R. McQuillan¹, ROHgen 1) Centre for Population Health Sciences, Univ Edinburgh, Edinburgh, United Kingdom; 2) Institute for Molecular Medicine Finland (FIMM), Helsinki, Finland; 3) Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Finland; 4) Institute for Maternal and Child Health, IRCCS “Burlo Garofolo”, Trieste, University of Trieste, Italy; 5) Department of Epidemiology, Erasmus University Medical Center, Rotterdam, The Netherlands; 6) Queensland Institute of Medical Research, 300 Herston Road, Brisbane, Queensland 4006, Australia; 7) Estonian Genome Center, University of Tartu, Tartu, Estonia; 8) Division of Genetics and Cell Biology, San Raffaele Research Institute, Milano, Italy; 9) Department of Psychology, The University of Edinburgh, 7 George Square, Edinburgh EH8 9JZ, UK; 10) MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, Scotland; 11) Department of Immunology, Genetics and Pathology, SciLifeLab Uppsala, Rudbeck Laboratory, Uppsala University, Uppsala, Sweden; 12) Centre for Biomedicine, European Academy Bozen/Bolzano (EURAC), Bolzano, Italy - Affiliated Institute of the University of Lübeck, Lübeck, Germany; 13) Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland; 14) Biocenter Oulu, University of Oulu, Finland; 15) Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, MRC Health Protection Agency (HPA) Centre for Environment and Health, Imperial College London, London, UK.

   Stature is a classical and highly heritable complex trait, with 80-90% of variation explained by genetic factors. In recent years, genome-wide association studies (GWAS) have successfully identified many common additive variants influencing human height; however, little attention has been given to the potential role of recessive genetic effects. Here, we investigated genome-wide recessive effects by an analysis of inbreeding depression on adult height in over 35,000 people from 21 different population samples. We found a highly significant inverse association between height and genome-wide homozygosity, equivalent to a height reduction of up to 3 cm in the offspring of first cousins compared with the offspring of unrelated individuals, an effect which remained after controlling for the effects of socio-economic status, an important confounder. There was, however, a high degree of heterogeneity among populations: whereas the direction of the effect was consistent across most population samples, the effect size differed significantly among populations. It is likely that this reflects true biological heterogeneity: whether or not an effect can be observed will depend on both the variance in homozygosity in the population and the chance inheritance of individual recessive genotypes. These results predict that multiple, rare, recessive variants influence human height. Although this exploratory work focuses on height alone, the methodology developed is generally applicable to heritable quantitative traits (QT), paving the way for an investigation into inbreeding effects, and therefore genetic architecture, on a range of QT of biomedical importance.

Empirical and theoretical studies on genetic variance of rare variants for complex traits using whole genome sequencing in the CHARGE Consortium. C. Zhu¹, A. Morrison², J. Reid³, C. J. O’Donnell⁴, B. Psaty⁵, L. A. Cupples^4,6, R. Gibbs³, E. Boerwinkle^2,3, X. Liu² 1) Department of Agronomy, Kansas State University , Manhattan, KS; 2) Human Genetics Center, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX; 3) Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX; 4) NHLBI Framingham Heart Study, Framingham, MA; 5) Cardiovascular Health Research Unit, University of Washington, Seattle, WA; 6) Department of Biostatistics, Boston University School of Public Health, Boston, MA.

   As the frontier of human genetic studies have shifted from genome-wide association studies (GWAS) towards whole exome and whole genome sequencing studies, we have witnessed an explosion of new DNA variants, especially rare variants. An important but not yet answered question is the contribution of rare variants to the heritabilities of complex traits, which determine, in part, the gain in power from rare variants to discover new disease-associated genes. Here we present theoretical and empirical results on this question.
    Our theoretical study was based upon the distribution of allele frequencies incorporating mutation, random genetic drift, and the possibility of purifying selection against susceptibility mutations. It shows that in most cases rare variants only contribute a small proportion to the overall genetic variance of a trait, but under certain conditions they may explain as much as 50% of additive genetic variance when both susceptible alleles are under purifying selection and the rate of mutations compensating the susceptible alleles (i.e. repair rate) is high.
    In our empirical study, we estimated the proportion of additive genetic variances (σ_g²) of rare variants contributed to the total phenotypic variances of six complex traits (BMI, height, LDL-C, HDL-C, triglyceride and total cholesterol) using whole genome sequences (8x coverage) of 962 European Americans from the Charge-S study. The results show that the estimated σ_g² of rare variants (MAF≤1%) ranged from 2% to 8% across the six traits. However, the standard errors (s.e.) of the estimated variance components from rare variants are relatively large compared to those of common variants. Using HDL-C as an example, the estimated σ_g²s are 0.08 (s.e. 0.10), 0.05 (s.e. 0.05) and 0.58 (s.e. 0.05) for rare, low-frequency (1%<MAF≤5%) and common (MAF>5%) variants, respectively.

Leveraging admixture analysis to resolve missing and cross-population heritability in GWAS. N. Zaitlen¹, A. Gusev¹, B. Pasaniuc¹, G. Bhatia², S. Pollack¹, A. Tandon³, E. Stahl³, R. Do⁴, B. Vilhjalmsson¹, E. Akylbekova⁵, A. Cupples⁶, M. Fornage⁷, L. Kao⁸, L. Lange⁹, S. Musani⁵, G. Papanicolaou¹⁰, J. Rotter¹¹, I. Ruczinksi¹², D. Siscovick¹³, X. Zhu¹⁴, S. McCarroll³, G. Lettre¹⁵, J. Hirschhorn¹⁶, N. Patterson⁴, D. Reich³, J. Wilson⁵, S. Kathiresan⁴, A. Price¹, CAC. CARe Analysis Core⁵ 1) Genetic Epidemiology, Harvard School of Public Health, Boston, MA; 2) Harvard-MIT Division of Health, Science and Technology; 3) Department of Genetics, Harvard Medical School, Boston, MA, USA; 4) Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, USA; 5) Jackson Heart Study, Jackson State University, Jackson, MS, USA; 6) Boston University, Boston, MA, USA; 7) Institute of Molecular Medicine and Division of Epidemiology School of Public Health, University of Texas Health Sciences Center at Houston, Houston, TX, 77030, USA; 8) Department of Epidemiology, Johns Hopkins University, Baltimore, Maryland, United States of America; 9) University of North Carolina, Chapel Hill, NC, USA; 10) National Heart, Lung, and Blood Institute (NHLBI), Division of Cardiovascular Sciences, NIH, Bethesda, MD 20892, USA; 11) Cedars-Sinai Medical Center, Medical Genetics Institute, Los Angeles, CA, USA; 12) Johns Hopkins University, Baltimore, Maryland, United States of America; 13) University of Washington, Seattle, WA, USA; 14) Department of Epidemiology and Biostatistics, School of Medicine, Case Western Reserve University, Cleveland, USA; 15) Département de Médecine, Université de Montréal, C.P. 6128, succursale CentrePville, Montréal, Québec, Canada; 16) Divisions of Genetics and Endocrinology and Program in Genomics, Children’s Hospital Boston, Boston, MA, USA2.

   Resolving missing heritability, the difference between phenotypic variance explained by associated SNPs and estimates of narrow-sense heritability (h2), will inform strategies for disease mapping and prediction of complex traits. Possible explanations for missing heritability include rare variants not captured by genotyping arrays, or biased estimates of h2 due to epistatic interactions [Zuk et al. 2012]. Here, we develop a novel approach to estimating h2 based on sharing of local ancestry segments between pairs of unrelated individuals in an admixed population. Unlike recent approaches for estimating the heritability explained by genotyped markers (h2g) [Yang et al. 2010], our approach captures the total h2, because local ancestry estimated from genotyping array data captures the effects of all variants—not just those on the array. Our approach uses only unrelated individuals, and is thus not susceptible to biases caused by epistatic interactions or shared environment that can confound genealogy-based estimates of h2. Theory and simulations show that the variance explained by local ancestry (h2γ) is related to h2, Fst, and genome-wide ancestry proportion (θ): h2γ = h2*2*Fst*θ*(1-θ). Thus, we can estimate h2γ and then infer h2 from h2γ. We apply our method to 5,040 African Americans from the CARe cohort and estimate the autosomal h2 for HDL cholesterol (0.39±0.11), LDL cholesterol (0.18±0.09), and height (0.55±0.13). As expected these h2 estimates were higher than estimates of h2g from the same data using standard approaches: 0.22±0.07, 0.16±0.07 and 0.31±0.07, consistent with previous estimates. The difference between h2 and h2g suggests that rare variants contribute substantial missing heritability that can be quantified using local ancestry information. Larger sample sizes will sizes will enable h2 estimates with even lower standard errors, so that the possible contribution of epistasis to previous estimates of h2 can be precisely quantified. We additionally use local ancestry to estimate the fraction of phenotypic variance shared between European and African genomes that is explained by genotyped markers, by estimating h2g in European segments, h2g in African segments, and h2g shared between European and African segments. Given that most GWAS to date have been carried out in individuals of European descent, these estimates shed light on the importance of collecting data from non-European populations for mapping disease in those populations.

Genome-wide association meta-analyses in over 210,000 individuals identify 20 sexually dimorphic genetic variants for body fat distribution. T. W. Winkler¹, D. C. Croteau-Chonka², T. Ferreira³, K. Fischer⁴, A. E. Locke⁵, R. Mägi^3,4, D. Shungin^6,7,8, T. Workalemahu⁹, J. Wu¹⁰, F. Day¹¹, A. U. Jackson⁵, A. Justice¹², R. Strawbridge¹³, H. Völzke¹⁴, L. Qi⁹, M. C. Zillikens¹⁵, C. S. Fox¹⁶, E. K. Speliotes^17,18, I. Barroso^19,20, E. Ingelsson²¹, J. N. Hirschhorn²², M. I. McCarthy²³, P. W. Franks^6,8,9, A. P. Morris³, L. A. Cupples^10,24, K. E. North¹², K. L. Mohlke², R. J. F. Loos^11,25, I. M. Heid¹, C. M. Lindgren³, GIANT Consortium 1) Public Health and Gender Studies, Institute of Epidemiology and Preventive Medicine, Regensburg University Medical Center, Regensburg, Germany; 2) Department of Genetics, University of North Carolina, Chapel Hill, NC; 3) Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK; 4) Estonian Genome Center, University of Tartu, Tartu, Estonia; 5) Department of Biostatistics, University of Michigan, Ann Arbor, MI; 6) Department of Clinical Sciences, Skåne University Hospital, Lund University, Malmö, Sweden; 7) Department of Odontology, Umeå University, Umeå, Sweden; 8) Genetic and Molecular Epidemiology Group, Department of Public Health and Clinical Medicine, Section for Medicine, Umeå University, Umeå, Sweden; 9) Department of Nutrition, Harvard School of Public Health, Boston, MA; 10) Department of Biostatistics, School of Public Health, Boston University, Boston, MA; 11) MRC Epidemiology Unit, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge, UK; 12) Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC; 13) Cardiovascular Genetics and Genomics Group, Karolinska Institute, Stockholm, Sweden; 14) Institute for Community Medicine, Ernst-Moritz-Arndt-University Greifswald, Greifswald, Germany; 15) Department of Internal Medicine, Erasmus MC Rotterdam, the Netherlands; 16) National Heart, Lung, and Blood Institute, Framingham, MA; 17) Broad Institute, Cambridge, MA; 18) Department of Internal Medicine, Division of Gastroenterology, and Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI; 19) University of Cambridge Metabolic Research Labs, Institute of Metabolic Science Addenbrooke's Hospital, Cambridge, UK; 20) Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK; 21) Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden; 22) Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA; 23) University of Oxford, Oxford, UK; 24) Framingham Heart Study, Framingham, MA; 25) Charles R. Bronfman Institute of Personalized Medicine, Child Health and Development Institute, Department of Preventive medicine, Mount Sinai School of Medicine, New York, NY 10029, USA.

   It is well-known that body fat distribution differs between men and women, a circumstance that may be due to innate, genetic differences between sexes. Previously, we performed a large-scale meta-analysis of GWAS of waist-to-hip ratio adjusted for BMI (WHR), a measure of body fat distribution independent of overall adiposity and found that of the 14 loci established in men and women combined, seven showed a significant sex-difference. In a subsequent genome-wide analysis that was specifically tailored to detect sex-differential genetic effects for WHR, we identified two additional loci with significant sex-difference. Despite these findings, the genetic basis affecting the sexual dimorphism of WHR as well as the genetic architecture of WHR in general are still poorly understood. We therefore conducted sex-combined and sex-stratified meta-analyses comprising >210,000 individuals (>116,000 women; >94,000 men) of European ancestry from 57 GWAS studies and 28 studies genotyped on the MetaboChip within the GIANT consortium. The sex-combined analysis yielded 39 loci with genome-wide significant association (P<5x10-8), of which 11 loci showed significant sex-difference (Bonferroni-corrected P<0.05/39). Six of these loci influence WHR in women only without any effect in men (near COBLL1, LYPLAL1, PPARG, PLXND1, MACROD1, FAM13A); four loci have an effect in women and a less pronounced effect in men (near VEGFA, ADAMTS9, HOXC13, RSPO3); and one locus has only an effect in men (near GDF5). The sex-stratified analyses identified nine additional female-specific loci that had been missed in the sex-combined analysis due to the lack of effect in men (near MAP3K1, BCL2, TNFAIP8, CMIP, NKX3-1, NMU, SFXN2, HMGA1, KCNJ2). No additional loci were identified in the male-specific analysis. We confirmed all previously established sexually dimorphic variants for WHR. Of particular interest is the PPARG region that is a well-known target in type 2 diabetes treatments and shows a female-specific association with WHR. The enrichment of female-specific associations, i.e. 19 of the 20 sexually dimorphic loci, is consistent with the heritability of WHR as estimated in the Framingham Heart study; we found that WHR is more heritable in women (h2~46%) compared to men (h2~19%). Our results highlight the importance of sex-stratified analyses and can help to better understand the genetics underpinning the sex-differences of body fat distribution.