Those who read Sailer learned of this study a couple weeks ago. I've finally gotten around to looking at the actual paper, which seems convincing enough to me in doing what it says it does -- demonstrating "human intelligence is highly heritable and polygenic".
TGGP draws attention to comments by a blogger (Kevin Mitchell) who claims the paper "failed to establish the polygenic nature of the trait", but I don't see that Mitchell has a case. Mitchell:
I would interpret these findings very differently. What the authors do is analyse GWAS data in a very unusual way – they are not interested in finding specific SNPs affecting the trait, they simply use the SNPs to measure genetic relatedness between individuals.
As Mitchell then acknowledges, the paper does include a standard GWAS, the results of which are negative: at the level of individual SNPs not a single "replicable genome-wide significant association" is found. This is not surprising given the relatively small sample size and the (for me) expected polygenic nature of intelligence, but it (along with previous negative findings) tends to rule out any significant role for common variants of large effect in determining IQ.
What Mitchell is claiming here is that the results could be explained by cryptic relatedness and/or population structure. However, the researchers address both issues, by excluding samples that appear to be related to other samples nearer than the level of 4th cousins and by including as covariates in their models the first few components of an MDS analysis. For non-close relatives in unstructured populations, how similar two individuals are on chromosome 1 tells us nothing about how similar they are on any other chromosome. Visscher was more explicit on this point in a commentary on the height paper:
The study uses SNPs across the genome to measure this relatedness and then shows it correlates with phenotypic similarity – i.e., the trait is heritable. We knew that already.
What they claim is that you can break down this effect by chromosome or by subregion. When they use the SNPs along longer chromosomes they seem to get a bigger effect – “explaining more of the phenotypic variance”. The inference is that thousands of SNPs, scattered across the whole genome, contribute to the trait or, more specifically to variance in the trait across the population (the implication is that they contribute to the value of the trait in individuals).
There is an alternative explanation for this effect, however, which is that using more SNPs simply gives a better estimate of genetic relatedness. So, the SNPs on chromosomes 1 (the longest) give a better estimate than those on chromosome 21 (the shortest) – they index relatedness with more precision. As a result, they correlate better with phenotypic similarity – this looks like you have “explained more of the variance”. In fact, getting such a signal from SNPs on chromosome 1 does not mean that any of the causal variants are actually on chromosome 1. Nor does the fact that such signals can be derived from anywhere in the genome mean that there are thousands of variants across the genome affecting the trait.
What is the evidence that population structure is not causing the observed effects?
We took several steps to avoid population structure inflating the estimate of the variance explained by the SNPs. We excluded one individual from any pair that had an estimated relationship > 0.025 (approximately equivalent to between 3rd and 4th cousins). We fitted the first 20 principal components from the relationship matrix in the statistical model so that any population substructure that they picked up was excluded from the variance explained by the SNPs. Critically, we then estimated the correlation between the relationship matrices estimated from different chromosomes and did not find significant correlation. We tested a set of SNPs that are ancestry-informative in Europe for association with height and did not observe inflation of the test-statistics.
For the purpose of this paper, we performed an additional simulation experiment (inspired by comments from Dan Stram) by assuming that the causal variants were all carried on one set of chromosomes (odd numbers) and another set of chromosomes (even numbers) carried SNPs from which we estimated relatedness. If there is structure in the population then this would imply that a pair of individuals that are closely related on odd chromosomes will also be closely related on even chromosomes. We used the observed genotype data of 3,925 individuals and 295K SNPs as the basis of the simulation, and simulated 1,000 causal variants on the odd chromosomes with a total heritability of 80%. Then we performed a restricted maximum likelihood (REML) analysis of the simulated phenotypes on the genetic relationship matrix estimated from the SNPs on the even chromosomes. The estimates and standard errors (SEs) from 10 simulation replicates are shown in Table 1. Since REML estimates of variance are always positive, if the true variance explained is zero, we expect half the replicates to return an estimate of 0.0 and half to return an estimate with mean value 0.8 times the standard error. This is exactly what happened. Therefore we conclude (again) that there is no structure in the data that would inflate the estimate of the variance explained by the SNPs.
Steve Hsu correctly points out:
If I understand correctly, you want to claim that the observed population variation could be due to a few rare variants of large effect. But then it would be surprising for this study to have found .5 of the total variation to be associated with SNPs — compare to earlier studies using twins/adoptions/siblings that found narrow sense heritability of about .6 or so. I would not expect the rare alleles you hypothesize to be in good LD with SNPs (which are designed to tag common variants), so we would expect to lose a big chunk of the .6 additive heritability.
For example, in the Visscher paper on height they had to hand wave about imperfect LD to recover the full .8 or so of heritability. In this case the global fit comes out very close to .6, which suggests common rather than rare variants (at least, they are well tagged by SNPs). But if they are common variants their individual effect sizes must be small and there are a lot of them. Let me know if I am missing something.
I don’t think the population variation is caused by “a few” rare variants – I think it is (or could be at least) caused by a larger number of rare variants – different ones in different people.
This is getting to be a pretty silly argument: "different ones in different people" would add up to a very large number, which sounds "polygenic" enough to me (regardless of how many people have the major allele at most variable sites). And again: rare variants will be tagged less effectively (if at all) by common SNPs, so the causal variants whose effects are being estimated in this study can't be too rare. The contribution of rare variants to variability in intelligence is likely largely on top of the effect identified here, and probably mostly negative: an unusually high number of rare, deleterious mutations will tend to interfere with brain development and diminish IQ; an unusually low number will result in a higher IQ on average, explaining at least in part the associations commonly found between intelligence and other markers of "good genes" (health, physical attractiveness, and so on). A priori, though, it makes no sense to expect this type of variation to be the only or overwhelming source of genetic variability in IQ. Clearly, a very large number of genes affect brain development, and I expect pretty much all of these genes to be polymorphic. It's also clear tradeoffs affecting IQ exist (such as between brain size and energy expenditure) and that specific IQ-influencing alleles will have varying effects on fitness in different times and places. So it seems obvious to me common variants should be expected to play a major role in inter-individual and inter-population IQ differences.
Incidentally, looking again at the supplementary material for the height paper recently, I noticed the following addition:
In the version of this supplementary file originally posted online, Supplementary Fig. 2a and 2b were incorrect. The legend stated that in Supplementary Fig. 2a, PC1 versus PC2 was plotted when in fact PC2 versus PC3 was shown. Similarly, in Supplementary Fig. 2b, PC4 versus PC5 was plotted rather than PC3 versus PC4 as stated. This error is purely graphical and does not in any way affect the results or conclusions presented in the article.Dasein spotted the strange-looking PCA at the time. I didn't think it materially affected that paper's conclusion, but I'm pleased to see that confirmed and the issue resolved.