Reply to Rienzi on DNAprint, part 2

Rienzi takes issue with "unfavorable comparison" of DNAprint's efforts to higher-resolution exploration of genetic structure.

The issue is not up for debate.

As DNAprint themselves acknowledge, "increasing the number of AIMs is expected to increase the precision of the individual ancestry estimates". Whether we are dealing with personal genetic testing or academic studies using admixture mapping, more markers unquestionably are better (at least well into the hundreds of thousands--not hundreds--of SNPs). Compare the results Rosenberg et al. obtained in 2002 using 377 autosomal STRs and those of Stanford's more recent 650,000 SNP analysis. Intra-European structure is indistinct in the earlier analysis; at higher resolution, the same population samples are cleanly separable.

Rienzi conflates the general principal above with a debate on the merits of specific testing companies. Beyond pointing out that DNAprint has no utility for white Americans or Europeans (even if we are to trust DNAprint's own data, the "admixture" of the typical white American is well below the error thresholds DNAprint acknowledge are built into their tests, which if anything underestimate error), I have little interest in such a debate. And it wouldn't matter if there were no other options for testing--useless shit is useless shit. Still, I'll bite.

To get this out of the way: I have never endorsed a testing company, nor do I advocate personal genetic testing, though it's fine when used by knowledgeable people with reasonable goals (e.g. supporting a paternal-line genealogy through Y-STR testing). Question for Rienzi: why do you keep pushing DNAprint products? I see much emotional reactivity and little practical advice in your DNAprint advocacy. Lay out some scenarios outlining exactly what actions you propose people take based on ABD results.

I'm already on record as stating ML individual admixture estimates without accompanying information on confidence intervals are all but useless. So, given that ABD is also useless, deCODEme and DNAprint are presently battling for last place in the admixture analysis arena. That said, under a reasonable set of assumptions deCODEme's analysis is likely already superior:
  • More markers available.
  • The quality of data from the Illumina BeadChips is comparable to or better than that coming off DNAprint's SNP typing platform.
  • One assumes deCODE uses a similar or the same (relatively simple and old) algorithm used by DNAprint.
  • Though unfortunate, I don't see the absence of an Amerindian parental population as that huge an issue at the moment--DNAprint seems to have trouble distinguishing IA and EA admixture anyway.

Unless deCODE screwed up their math or chose to analyze only a tiny fraction of the available SNPs when calculating admixture, their estimates are already more precise than those of DNAprint.

More significantly, there's nothing to stop the deCODEme customer from doing his own admixture analyses. He can:
(1) Download his own genotype data.
(2) Download reference data. In addition to HapMap samples, we now have access to the 650,000 SNP data sets for the HGDP samples. More data sets will likely become available in the future.
(3) Run analyses with freely available software (e.g. STRUCTURE, frappe, ADMIXMAP).

It's true that such analyses on large data sets are computationally intensive. I don't consider this a valid excuse for companies, but, regardless, it should not deter the individual. Moreover, processing costs continue to fall and a new approach claims better results with fewer computational resources:
LAMP computes the ancestry structure for overlapping windows of contiguous SNPs and combines the results with a majority vote. Our empirical results show that LAMP is significantly more accurate and more efficient than existing methods for inferrring locus-specific ancestries, enabling it to handle large-scale datasets. We further show that LAMP can be used to estimate the individual admixture of each individual. Our experimental evaluation indicates that this extension yields a considerably more accurate estimate of individual admixture than state-of-the-art methods such as STRUCTURE or EIGENSTRAT, which are frequently used for the correction of population stratification in association studies.
[. . .]
We tested LAMP extensively on various datasets of admixed populations generated from the HapMap resource. Our simulations show that LAMP is significantly more accurate than state-of-the-art methods such as SABER and STRUCTURE. In addition, LAMP is highly efficient, with a running time that is about 200 times faster than SABER and about 104 times faster than STRUCTURE. The efficiency of LAMP allows us to estimate ancestries across the genome in several hours on a single computer.
[. . .]
A number of recent studies have produced panels of AIMs in admixed populations;33, 34, 35, 36 AIMs are SNPs that have differing frequencies in the ancestral populations. It is possible that the AIMs might be used to improve the accuracy of individual admixture prediction done by STRUCTURE or other methods, including LAMP. However, the AIMs have disadvantages because there is a risk of over fitting, and the studied population might be somewhat different than the population for which the AIMs were found. As we show here, in an era where the genotyping technology is getting cheaper, it is useful to use the entire set of genotyped SNPs in the analysis of population stratification.

The Watson analysis speaks poorly for deCODE's ethics, but due to the data quality confound says little about the accuracy of their admixture estimates. (Watson's genome was sequenced by 454 at 6 times coverage. Which lengths of DNA got sequenced was up to chance, which means some segments weren't sequenced at all, and others were sequenced an inadequate number of times. This is not an issue with the BeadChip platform.)

No comments: