Inferring Human Population Structure: STR or SNP? S. Xu1, L. Jin1,2 1) Chinese Academy of Sciences and Max Planck Society (CAS-MPG) Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China;; 2) Ministry of Education (MOE) Key Laboratory of Contemporary Anthropology and Center for Evolutionary Biology, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, Shanghai 200433, China.Yes, Rienzi, if it wasn't already obvious, thousands of SNPs beat 32 STR loci. Why would anyone "agonize" over an inconsequential result?
Both microsatellites (STRs) and single nucleotide polymorphisms (SNPs) have played important roles in inferring population structure. With the availability of genome-wide STR and SNP data for the same collection of world-wide human population samples (HGDP-CEPH panel), we now have the opportunity to compare the usefulness of the two types of data in inferring population structure. We selected the same set of 940 unrelated HGDP individuals in which both 783 STRs and 650,000 SNPs were genotyped, and performed both classical (phylogenetic and principal component) analysis and STRUCTURE analysis. We found for all analyses, with the same allele number, SNP data perform better and generate more reasonable results than STR data. Notably, a) SNP data offer superior clustering of individuals and populations; b) the phylogenetic tree reconstructed using SNP data is consistent better with the geographical distribution of populations; c) SNP data reveals fine structures and reasonable admixture pattern in STRUCTURE analysis for both full samples and subset of samples, but STR can not.
SNPs better than STRs for inferring population structure
Another ASHG 2008 abstract: