race/history/evolution notes: ASHG 2012 abstracts (1)

People of the British Isles: An analysis of the genetic contributions of European populations to a UK control population. S. Leslie¹, B. Winney², G. Hellenthal³, S. Myers⁴, P. Donnelly³, W. Bodmer² 1) Statistical Genetics, Murdoch Childrens Research Institute, Melbourne, Australia; 2) Department of Oncology, University of Oxford, UK; 3) The Wellcome Trust Centre for Human Genetics, University of Oxford, UK; 4) Department of Statistics, University of Oxford, UK.

   There is much interest in fine scale population structure in the UK, as a signature of historical migration events and because of the effect population structure may have on disease association studies. Population structure appears to have a minor impact on the current generation of genome-wide association studies, but will probably be important for the next generation of studies seeking associations to rare variants. Furthermore there is great interest in understanding where the British people came from. Thus far genetic studies have been limited to a small number of markers or to samples not collected to specifically address these questions. A natural method for understanding population structure is to control and document carefully the provenance of samples. We describe the collection of a cohort of rural UK samples (The People of the British Isles), aimed at providing a well-characterised UK control population. This will be a resource for research community as well as providing fine-scale genetic information on the history of the British. Using a novel clustering algorithm, approximately 2000 samples were clustered purely as a function of genetic similarity, without reference to their known sampling locations. When each individual is plotted on a UK map, there is a striking association between inferred clusters and geography, reflecting to a major extent the known history of the British peoples. A similar analysis is performed on samples from different parts of Europe. Using the European samples as ‘source populations’ we apply a novel algorithm to determine the proportion of the genomes within each of the derived British clusters that are most closely related to each of the source populations. Thus we can observe the relative contribution (under our model) of each of these European populations to the genomes of samples in different regions of Britain. Our results strikingly reflect much of the known historical and archaeological record while raising some important questions and perhaps answering others. We believe this is the first detailed analysis of very fine-scale genetic structure and its origin in a population of very similar humans. This has been achieved through both a careful sampling strategy and an approach to analysis that accounts for linkage disequilibrium.

Estimating and Interpreting F_ST: the Impact of Rare Variants. G. Bhatia^1,2, N. Patterson², S. Sankararaman^2,5, A. L. Price^2,3,4 1) Harvard- Massachusetts Institute of Technology (MIT) Division of Health, Science and Technology, Cambridge, MA; 2) Broad Institute of Harvard and MIT, Cambridge, MA; 3) Department of Epidemiology, Harvard School of Public Health, Boston, MA; 4) Department of Biostatistics, Harvard School of Public Health, Boston, MA; 5) Department of Genetics, Harvard Medical School, Boston, MA.

   F_ST is a widely used tool for studying population structure, but many different definitions, estimation methods and interpretations exist in the literature. Thus, wide variation in published estimates of F_ST is important to understand. For example, the F_ST between European (CEU) and East Asian (CHB) populations is 0.111 when estimated from HapMap3 data, but only 0.052 when estimated from 1000 Genomes data (1kG). While, changes in F_ST from sequencing data might be expected from including rare variants we show that this is largely through bias introduced by the estimation method and not population genetic factors. We describe a method that is shown to avoid these biases. We consider three specific aspects of estimation: (1) defining F_ST for a single SNP, (2) combining estimates of F_ST across multiple SNPs, and (3) selecting the set of SNPs used in the computation. Correcting for differences in each of these aspects of estimation yields estimates of F_ST that are much more concordant between genotype and sequence data. For example, our estimate of F_ST between CEU and CHB from 1kG is 0.106, only slightly lower than the HapMap3 estimate. This decrease is due to ascertainment bias of SNPs included in the HapMap3 project, not to properties of rare variants. In general, F_ST at rare variants in a population will be sensitive to demographic events affecting that population. When comparing CEU to CHB, for example, we show that rare variants in CEU and CHB have higher F_ST than common variants. This is consistent with the influence of strong bottlenecks on F_ST at rare variants. We note that ascertainment in an out-group—for example, Yoruba (YRI)—will remove this frequency dependence of F_ST. Finally, we show that single-SNP estimates of F_ST based on a common definition (Weir and Cockerham 1984) can become inflated in a setting of very different sample sizes. This inflation can result in false-positive signals of natural selection. Indeed, we show that in a recent study of selection that compared 1,890 African-American and 113 YRI samples, (Jin et al. 2011), F_ST estimates at 9 of the 10 reported novel loci are inflated by the disparity in sample size, and, after correction, only 7 of these 10 loci remain nominally significant. This suggests that caution is warranted when using this definition to rank single-SNP estimates of F_ST. Our results indicate that a careful protocol is needed for producing F_ST estimates. We provide such a protocol.

High Exome Mutational Burden in 58 African Americans with Persistent Extreme Blood Pressure. KD. H. Nguyen¹, A. C. Morrison², A. Li², R. Gibbs³, E. Boerwinkle², A. Chakravarti¹ 1) Center for Complex Disease Genomics, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA; 2) Human Genetics Center, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA; 3) Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.

   High blood pressure (BP) is a major cardiovascular risk factor in African Americans (AA). Despite its modest heritability (35%), ~63 BP loci have been implicated by genome-wide association studies in European and African ancestry samples. We explored exome sequencing in 58 African Americans (AA) at the extremes of BP distribution across multiple visits in the Atherosclerosis Risk in Communities study (~1%tile and 99%tile residuals of the baseline age- and sex-corrected systolic BP) to demonstrate the enrichment of deleterious mutations genome-wide and to identify novel genes. We identified 67,298 high quality coding/splicing variants (≥10X coverage, ≥2 copies of the variant alleles, PHRED-like score ≥30, call rate ≥90%); each variant had a phyloP conservation score (S) and was classified as synonymous, mild missense (exon splice junction, non-NMD nonsense, nonsynonymous) or severe missense (intron splice junction, NMD nonsense). We assumed that the observed exomic mutation profile (kernel density of variants for each S value) from the 58 individuals was a mixture of two profiles, (1- β) of random subjects (107,727 variants in 61 AA individuals from the 1000G Project) and β of ‘true’ mutations (70,393 Mendelian / disease causing mutations from the Human Genome Mutation Database), and estimated the mutational burden (β^) by least squares. This analysis estimated an overall β^= 6%, with values of 2%, 12% and 38% for the synonymous, mild missense and severe missense variants, respectively. Importantly, β^ increased with higher conservation scores to ~100%. Across each of the 3 mutation classes, β^ was slightly higher for variants observed exclusively in the top than the bottom BP group (14%/12%, 27%/25%, 60%/41%, for synonymous, mild and severe missense variants respectively). Conversely, we observed β^=0 for variants that were present in both the top and bottom BP classes irrespective of mutation class. By considering only variants at class-specific phyloP thresholds, S≥5 and 4.5, for the mild and severe missense variants (β^=100%), we estimate that a minimum of 2,412 variants in 1,881 genes, or an average burden of ~42 mutations at ~32 genes per subject, are involved in BP. Consequently, our results showed that BP extreme subjects have distinct global mutational burden; there is a significant enrichment of deleterious coding mutations at highly conserved sites in these individuals; and the identified genes reveal new BP candidate genes.

PRDM9 directs genetic recombination away from functional genomic elements. K. Brick¹, F. Smagulova², P. Khil¹, RD. Camerini-Otero¹, G. Petukhova² 1) Genetics & Biochemistry Branch, NIDDK, National Institutes of Health, Bethesda, MD; 2) Uniformed Services University of Health Sciences, Department of Biochemistry and Molecular Biology, Bethesda, MD, USA.

   Recombination initiates with the formation of programmed DNA double strand breaks (DSBs) at a small subset of genomic loci called hotspots. Elegant recent studies in mouse and human have determined that PRDM9, a meiosis-specific histone H3 methyl-transferase is involved in DSB hotspot site determination (Parvanov et al. Science 2010; Baudat et al., Science 2010; Myers et al., Science 2010), likely thorough DNA binding of its zinc-finger domain. We have recently generated the first genome-wide DSB hotspot map in a metazoan genome and have shown that the majority of mouse DSB hotspots are associated with testis-specific H3K4me3 chromatin marks, potentially formed by PRDM9 (Smagulova et al., Nature 2011). Curiously however, Prdm9 knockout mice remain proficient at initiating recombination. In this work, we describe several straightforward experiments that elucidate the nature and extent of the role of PRDM9 in determining DSB hotspots locations.
    We used a novel ChIP-Seq variant developed by our group to detect ssDNA bound by the meiotic recombinase DMC1 (Khil et al., Genome Res., 2012). Using this method, we precisely mapped the genome wide distribution of DSB hotspots in seven mouse strains and in their F1 progeny. While hotspots in mice sharing a Prdm9 allele mapped to almost identical loci, hotspots in other mice were dependent on the DNA binding specificity of the Prdm9 allele. Importantly, in Prdm9 knockout mice, hotspots were at completely different locations than in wild-type, definitively illustrating that PRDM9 determines practically all DSB hotspot locations. Intriguingly, DSBs in the pseudo autosomal region - the site of an obligate recombination event in every meiosis - were found to be Prdm9-independent and present in all strains. In Prdm9 knockout mice, DSBs still accumulated in hotspots however, in the absence of PRDM9, most recombination initiated at H3K4me3 marks at promoters or enhancers. These sites are rarely targeted in wild-type mice illustrating an important, unexpected role for PRDM9 in sequestering the recombination machinery away from functional genomic elements where the efficient repair of DSBs may be problematic.

A Unified Model of Meiosis Combining Recombination, Non-Disjunction, Interference and Infertility. H. R. Johnston IV, D. J. Cutler Department of Human Genetics, Emory University School of Medicine, Atlanta, GA.

   Human male and female recombination rates and patterns differ greatly across the broad scale of human chromosomes. Rates of infertility and non-disjunction differ widely between males and females. No simple cause is known for these observations. To this end, we have created a unified model of meiosis that combines recombination, non-disjunction, interference and fertility. The model correctly predicts the rate of fertility, trisomy 21 occurrences and the number and, most interestingly, the different patterns of recombination between the sexes. The model we create is based on the observation that chiasmata are the mechanism that enables the normal segregation of chromosomes during meiosis. Non-disjunction is the result of a failed segregation event. In our model, non-disjunction occurs both when no chiasmata are present between pairs of non-sister chromatids as well as when multiple chiasmata are present close together between pairs of non-sister chromatids. Other elements of our model include having no chiasmata occur between sister chromatids as well as concluding male meiosis immediately while arresting female meiosis between birth and the mother’s age at conception. This period of arrest requires that females begin with far more chiasmata than males. It also allows for physical interference to initiate from anywhere on a chromosome arm. In males, this initiation event is always telomeric. These elements combine to generate the unique patterns of recombination in each gender that have, heretofore, not been explained. They also generate the unique patterns of non-disjunction and infertility, helping to explain why these phenomena are seen far more often in eggs relative to sperm. Overall, this model argues that gross differences between male and female patterns of non-disjunction, infertility, and recombination are substantially the result of the period of meiotic arrest during oogenesis.

Human spermatogenic failure purges deleterious mutation load from the autosomes and both sex chromosomes, including the gene DMRT1. D. F. Conrad¹, A. Lopes², K. I. Aston³, F. Carvalho⁴, J. Goncalves⁵, R. Mathiesen², N. Huang⁶, A. Ramu¹, J. Downie⁷, S. Fernandes⁸, A. Amorim^2,8, A. Barros⁹, M. Hurles⁶, S. Moskovtsev¹⁰, C. Ober¹¹, J. Schiffman⁷, P. N. Schlegel¹², M. De Sousa¹³, D. T. Carrell^{3, 14} 1) Dept Genetics, Washington Univ School Med, St Louis, MO; 2) IPATIMUP, Institute of Molecular Pathology and Immunology of the University of Porto, R. Dr. Roberto Frias S/N, 4200-465 Porto, Portugal; 3) Andrology and IVF Laboratories, Department of Surgery; 4) Department of Genetics, Faculty of Medicine, University of Porto, Porto, Portugal; 5) Centre for Human Genetics, National Institute of Health Dr. Ricardo Jorge, Lisbon, Portugal; 6) Genome Mutation and Genetic Disease Group, Wellcome Trust Sanger Institute, Cambridge, UK; 7) Department of Oncological Sciences; 8) Faculty of Science, University of Porto, 4099-002 Porto, Portugal; 9) Centre for Reproductive Genetics Alberto Barros, Porto, Portugal; 10) Department of Obstetrics & Gynaecology, University of Toronto; 11) Department of Human Genetics, Department of Obstetrics & Gynecology, The University of Chicago, Chicago, IL 60637, USA; 12) Department of Urology, Weill Cornell Medical College, New York-Presbyterian Hospital, New York, USA; 13) Laboratory of Cell Biology, UMIB, ICBAS, University of Porto, Porto, Portugal; 14) Department of Physiology, Department of Obstetrics and Gynecology University of Utah School of Medicine, Salt Lake City, Utah, 84108, USA.

   Gonadal failure, along with early pregnancy loss and perinatal death, may be an important filter that limits the propagation of harmful mutations in the human population. We hypothesized men with spermatogenic impairment, a condition with unknown genetic architecture and a common cause of male infertility, are enriched for rare deleterious mutations compared to men with normal spermatogenesis. We assayed genomewide SNPs and CNVs in 327 men with spermatogenic impairment and >1100 controls, and estimated that a rare autosomal deletion multiplicatively changes a man’s risk for this condition by 10% (OR 1.10 [1.05-1.15], p < 4 x 10^-5), a rare X-linked CNV by 29%, (OR 1.29 [1.16-1.43], p< 3 x 10^-6) and a rare Y-linked duplication by 64% (OR 1.64 [1.28-2.10], p < 9 x 10^-5). Based on the population frequency of potential risk alleles, extent of homozygosity, and evidence for dosage sensitivity of genes disrupted in men with spermatogenic impairment, we propose that the CNV burden is polygenic and distinct from the burden of large, dominant mutations described for developmental disorders. Our study also identifies focal deletions of the sex-differentiation gene DMRT1 as likely recurrent causes of idiopathic azoospermia, and generates hypotheses for directing future studies on the genetic basis of male infertility and IVF outcomes.

Genome Wide Association Study of Sexual Orientation in a Large, Web-based Cohort. E. M. Drabant, A. K. Kiefer, N. Eriksson, J. L. Mountain, U. Francke, J. Y. Tung, D. A. Hinds, C. B. Do 23andMe, Mountain View, CA.

   There is considerable variation in human sexual orientation. Heritability studies have differed on the exact scope of genetic contributions for sexual orientation, but it appears that both genetics and environment play a role. Though a few linkage studies have pointed at a possible role for certain genes on the X chromosome, the strength of that evidence is limited due to the conflicting nature of the reports and small sample sizes. We sought to clarify some of the questions surrounding the possible genetic underpinnings of sexual orientation by deploying a web-based survey to the large 23andMe database and conducting the first ever genome-wide association study (GWAS) on sexual orientation.
   We adapted the Klein Sexual Orientation Grid to examine seven elements of sexual orientation. All items were rated on a seven point scale by participants. Initial analyses focused on the “self identification” item as a continuous variable in response to the question “How do you label, identify or think of yourself?” In a sample of 7,887 men and 5,570 women, 77.2% of men 74.6% of women identified as heterosexual only, 7.3% of men and 15.3% of women as heterosexual mostly, 1.1% of men and 2.7% of women as heterosexual somewhat more, 1.3% of men and 3.5% of women as bisexual, 0.7% of men and 0.5% of women as homosexual somewhat more, 2.9% of men and 1.6% of women as homosexual mostly, and 9.5% of men and 1.8% of women as homosexual only. In both men and women, sexual identity was most significantly correlated with sexual attraction (men r=0.97, women r=0.90), sexual behavior (men r=0.95, women r=0.83), sexual fantasies (men r=.96, women r=.75), and emotional attraction (men r=0.79, women r=0.45), and the least strongly correlated with heterosexual/homosexual lifestyle (men r=.54, women r=.37), and social preference (men r=.15, women r=.08).
   We carried out GWAS stratified by sex in a cohort of 7887 unrelated men and 5570 unrelated women of European ancestry collected in the two months since the initial survey release. No clear genome-wide significant associations have been found thus far, and the current data do not show any direct association for markers within chromosome band Xq28. However, data collection is still ongoing, and increased sample size may help to clarify the roles for currently suggestive associations.

A scalable pipeline for local ancestry inference using thousands of reference individuals. C. B. Do, E. Durand, J. M. Macpherson, B. Naughton, J. L. Mountain 23andMe, Inc, Mountain View, CA.

   Ancestry deconvolution, the task of identifying the ancestral origin of chromosomal segments in admixed individuals, is straightforward when the ancestral populations considered are sufficiently distinct. To date, however, no approaches have been shown to be effective at distinguishing between closely related populations (e.g., within Europe). Moreover, due to their computational complexity, most existing methods for ancestry deconvolution are unsuitable for application in large-scale settings, where the reference panels used contain thousands of individuals.
   We describe Ancestry Painting 2.0, a modular three-stage pipeline for efficiently and accurately identifying the ancestral origin of chromosomal segments in admixed individuals. In the first stage, an out-of-sample extension of the BEAGLE phasing algorithm is used to generate a preliminary phasing for an unphased, genotyped individual. In the second stage, a support vector machine (SVM) using a specialized string kernel assigns tentative ancestry labels to short local phased genomic regions. In the third stage, an autoregressive pair hidden Markov model simultaneously corrects phasing errors and produces reconciled local ancestry estimates and confidence scores based on the SVM labels.
   We compiled a reference panel of over 7,500 individuals of homogeneous ancestry, derived from a combination of several publicly available datasets and over 5,000 individuals reporting four grandparents with the same country-of-origin from the customer database of the personal genetics company, 23andMe, Inc, and excluding outliers identified through principal components analysis (PCA). In cross-validation experiments, Ancestry Painting 2.0 achieves high sensitivity and specificity (in most cases >90%) for labeling chromosomal segments across over 20 different populations worldwide. We also demonstrate the robustness of the algorithm via simulations of individuals of known local admixture, and compare Ancestry Painting 2.0 with existing state-of-the-art tools for multi-population local and global ancestry inference, including LAMP, ALLOY, PCA-ADMIX, and ADMIXTURE.

ASHG 2012 abstracts (1)

No comments: