ASHG 2012 abstracts (3): miscellaneous


The Myth of Random Mating: Evidence of ancestry-related assortative mating across 3 generations in Framingham, MA. R. Sebro1,2, G. Peloso3,4, J. Dupuis5,6, N. Risch1,7,8 1) Institute for Human Genetics, University of California, San Francisco, San Francisco, CA; 2) Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, CA; 3) Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA; 4) Program in Medical and Population Genetics, Broad Institute, Cambridge, MA; 5) Department of Biostatistics, Boston University School of Public Health, Boston MA; 6) The National Heart, Lung and Blood Institute’s Framingham Heart Study, Framingham, MA; 7) Department of Biostatistics and Epidemiology, University of California, San Francisco, San Francisco, CA; 8) Division of Research, Kaiser Permanente, Oakland, CA.

   The factors that influence spouse selection are important to geneticists because the mating pattern determines the genetic structure of a population. There has been evidence of positive assortative mating (PAM) related to several phenotypic traits like height. Ancestrally-related PAM is necessary for genetic population stratification, which means spouses are more likely to share genes of common ancestry. Prior studies have shown strong ancestry-related assortative mating among Latino populations. Here, Caucasian spouse pairs from the Framingham Heart Study (FHS) Original and Offspring cohorts (N=885) genotyped on Affymetrix 500K were analyzed using principal components (PC) analysis. Data from individuals genotyped in HapMap and the Human Genome Diversity Project (HGDP) were projected onto these PCs to facilitate interpretation. Based on these and other data, the first principal component delineates the prominent northwest-to-southeast European cline. In our data, there was clear clustering on this axis, probably separating individuals of English/Irish/German ancestry from those of Italian ancestry. The second principal component also reveals strong clustering, and likely reveals individuals of Ashkenazi Jewish ancestry. In the Original (older) cohort, there is a very strong correlation in PC1 between the spouses (r=0.73, P=2e-22) and also for PC2 (r=0.80, P=4e-29). In the Offspring cohort the spouse correlations were lower but still highly significant: r=0.38, P=3e-28 for PC1 and r=0.45, P =9e-40) for PC2. Examination of scatter plots for spouse pairs in the two generations reveals both a reduction in clustering and lower but still evident correlation in the Offspring cohort. Of genetic impact, we observed highly significant Hardy-Weinberg disequilibrium (homozygote excess) for SNPs loading heavily on PC1 and PC2 across 3 generations, and also highly significant linkage disequilibrium between the same set of SNPs located on different chromosomes. These results are consistent with demographic patterns of social homogamy which have existed in Framingham over several generations, and a general trend of reduced homogamy over time. While Framingham is not representative of the general US population, its historic mating patterns serve as a reminder that assumptions of Hardy Weinberg and Linkage Equilibrium need to be made with caution when applied to genetic loci that are related to ancestry in any population.




A web-based initiative to accelerate research on genetics and disease in African Americans. K. E. Barnholt1, A. K. Kiefer1, H. L. Gates, Jr.2, M. Nelson1, M. Mullins1, E. Baker3, J. Frank1, C. D. Bustamante4, T. W. Love5, R. A. Kittles6, N. Eriksson1, J. L. Mountain1 1) 23andMe, Inc., Mountain View, CA; 2) W.E.B. Du Bois Institute for African and African American Studies, Harvard University, Cambridge, MA; 3) 23andYou.com; 4) Department of Genetics, Stanford University School of Medicine, Stanford, CA; 5) Onyx Pharmaceuticals, Inc., South San Francisco, CA; 6) College of Medicine, University of Illinois at Chicago, Chicago, IL.

   Little is known about the connections between DNA and disease in African Americans, in part because most genetics research has involved only those of European ancestry. Greater understanding of such connections could improve diagnoses and lead to opportunities for more personalized health care. In 2011 23andMe, Inc., a personal genomics and research company, launched the Roots into the Future initiative, which aims to enroll 10,000 African Americans in an innovative research project. The study seeks to determine whether genetic associations previously identified in Europeans are relevant to African Americans and to discover other genetic markers linked to conditions of particular relevance to the African American community. Currently the 23andMe cohort includes nearly 10,000 African Americans, over 5700 of whom were recruited through the Roots into the Future initiative. Each of these individuals (58% female, 42% male; mean age: 44) has submitted a saliva sample for genotyping via 23andMe’s custom genotyping array, which includes approximately 1 million single nucleotide polymorphisms. Participants are currently contributing information about their health and traits through online surveys. To date over 6200 participants have completed an average of 10.6 surveys. Using the genetic data we estimated the percent African and European ancestry of each participant. Median estimates were 73% and 23% respectively (with 4% uncertain). As expected, the higher a person’s proportion of European ancestry, the greater the chance that person carries variants that are more common among Europeans than among Africans, such as those linked to HIV-resistance and alpha-1 antitrypsin deficiency. Furthermore, the higher a person’s proportion of African ancestry, the more likely that person reported having curly hair, high blood pressure and type 2 diabetes, and the less likely that person reported having facial wrinkles, rosacea and Parkinson’s Disease. Based on data for over 8700 individuals likely to self-identify as African American, we replicated over 25 genetic associations reported previously for African Americans, including those for body-mass index, type 2 diabetes, lupus, height, and osteoporosis. For conditions for which we have already accrued at least 500 cases among this cohort, such as asthma, migraines, and uterine fibroids, we anticipate having power either to replicate associations identified through previous studies of Europeans or to find new associations.



Hidden heritability and risk prediction based on genome-wide association studies. N. Chatterjee1, B. Wheeler2, J. Sampson1, P. Hartge1, S. Chanock1, J. Park1 1) National Institute of Health, Rockville, MD, USA; 2) Information management system, Rockville, MD.

   Known discoveries from genome-wide association studies have limited predictive ability for individual traits, but recent estimates of “hidden heritability” suggest that in the future performance of predictive models can be potentially enhanced by incorporation of a large number of SNPs each with individually small effects. We use a novel theoretical model, discoveries from the largest genome-wide association studies and recent estimates of hidden heritability to project the predictive performance of polygenic models for ten complex traits as a function of the number and distribution of effect sizes for the underlying susceptibility SNPs, the sample size of the training dataset and the balance of true and false positives associated with the SNP selection criterion. We project, for example, that while 45% of the total variance of adult height has been attributed to common variants, a predictive model built based on as many as one million people may only explain 33.4% of variance of the trait in an independent sample. For rare highly familial conditions, such as Type 1 diabetes and Crohn’s disease, risk models including family history and optimal polygenic scores built based on current GWAS can identify a large proportion (e.g 80-90%) of cases by targeting a small group of high-risk individuals (e.g subjects with top 20% risk). In contrast, for more common conditions with modest familial components, such as Type 2 diabetes (T2D), coronary heart disease (CAD) and prostate cancer (PrCA), risk models built based on GWAS with current or foreseeable sample sizes (e.g triple in size) can miss large proportion (>50%) of cases by targeting a small group of high-risk individuals. For these common disease, the proportion of the population that can be identified to have 2-fold or higher risk than an average person in the population ranged between 1.1% (CAD) and 7.0% (PrCA) for polygenic models built based on current GWAS. If the sample size for future studies could be tripled, these proportions could range between 6.1% (CAD) and 18.8% (T2D). Our analyses suggest that the predictive utility of polygenic models depends not only on heritability, but also on achievable sample sizes, effect-size distribution and information on other risk-factors, including family history.



GWAS Identifies Biologically Relevant SNP Associations with Sexual Partnering Behavior. J. Gelernter1, 2, H. R. Kranzler3, R. Sherva4, R. Koesterer4, L. Almasy5, H. Zhao1, L. A. Farrer4 1) Yale University School of Medicine, New Haven, CT; 2) VA CT Healthcare Center, West Haven, CT; 3) University of Pennsylvania School of Medicine, Philadelphia, PA; 4) Boston University School of Medicine, Boston, MA; 5) Texas Biomedical Research Institute, San Antonio, TX.

   The specific factors influencing human sexual partnering are poorly understood. Arguably, in the pre-modern era, multiple mating may have been tied to selection for traits related to survival including resistance to infection and starvation, strength, and certain behaviors. Recently, we completed a GWAS using the Illumina Omni-Quad microarray in ~5800 African- and European-American (AA and EA) participants in genetic studies of alcohol, cocaine, and opioid dependence. Subjects were interviewed using the Semi-Structured Assessment for Drug Dependence and Alcoholism (SSADDA) - an instrument that covers all major DSM-IV diagnoses as well as other numerous psychiatric and lifestyle traits. One of these is the response to: “How many sexual partners have you had in your life?” Association of age-adjusted residuals of this variable with more than 3 million SNPs reliably imputed using the 1K Genomes reference panel was tested in each sex*population subgroup using generalized estimating equations. Results from subgroup analyses were combined by meta analysis. SNPs with p-values <1E-06 were genotyped in a replication sample of ~2300 subjects. Genomewide-significant results were obtained for 13 SNPs including ones that map to genes coding proteins involved in reproductive-related functions (e.g., rs74738626 in KCNU1 which encodes a testes-specific K+ channel [p=1.2E-12], rs78227383 in NME5, a nucleoside diphosphate kinase which may have a specific function in the phosphotransfer network involved in spermatogenesis [p=4.0E-11 in EAs only], and rs76221611 in CCND2 which encodes cyclin D2, shown to be highly expressed in ovarian and testicular tumors [p=3.3E-11 in AAs only]), immune response (e.g., rs2709778 in GARS which encodes gylcyl-tRNA synthetase shown to be a target of autoantibodies in human autoimmune diseases [p=1.0E-10 in males only]), and other genes of biological interest (e.g., rs10849971 in ALDH2, an alcohol-metabolizing enzyme that is also an alcohol dependence risk locus [p=9.6E-09 in females only]). These findings have clear implications with respect to normal sexual function and potentially for risk of sexually transmitted disease.




Ubiquitous polygenicity of human complex traits: genome-wide analysis of 49 traits in Koreans. J. Yang1, T. Lee2, J. Kim3, S. Cho4, P. Visscher1,5, H. Kim2,3,4 1) University of Queensland Diamantina Institute, University of Queensland, Brisbane, Queensland, Australia; 2) Department of Agricultural Biotechnology, Seoul National University, Seoul, Korea; 3) Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea; 4) C&K Genomics, Seoul, Korea; 5) The Queensland Brain Institute, The University of Queensland, Brisbane, Queensland 4072, Australia.

   Recent studies in population of European ancestry have shown that 30-50% of heritability for human complex traits such as height (Yang et al. 2010) and body mass index (Yang et al. 2011), and common diseases such as schizophrenia (Lee et al. 2012) and rheumatoid arthritis (Stahl et al. 2012) can be captured by common SNPs, and that genetic variation can be attributed to chromosomes, in proportion to their length. Using genome-wide estimation and partitioning approaches, we analyzed 49 human quantitative traits, many of which are relevant to human diseases, in 7,170 unrelated Korean individuals genotyped on 326,591 SNPs. For 43 of the 49 traits, we estimated a significant (P < 0.05) proportion of variance explained by all SNPs (h2G). On average across 47 of the 49 traits for which the estimate of h2G is non-zero, 13.4% (range of 3.4% to 31.6%) of phenotypic variance can be explained by all the SNPs being analysed, or approximately one-third (range of 7.8% to 76.8%) of narrow sense heritability. In contrast, on average across 25 of the 49 traits, the top associated SNPs at genome-wide significance level (P < 5e-8) explain 1.5% (range of 0.5% to 3.8%) of phenotypic variance. The majority (~92%) of explained variation estimated from all SNPs is captured by the SNPs with p-values < 0.031 in single SNP association analyses. Longer genomic segments tend to explain more phenotypic variation, with a correlation of 0.78 between the estimate of variance explained by individual chromosomes and their physical length. This correlation was stronger (0.81) for intergenic regions. Despite the fact that there are a few SNPs with large effects for most traits, these results suggest that polygenicity is ubiquitous for most human complex traits, and that a substantial proportion of heritability is captured by common SNPs.


What is the total SNP-associated heritability for alcohol dependence? N. G. Martin1, G. Zhu1, P. A. Lind1, A. C. Heath2, P. A. F. Madden2, M. L. Pergadia2, G. W. Montgomery1, J. B. Whitfield1 1) Genetic Epidemiology, Queensland Institute of Medical Research, Brisbane, Australia; 2) Department of Psychiatry, Washington University School of Medicine, St Louis, MO, USA.

   Background. Much has been written about the so-called “missing heritability” for complex traits. Nowhere is this more pertinent than for alcohol and nicotine dependence (AD, ND) for which there are estimates of heritability of up to 65% from twin studies, yet few causal variants have been replicated from GWAS studies, despite large sample sizes, suggesting that individual effect sizes of SNPs must be very small. Recently new statistical genetic techniques have been developed which allow estimation of the total variance associated with all SNPs on a GWAS chip, but this has yet to be applied to AD and ND. Methods. The current analysis is based on AD and ND symptom count data from over 8000 participants in our population-based twin-family studies who have used either alcohol or cigarettes at some stage of their lives. They were individually genotyped with Illumina 370K or 660K chips and 7.034M genotypes were imputed from HapMap 3 and 1000-Genomes data. The GCTA program of Yang, Visscher et al is used first to detect the degree of relatedness between apparently unrelated subjects, based on a set of about 300,000 SNPs pruned for LD. Phenotypic similarity is then regressed on IBS sharing for all possible relative pairs to estimate the total amount of variance due to SNPs on the chip. Results. Based on GCTA analysis for other complex traits we expect to find SNP associated variance accounting for about half the heritability estimated from conventional genetic epidemiology designs. However, these estimates are highly sensitive to population stratification so great care will be taken to remove all traces of population stratification during the analysis. Conclusions. The gap between the SNP-associated variance estimated by GCTA and twin and family estimates of heritability is most likely due to several factors. First, the tag SNPs on the chip are not in perfect LD with the causal SNPs; for other traits, simulation has shown that correcting for imperfect LD raises the SNP “heritability” by about 10%. Another major factor is that commercial chips only interrogate common SNPs so large effects of rare SNPs are simply not captured. Reasonable estimates from simulations suggest that this could account for another 20% of variance. Finally, we recognize that there are large sections of the genome containing highly repetitive DNA which are very poorly tagged by current chips, and where substantial proportions of genetic variance may be hidden.


What is the total SNP-associated heritability for alcohol dependence? N. G. Martin1, G. Zhu1, P. A. Lind1, A. C. Heath2, P. A. F. Madden2, M. L. Pergadia2, G. W. Montgomery1, J. B. Whitfield1 1) Genetic Epidemiology, Queensland Institute of Medical Research, Brisbane, Australia; 2) Department of Psychiatry, Washington University School of Medicine, St Louis, MO, USA.

   Background. Much has been written about the so-called “missing heritability” for complex traits. Nowhere is this more pertinent than for alcohol and nicotine dependence (AD, ND) for which there are estimates of heritability of up to 65% from twin studies, yet few causal variants have been replicated from GWAS studies, despite large sample sizes, suggesting that individual effect sizes of SNPs must be very small. Recently new statistical genetic techniques have been developed which allow estimation of the total variance associated with all SNPs on a GWAS chip, but this has yet to be applied to AD and ND. Methods. The current analysis is based on AD and ND symptom count data from over 8000 participants in our population-based twin-family studies who have used either alcohol or cigarettes at some stage of their lives. They were individually genotyped with Illumina 370K or 660K chips and 7.034M genotypes were imputed from HapMap 3 and 1000-Genomes data. The GCTA program of Yang, Visscher et al is used first to detect the degree of relatedness between apparently unrelated subjects, based on a set of about 300,000 SNPs pruned for LD. Phenotypic similarity is then regressed on IBS sharing for all possible relative pairs to estimate the total amount of variance due to SNPs on the chip. Results. Based on GCTA analysis for other complex traits we expect to find SNP associated variance accounting for about half the heritability estimated from conventional genetic epidemiology designs. However, these estimates are highly sensitive to population stratification so great care will be taken to remove all traces of population stratification during the analysis. Conclusions. The gap between the SNP-associated variance estimated by GCTA and twin and family estimates of heritability is most likely due to several factors. First, the tag SNPs on the chip are not in perfect LD with the causal SNPs; for other traits, simulation has shown that correcting for imperfect LD raises the SNP “heritability” by about 10%. Another major factor is that commercial chips only interrogate common SNPs so large effects of rare SNPs are simply not captured. Reasonable estimates from simulations suggest that this could account for another 20% of variance. Finally, we recognize that there are large sections of the genome containing highly repetitive DNA which are very poorly tagged by current chips, and where substantial proportions of genetic variance may be hidden.


Vascular Stiffness in a Healthy High Risk African American Population is Modified by the Extent of European Admixture. D. Vaidya, R. A. Mathias, L. R. Yanek, L. C. Becker, D. M. Becker Medicine, Johns Hopkins University, Baltimore, MD.

   Background: Compared to European Americans (EA), African-Americans (AA) have stiffer peripheral vessels, reflected in reduced carotid distensibility coefficient (DC). To determine whether this racial difference may be genetically determined, we examined the extent to which the variance in carotid distensibility in AA could be explained by EA admixture either at a global or local at genomic level. Methods: We examined data from 344 AA, 62% women, aged 25-76 years, enrolled in a large study (GeneSTAR) of apparently healthy people with a family history of early-onset coronary artery disease. DC was assessed using 2D ultrasound, calculated as 2*(fractional change in diameter from diastole to systole)/(systolic -diastolic blood pressure). By its calculation DC is inherently corrected for blood pressure levels. EA admixture was determined using a panel of 50,000 ancestry informative markers (deCODE Genetics), and local ancestry was calculated on Illumina Human 1M genomewide SNP panel using LAMP. Associations of log-transformed DC were tested using mixed model regressions adjusted for age, sex, sex*age interaction and within-family correlations. LAMP models were adjusted for population stratification PCAs derived from the Illumina 1M SNPs (EIGENSTRAT). Results: The median [interquartile range] of the DC was 0.0017 [0.0012-0.0024] mmHg-1. Every 10% incremental level of EA admixture was associated with 5% higher DC (95% CI: 1% to 9%, p=0.005), reflecting more distensibility, and less stiffness. In genomewide local ancestry analysis adjusted for sex, age, sex*age interaction, population stratification PCAs and within-family correlations, of 2756 genome segments in local ancestry LD, the highest association for local ancestry was found in Chromosome 8, positions 8.3M to 10M (Build 37.3), p=0.0012. On adjusting for local ancestry in this region, population stratification PCA1 representing global Caucasian ancestry was no longer significantly associated with DC (p=0.93). Conclusions: The racial difference in arterial distensibility between AA and EA is likely to have a basis in genetic admixture. We have identified a candidate region on chromosome 8 that may be responsible for this global admixture association.


A population isolate reveals enriched recessive deleterious variants underlying neurodevelopmental traits. O. Pietilainen1,2,3, J. Suvisaari5, W. Hennah2, V. Leppa2, T. Paunio2,3,4, M. Torniainen5, S. Ripatti1,2, S. Ala-Mello6, K. Rehnstrom1, A. Tuulio-Henriksson5, T. Varilo2, J. Tallila1, K. Kristiansson2, M. Isohanni7, J. Kaprio2, J. Eriksson8, M. Jarvelin9, R. Durbin1, J. Lonnqvist4,5, M. Hurles2, H. Stefansson10, N. Freimer11, M. Daly12, A. Palotie1,2,12 1) The Wellcome Trust Sanger Institute, Cambridge, Cambridge, United Kingdom; 2) Institute for Molecular Medicine Finland FIMM, Helsinki, Finland; 3) National Institute for Health and Welfare, Public Health Genomics Unit, Helsinki, Finland; 4) University of Helsinki and Helsinki University Central Hospital, Department of Psychiatry, Helsinki, Finland; 5) National Institute for Health and Welfare, Department of Mental Health and Substance Abuse Services, Helsinki, Finland; 6) Helsinki University Central Hospital, Department of Clinical Genetics, Helsinki, Finland; 7) Department of Psychiatry, Institute of Clinical Medicine, University of Oulu, Finland; 8) National Institute for Health and Welfare, Chronic Disease Epidemiology and Prevention, Helsinki, 90014, Finland; 9) Institute of Health Sciences, University of Oulu, Oulu, Finland; 10) deCODE genetics, 101 Reykjavik, Iceland; 11) Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, UCLA, Los Angeles, California, USA; 12) The Broad Institute of MIT and Harvard University, Cambridge, Ma, USA.

   Low frequency variants (MAF <5%) likely contribute to susceptibility for complex traits, but their study is challenging in admix populations. We hypothesize that population isolates that have experienced bottlenecks would have an enrichment of specific low frequency variants some of which could be predisposing to complex traits. This enrichment could benefit especially identification of variants with recessive effects. To test this hypothesis, we studied homozygous deletions in a prospective birth cohort from an isolated Northern Finnish population (N=4,931). The role of rare deletions being clearly establish in abnormal neuronal development led us to constrain our initial analysis to seven supposedly relevant phenotypes including diagnosis of schizophrenia, intellectual deficit, learning difficulties, epilepsy, neonatal convulsion, impaired hearing and cerebral palsy/perinatal brain damage. The analysis included 32,487 homozygous deletions in 205 loci of which 11% included exons of one or more genes. Among the seven traits studied, the strongest association was found with impaired hearing and a deletion on 15q15.3, overlapping STRC, previously associated with deafness (p = 10-4). The largest identified homozygous deletion was 240 kb on 22q11.22 and was associated with intellectual deficit (p<0.02). The deletion showed significant regional enrichment in an internal north-eastern isolate with 3-fold risk of schizophrenia compared to elsewhere in the country. Follow up of the deletion in 265 schizophrenia patients and 5140 controls revealed an allelic association with schizophrenia (p= 0.02, OR = 1.9) and was further replicated in 9,539 cases and 15,677 controls of European origin (p = 0.03, OR = 2.1). After screening over 13,106 Finns, we identified four individuals being homozygous for the deletion, all diagnosed with schizophrenia and/or intellectual disability. The deletion overlaps a gene encoding for TOP3B and was found to down regulate its expression to half among heterozygous carriers and zero in homozygous carriers (p < 10-10). Our results demonstrate the effect of multiple consecutive population bottlenecks in the enrichment of sizable deletions contributing to abnormal neuronal development. In addition the findings highlight the usefulness of population isolates in studying rare and low frequency variants in complex traits.

Identifying age- and sex- associated gene expression profiles in >7,000 whole-blood samples. M. J. Peters1,2,17, R. Joehanes3,17, T. Esko4,17, K. Heim5,17, H. Völzke6,17, L. Pilling7,17, J. Brody8,17, Y. F. Ramos9,17, B. E. Stranger10,11, M. W. Christiansen8, S. Gharib8, R. Hanson12, A. Hofman2,13, J. Kettunen14, D. Levy3, P. Munson3, C. O’Donnell3, B. Psaty8, F. Rivadeneira1,2,13, A. Suchy-Dicey8, A. G. Uitterlinden1,2,13, H. Westra15, I. Meulenbelt2,9,17, D. Enquobahrie8,17, T. Frayling7,17, A. Teumer16,17, H. Prokisch5,17, A. Metspalu4,17, J. B. J. Van Meurs1,2,17, A. D. Johnson3,17 on behalf of the CHARGE Gene Expression Working Group. 1) Department of Internal Medicine, Erasmus Medical Centre, Rotterdam, the Netherlands; 2) Netherlands Genomics Initiative-Sponsored by the Netherlands Consortium for Healthy Aging, Rotterdam and Leiden, the Netherlands; 3) Framingham Heart Study, National Heart, Lung and Blood Institute, Framingham, USA; 4) Estonian Genome Center and Institute of Molecular and Cell Biology of University of Tartu, Estonia; 5) Institute of Human Genetics, Technische Universität München, Munich, Germany; 6) Institute for Community Medicine, University Medicine Greifswald, Germany; 7) Epidemiology and Public Health, Peninsula College of Medicine and Dentistry, University of Exeter, UK; 8) Cardiovascular Health Research Unit, Departments of Medicine and Epidemiology, University of Washington, Seattle, WA, United States; 9) Department of Molecular Epidemiology, Leiden University Medical Center, Leiden, the Netherlands; 10) Division of Genetics, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA; 11) Broad Institute of Harvard and MIT, Cambridge, USA; 12) Phoenix Epidemiology and Clinical Research Branch, NIDDK, National Institute of Health, Phoenix, AZ, USA; 13) Department of Epidemiology, Erasmus Medical Centre, Rotterdam, the Netherlands; 14) Institute for Molecular Medicine Finland FIMM, University of Helsinki, Finland; 15) Department of Genetics, University of Groningen, University Medical Center Groningen, the Netherlands; 16) Interfaculty Institute for Genetics and Functional Genomics, Ernst-Moritz-Arndt University Greifswald, Germany; 17) Contributed Equally.

   Genome-Wide Expression Profiles (GWEPs) have been assayed in a growing number of cohort studies, but few attempts have been made to meta-analyse and cross-validate expression datasets. Consequently, many expression studies have been under powered. Therefore, we established a large-scale multi-cohort GWEP meta-analysis. The aim of this study was to robustly identify novel gene expression signatures associated with age and sex, two major risk factors for many diseases. We analyzed 6,993 European-ancestry PAXgene (whole-blood) samples from 6 cohort studies (RS, FHS, EGCUT, KORA, SHIP, INCHIANTI). GWEPs were quantile-normalized, log2-transformed, probe-centered and sample-z-transformed prior to analysis. In the discovery stage we meta-analysed age- and sex-associated signals for samples hybridized to an Illumina or Affymetrix array separately. All analyses were adjusted for plate ID, RNA quality, fasting- and smoking status, and cell counts (when available). The age analysis was additionally adjusted for sex. All significant signals were cross-validated between the Illumina and Affymetrix platforms. We examined the top-associated GWEPs in 3 additional studies: HVH (n=348), GARP (n=134), and NIDDK/PIMA (n=1457). We identified 396 age-associated transcripts with p<1E-5 and same direction in both platforms. NELL2, a protein kinase C-binding protein, was the most significant result with gene expression levels decreasing with age (Illumina p=8.2E-81, Affymetrix p=3.2E-64). NELL2 is involved in cell growth regulation and differentiation, and there is evidence for developmental fluctuation in puberty. We identified 347 transcripts differentially expressed between males and females(p<1E-5, same direction both platforms), of which >200 show mapping to sex chromosomes. The top autosomal gender-differentiated transcript is DACT1, which has higher mRNA levels in females (Illumina p=2.4E-47, Affymetrix p=1.6E-75). DACT1 is an antagonist of beta-catenin and prior work indicates it to be differentially methylated in testes. It is a biomarker for semen and DACT1 knockout mice showed developmental defects. Both the NELL2 and the DACT1 signals were replicated in all 3 additional cohorts. With the GWEP meta-analysis, we gained power relative to individual cohort analyses, and were able to identify novel replicable significant age- and sex- associated loci. These loci may have implications for age-related disease biology, gender biology, and in sample forensics.


Genetic variants in pigmentation genes, skin color, and risk of skin cancer in Japanese. T. Suzuki1, Y. Abe1, J. Yoshizawa1, Y. Hozumi1, T. Nakamura2, G. Tamiya2 1) Dept Dermatology, Yamagata Univ Sch Med, Yamagata, Japan; 2) Advanced Molecular Epidemiology Research Institute, Yamagata Univ Sch Med, Yamagata, Japan.

   Melanin pigmentation plays an important role in shielding the body from ultraviolet (UV) radiation and may serve as a scavenger for reactive oxygen species. More than 150 genes have been implicated in determining in mice, and include transcription factors, membrane and structural proteins, enzymes, and several kinds of receptors and their legands, most of which have human orthologues. Although many molecular mechanisms involved in melanin pigmentation are being determined, relatively little is understood about the genetic component responsible for variations in skin color within or between human populations. First, in order to reveal their genetic contribution to skin color, we examined the association of pigmentation-related genes variants and variations in the melanin index in members of the general Japanese population whose skin color was objectively measured by reflectometry. The multiple regression showed that OCA2 A481T rs74653330 (p = 6.18e-8) and, OCA2 H615R rs1800414 (p = 5.72e-6) were strongly associated with the mean of the melanin index in the female population. Three variants (SLC45A2 T500P rs11568737 p = 0.048, OCA2 T387M p = 0.015, TYR D125Y rs13312741 p = 0.022) were also significantly associated with melanin index. However, no significant associations were found between age and melanin index for variants of MC1R. Second, we evaluated the associations of the pigmentation-related genes variants and the risk of skin cancer. The statistical analysis revealed that only OCA2 H615R was associated with the risk of all skin cancers, especially malignant melanoma. We could not find any statistical significance in the associations of other variants, including OCA2 A481T, or melanin index with the risk of skin cancer. This is the first report on the association between the genetic variants in pigmentation genes and the risk of skin cancer in East Asian population.

You may contact the first author (during and after the meeting) at tamsuz@med.id.yamagata-u.ac.jp


Molecular phylogeny of an autosomal region under natural selection. V. A. Canfield1, A. Berg1, S. Peckins1, S. Oppenheimer2, K. C. Cheng1 1) Penn State College of Medicine, Hershey, PA; 2) Oxford University, Oxford, UK.

   The derived (A111T) variant of SLC24A5 is associated with lighter skin pigmentation compared to the ancestral allele. A111T is fixed or nearly fixed in most European, North African and Middle Eastern populations, extending east to Pakistan. In Europeans, a large genomic region of diminished variation on chromosome 15, nearly 150 kb in extent, includes SLC24A5. We analyzed the haplotypes in this region using existing genomic data. Eleven haplotypes, defined on the basis of 16 SNPs that span a 76 kb genomic region in which recombination was rare, account for 95% of the total. A single haplotype (here called C11) carries A111T, suggesting that its origin did not long predate the onset of selection. Haplotype C11 was the product of recombination between haplotypes C3 and C10, followed by the A111T mutation. C3 and C10 are both present in East Asia and the New World but virtually absent in Africa, suggesting that C11 originated outside of Africa, most likely in the Middle East. The current distribution of A111T is consistent with the view that it originated after the divergence between populations that settled Europe and those that settled East Asia.

You may contact the first author (during and after the meeting) at vac3@psu.edu



Sharing by descent, phasing, rare variants and population structure. A. Kong deCODE Genet., Reykjavik, Iceland.

Session Descriptions:
Identity by descent (IBD) is fundamental to genetics and has diverse applications. Recently developed statistical methods and genome-wide SNP data have made it possible to detect haplotypes shared identically by descent between individuals with common ancestry up to 25-50 generations ago. With sequence data, shared haplotypes from even more distant ancestry can be detected. Patterns of IBD segment sharing within and between populations reveal important population demographic features including recent effective population size and migration patterns. IBD segment sharing is directly relevant to disease gene mapping and estimation of heritability. Individuals who share a genetic basis for a trait are more likely to have IBD sharing compared to randomly chosen individuals, and this forms the basis for IBD mapping and heritability estimation. Analysis of data from extended pedigrees was extremely difficult with standard linkage approaches, but is now possible using approaches based on detected IBD segments. Detected IBD can be present across pedigrees, which enhances power to detect association with the trait. Further, in population samples there is potential to utilize detected IBD segments to improve power to detect association when multiple variants within a gene influence the trait. IBD segments can also be used to greatly improve haplotype phase estimates, which is critical to understanding the functional consequence of genetic variation. IBD-based long-range phasing has previously been shown to be effective in isolated populations such as Iceland, but recent advances have extended its application to large outbred populations. In this session, we explore these exciting new developments.

ASHG 2012 abstracts (2): physical traits


Chromosome X revisited - Variants in Xq21.1 associate with adult stature in a meta-analysis of 14,700 Finns. T. Tukiainen1, J. Kettunen1,2, A.-P. Sarin1,2, J. G. Eriksson3,4,5,6,7, A. Jula8, V. Salomaa3, O. T. Raitakari9,10, M.-R. Järvelin11,12, S. Ripatti1,2,13 1) Institute for Molecular Medicine Finland FIMM, University of Helsinki, Finland; 2) Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Finland; 3) Department of Chronic Disease Prevention, National Institute for Health and Welfare, Finland; 4) Department of General Practice and Primary Healthcare, University of Helsinki, Finland; 5) Unit of General Practice, Helsinki University Central Hospital, Finland; 6) Folkhälsan Research Center, Helsinki, Finland; 7) Vaasa Central Hospital, Vaasa, Finland; 8) Population Studies Unit, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Turku, Finland; 9) Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Finland; 10) Department of Clinical Physiology, Turku University Hospital, Finland; 11) Department of Epidemiology and Biostatistics, Faculty of Medicine, Imperial College London, United Kingdom; 12) Institute of Health Sciences, Biocenter Oulu, University of Oulu, Finland; 13) Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

   Genome-wide association studies (GWAS) provide a powerful tool to assess genetic associations between common marker alleles and complex traits in large numbers of individuals. Typically these studies have focused on testing the markers in the 22 autosomal chromosomes while the X-chromosome has been omitted from the analyses. The chromosome X, however, constitutes approximately 5% of genomic DNA encoding for more than 1000 genes, and thus also likely contains genetic variation contributing to common traits and disorders.
   We set to test associations between 560,000 genotyped and imputed SNP markers and eight anthropometric (BMI, stature, WHR) and biochemical (CRP, HDL, LDL, TC, TG) traits in 14,710 individuals (7468 males, 7242 females) from five Finnish cohorts.
   A region in chromosome Xq21.1 was associated with adult stature (meta-analysis p-value = 3.32×10-10). The lead SNP in the locus explained up to 0.55% of the variance in height in 31-year-old women corresponding to 1.09 cm difference between minor and major allele homozygotes. The associated lead variant (MAF = 0.31) is located upstream of ITM2A, a gene encoding for a membrane protein that plays a role in osteo- and chondrogenic differentiation. As this is among the first studies using the X chromosome reference haplotypes from the 1000 Genomes project, we are currently validating the imputation with genotyping methods.
   The findings pinpoint the value of including chromosome X in the GWAS of complex traits to identify further relevant gene regions that also account for some of the missing heritability. The study illustrates that the 1000 Genomes reference haplotypes allow for high-resolution investigations of the genetic variants in chromosome X even with a relative modest sample sizes compared to the current-day GWAS meta-analyses. Our finding demonstrates that the same analysis strategy is also likely to be useful in the meta-analyses of the large consortia with complex traits.



Dissection of polygenic variation for human height into individual variants, specific loci and biological pathways from a GWAS meta-analysis of 250,000 individuals. T. Esko1, A. R. Wood2, S. Vedantam3,4,5, J. Yang6, S. Gustaffsson7, S. I. Berndt8, J. Karjalainen9, H. M. Kang10, A. E. Locke11, A. Scherag12, D. C. Croteau-Chonka13, F. Day14, R. Magi1, T. Ferreira15, J. Randall15, T. W. Winkler16, T. Fall7, Z. Kutalik17, T. Workalemahu18, G. Abecasis10, M. E. Goddard6, L. Franke9, R. J. F. Loos14,19, M. N. Weedon2, E. Ingelsson7, P. M. Visscher6, J. N. Hirschhorn3,4,5, T. M. Frayling2, GIANT Consortium 1) Estonian Genome Center, University of Tartu, Tartu, Tartumaa, Estonia; 2) Genetics of Complex Traits, Peninsula College of Medicine and Dentistry, University of Exeter, Exeter, UK; 3) Divisions of Genetics and Endocrinology and Program in Genomics, Children's Hospital, Boston, Massachusetts 02115, USA; 4) Metabolism Initiative and Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts 02142, USA; 5) Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA; 6) University of Queensland Diamantina Institute, University of Queensland, Princess Alexandra Hospital, Brisbane, Queensland, Australia; 7) Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, 171 77 Stockholm, Sweden; 8) Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Bethesda, Maryland 20892, USA; 9) Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands; 10) Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan 48109, USA; 11) Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA; 12) Institute for Medical Informatics, Biometry and Epidemiology, University of Duisburg-Essen, Germany; 13) Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27599, USA; 14) MRC Epidemiology Unit, Institute of Metabolic Science, Cambridge, UK; 15) Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK; 16) Public Health and Gender Studies, Institute of Epidemiology and Preventive Medicine, Regensburg University Medical Center, Regensburg, Germany; 17) Department of Medical Genetics, University of Lausanne, 1005 Lausanne, Switzerland; 18) Department of Nutrition, Harvard School of Public Health, Boston, Massachusetts 02115, USA; 19) Mount Sinai School of Medicine, New York, NY, USA.

   Adult human height is a highly heritable polygenic trait. Previous genome-wide analyses have identified 180 independent loci explaining an estimated 1/8th of the heritable component (80%). Our aims were a) to increase the understanding of the role of common genetic variation in a model quantitative trait, and b) to help understand the biology of normal growth and development. Within the GIANT consortium, we performed a GWAS of ~250,000 individuals of European ancestry. We tested for the presence of multiple signals at individual loci using an approximate conditional and joint multiple SNP regression analysis. We identified 698 independent variants associated with height at p<5x10-8, which fell in 424 loci (+/-500kb from lead SNP) and altogether explained 1/4 of the inherited component in adult height. Half of the loci contained multiple signals of association. By applying a novel pathway analysis approach that uses co-expression data from 80,000 samples to predict the biological function of poorly annotated genes, we observed enrichment for novel and biologically relevant pathways in these loci. For example, for more than 10 % of the loci a gene was found in their vicinity with a predicted "regulation of ossification" function (GO:0030278, WMW P < 10-34), including newly identified genes such as PRRX1and SNAI1. Other genes and pathways newly highlighted by pathway analysis include WNT (WNT2B, WNT4, WNT7A) and FGF (FGF2, FGF18) signaling and osteoglycin. We also noted an excess of signals across the entire genome, with the median test statistic twice that expected under null (lambda = 2.0). This result is consistent with either a very deep polygenic component to height that covers most of the genome or population stratification contributing partly to the results, or a combination of the two. Encouragingly, initial results from family based analyses and mixed models that correct for distant relatedness across samples indicate that a large proportion of the discovered signals are genuine height-associated variants rather than confounded by stratification. In conclusion, data from 250,000 individuals show that adult height is highly polygenic with, typically, multiple signals of association per locus now accounting for ¼ of heritability. Furthermore, these results suggest that increasing GWAS sample sizes can continue to uncover substantial new insights into the aetiological pathways involved in common human phenotypes.


Over 250 novel associations with human morphological traits. N. Eriksson, C. B. Do, J. Y. Tung, A. K. Kiefer, D. A. Hinds, J. L. Mountain, U. Francke 23andMe, Mountain View, CA.

   External morphological features are by definition visible and are typically easy to measure. They also generally happen to be highly heritable. As such, they have played a fundamental role in the development of the field of genetics. As morphological traits have frequently been the target of natural selection, their genetics may also provide clues into our evolutionary history. Many rare diseases include dysmorphologic features among their symptoms. However, aside from height and BMI, currently little is known about the genetics of common variation in human morphology. Here we present a series of genome-wide association studies across 18 self-reported morphological traits in a total of over 55,000 people of European ancestry from the customer base of 23andMe. The phenotypes studied include hair traits (baldness, unibrow, hair curl, upper and lower back hair, widow’s peak), as well as many soft tissue and skeletal traits (chin dimple, nose shape, dimples, earlobe attachment, nose-wiggling ability, the presence of a gap between the top incisors, joint hypermobility, finger and toe relative lengths, arch height, foot direction, height-normalized shoe size). Across the 18 phenotypes, we find a total of 281 genome-wide significant associations (including 53 for unibrow and 29 each for hair curl and chin dimple). Almost all of these associations are novel; we believe this is the largest set of novel associations ever described in a single report. Many of these SNPs show pleiotropic effects, e.g., a SNP near GDF5 is associated with hypermobility, arch height, relative toe length, shoe size, and foot direction; another near AUTS is associated with both back hair and baldness. Nearby genes are significantly enriched to be transcription factors (p<1e-14) and to be involved in rare disorders that cause cleft palate, ear, limb, or skull abnormalities (p<1e-7). A SNP near ZEB2 is associated with both widow’s peak and chin dimple; mutations in ZEB2 cause Mowat-Wilson syndrome, which includes distinctive facial features such as a pronounced chin. Morphology-associated SNPs are also enriched within regions that have been identified as undergoing selection since the divergence from Neanderthals (18 associations in 11 regions, p = 4e-5). The abundance of these SNPs, which include the ZEB2 and GDF5 associations above, suggest that physical traits may have played a significant role in driving the natural selection processes that gave rise to modern humans.



Genome-wide association study of Tanner puberty staging in males and females. D. Cousminer1, N. Timpson2, D. Berry3, W. Ang4, I. Ntalla5, M. Groen-Blokhuis6, M. Guxens7, M. Kähönen8, J. Viikari9, T. Lehtimäki10, K. Panoutsopoulou11, D. Boomsma6, E. Zeggini11, G. Dedoussis5, C. Pennell4, O. Raitakari12, E. Hyppönen3, G. Davey Smith2, M. McCarthy13, E. Widén1, The Early Growth Genetics (EGG) Consortium 1) Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Finland; 2) The Medical Research Council (MRC) Centre for Causal Analyses in Translational Epidemiology, School of Social and Community Medicine, University of Bristol, Bristol, UK; 3) Centre for Paediatric Epidemiology and Biostatistics, MRC Centre for Epidemiology of Child Health, UCL Institute of Child Health, London, UK; 4) University of Western Australia, Perth, Western Australia, Australia; 5) Harokopio University of Athens, Department of Dietetics and Nutrition, Athens; 6) Netherlands Twin Register, Department of Biological Psychology, VU University, Amsterdam, The Netherlands; 7) Center for Research in Environmental Epidemiology (CREAL), Barcelona, Catalonia, Spain; 8) Department of Clinical Physiology, University of Tampere and Tampere University Hospital, Finland; 9) Department of Medicine, University of Turku, Finland; 10) Department of Clinical Chemistry, Fimlab Laboratories, University Hospital and University of Tampere, Finland; 11) Wellcome Trust Sanger Institute, Hinxton, UK; 12) Department of Clinical Physiology and Nuclear Medicine, University of Turku, Finland; 13) Wellcome Trust Centre for Human Genetics, Roosevelt Drive, University of Oxford, Oxford, UK.

   Puberty is a complex trait with large variation in timing and tempo in the population, and extremes in pubertal timing are a common cause for referral to pediatric specialists. Recently, large genome-wide association studies (GWAS) have revealed 42 common variant loci associated with age at menarche (AAM), and some implicated genes are known from severe single-gene disorders. However, little remains known of the genetic architecture underlying normal variation in the onset of puberty, especially in males.
   Tanner staging, a 5-stage scale assessing female breast and male genital development, is a commonly used measure of pubertal development. While AAM is a late event during puberty, Tanner staging during mid-puberty may correlate more closely with the central activation of puberty. With Tanner scale data at the comparable ages of 11-12 yrs in girls and 13-14 yrs in boys, we performed GWAS meta-analyses across 10 cohorts with up to 9,900 samples. The combined male and female analysis showed evidence for association near LIN28B (P=1.95x10-8), previously implicated in AAM and height growth in both sexes. Our data confirms that this locus is also important for male pubertal development and may be part of the pubertal initiation program upstream of sex-specific mechanisms. A novel signal (P= 4.95 x 10-8) with a consistent direction of effect across contributing datasets locates on chromosome 1 at an intronic transcription factor binding-site cluster within the gene CAMTA1. Furthermore, the primary analyses revealed suggestive evidence for male-specific loci, e.g. nearby MKL2 (P=4.68 x 10-7), which may be confirmed by follow-up genotyping. MAGENTA gene-set enrichment analysis of the combined-gender GWAS results showed enrichment of genes involved in expected pathways given the known biology underlying activation of puberty via the HPG axis. Novel genes near suggestively associated loci may also pinpoint novel regulatory mechanisms; CAMTA1 is a calmodulin-binding transcriptional activator, while MKL2 is also a transcriptional activator involved in cell differentiation and development. These results suggest the presence of multiple real signals beneath the genome-wide significant threshold, and further exploration of enriched pathways may reveal new insights into the biology of pubertal development.


Heritability estimation of height from common genetic variants in a large sample of African Americans. F. Chen1, G. K. Chen1, R. C. Millikan2, E. M. John3,4, C. B. Ambrosone5, L. Berstein6, W. Zheng7, J. J. Hu8, R. G. Ziegler9, S. L. Deming7, E. V. Bandera10, W. J. Blot7, 11, S. S. Strom12, S. I. Berndt9, R. A. Kittles13, B. A. Rybicki14, W. Issacs15, S. A. Ingles1, J. L. Stanford16, W. R. Diver17, J. S. Witte18, L. B. Signorello7,11, S. J. Chanock9, L. Le Marchand19, L. N. Kolonel19, B. E. Henderson1, C. A. Haiman1, D. O. Stram1 1) Preventive Medicine, University of Southern California, Los Angeles, CA; 2) Epidemiology, Gillings School of Global Public Health, and Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC; 3) Northern California Cancer Center, Fremont, CA; 4) School of Medicine, Stanford University, and Stanford Cancer Center, Stanford, CA; 5) Cancer Prevention and Control, Roswell Park Cancer Institute, Buffalo, NY; 6) Cancer Etiology, Population Science, Beckman Research Institute, City of Hope, CA; 7) Epidemiology, Vanderbilt Epidemiology Center and Vanderbilt-Ingram Cancer Center, Vanderbilt University, Nashville, TN; 8) Sylvester Comprehensive Cancer Center, Department of Epidemiology and Public Health, University of Miami, Miami, FL; 9) Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bathesda, MD; 10) The Cancer Institute of New Jersey, New Brunswick, NJ; 11) International Epidemiology Institute, Rockville, MD; 12) Epidemiology, The University of Texas M.D. Anderson Cancer Center, Huston, TX; 13) Medicine, University of Illinois at Chicago, Chicago, IL; 14) Biostatistics and Research Epidemiology, Henry Ford Hospital, Detroit, MI; 15) James Buchanan Brady Urological Institute, Johns Hopkins Hospital and Medical Institutions, Baltimore, MD; 16) Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA; 17) Epidemiology Research, American Cancer Society, Atlanta, GA; 18) Institute of Human Genetics, Dept of Epidemiology and Biostatistics, University of California, San Francisco, CA; 19) Epidemiology, Cancer Research Center, University of Hawaii, Honolulu, HI.

   Height has an extremely polygenic pattern of inheritance. Genome-wide association studies (GWAS) have revealed hundreds of common variants that are associated with human height at genome-wide levels of significance. Each of these common variants has a very modest effect, and only a small fraction of phenotypic variation can be explained by the aggregate of these common variants. In this large study of African-American men and women, we genotyped and analyzed 975,519 autosomal SNPs across the entire genome using a variance components approach, and found that 46.4% of phenotypic variation can be explained by these SNPs in a sample of 9,779 evidently unrelated individuals. We noted that in two samples of close relatives defined by probability of identical-by-descent (IBD) alleles sharing (Pr (IBD=1)>=0.3 and Pr (IBD=1)>=0.4), the proportion of phenotypic variation explained by the same set of SNPs increased to 75.5% (se: 14.8%) and 70.3% (26.9%), respectively. We conclude that the additive component of genetic variation for height may have been overestimated in earlier studies (~80%) and argue that this proportion also includes variation from epistatic effects. Using simulation, we showed that by using common SNPs that are only weakly correlated with causal SNPs, the model could explain a large proportion of heritability. We therefore argue that the heritability estimate from the variance components approach is not necessarily the variation explained by a given set of SNPs, but also possibly reflects distant relatedness between nominally unrelated participants. Finally, we explored the performance of the variance components approach and concluded that the approach fails when a large number of independent variables are included in the model as the structure of the two components becomes similar. Thus some degree of population stratification seems to be required in order for the method to perform well for very large numbers of SNPs; however when modest stratification is present there is a risk of miss-attribution of effects of unmeasured (and untagged) variants to measured variants.



A multi-SNP locus-association method reveals a substantial fraction of the missing heritability. Z. Kutalik1,2, G. Ehret3,4, D. Lamparter1,2, C. Hoggart5, J. Whittaker6, J. Beckmann1,7, GIANT consortium 1) Med Gen, Univ Lausanne, Lausanne, Switzerland; 2) Swiss Institute of Bioinformatics, Switzerland; 3) Division of Cardiology, Geneva University Hospital, Geneva, Switzerland; 4) McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland, United States of America; 5) Department of Pediatrics, Imperial College London, London, United Kingdom; 6) Quantitative Sciences, GlaxoSmithKline, Stevenage, UK; 7) Service of Medical Genetics, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzer- land.

   There are many known examples of multiple (semi-)independent associations at individual loci, which may arise either because of true allelic heterogeneity or imperfect tagging of an unobserved causal variant. This phenomenon is of great importance in monogenic traits but has not yet been systematically investigated and quantified in complex trait GWAS. We describe a multi-SNP association method that estimates the effect of loci harbouring multiple association signals using GWAS summary statistics. Applying the method to a large anthropometric GWAS meta-analysis (GIANT), we show that for height, BMI, and waist-hip-ratio (WHR) 10%, 9%, and 8% of additional phenotypic variance can be explained respectively on top of the previously reported 10%, 1.5%, 1%. The method also permitted to substantially increase the number of loci that replicate in a discovery-validation design. Specifically, we identified in total 263 loci at which the multi-SNP explains significantly more variance than the best individual SNP at the locus. A detailed analysis of multi-SNPs shows that most of the additional variability explained is derived from SNPs not in LD with the lead SNP suggesting a major contribution of allelic heterogeneity to the missing heritability.


Hundreds of loci contribute to body fat distribution and central adiposity. A. E. Locke1, D. Shungin2,3,4, T. Ferreira5, T. W. Winkler6, D. C. Croteau-Chonka7, R. Magi5,8, T. Workalemahu9, K. Fischer8, J. Wu10, R. J. Strawbridge11, A. Justice12, F. Day13, N. Heard-Costa14,15, C. S. Fox14, M. C. Zillikens16, E. K. Speliotes17,18, H. Völzke19, L. Qi9, I. Barroso20,21, I. M. Heid6, K. E. North12, P. W. Franks2,4,9, M. I. McCarthy22, J. N. Hirschhorn23, L. A. Cupples10,14, E. Ingelsson24, A. P. Morris5, R. J. F. Loos13,25, C. M. Lindgren5, K. L. Mohlke7, Genetic Investigation of ANthropometric Traits (GIANT) Consortium 1) Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI; 2) Genetic and Molecular Epidemiology Group, Department of Public Health and Clinical Medicine, Umeå University, Umeå, Sweden; 3) Department of Odontology, Umeå University, Umeå, Sweden; 4) Department of Clinical Sciences, Skåne University Hospital, Lund University, Malmö, Sweden; 5) Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK; 6) Regensburg University Medical Center, Department of Epidemiology and Preventive Medicine, Regensburg, Germany; 7) Department of Genetics, University of North Carolina, Chapel Hill, NC; 8) Estonian Genome Center, University of Tartu, Estonia; 9) Department of Nutrition, Harvard School of Public Health, Boston, MA; 10) Department of Biostatistics, School of Public Health, Boston University, Boston, MA; 11) Cardiovasvular Genetics and Genomics Group, Karolinska Institutet, Stockholm Sweden; 12) Department of Epidemiology and Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, NC; 13) MRC Epidemiology Unit, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge, UK; 14) National Heart, Lung, and Blood Institute, Framingham, MA; 15) Department of Neurology, Boston University School of Medicine, Boston, MA; 16) Department of Internal Medicine, Erasmus MC Rotterdam, the Netherlands; 17) Department of Internal Medicine, Division of Gastroenterology, and Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI; 18) Broad Institute, Cambridge, MA; 19) Institute for Community Medicine, Ernst-Moritz-Arndt-University Greifswald, Greifswald, Germany; 20) Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK; 21) University of Cambridge Metabolic Research Labs, Institute of Metabolic Sciences,; 22) University of Oxford, Oxford, UK; 23) Department of Genetics, Harvard Medical School, Boston, MA; 24) Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden; 25) Charles R. Bronfman Institute of Personalized Medicine, Child Health and Development Institute, Department of Preventive Medicine, Mount Sinai School of Medicine, New York, NY.

   Central adiposity and body fat distribution are risk factors for type 2 diabetes and cardiovascular disease and can be measured using waist circumference (WC), hip circumference (HIP), and waist-to-hip ratio (WHR). Adjusting for body mass index (BMI) differentiates effects from those for overall obesity. We performed fixed effects inverse variance meta-analysis for these traits with 72,919 individuals from 30 studies in a prior genome-wide association study (GWAS) meta-analysis, 71,139 individuals from 24 additional GWAS, and 67,163 individuals from 28 studies genotyped on Metabochip by the GIANT consortium. We identified 48 independent genome-wide significant (p<5x10-8) associations for WHR adjusted for BMI, including all 14 previously published signals. Twelve signals are located near genes for transcription factors, including developmental homeobox-containing proteins. Among them, two are in the HOXC gene cluster near HOXC8 and miR-196a2. HOXC8 is expressed in white adipose tissue and is a regulator of brown adipogenesis, while miR-196a inhibits Hoxc8 expression. Signals are located near PPARG, encoding a transcription factor known to regulate adipocyte differentiation, and near HMGA1 and CEPBA, encoding transcription factors that act downstream of insulin receptor and leptin signaling, respectively. Further novel signals are located near genes involved in angiogenesis (PLXND1, VEGFB, and MEIS1). Among the other five traits, we estimate that a significant proportion of the genetic effects for WC and HIP adjusted for BMI are correlated with height (0.59, p<5x10-79 and 0.83, p<2x10-40, respectively). Despite this strong correlation, an appreciable proportion of the genetic contributions to these traits will be independent of height. Association meta-analysis for the five additional traits identified an additional 148 independent signals (p<5x10-8), 32 of which have not been reported previously for an anthropometric trait. These novel signals suggest regulation of adipose gene expression (KLF14) and transcriptional control of cell patterning and differentiation in early development (HLX, SOX11, ZNF423, and HMGXB4) affect fat distribution. Meta-analyses for WHR, WC, and HIP, with and without adjustment for BMI, identified a total of 196 independent loci, 66 novel, affecting fat deposition and body shape, and implicating genes involved in development, adipose gene expression and tissue differentiation, response to metabolic signaling, and angiogenesis.



Prediction of human height with large panels of SNPs - insights into genetic architecture. Y. C. Klimentidis1, A. I. Vazquez1, G. de los Campos2 1) Energetics, University of Alabama at Birmingham, Birmingham, AL; 2) Biostatistics, University of Alabama at Birmingham, Birmingham, AL.

   Prediction of complex traits from genetic information is an area of major clinical and scientific interest. Height is a model trait since it is highly heritable and easily measured. Substantial strides in understanding the genetic basis of height have recently been made through genome-wide association studies (GWAS), and whole-genome prediction (WGP) which fits thousands of SNPs jointly. Here, we attempt to gain insight into the genetic architecture of human height by examining how WGP accuracy is affected by the choice of single-nucleotide polymorphism (SNPs). Specifically, we compare the prediction accuracy of models using: 1) SNPs selected based on the ‘top hits’ of the GIANT consortium meta-analysis for height at different p-value thresholds, and 2) SNPs in genomic regions that surround the most significant ‘top hits’. We use the Framingham Heart Study and GENEVA datasets, imputed up to 10 million SNPs with 1000 Genomes reference data. The predictive accuracy of each model was evaluated in cross-validation. We find that prediction accuracy increases up to a certain point with the inclusion of more ‘top hits’ from the GIANT study, that including SNPs from the regions surrounding ‘top hits’ contributes minimally to prediction accuracy, and that prediction accuracy increases with the size of the training dataset. Finally, we find that prediction accuracy is greatest for individuals at the phenotypic extremes of height. Our results suggest that improvement of genomic prediction models will require the use of information from a large number of selected SNPs, and that these models may be most useful at the phenotypic extremes.




Evidence of Inbreeding Depression on Human Height. J. F. Wilson1, N. Eklund2,3, N. Pirastu4, M. Kuningas5, B. P. McEvoy6, T. Esko7, T. Corre8, G. Davies9, P. d'Adamo4, N. D. Hastie10, U. Gyllensten11, A. F. Wright10, C. M. van Duijn5, M. Dunlop10, I. Rudan1, P. Gasparini4, P. P. Pramstaller12, I. J. Deary9, D. Toniolo8, J. G. Eriksson3, A. Jula3, O. T. Raitakari13, A. Metspalu7, M. Perola2,3,7, M. R. Jarvelin14,15, A. Uitterlinden5, P. M. Visscher6, H. Campbell1, R. McQuillan1, ROHgen 1) Centre for Population Health Sciences, Univ Edinburgh, Edinburgh, United Kingdom; 2) Institute for Molecular Medicine Finland (FIMM), Helsinki, Finland; 3) Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Finland; 4) Institute for Maternal and Child Health, IRCCS “Burlo Garofolo”, Trieste, University of Trieste, Italy; 5) Department of Epidemiology, Erasmus University Medical Center, Rotterdam, The Netherlands; 6) Queensland Institute of Medical Research, 300 Herston Road, Brisbane, Queensland 4006, Australia; 7) Estonian Genome Center, University of Tartu, Tartu, Estonia; 8) Division of Genetics and Cell Biology, San Raffaele Research Institute, Milano, Italy; 9) Department of Psychology, The University of Edinburgh, 7 George Square, Edinburgh EH8 9JZ, UK; 10) MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, Scotland; 11) Department of Immunology, Genetics and Pathology, SciLifeLab Uppsala, Rudbeck Laboratory, Uppsala University, Uppsala, Sweden; 12) Centre for Biomedicine, European Academy Bozen/Bolzano (EURAC), Bolzano, Italy - Affiliated Institute of the University of Lübeck, Lübeck, Germany; 13) Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland; 14) Biocenter Oulu, University of Oulu, Finland; 15) Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, MRC Health Protection Agency (HPA) Centre for Environment and Health, Imperial College London, London, UK.

   Stature is a classical and highly heritable complex trait, with 80-90% of variation explained by genetic factors. In recent years, genome-wide association studies (GWAS) have successfully identified many common additive variants influencing human height; however, little attention has been given to the potential role of recessive genetic effects. Here, we investigated genome-wide recessive effects by an analysis of inbreeding depression on adult height in over 35,000 people from 21 different population samples. We found a highly significant inverse association between height and genome-wide homozygosity, equivalent to a height reduction of up to 3 cm in the offspring of first cousins compared with the offspring of unrelated individuals, an effect which remained after controlling for the effects of socio-economic status, an important confounder. There was, however, a high degree of heterogeneity among populations: whereas the direction of the effect was consistent across most population samples, the effect size differed significantly among populations. It is likely that this reflects true biological heterogeneity: whether or not an effect can be observed will depend on both the variance in homozygosity in the population and the chance inheritance of individual recessive genotypes. These results predict that multiple, rare, recessive variants influence human height. Although this exploratory work focuses on height alone, the methodology developed is generally applicable to heritable quantitative traits (QT), paving the way for an investigation into inbreeding effects, and therefore genetic architecture, on a range of QT of biomedical importance.


Empirical and theoretical studies on genetic variance of rare variants for complex traits using whole genome sequencing in the CHARGE Consortium. C. Zhu1, A. Morrison2, J. Reid3, C. J. O’Donnell4, B. Psaty5, L. A. Cupples4,6, R. Gibbs3, E. Boerwinkle2,3, X. Liu2 1) Department of Agronomy, Kansas State University , Manhattan, KS; 2) Human Genetics Center, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX; 3) Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX; 4) NHLBI Framingham Heart Study, Framingham, MA; 5) Cardiovascular Health Research Unit, University of Washington, Seattle, WA; 6) Department of Biostatistics, Boston University School of Public Health, Boston, MA.

   As the frontier of human genetic studies have shifted from genome-wide association studies (GWAS) towards whole exome and whole genome sequencing studies, we have witnessed an explosion of new DNA variants, especially rare variants. An important but not yet answered question is the contribution of rare variants to the heritabilities of complex traits, which determine, in part, the gain in power from rare variants to discover new disease-associated genes. Here we present theoretical and empirical results on this question.
    Our theoretical study was based upon the distribution of allele frequencies incorporating mutation, random genetic drift, and the possibility of purifying selection against susceptibility mutations. It shows that in most cases rare variants only contribute a small proportion to the overall genetic variance of a trait, but under certain conditions they may explain as much as 50% of additive genetic variance when both susceptible alleles are under purifying selection and the rate of mutations compensating the susceptible alleles (i.e. repair rate) is high.
    In our empirical study, we estimated the proportion of additive genetic variances (σg2) of rare variants contributed to the total phenotypic variances of six complex traits (BMI, height, LDL-C, HDL-C, triglyceride and total cholesterol) using whole genome sequences (8x coverage) of 962 European Americans from the Charge-S study. The results show that the estimated σg2 of rare variants (MAF≤1%) ranged from 2% to 8% across the six traits. However, the standard errors (s.e.) of the estimated variance components from rare variants are relatively large compared to those of common variants. Using HDL-C as an example, the estimated σg2s are 0.08 (s.e. 0.10), 0.05 (s.e. 0.05) and 0.58 (s.e. 0.05) for rare, low-frequency (1%<MAF≤5%) and common (MAF>5%) variants, respectively.


Leveraging admixture analysis to resolve missing and cross-population heritability in GWAS. N. Zaitlen1, A. Gusev1, B. Pasaniuc1, G. Bhatia2, S. Pollack1, A. Tandon3, E. Stahl3, R. Do4, B. Vilhjalmsson1, E. Akylbekova5, A. Cupples6, M. Fornage7, L. Kao8, L. Lange9, S. Musani5, G. Papanicolaou10, J. Rotter11, I. Ruczinksi12, D. Siscovick13, X. Zhu14, S. McCarroll3, G. Lettre15, J. Hirschhorn16, N. Patterson4, D. Reich3, J. Wilson5, S. Kathiresan4, A. Price1, CAC. CARe Analysis Core5 1) Genetic Epidemiology, Harvard School of Public Health, Boston, MA; 2) Harvard-MIT Division of Health, Science and Technology; 3) Department of Genetics, Harvard Medical School, Boston, MA, USA; 4) Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, USA; 5) Jackson Heart Study, Jackson State University, Jackson, MS, USA; 6) Boston University, Boston, MA, USA; 7) Institute of Molecular Medicine and Division of Epidemiology School of Public Health, University of Texas Health Sciences Center at Houston, Houston, TX, 77030, USA; 8) Department of Epidemiology, Johns Hopkins University, Baltimore, Maryland, United States of America; 9) University of North Carolina, Chapel Hill, NC, USA; 10) National Heart, Lung, and Blood Institute (NHLBI), Division of Cardiovascular Sciences, NIH, Bethesda, MD 20892, USA; 11) Cedars-Sinai Medical Center, Medical Genetics Institute, Los Angeles, CA, USA; 12) Johns Hopkins University, Baltimore, Maryland, United States of America; 13) University of Washington, Seattle, WA, USA; 14) Department of Epidemiology and Biostatistics, School of Medicine, Case Western Reserve University, Cleveland, USA; 15) Département de Médecine, Université de Montréal, C.P. 6128, succursale CentrePville, Montréal, Québec, Canada; 16) Divisions of Genetics and Endocrinology and Program in Genomics, Children’s Hospital Boston, Boston, MA, USA2.

   Resolving missing heritability, the difference between phenotypic variance explained by associated SNPs and estimates of narrow-sense heritability (h2), will inform strategies for disease mapping and prediction of complex traits. Possible explanations for missing heritability include rare variants not captured by genotyping arrays, or biased estimates of h2 due to epistatic interactions [Zuk et al. 2012]. Here, we develop a novel approach to estimating h2 based on sharing of local ancestry segments between pairs of unrelated individuals in an admixed population. Unlike recent approaches for estimating the heritability explained by genotyped markers (h2g) [Yang et al. 2010], our approach captures the total h2, because local ancestry estimated from genotyping array data captures the effects of all variants—not just those on the array. Our approach uses only unrelated individuals, and is thus not susceptible to biases caused by epistatic interactions or shared environment that can confound genealogy-based estimates of h2. Theory and simulations show that the variance explained by local ancestry (h2γ) is related to h2, Fst, and genome-wide ancestry proportion (θ): h2γ = h2*2*Fst*θ*(1-θ). Thus, we can estimate h2γ and then infer h2 from h2γ. We apply our method to 5,040 African Americans from the CARe cohort and estimate the autosomal h2 for HDL cholesterol (0.39±0.11), LDL cholesterol (0.18±0.09), and height (0.55±0.13). As expected these h2 estimates were higher than estimates of h2g from the same data using standard approaches: 0.22±0.07, 0.16±0.07 and 0.31±0.07, consistent with previous estimates. The difference between h2 and h2g suggests that rare variants contribute substantial missing heritability that can be quantified using local ancestry information. Larger sample sizes will sizes will enable h2 estimates with even lower standard errors, so that the possible contribution of epistasis to previous estimates of h2 can be precisely quantified. We additionally use local ancestry to estimate the fraction of phenotypic variance shared between European and African genomes that is explained by genotyped markers, by estimating h2g in European segments, h2g in African segments, and h2g shared between European and African segments. Given that most GWAS to date have been carried out in individuals of European descent, these estimates shed light on the importance of collecting data from non-European populations for mapping disease in those populations.


Genome-wide association meta-analyses in over 210,000 individuals identify 20 sexually dimorphic genetic variants for body fat distribution. T. W. Winkler1, D. C. Croteau-Chonka2, T. Ferreira3, K. Fischer4, A. E. Locke5, R. Mägi3,4, D. Shungin6,7,8, T. Workalemahu9, J. Wu10, F. Day11, A. U. Jackson5, A. Justice12, R. Strawbridge13, H. Völzke14, L. Qi9, M. C. Zillikens15, C. S. Fox16, E. K. Speliotes17,18, I. Barroso19,20, E. Ingelsson21, J. N. Hirschhorn22, M. I. McCarthy23, P. W. Franks6,8,9, A. P. Morris3, L. A. Cupples10,24, K. E. North12, K. L. Mohlke2, R. J. F. Loos11,25, I. M. Heid1, C. M. Lindgren3, GIANT Consortium 1) Public Health and Gender Studies, Institute of Epidemiology and Preventive Medicine, Regensburg University Medical Center, Regensburg, Germany; 2) Department of Genetics, University of North Carolina, Chapel Hill, NC; 3) Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK; 4) Estonian Genome Center, University of Tartu, Tartu, Estonia; 5) Department of Biostatistics, University of Michigan, Ann Arbor, MI; 6) Department of Clinical Sciences, Skåne University Hospital, Lund University, Malmö, Sweden; 7) Department of Odontology, Umeå University, Umeå, Sweden; 8) Genetic and Molecular Epidemiology Group, Department of Public Health and Clinical Medicine, Section for Medicine, Umeå University, Umeå, Sweden; 9) Department of Nutrition, Harvard School of Public Health, Boston, MA; 10) Department of Biostatistics, School of Public Health, Boston University, Boston, MA; 11) MRC Epidemiology Unit, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge, UK; 12) Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC; 13) Cardiovascular Genetics and Genomics Group, Karolinska Institute, Stockholm, Sweden; 14) Institute for Community Medicine, Ernst-Moritz-Arndt-University Greifswald, Greifswald, Germany; 15) Department of Internal Medicine, Erasmus MC Rotterdam, the Netherlands; 16) National Heart, Lung, and Blood Institute, Framingham, MA; 17) Broad Institute, Cambridge, MA; 18) Department of Internal Medicine, Division of Gastroenterology, and Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI; 19) University of Cambridge Metabolic Research Labs, Institute of Metabolic Science Addenbrooke's Hospital, Cambridge, UK; 20) Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK; 21) Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden; 22) Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA; 23) University of Oxford, Oxford, UK; 24) Framingham Heart Study, Framingham, MA; 25) Charles R. Bronfman Institute of Personalized Medicine, Child Health and Development Institute, Department of Preventive medicine, Mount Sinai School of Medicine, New York, NY 10029, USA.

   It is well-known that body fat distribution differs between men and women, a circumstance that may be due to innate, genetic differences between sexes. Previously, we performed a large-scale meta-analysis of GWAS of waist-to-hip ratio adjusted for BMI (WHR), a measure of body fat distribution independent of overall adiposity and found that of the 14 loci established in men and women combined, seven showed a significant sex-difference. In a subsequent genome-wide analysis that was specifically tailored to detect sex-differential genetic effects for WHR, we identified two additional loci with significant sex-difference. Despite these findings, the genetic basis affecting the sexual dimorphism of WHR as well as the genetic architecture of WHR in general are still poorly understood. We therefore conducted sex-combined and sex-stratified meta-analyses comprising >210,000 individuals (>116,000 women; >94,000 men) of European ancestry from 57 GWAS studies and 28 studies genotyped on the MetaboChip within the GIANT consortium. The sex-combined analysis yielded 39 loci with genome-wide significant association (P<5x10-8), of which 11 loci showed significant sex-difference (Bonferroni-corrected P<0.05/39). Six of these loci influence WHR in women only without any effect in men (near COBLL1, LYPLAL1, PPARG, PLXND1, MACROD1, FAM13A); four loci have an effect in women and a less pronounced effect in men (near VEGFA, ADAMTS9, HOXC13, RSPO3); and one locus has only an effect in men (near GDF5). The sex-stratified analyses identified nine additional female-specific loci that had been missed in the sex-combined analysis due to the lack of effect in men (near MAP3K1, BCL2, TNFAIP8, CMIP, NKX3-1, NMU, SFXN2, HMGA1, KCNJ2). No additional loci were identified in the male-specific analysis. We confirmed all previously established sexually dimorphic variants for WHR. Of particular interest is the PPARG region that is a well-known target in type 2 diabetes treatments and shows a female-specific association with WHR. The enrichment of female-specific associations, i.e. 19 of the 20 sexually dimorphic loci, is consistent with the heritability of WHR as estimated in the Framingham Heart study; we found that WHR is more heritable in women (h2~46%) compared to men (h2~19%). Our results highlight the importance of sex-stratified analyses and can help to better understand the genetics underpinning the sex-differences of body fat distribution.

ASHG 2012 abstracts (1)

People of the British Isles: An analysis of the genetic contributions of European populations to a UK control population. S. Leslie1, B. Winney2, G. Hellenthal3, S. Myers4, P. Donnelly3, W. Bodmer2 1) Statistical Genetics, Murdoch Childrens Research Institute, Melbourne, Australia; 2) Department of Oncology, University of Oxford, UK; 3) The Wellcome Trust Centre for Human Genetics, University of Oxford, UK; 4) Department of Statistics, University of Oxford, UK.

   There is much interest in fine scale population structure in the UK, as a signature of historical migration events and because of the effect population structure may have on disease association studies. Population structure appears to have a minor impact on the current generation of genome-wide association studies, but will probably be important for the next generation of studies seeking associations to rare variants. Furthermore there is great interest in understanding where the British people came from. Thus far genetic studies have been limited to a small number of markers or to samples not collected to specifically address these questions. A natural method for understanding population structure is to control and document carefully the provenance of samples. We describe the collection of a cohort of rural UK samples (The People of the British Isles), aimed at providing a well-characterised UK control population. This will be a resource for research community as well as providing fine-scale genetic information on the history of the British. Using a novel clustering algorithm, approximately 2000 samples were clustered purely as a function of genetic similarity, without reference to their known sampling locations. When each individual is plotted on a UK map, there is a striking association between inferred clusters and geography, reflecting to a major extent the known history of the British peoples. A similar analysis is performed on samples from different parts of Europe. Using the European samples as ‘source populations’ we apply a novel algorithm to determine the proportion of the genomes within each of the derived British clusters that are most closely related to each of the source populations. Thus we can observe the relative contribution (under our model) of each of these European populations to the genomes of samples in different regions of Britain. Our results strikingly reflect much of the known historical and archaeological record while raising some important questions and perhaps answering others. We believe this is the first detailed analysis of very fine-scale genetic structure and its origin in a population of very similar humans. This has been achieved through both a careful sampling strategy and an approach to analysis that accounts for linkage disequilibrium.



Estimating and Interpreting FST: the Impact of Rare Variants. G. Bhatia1,2, N. Patterson2, S. Sankararaman2,5, A. L. Price2,3,4 1) Harvard- Massachusetts Institute of Technology (MIT) Division of Health, Science and Technology, Cambridge, MA; 2) Broad Institute of Harvard and MIT, Cambridge, MA; 3) Department of Epidemiology, Harvard School of Public Health, Boston, MA; 4) Department of Biostatistics, Harvard School of Public Health, Boston, MA; 5) Department of Genetics, Harvard Medical School, Boston, MA.

   FST is a widely used tool for studying population structure, but many different definitions, estimation methods and interpretations exist in the literature. Thus, wide variation in published estimates of FST is important to understand. For example, the FST between European (CEU) and East Asian (CHB) populations is 0.111 when estimated from HapMap3 data, but only 0.052 when estimated from 1000 Genomes data (1kG). While, changes in FST from sequencing data might be expected from including rare variants we show that this is largely through bias introduced by the estimation method and not population genetic factors. We describe a method that is shown to avoid these biases. We consider three specific aspects of estimation: (1) defining FST for a single SNP, (2) combining estimates of FST across multiple SNPs, and (3) selecting the set of SNPs used in the computation. Correcting for differences in each of these aspects of estimation yields estimates of FST that are much more concordant between genotype and sequence data. For example, our estimate of FST between CEU and CHB from 1kG is 0.106, only slightly lower than the HapMap3 estimate. This decrease is due to ascertainment bias of SNPs included in the HapMap3 project, not to properties of rare variants. In general, FST at rare variants in a population will be sensitive to demographic events affecting that population. When comparing CEU to CHB, for example, we show that rare variants in CEU and CHB have higher FST than common variants. This is consistent with the influence of strong bottlenecks on FST at rare variants. We note that ascertainment in an out-group—for example, Yoruba (YRI)—will remove this frequency dependence of FST. Finally, we show that single-SNP estimates of FST based on a common definition (Weir and Cockerham 1984) can become inflated in a setting of very different sample sizes. This inflation can result in false-positive signals of natural selection. Indeed, we show that in a recent study of selection that compared 1,890 African-American and 113 YRI samples, (Jin et al. 2011), FST estimates at 9 of the 10 reported novel loci are inflated by the disparity in sample size, and, after correction, only 7 of these 10 loci remain nominally significant. This suggests that caution is warranted when using this definition to rank single-SNP estimates of FST. Our results indicate that a careful protocol is needed for producing FST estimates. We provide such a protocol.



High Exome Mutational Burden in 58 African Americans with Persistent Extreme Blood Pressure. KD. H. Nguyen1, A. C. Morrison2, A. Li2, R. Gibbs3, E. Boerwinkle2, A. Chakravarti1 1) Center for Complex Disease Genomics, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA; 2) Human Genetics Center, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA; 3) Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.

   High blood pressure (BP) is a major cardiovascular risk factor in African Americans (AA). Despite its modest heritability (35%), ~63 BP loci have been implicated by genome-wide association studies in European and African ancestry samples. We explored exome sequencing in 58 African Americans (AA) at the extremes of BP distribution across multiple visits in the Atherosclerosis Risk in Communities study (~1%tile and 99%tile residuals of the baseline age- and sex-corrected systolic BP) to demonstrate the enrichment of deleterious mutations genome-wide and to identify novel genes. We identified 67,298 high quality coding/splicing variants (≥10X coverage, ≥2 copies of the variant alleles, PHRED-like score ≥30, call rate ≥90%); each variant had a phyloP conservation score (S) and was classified as synonymous, mild missense (exon splice junction, non-NMD nonsense, nonsynonymous) or severe missense (intron splice junction, NMD nonsense). We assumed that the observed exomic mutation profile (kernel density of variants for each S value) from the 58 individuals was a mixture of two profiles, (1- β) of random subjects (107,727 variants in 61 AA individuals from the 1000G Project) and β of ‘true’ mutations (70,393 Mendelian / disease causing mutations from the Human Genome Mutation Database), and estimated the mutational burden (β^) by least squares. This analysis estimated an overall β^= 6%, with values of 2%, 12% and 38% for the synonymous, mild missense and severe missense variants, respectively. Importantly, β^ increased with higher conservation scores to ~100%. Across each of the 3 mutation classes, β^ was slightly higher for variants observed exclusively in the top than the bottom BP group (14%/12%, 27%/25%, 60%/41%, for synonymous, mild and severe missense variants respectively). Conversely, we observed β^=0 for variants that were present in both the top and bottom BP classes irrespective of mutation class. By considering only variants at class-specific phyloP thresholds, S≥5 and 4.5, for the mild and severe missense variants (β^=100%), we estimate that a minimum of 2,412 variants in 1,881 genes, or an average burden of ~42 mutations at ~32 genes per subject, are involved in BP. Consequently, our results showed that BP extreme subjects have distinct global mutational burden; there is a significant enrichment of deleterious coding mutations at highly conserved sites in these individuals; and the identified genes reveal new BP candidate genes.




PRDM9 directs genetic recombination away from functional genomic elements. K. Brick1, F. Smagulova2, P. Khil1, RD. Camerini-Otero1, G. Petukhova2 1) Genetics & Biochemistry Branch, NIDDK, National Institutes of Health, Bethesda, MD; 2) Uniformed Services University of Health Sciences, Department of Biochemistry and Molecular Biology, Bethesda, MD, USA.

   Recombination initiates with the formation of programmed DNA double strand breaks (DSBs) at a small subset of genomic loci called hotspots. Elegant recent studies in mouse and human have determined that PRDM9, a meiosis-specific histone H3 methyl-transferase is involved in DSB hotspot site determination (Parvanov et al. Science 2010; Baudat et al., Science 2010; Myers et al., Science 2010), likely thorough DNA binding of its zinc-finger domain. We have recently generated the first genome-wide DSB hotspot map in a metazoan genome and have shown that the majority of mouse DSB hotspots are associated with testis-specific H3K4me3 chromatin marks, potentially formed by PRDM9 (Smagulova et al., Nature 2011). Curiously however, Prdm9 knockout mice remain proficient at initiating recombination. In this work, we describe several straightforward experiments that elucidate the nature and extent of the role of PRDM9 in determining DSB hotspots locations.
    We used a novel ChIP-Seq variant developed by our group to detect ssDNA bound by the meiotic recombinase DMC1 (Khil et al., Genome Res., 2012). Using this method, we precisely mapped the genome wide distribution of DSB hotspots in seven mouse strains and in their F1 progeny. While hotspots in mice sharing a Prdm9 allele mapped to almost identical loci, hotspots in other mice were dependent on the DNA binding specificity of the Prdm9 allele. Importantly, in Prdm9 knockout mice, hotspots were at completely different locations than in wild-type, definitively illustrating that PRDM9 determines practically all DSB hotspot locations. Intriguingly, DSBs in the pseudo autosomal region - the site of an obligate recombination event in every meiosis - were found to be Prdm9-independent and present in all strains. In Prdm9 knockout mice, DSBs still accumulated in hotspots however, in the absence of PRDM9, most recombination initiated at H3K4me3 marks at promoters or enhancers. These sites are rarely targeted in wild-type mice illustrating an important, unexpected role for PRDM9 in sequestering the recombination machinery away from functional genomic elements where the efficient repair of DSBs may be problematic.


A Unified Model of Meiosis Combining Recombination, Non-Disjunction, Interference and Infertility. H. R. Johnston IV, D. J. Cutler Department of Human Genetics, Emory University School of Medicine, Atlanta, GA.

   Human male and female recombination rates and patterns differ greatly across the broad scale of human chromosomes. Rates of infertility and non-disjunction differ widely between males and females. No simple cause is known for these observations. To this end, we have created a unified model of meiosis that combines recombination, non-disjunction, interference and fertility. The model correctly predicts the rate of fertility, trisomy 21 occurrences and the number and, most interestingly, the different patterns of recombination between the sexes. The model we create is based on the observation that chiasmata are the mechanism that enables the normal segregation of chromosomes during meiosis. Non-disjunction is the result of a failed segregation event. In our model, non-disjunction occurs both when no chiasmata are present between pairs of non-sister chromatids as well as when multiple chiasmata are present close together between pairs of non-sister chromatids. Other elements of our model include having no chiasmata occur between sister chromatids as well as concluding male meiosis immediately while arresting female meiosis between birth and the mother’s age at conception. This period of arrest requires that females begin with far more chiasmata than males. It also allows for physical interference to initiate from anywhere on a chromosome arm. In males, this initiation event is always telomeric. These elements combine to generate the unique patterns of recombination in each gender that have, heretofore, not been explained. They also generate the unique patterns of non-disjunction and infertility, helping to explain why these phenomena are seen far more often in eggs relative to sperm. Overall, this model argues that gross differences between male and female patterns of non-disjunction, infertility, and recombination are substantially the result of the period of meiotic arrest during oogenesis.



Human spermatogenic failure purges deleterious mutation load from the autosomes and both sex chromosomes, including the gene DMRT1. D. F. Conrad1, A. Lopes2, K. I. Aston3, F. Carvalho4, J. Goncalves5, R. Mathiesen2, N. Huang6, A. Ramu1, J. Downie7, S. Fernandes8, A. Amorim2,8, A. Barros9, M. Hurles6, S. Moskovtsev10, C. Ober11, J. Schiffman7, P. N. Schlegel12, M. De Sousa13, D. T. Carrell3, 14 1) Dept Genetics, Washington Univ School Med, St Louis, MO; 2) IPATIMUP, Institute of Molecular Pathology and Immunology of the University of Porto, R. Dr. Roberto Frias S/N, 4200-465 Porto, Portugal; 3) Andrology and IVF Laboratories, Department of Surgery; 4) Department of Genetics, Faculty of Medicine, University of Porto, Porto, Portugal; 5) Centre for Human Genetics, National Institute of Health Dr. Ricardo Jorge, Lisbon, Portugal; 6) Genome Mutation and Genetic Disease Group, Wellcome Trust Sanger Institute, Cambridge, UK; 7) Department of Oncological Sciences; 8) Faculty of Science, University of Porto, 4099-002 Porto, Portugal; 9) Centre for Reproductive Genetics Alberto Barros, Porto, Portugal; 10) Department of Obstetrics & Gynaecology, University of Toronto; 11) Department of Human Genetics, Department of Obstetrics & Gynecology, The University of Chicago, Chicago, IL 60637, USA; 12) Department of Urology, Weill Cornell Medical College, New York-Presbyterian Hospital, New York, USA; 13) Laboratory of Cell Biology, UMIB, ICBAS, University of Porto, Porto, Portugal; 14) Department of Physiology, Department of Obstetrics and Gynecology University of Utah School of Medicine, Salt Lake City, Utah, 84108, USA.

   Gonadal failure, along with early pregnancy loss and perinatal death, may be an important filter that limits the propagation of harmful mutations in the human population. We hypothesized men with spermatogenic impairment, a condition with unknown genetic architecture and a common cause of male infertility, are enriched for rare deleterious mutations compared to men with normal spermatogenesis. We assayed genomewide SNPs and CNVs in 327 men with spermatogenic impairment and >1100 controls, and estimated that a rare autosomal deletion multiplicatively changes a man’s risk for this condition by 10% (OR 1.10 [1.05-1.15], p < 4 x 10-5), a rare X-linked CNV by 29%, (OR 1.29 [1.16-1.43], p< 3 x 10-6) and a rare Y-linked duplication by 64% (OR 1.64 [1.28-2.10], p < 9 x 10-5). Based on the population frequency of potential risk alleles, extent of homozygosity, and evidence for dosage sensitivity of genes disrupted in men with spermatogenic impairment, we propose that the CNV burden is polygenic and distinct from the burden of large, dominant mutations described for developmental disorders. Our study also identifies focal deletions of the sex-differentiation gene DMRT1 as likely recurrent causes of idiopathic azoospermia, and generates hypotheses for directing future studies on the genetic basis of male infertility and IVF outcomes.




Genome Wide Association Study of Sexual Orientation in a Large, Web-based Cohort. E. M. Drabant, A. K. Kiefer, N. Eriksson, J. L. Mountain, U. Francke, J. Y. Tung, D. A. Hinds, C. B. Do 23andMe, Mountain View, CA.

   There is considerable variation in human sexual orientation. Heritability studies have differed on the exact scope of genetic contributions for sexual orientation, but it appears that both genetics and environment play a role. Though a few linkage studies have pointed at a possible role for certain genes on the X chromosome, the strength of that evidence is limited due to the conflicting nature of the reports and small sample sizes. We sought to clarify some of the questions surrounding the possible genetic underpinnings of sexual orientation by deploying a web-based survey to the large 23andMe database and conducting the first ever genome-wide association study (GWAS) on sexual orientation.
   We adapted the Klein Sexual Orientation Grid to examine seven elements of sexual orientation. All items were rated on a seven point scale by participants. Initial analyses focused on the “self identification” item as a continuous variable in response to the question “How do you label, identify or think of yourself?” In a sample of 7,887 men and 5,570 women, 77.2% of men 74.6% of women identified as heterosexual only, 7.3% of men and 15.3% of women as heterosexual mostly, 1.1% of men and 2.7% of women as heterosexual somewhat more, 1.3% of men and 3.5% of women as bisexual, 0.7% of men and 0.5% of women as homosexual somewhat more, 2.9% of men and 1.6% of women as homosexual mostly, and 9.5% of men and 1.8% of women as homosexual only. In both men and women, sexual identity was most significantly correlated with sexual attraction (men r=0.97, women r=0.90), sexual behavior (men r=0.95, women r=0.83), sexual fantasies (men r=.96, women r=.75), and emotional attraction (men r=0.79, women r=0.45), and the least strongly correlated with heterosexual/homosexual lifestyle (men r=.54, women r=.37), and social preference (men r=.15, women r=.08).
   We carried out GWAS stratified by sex in a cohort of 7887 unrelated men and 5570 unrelated women of European ancestry collected in the two months since the initial survey release. No clear genome-wide significant associations have been found thus far, and the current data do not show any direct association for markers within chromosome band Xq28. However, data collection is still ongoing, and increased sample size may help to clarify the roles for currently suggestive associations.


A scalable pipeline for local ancestry inference using thousands of reference individuals. C. B. Do, E. Durand, J. M. Macpherson, B. Naughton, J. L. Mountain 23andMe, Inc, Mountain View, CA.

   Ancestry deconvolution, the task of identifying the ancestral origin of chromosomal segments in admixed individuals, is straightforward when the ancestral populations considered are sufficiently distinct. To date, however, no approaches have been shown to be effective at distinguishing between closely related populations (e.g., within Europe). Moreover, due to their computational complexity, most existing methods for ancestry deconvolution are unsuitable for application in large-scale settings, where the reference panels used contain thousands of individuals.
   We describe Ancestry Painting 2.0, a modular three-stage pipeline for efficiently and accurately identifying the ancestral origin of chromosomal segments in admixed individuals. In the first stage, an out-of-sample extension of the BEAGLE phasing algorithm is used to generate a preliminary phasing for an unphased, genotyped individual. In the second stage, a support vector machine (SVM) using a specialized string kernel assigns tentative ancestry labels to short local phased genomic regions. In the third stage, an autoregressive pair hidden Markov model simultaneously corrects phasing errors and produces reconciled local ancestry estimates and confidence scores based on the SVM labels.
   We compiled a reference panel of over 7,500 individuals of homogeneous ancestry, derived from a combination of several publicly available datasets and over 5,000 individuals reporting four grandparents with the same country-of-origin from the customer database of the personal genetics company, 23andMe, Inc, and excluding outliers identified through principal components analysis (PCA). In cross-validation experiments, Ancestry Painting 2.0 achieves high sensitivity and specificity (in most cases >90%) for labeling chromosomal segments across over 20 different populations worldwide. We also demonstrate the robustness of the algorithm via simulations of individuals of known local admixture, and compare Ancestry Painting 2.0 with existing state-of-the-art tools for multi-population local and global ancestry inference, including LAMP, ALLOY, PCA-ADMIX, and ADMIXTURE.