SMBE 2014: more (several dozen) abstracts touching on recent evolution in humans

Detecting patterns of global and local positive selection by examining novel variants in the exomes of 7 world-wide human populations
Laura Botigué 1, Jeff Kidd2, Brenna Henn1
1Stony Brook University, Stony Brook, New York, USA, 2University of Michigan, Ann Arbor, Michigan, USA

Recent efforts to identify adaptive loci in humans relied primarily on single nucleotide polymorphism array data. For many global populations however, these datasets suffered from ascertainment bias and did not allow for the identification of novel, adaptive variants unique to different populations. In this study we use high coverage exomes and low coverage full genomes from over 50 individuals from 7 human populations of geographically divergent groups from Namibia, Congo, Algeria, Pakistan, Cambodia, Siberia and Mexico to differentiate between local and global adaptation. We additionally apply the same approach to examining 1000 Genomes data. In order to minimize the effect of demography, we compare the site frequency spectrum of putatively functional variants with the neutral site frequency spectrum as estimated from synonymous sites or intergenic loci. We specifically hypothesize that derived variants with a large predicted functional impact found at high frequencies are not deleterious and potentially beneficial. We further hypothesize that derived variants common across populations are good candidates for adaptative traits common to the human species, whereas variants that are at high frequency but population specific are indicative of local adaptation. When we consider only variants with an extreme functional effect, as predicted by GERP scores, a total of 6% are shared across all populations, and 16% are private to a given population at frequencies higher than 10%. We obtain a subset of candidate genes under selection based on these hypotheses and assess common features among then using gene ontology. Overall, results may shed light to human adaptation at the species level, as well as the local level, and finally have a better understanding of how exposure to new environmental pressures affected early human expansion across the globe.

Inference of local ancestry in archaic-modern human admixture and its impact on modern human evolution
Sriram Sankararaman 1 ,2, Swapan Mallick1 ,2, Michael Danneman3, Kay Prufer3, Janet Kelso3, Svante Paabo3, Nick Patterson1 ,2, David Reich1 ,2
1Harvard Medical School, Boston, USA, 2Broad Institute, Cambridge, USA, 3Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
Analysis of archaic genomes has documented several examples of admixture between archaic and modern human groups e.g. these analyses have revealed that Neandertals interbred with the ancestors of all non-Africans and the Denisovans interbred with the ancestors of present-day Melanesians.  To understand how these admixture events shaped the evolution of modern humans, we need to build maps of archaic ancestry in modern humans.

As a first step, we have developed a statistical method for inferring segments of Neandertal local ancestry in modern humans and applied this method to construct a map of Neandertal ancestry in modern non-Africans, using data from Phase 1 of the 1000 genomes project combined with a high coverage (50×) Neandertal genome.  This map reveals the adaptive impact of Neandertal gene flow as we find enhanced Neandertal ancestry in genes involved in keratin filament formation as well as other biological pathways.  We also observe large regions with reduced Neandertal ancestry consistent with purifying selection against introgressing Neandertal alleles in part due to these alleles contributing to hybrid male sterility.
To extend this approach to other archaic-modern human introgression events, we generated deep genome sequences of 21 people from populations with substantial Denisovan ancestry: 16 Papua New Guineans, 2 Bougainville Islanders, and 3 aboriginal individuals from Australia. We also extend our method to infer Neandertal and Denisovan local ancestry in these populations. We test whether the same evidence for hybrid male sterility is observed in this introgression event as is observed between Neandertals and modern humans.


Fine Atlas of Natural Selection in Human Genome
Hang Zhou 1, Sile Hu1, Rostislav Matveev2, Kun Tang1
1CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Science, Shanghai 200031, China, 2Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, D-04103 Leipzig, Germany
In the last few years a number of genome-wide scans for signals of recent natural selection have been done for the human genome. These studies strongly furthered our understanding of recent human evolution. Nonetheless, some key issues were left largely un-solved. In this study, several coalescent based likelihood tests were developed to collectively assign all genome fragments to modes of neutrality, negative, balancing or positive selection, and simultaneously estimate the selection time and coefficient. Simulations revealed that this workflow was powerful towards various non-neutral evolutions, while remaining highly robust against demographic factors. Here we report a fine atlas of natural selection in the human genome through analyze the 1000 Genomes data. Several hundreds of regions undergone positive selection and a bunch of regions undergone negative and balancing selection were detected. We did functional annotation for genes undergone selection in various categories. Genes were enriched in certain functional groups. And we found that there is high heterogeneity of selection time of positive selection genes in different functional categories. We also evaluated the selection pressures in ENCODE predicted regulatory elements. The selection pressure in promoter regions was the highest, whereas introns and repressed or low-activity regions showed obviously lower influence of selection. Spatial distribution revealed that TSS and CDS clearly centered in the selection signals. Given the fine resolution of the selection signals, we are in the process of understanding the different selection pressures our ancestors have encountered during the course of recent migration, local adaptation and social transitions.

Inference of selection using extended haplotype homozygosity on polygenic traits
Angeles de Cara, Frederic Austerlitz
Museum National d'Histoire Naturell, Paris, France
The fast-growing amount of genome-wide polymorphism data available has led to considerable efforts for developing methods to detect the footprints of natural selection at the molecular level. Finding regions under selection is one of the first steps to understand the process of adaptation and speciation. Our ability to detect selection at the molecular level depends critically on the type of data available and on the robustness of the methods to the underlying assumptions. Several commonly used methods consist in looking for FST outlier loci, which are considered to be under selection. However, it has been shown that these methods fail to clearly identify loci under weak selection. Conversely, some neutral markers can be inferred to be under selection (false positives). We study here the efficiency of a recent method to infer selection, iHS, in simulated data where we perform artificial selection on a polygenic trait under several genetic architectures. This iHS method is based on the idea that positive selection on a given position in the genome will create a region of extended homozygosity around this position. Our results show that this method seems to only work when selection is strong and acts on a single locus, while it fails to identify loci under selection when selection acts simultaneously on many loci.

Soft shoulders ahead: on the problem of differentiating between hard and soft sweeps
Daniel Schrider 1, Mendes Fabio2, Matthew Hahn2, Andrew Kern1
1Department of Genetics, Rutgers University, Piscataway, NJ, USA, 2Department of Biology, Bloomington, IN, USA
Characterizing the nature of the adaptive process at the genetic level is a central goal for population genetics. In particular, we know little about the sources of adaptive substitution. Historically, population geneticists have focused attention on the hard sweep model of adaptation in which a de novo beneficial mutation arises and rapidly fixes in a population (e.g. Maynard Smith and Haigh 1974). Recently more attention has been given to soft sweep models, in which alleles that were previously neutral, or nearly so, drift until such a time as the environment shifts and their selection coefficient changes to become beneficial (e.g. Hermisson and Pennings 2005). It remains an active and difficult problem however to tease apart the tell-tale signatures of hard vs. soft sweeps in genomic polymorphism data. Through extensive simulations of hard and soft sweep models, we show that indeed the two might not be separable through the use of univariate summaries of the site frequency spectrum or a recent class of haplotype based statistics that has been introduced. In particular it seems that recombination in regions linked to, but distant from, sites of hard sweeps can create a patterns of polymorphism that closely mirror what is expected to be found near soft sweeps. We show that regions flanking hard sweeps also resemble partial sweeps, where an allele has begun sweeping to high frequency but not reached fixation. This problem of “soft shoulders” suggests that we currently have only a very limited ability to differentiate soft vs. hard vs. partial sweep scenarios in molecular population genomics data. We propose an approach that can distinguish these “shoulders” from true targets of selection.

Signatures of selection surrounding large insertions and deletions in coding regions identified in the hominid lineage genome-wide.
Wilfried Guiblet 1 ,2, Kai Zhao3, Daysha Ferrer-Torres1, Christina Ruiz-Rodriguez1 ,3, Alfred Roca3, Steven Massey4, Juan Martinez-Cruzado1, Taras Oleksyk1
1University of Puerto Rico at Mayaguez, Puerto Rico, Puerto Rico, 2IBIOS Graduate Program Option In BioInformatics and Genomics, The Huck Institute of Life Sicences, Pennsylvania State University, University Park, Pennsylvania, USA, 3Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA, 4Department of Biology, University of Puerto Rico at Rio Piedras, San Juan, Puerto Rico, Puerto Rico, 5Caribbean Genome Center, Biology Department, University of Puerto Rico at Mayaguez, Mayaguez, Puerto Rico, Puerto Rico
Genes have highly conserved sequences and usually show very few differences between closely related species, such as human and nonhuman primates. In this study, we focused on >10 bp insertions and deletions (indels) found when comparing modern human, chimpanzee, gorilla, orangutan, and rhesus macaque reference genome sequences, with the purpose of testing indel flanking regions for the signatures of selection. From 36,422 indels identified by comparing reference genomes pairwise, we chose 151 indels within coding regions because of the potentially high impact on protein sequence. Twenty-two of these fragments had been earlier validated in the laboratory by PCR and electrophoresis to distinguish real features from computational artifacts. Ka-Ks values for the genes containing each of these fragments were computed in pairwise comparison across the hominid lineage. We also searched for and identified indels within candidate chromosomal regions showing signals of positive selection, i.e., displaying unusually low multilocus heterozygosity and high divergence (FST) in pairwise comparisons between populations or continental groups from the Human Genome Diversity Panel (HGDP). The comparisons were performed on populations that geographically were placed along the modern human migration routes of the Out of Africa event. Our findings were evaluated against random expectations by a resampling method, where exactly the same procedures and tests were performed with a dataset of randomly positioned indels matched by size, distributed across the human reference genome. The genes examined in our study may have been shaped by selection in the human or other primate lineages, thus adding to our understanding of recent human evolution. Some of these may reflect adaptation to disease, and enable discoveries in future biomedical studies.

Genetic Origins of Lactase Persistence and the Spread of Pastoralism in Africa
Alessia Ranciaro 1, Michael C. Campbell1, Jibril B. Hirbo1, Wen-Ya Ko1, Alain Froment2, Paolo Anagnostou3, Maritha J. Kotze4, Muntaser IbraIbrahimhim5, Thomas Nyambo6, Sabah A. Omar7, Sarah A. Tishkoff1 ,8
1University of Pennsylvania, Philadelphia, PA, USA, 2UMR 208, IRD-MNHN, Musée de l’Homme, 75116 Paris, France, 3Dipartimento di Biologia Ambientale, Universita’ La Sapienza, Roma, Italy, 4Division of Anatomical Pathology, Department of Pathology, Faculty of Health Sciences, University of Stellenbosch, Tygerberg, 7505, South Africa, 5Department of Molecular Biology, Institute of Endemic Diseases, University of Khartoum, 15-13 Khartoum, Sudan, 6Department of Biochemistry, Muhimbili University of Health and Allied Sciences, Dar es Salaam, Tanzania, 7Kenya Medical Research Institute, Centre for Biotechnology Research and Development, 54840-00200 Nairobi, Kenya, 8Department of Biology, University of Pennsylvania, Philadelphia, PA, USA
In humans the ability to digest the sugar in milk, lactose, declines after weaning because of decreasing levels of the enzyme lactase-phlorine hydrolase (LPH) coded for by the LCT gene. However, some individuals maintain the ability to digest lactose into adulthood (known as lactase persistence (LP)). It is thought that selection has played a major role in maintaining this genetically-determined trait (LP) in different human populations who practice pastoralism. In order to identify novel variants associated with the LP trait and study its evolutionary history in Africa, we sequenced introns 9 and 13 of the MCM6 gene, and ~2 kb of the LCT promoter region in 819 individuals from 63 African populations and in 154 non-Africans from 9 populations. We also genotyped 4 microsatellites in an ~198 kb region in a subset of 252 individuals to reconstruct the origin and spread of LP-associated variants in Africa. Additionally, we performed genotype-phenotype association analyses in 513 individuals from 50 eastern African populations. We confirm the association between the LP trait and three common variants in intron 13 (C -14010, G -13907 and G -13915). Furthermore, we identified two additional SNPs in intron 13 and in the promoter region (G -12962 and T -956, respectively) associated with LP. Using a test of long range linkage disequilibrium (LD), we detected strong signatures of recent positive selection in East African populations and in the Fulani from Central Africa. In addition, haplotype analysis supports an East African origin of the C-14010 LP-associated mutation in southern Africa.


The Genetic Architecture Of Skin Pigmentation In Southern Africa
Alicia R Martin 1, Julie M Granka2, Christopher R Gignoux1, Marlo Möller3, Cedric J Werely3, Jeffrey M Kidd4, Marcus W Feldman2, Eileen G Hoal3, Carlos D Bustamante1, Brenna M Henn1 ,5
1Genetics Department, Stanford University, Stanford, CA, USA, 2Department of Biological Sciences, Stanford University, Stanford, CA, USA, 3Division of Molecular Biology and Human Genetics, Stellenbosch University, Tygerberg, South Africa, 4Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA, 5Department of Ecology and Evolution, SUNY Stony Brook, Stony Brook, NY, USA
Skin pigmentation is one of the most recognizably diverse phenotypes in humans across the globe, but its highly genetic basis has mainly been studied in northern European and Asian populations. The Eurasian pigmentation alleles are among the most differentiated variants in the genome, suggesting strong positive selection for light skin pigmentation. Light skin pigmentation is also observed in the far southern latitudes of Africa, among KhoeSan hunter-gatherers of the Kalahari Desert and other populations. The KhoeSan hunter-gatherers are among the oldest human populations, believed to have diverged from other populations 100,000 years ago, and maintain extraordinary levels of genetic diversity. It is unknown whether light skin pigmentation represents convergent evolution or the ancestral human phenotype. We have collected ethnographic information, pigmentation phenotypes, and genotype data from 136 individuals in the ≠Khomani San from the Kalahari. To understand the genetic basis for light skin pigmentation, we have also exome sequenced 84 ≠Khomani San individuals to high coverage, generating one of the largest indigenous African exome datasets sequenced outside of the 1000 Genomes Project. Because linkage disequilibrium decay is rapid in this population, we have assessed parameters influencing phasing and imputation accuracy empirically using sequencing data from two full genomes since ideal reference panels do not exist. We have also pursued multiple genotype/phenotype mapping methods, including a mixed model approach, admixture mapping, and linkage mapping. After controlling for admixture from European and Bantu-speaking populations, we find that globally common variants are not significantly associated with pigmentation. Rather, our results indicate that there are a multitude of rare variants in known pigmentation genes, and suggest that previously unidentified genes acting in canonical pigmentation pathways are involved. Our results highlight the strength of diverse population studies to explain phenotypic variation in the context of human evolutionary history.

The distribution of effects of mutations
Thomas Lenormand
CEFE - CNRS, Montpellier, France
Mutations and their diversity of effects is the fuel of Evolution. Yet, most population genetics models ignore this statement and its consequences. It is extremely frequent in these models to consider a single class of mutations (e.g. deleterious recessive mutations). Of course all types of mutations occur simultaneously and it is difficult to ignore this reality if we want to make quantitative predictions in evolution. The difficulty is to describe in a general and realistic way how the effect of mutations varies. In addition the effect of mutations comprises a large array of different problems, among which are some that raised the longest and fiercest controversies in evolutionary biology. The debate over dominance is perhaps emblematic in this respect. There are currently different approaches to predict the effect of mutations (physiological theory, canalization theory, extreme value theory, mutational landscape theory). In this talk, I will focus on mutation models based on a fitness landscape approach. I will present the rationale of this approach and the different predictions that can be made using this framework. I will then survey the current data to confront this theory and finish by presenting how this theory may be extended.


Cultural transmission of reproductive success: a strong evolutionary force that shapes genetic diversity.
Evelyne Heyer1, Jean-Tristan Brandenburg1 ,2, Michela Leonardi1, Patricia Balaresque3, Bruno Toupance1, Tatyana Hegay4, Almaz Aldashev5, Frederic Austerlitz 1
1CNRS/MNHN/P7 UMR7206, Paris, France, 2INRA/CNRS UMR 0320/UMR 8120, Moulon, France, 3CNRS/Univ Toulouse UMR5288, Toulouse, France, 4Academy of Science, Tachkent, Uzbekistan, 5Academy of Science, Bishkek, Kyrgyzstan
One of the specificities of our species, as acknowledged for a long time by anthropologists, is to live in an extremely wide range of social organizations defined mainly by alliance rules, matrimonial systems, residence rules and descent rules*. The hint that social organization should be taken into account when studying genetic diversity came mainly from comparisons between mitochondrial DNA (mtDNA) and Y-chromosome genetic diversity. Initially, it was proposed that sex-specific behaviours, and particularly differences in migration rates between men and women due to residence rules, may explain differences in Y-chromosome diversity versus mtDNA diversity. More recently it has been shown that the differences in diversity and differentiation levels between the different genetic systems (X, Y, mtDNA and autosomes) could not be explained only by differences between male and female migration rates, but also by differences between male and female effective population sizes.
We hypothesized that the mechanism by which such reduction in effective population size can be reached is Cultural transmission of reproductive success. Building on our previous theoretical work that showed that CTRS can reduce profoundly effective population size, and on a method that we have designed to detect such transmission from current DNA sequence polymorphism datasets, we tested formally the extent to which CTRS reduces genetic diversity in Central Asia, where we have previously demonstrated the occurrence of sex-specific reduction in effective population size: male effective size is much smaller than its female counterpart.
We used mtDNA and Y-chromosome genetic data to infer male and female transmission of reproductive success in 19 Turkic and Indo-Iranian populations from Central Asia known for their contrasted social organisations. Both societies are patrilocal and mildly polygynous, but Turkic populations have a patrilineal descent, while Indo-Iranian populations have a cognatic descent.
Our results show that patrilinearity impacts genetic diversity through cultural transmission of reproductive success. This clearly demonstrates the impact of social organization on human biological evolution. Moreover, notwithstanding the fact that our genetic approach clearly shows that there is a strong male bias transmission of reproductive success in patrilineal societies, it also formally demonstrates that cultural transmission of reproductive success could be a major evolutionary force. Indeed, it reduces within-population genetic diversity and increases among-population differentiation, the two key components for the evolution of cooperation.



Linking subsistence strategy, learning practices and demography
Laurel Fogarty, Nicole Creanza, Marcus W. Feldman
Stanford University, California, USA
Human populations vary demographically with population sizes ranging from small groups of hunter-gatherers with less than fifty individuals to vast cities containing many millions. Here we investigate how the cultural transmission of traits affecting survival, fertility, or both can influence the birth rate, age structure, and asymptotic growth rate of a population. We show that, in a simple model with just three age classes, the strong spread of such a trait can lead to a demographic transition, similar to that experienced in Europe in the late 19th and early 20th centuries, without using ecological or economic optimizing models. We also show that population subsistence and learning strategies can be linked using a more realistic model with five age classes, and can explain some demographic data on modern hunter-gatherer and small scale farming populations.

We investigate the roles of vertical, oblique, and horizontal learning of a fitness-altering cultural trait and find that, compared to vertical learning alone, horizontal and oblique learning can accelerate the trait’s spread, lead to faster population growth, and increase its equilibrium frequency.



Genome-wide analysis of Oceanian ancestry
Ana T. Duggan 1, David Reich2 ,3, Mark Stoneking1
1Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany, 2Harvard Medical School, Boston, USA, 3Broad Institute, Boston, USA
The history of Oceania, as inferred from archaeological, linguistic and genetic evidence, points to two major human expansions through the region. It seems that the first human settlers arrived in New Guinea and Australia, then joined as the continent of Sahul, more than 40 thousand years ago and spread to the Bismarck Archipelago and other nearby islands but did not spread widely through the Solomon Islands. Present day populations believed to be descendent of these initial settlers speak very diverse languages of apparently great time depth (referred to collectively as Papuan), practice patrilocality and tend to have darker skin pigmentation. The second wave of human expansion arrived with the Austronesians approximately 3.5 thousand years ago and touched almost all of Near Oceania before spreading further into the Pacific and settling Remote Oceania. The Austronesians brought with them a single proto-language which has diversified into a group of closely related languages, possessed a distinctive pottery style, were likely matrilocal and their descendants have a more Asian phenotype. MtDNA and Y-chromosome indicated that Papuan-speaking and Austronesian-speaking populations did admix extensively in Near Oceania and that the mixture appears to have been sex-biased. Maternal ancestry of putative Asian origin is high, even within Papuan speaking populations, and yet Remote Oceanian populations show high levels of Y-chromosomes of Near Oceanian origin. While some studies of genome-wide short tandem repeats or polymorphisms have been conducted they have been restrict to populations from New Guinea and Polynesia who likely represent population extremes. Here we analyse genome-wide SNP data, collected on the Affymetrix Human Origins array, from approximately 300 samples from 40 populations across Southeast Asia and Near and Remote Oceania. We are using these data to attempt to elucidate the genetic structure of Papuan-speaking and Austronesian-speaking groups, including the time and extent of admixture between them, to better understand the dynamics of population contact which lead to the distinctive pattern of uniparental inheritance but also maintained two very different language groups and cultures within Oceania.


Genes mirror subsistence in prehistoric Europe
Mattias Jakobsson
Uppsala University, Uppsala, Sweden
The Neolithic transition swept over Europe after the invention of farming some 11,000 years ago in the Near East to reach its northern fringe some 6,000 years ago. Genomic information from ancient human remains is beginning to show its full potential for learning about the human demographic history, including the debated agricultural transition. We generate and investigate low- to medium-coverage genomic data (up to 2.2x coverage) from several Stone-Age Scandinavian and Iberian individuals, including 10 Scandinavian 5,000 year old individuals from farming and hunter-gatherer groups, a 7,500 year old Mesolithic individual from the same region as the Scandinavian hunter-gatherers, and 5 Iberian 5,000 year old individuals from a farmer group. The Stone-Age Scandinavian individuals show remarkable population structure corresponding to their material culture association and the farmers are genetically most similar to extant southern Europeans, contrasting sharply to the hunter-gatherers whose genetic signature is unique, but closest to extant northern Europeans. The genomic make-up of present-day Scandinavians is intermediate between the two Neolithic groups suggesting that extensive admixture - perhaps around the time of the disappearance of the hunter-gatherer lifestyle - eventually shaped the patterns of variation. Similarly, Iberian farmers show affinities to modern-day southern Europeans, especially to Sardinians, in contrast to the published 7,000 year old Iberian hunter-gatherer from La Brana that is genetically more close to current-day northern Europeans. Notably, these similarities to Sardinians seem to be stronger than to the contemporary population of Spain, which suggests complex changes in genetic ancestry of Iberians during the last 7,000 years. The pattern of genetic variation in Stone-Age Europe contrasts to current-day patterns that mirror the individuals' geographic sampling locations. We further estimate genetic diversity within the groups and show that diversity was lower among the hunter-gatherers compared to the farmers suggesting smaller population size for the hunter-gatherers, perhaps related to a lower carrying capacity associated with hunting and gathering lifestyles. These findings show that lifestyle may be the major determinant of genetic similarity and diversity in pre-historic Europe rather than geography as in modern-day Europe, which illuminate the impact of the agricultural revolution.


Biocultural Analysis of Variation in Blood Pressure among African Americans in the Health Equity Alliance of Tallahassee (HEAT) Heart Health Study
Laurel N. Pearson, Sarah M. Szurek, Clarence C. Gravlee, Connie J. Mulligan
University of Florida, Gainesville, FL, USA
Disparities in health and risk of disease are of significant interest in the United States where African Americans experience some of the poorest health outcomes and greatest burden of chronic diseases. Large efforts have been undertaken to identify genetic factors contributing to differential risk and outcomes experienced across American populations; however, population structure created over centuries through immigration, migration, and admixture has added complexity to simple genetic analysis of health disparities. Additionally, the effect of cultural variability and the interaction of these with underlying genetic factors is poorly understood and only rarely sufficiently considered. Well-designed interdisciplinary research that incorporates genetic and socio-cultural factors and their interactions will be critical to understanding and addressing health disparities, especially for complex phenotypes such as hypertension.
The Health Equity Alliance of Tallahassee (HEAT) Heart Health Study is a community-based participatory research (CBPR) design that engages community members in the planning and collection of research data. A primary aim of HEAT is to investigate the socio-cultural factors that contribute to disparity in health status, especially in regard to cardiovascular phenotypes. Extensive cultural survey data targeted at understanding neighborhood environment, socioeconomic status, exposure to discrimination, and other stressors as well as phenotypic measures of body composition and blood pressure were completed on 165 African American research participants from economically diverse neighborhoods in Tallahassee, Florida. DNA derived from saliva samples was collected and genotyped on a custom Affymetrix Axiom array to assay a large panel of ancestry informative markers for assessment of genomic admixture and to perform genomic admixture mapping (3,600 AIMs). Additionally, this array includes SNPs in previously reported candidate genes for blood pressure, stress, and skin pigmentation (over 25,000 candidate SNPs). This work aims to address the complex interplay of genetic influences, including candidate genes and genetic ancestry, and socio-cultural factors, such as stress caused by perceived discrimination and community support, on blood pressure variation. We have previously shown that genetic contributions to variation in blood pressure phenotypes are modified by the inclusion of socio-cultural data. The more detailed study design made possible by this interdisciplinary CBPR study reveals the complex interplay of the genome and culture in contributing to health disparities in complex phenotypes.


Parallel trajectories of genetic and linguistic admixture in Cape Verdean Kriolu speakers.
Paul Verdu 1, Ethan Jewett2, Trevor Pemberton3, Noah Rosenberg2, Marlyse Baptista4
1CNRS/MNHN/Univ. Paris Diderot/Sorbonne Paris Cite, Paris, France, 2Stanford University, Department of Biology, Stanford, CA, USA, 3University of Manitoba,Department of Biochemistry and Medical Genetics, Winnipeg, MB, Canada, 4University of Michigan, Departments of Linguistics & Afroamerican and African Studies, Ann Arbor, MI, USA
Starting in the 15th Century, European colonization of Africa and the Atlantic Slave Trade brought together populations of European and African origin on the islands of Cape Verde, giving rise to an admixed population. The ways in which the different waves of migration and major sociohistorical events such as the abolition of slavery influenced the admixture process, and their impacts on the resulting genetic and cultural diversity in this population, remain largely unknown. To study the cultural and demographic history of the Cape Verdean population, we investigated patterns of genetic and linguistic diversity among 44 unrelated Cape Verdean individuals. Genetic data consisted of genotypes at ~2.5 million genome-wide SNPs and linguistic data of spontaneous speech in Cape Verdean Creole (Kriolu) provided by each subject. We found that individual speech patterns across Cape Verdean Kriolu speakers was significantly correlated with pairwise levels of allele-sharing dissimilarities, as well as with the birthplaces of individuals and their parents. Individual levels of African genetic admixture were significantly positively correlated with the number of words of putative African origin used by each individual. These results suggest that genetic and linguistic admixture followed parallel evolutionary trajectories in the Cape Verdean archipelago, and they provide a basis for combining genetic and linguistic information to reconstruct the complex admixture processes that have shaped the cultural and biological diversity of Cape Verde. To our knowledge, this work is the first joint analysis of genetic and cultural variation within a single population of individuals sharing a common, mutually intelligible language.

Copy number variation evolution and human disease traits.
James R Lupski 1 ,2
1Baylor College of Medicine, Houston, TX, USA, 2Texas Children's Hosptial, Houston, TX, USA
Whereas Watson-Crick DNA base pair changes have long been recognized as a mechanism for mutations, rearrangements of the human genome including deletions, duplications, inversions and complex combinations thereof have been appreciated only more recently as a significant source for human genetic variation.  Diseases that result from DNA rearrangements have been referred to as genomic disorders [Lupski, JR (2009) Genomic disorders ten years on. Genome Medicine 1:42.1-42.11].  Rearrangements associated with genomic disorders can be recurrent, with breakpoint clusters resulting in a common sized deletion/duplication, or nonrecurrent and of different sizes.  Nonallelic homologous recombination (NAHR) is a major mechanism for recurrent rearrangements, whereas nonhomologous end-joining (NHEJ) can be responsible for non-recurrent rearrangements.  Genome architectural features consisting of low-copy repeats (LCRs), also called segmental duplications, can stimulate and mediate NAHR.  There are positional hotspots for the crossovers within the LCRs.  Complex rearrangements can occur by FoSTeS - Fork Stalling and Template Switching.  A newer model, microhomology-mediated break-induced replication or MMBIR, provides further molecular mechanistic details and may be operative in all life forms as a means to process one-ended, double-stranded DNA generated by collapsed forks.  Rearrangements introduce variation into our genome for selection to act upon and as such serve an evolutionary function for our genome analogous to base pair changes for genes.  Genomic rearrangements may result in CNV that range in size from 100s to millions of base pairs and include single exons, whole genes, or genomic segments encompassing many genes or no genes at all!    They can be complex such as DUP-TRI/INV-DEL; the latter stimulated by inverted repeats.  CNV can cause Mendelian diseases and complex traits such as obesity and neurobehavioral phenotypes.  The mechanisms by which rearrangements convey phenotypes are diverse and include gene dosage, position effects, unmasking of coding region mutations (cSNPs) or other functional SNPs, creating gain-of-function fusion genes at the breakpoints, and perhaps through effects of transvection.  De novo genomic rearrangements cause both chromosomal and Mendelian disease, as well as sporadic traits, but our understanding of the extent to which genomic rearrangements, gene CNV, and/or gene dosage alterations are responsible for common and complex traits remains rudimentary.


The presence of convergent evolution suggests adaptive roles for genetic variants contributing to the human addictions
Christina Barr 1, Carlos Driscoll1, Stephen Lindell1, Kevin Blackistone1, Stephen Suomi2
1NIH/NIAAA, Section of Comparative Behavioral Genomics, Rockville, MD, USA, 2NIH/NICHD, Section of Comparative Ethology, Poolesville, MD, USA
The neurobiological systems that influence addiction vulnerability in humans may do so by acting on reward pathways, behavioral dyscontrol, and vulnerability to stress. In certain instances, genetic variants that are functionally similar or orthologous to those that moderate risk for human psychiatric and addictive disorders are maintained across species, and some of our studies have suggested there to be convergent evolution or allelic variants being under selection across primates. We have also shown that the rhesus macaque (Macaca mulatta) is useful for learning how relatively common genetic variants, which are associated with traits that may be adaptive in certain environmental contexts, can also increase vulnerability to behavioral pathology and alcohol preference. Genomics approaches can be used to home in on convergences in genetic variations that promote species-specific behaviors (fixed differences that have undergone purifying selection) and variable behavioral strategies that appear to be selected in multiple species (often as a result of balancing selection). We wanted to perform whole exome sequencing to identify coding polymorphisms that correlated with individual differences in behavior in the rhesus macaque. While there are no commercially available reagents for performing whole exome sequencing in nonhuman primates, a human whole exome platform is available. Given that there would likely be more purifying selection in coding regions and, therefore, less interspecific variation, we used a human exome capture and sequencing kit for performing exome sequencing for rhesus macaque subjects that differed in their levels of impulsivity and aggression. As genetic variation observed in domestic animals may be powerful for looking at genetic factors that enabled domestication as well as the reversal of some of those traits through more recent artificial selection, whole exome sequencing for individuals from several domestic animal species (canids, felids and equids) was performed in parallel. I will describe the types of variation we identified using these approaches and will illustrate how the genes in which we find functionally similar genetic variation overlap with those that predict vulnerability to human psychopathology and the addictions. The presented findings will be discussed in the context of how the high prevalence of addiction risk alleles in some populations of humans may be rooted in the fact that the same variants contribute to potentially adaptive traits in the absence of environmental stressors, recreational drugs and alcohol.


Patterns of ancient selection in modern humans around candidate sites
Fernando Racimo 1 ,2, Martin Kuhlwilm2, Montgomery Slatkin1
1University of California - Berkeley, Berkeley, CA, USA, 2Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany

Though the recent sequencing of the high-coverage Denisovan and Neanderthal genomes has allowed us to find the genetic differences that set modern humans apart from archaic humans, the subset of such changes that rose to fixation due to selection is currently unknown. In this study, we look for patterns of positive selection on the modern human lineage at various classes of putatively functional changes using diversity scaled by divergence, as has been done previously on the human lineage since the split from chimpanzees. We also develop an approximate Bayesian computation (ABC) approach incorporating various statistics aimed at identifying ancient patterns consistent with selection around a candidate site. We fail to find an enrichment for signals of positive selection around nonsynymous changes relative to synonymous changes. It has been argued that the failure to detect this difference in changes on the human lineage may be due to varying levels of background selection which occlude the signal of positive selection. Indeed, when we control for the intensity of background selection (BS), we observe a significant difference between nonsynonymous changes in regions of low BS and matching regions of the genome, lending support to this hypothesis. We also identify a slight enrichment for positive selection at splice site changes.


Biased gene conversion skews allele frequencies in human populations, increasing the disease burden of recessive alleles
Joseph Lachance, Sarah Tishkoff
University of Pennsylvania, Philadelphia, PA, USA
Gene conversion results in the non-reciprocal transfer of genetic information between two recombining sequences, and there is evidence that this process is biased towards G and C alleles.  Using high-coverage whole genome sequences of African hunter-gatherers, other human populations, and primate outgroups we quantified the effects of GC-biased gene conversion (gBGC) on population genomic datasets.  We find that genetic distances (Fst and population branch statistics) are modified by gBGC.  In addition, the site frequency spectrum is left-shifted when ancestral alleles are favored by gBGC and right-shifted when derived alleles are favored by gBGC.  Allele frequency shifts due to gBGC mimic the effects of natural selection.  Summary statistics of site frequency spectra (Tajima’s D, Fay and Wu’s H, and mean derived allele frequency) depend strongly on whether alleles are favored by gBGC.  These effects are strongest in high recombination regions of the human genome.  By comparing the site frequency spectra of unbiased and biased sites the strength of gene conversion was estimated to be on the order of Ne*b=0.009.  We also find that derived alleles favored by gBGC are much more likely to be homozygous than derived alleles at unbiased SNPs (+42.2% to 62.8%).  This results in a "curse of the converted", whereby recessive alleles have an increased disease burden.  Taken together, our findings reveal that GC-biased gene conversion has important population genetic and public health implications.

Coalescence Based Models to Detect Different Types of Selection
Hang Zhou1, Sile Hu 1, Rostislav Matveev2, Kun Tang1
1CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences,Chinese Academy of Science, Shanghai, China, 2Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
Detecting signals of natural selection is a central problem in Population Genetics. Up to date, many mathematical models have been proposed to describe dynamics of natural selection. Model based methods have also been proposed to detect signals of selection and estimate the corresponding parameters. Nevertheless, under scenarios of varying population size, it is not easy to identify selection event, because population size changes may result in patterns similar to what natural selection produced. In addition, there lack powerful methods for detecting negative or balancing selection. Recently, the pairwise sequential markovian coalescent (PSMC) method was proposed to estimate the comprehensive population size trajectory based on genome sequencing data, which estimates the pairwise time of most recent common ancestors (TMRCA) at the same time. We found that the TMRCA estimates could be used to reconstruct the local coalescent trees across the whole genome. Therefore methods can be directly constructed on the coalescent data to detect natural selection and to infer the corresponding parameters. The coalescent trees were first converted to coalescent time scale, by rescaling against a fine population size trajectory estimated by PSMC. The resulting standard coalescent distribution is therefore independent of the effective population size changes. Three coalescent models were constructed to describe various evolutionary scenarios, namely H0, H1 and H2. H0 is null hypothesis under assumption of neutral coalescent; H1 is one parameter hypothesis under assumption that the coalescent rate changed with a constant scale , therefore devoting to the cases of negative or balancing selection. H2 is a five parameters model, which assumes coalescent rate changes over three consecutive time intervals. This model tries to capture the different phases of a positive selection event. Based on H2, we developed an algorithm to estimate selection coefficient and selection starting time. Likelihood ratio-tests were constructed to assign the proper model to any coalescent tree. A large number of simulations showed that our approach has strong power and high accuracy of estimation for selection starting time and selection coefficient. We applied this approach to the whole genome data from the 1000 genome project, and built a fine atlas map of recent selection signals of whole genome.


Investigating fine-scale population structure between the Nama and Khomani San of South Africa
Caitlin Uren 1, Marlo Moller1, Dean Bobo2, Julie Granka3, Michelle Daya1, Cedric Werely1, Justin Myrick4, Alicia Martin5, Christopher Gignoux5, Brenna Henn2, Eileen Hoal1
1Stellenbosch University, Department of Molecular Biology and Human Genetics, Tygerberg, Cape Town, South Africa, 2Stony Brook University, Department of Ecology and Evolution, Stony Brook, New York, USA, 3Stanford University, Department of Biology, Stanford, California, USA, 4University of California,, Los Angeles, California, USA, 5Stanford University, Department of Genetics, Stanford, California, USA

The Cape Coloured population of Cape Town, South Africa (SAC) derives ancestry from multiple, global populations including Europeans and Indonesians. Initial studies also indicated a substantial contribution from the KhoeSan, a diverse group of hunter-gatherers and pastoralists that historically occupied much of southern Africa (Chimusa et al. 2013a, de Wit et al. 2010). The degree of KhoeSan ancestry reflects the role of indigenous KhoeSan in the early establishment of the SAC population (Mountain 2003). Furthermore, we have demonstrated significant evidence of an association between KhoeSan ancestry and Tuberculosis (TB) susceptibility that is not confounded by socio-economic status. It was additionally found that the KhoeSan ancestry component in the SAC seems to contribute to the extreme susceptibility to TB in this admixed population. The southern African KhoeSan fall into two genetic groups, roughly corresponding to the northwestern and southeastern Kalahari, which has been shown to have separated within the last 30,000 years (Pickrell et al. 2013). We collected DNA samples from the Nama along the western coast and the Khomani San from the Kalahari Desert (written informed consent and approval of the Human Research Ethics Committee of Stellenbosch University). SNP genotype data was generated on the Illumina OmniExpress platforms (700k- 1M array) for 120 Khomani San, 25 Cape Coloureds and 13 Nama. Whole genome sequencing data of an additional 106 Nama samples is currently underway by collaboration with the Welcome Trust Sanger Institute. This is to our knowledge the largest genome-wide dataset collected for the purpose of understanding South African genetic diversity. We use principal component analysis, chromopainter and ADMIXTURE to investigate fine sale population structure among these South African groups.


The burden of private mutations is greatly affected by recent explosive human population growth
Alon Keinan, Feng Gao, Elodie Gazave, Li Ma, Diana Chang, Andrew Clark
Cornell University, Ithaca, NY, USA
Human populations have experienced dramatic growth since the Neolithic revolution. In this study, we modeled how this growth increases the effective population size and what it entails for the load of individual private mutations. Recent studies that sequenced large numbers of individuals observed an extreme excess of rare variants and provided evidence of recent rapid growth in effective population size, although estimates have varied greatly among studies. These studies were based on protein-coding genes, in which variants have been impacted by natural selection. Hence, we sequenced loci far from genes that meet a stringent set of criteria designed to ensure that mutations therein are likely to be neutral. We used high coverage sequencing and 500 individuals of homogeneous European ancestry to capture very rare variants, and fit an array of recent demographic history models to the site frequency spectrum. The best-fitting model estimates 3-4% growth per generation during the last 3000-4000 years, resulting in an effective population size increase of two orders of magnitude. Our models fit the data well only after observing that estimates are impacted by assumptions of ancient demography, which also explains the discrepancy among previous studies. We next aimed to quantify the effect of growth and purifying selection on the burden of private mutations per individual sample, which also translates to the number of new variants discovered with the sequencing of each new genome. Hence, we introduced a statistic (%HP) that is defined as the proportion of heterozygous sequence variants in an individual that are novel with respect to a sample of sequenced individuals from the same population. We predicted this quantity for demographic models and estimated it for different datasets. We observed a significantly higher empirical %HP compared with models without recent population growth. Incorporating growth as estimated above provides a much improved fit, a phenomenon that is more marked as sample size increases, e.g. for a sample of 10,000 individuals, %HP is 0.25% with recent growth, which is 18-fold higher than that without growth. This implies that 1 in 400 heterozygous sites in any 10,001st individual is expected to be private, which amounts to ~6000 variants, or roughly 100 times the number of de novo mutations. Finally, we report an increase in %HP due to purifying selection, e.g. it is ~10-fold higher for nonsense mutations compared to other genic mutations, for which in turn it is higher compared to the above putatively neutral mutations.


Evolution History of ethnic groups from European Russia and Sub-Arctic Transuralic region.
Svetlana Limborska 1, Andrey Khrunin1, Denis Khokhrin1, Dmitry Verbenko1, Diana Gerasimova2, Roman Kuchin2, Vladislav Rabinovich3
1Institute of Molecular Genetics, Russian Academy of Sciences, Moscow 123182, Russia, 2Ugra State University, Khanty-Mansiisk 628012, Russia, 3Ugra Research Institute of cellular technology with stem cell bank, Khanty-Mansiisk 628011, Russia
Understanding the genetic structure of populations is important both from a historical perspective and for the appropriate design and interpretation of genetic epidemiological studies. Several studies have examined the fine-scale structure of human genetic variation in Europe. However populations of North-Eastern European area and Sub-Arctic Transuralic region are less investigated. These territories are inhabited by different indigenous Finno-Ugric people (e.g., Veps, Komi, Khanty and Mansi) and ethnic Russians.

To explore genetic structure of the region described we analyzed single nucleotide polymorphism in the populations mentioned above using different versions of Illumina BeadChips. Principal components analysis, ADMIXTURE clustering and Wright's fixation indices (FST) were used to probe genetic variation.

Mansi were indigenous inhabitants of Northern European area till 17th Century AD. This ethnic group has undergone trans-Uralic migration and nowadays inhabits Sub-Arctic Region of Western Siberia. The Khanty, closely related to Mansi by linguistic classification, are the indigenous inhabitants of this region. The Mansi and the Khanty peoples have genomic characteristics that the most distant from all others by presence of different ancestry component.

Komi live in the farthest corner of Northern-Eastern Europe. Based on genomic analysis Komi form separate pole of genetic diversity in northern Europe gene pool. Modern Finno-Ugric minority, the Veps, which is one of the oldest people of northern Europe, still inhabit some territories of northwest Russia, demonstrates genetic similarity both with Finns and Komi.
Russians are the most abundant people in Northern-Eastern Europe. Principal component analysis has shown significant differences between Russians of Northern European region and Russian populations from the central part of Russian Plain. The later Russian populations have formed a single cluster on PC plot. In contrast, Northern Russians have demonstrated close relationships with Veps' population.
In general, our data provide a more complete genetic map of Europe and adjacent Northern area accounting for the diversity in its most eastern and northeastern populations. Furthermore, these data contribute to a better understanding of the population genetic history of present day ethnic groups of the area studied.

A Novel Likelihood Ratio Test for Sex-Biased Demography and the Effect of Cryptic Sex-Bias on the Estimation of Demographic Parameters
Shaila Musharoff 1, Suyash Shringarpure1, Carlos D. Bustamante1, Sohini Ramachandran2
1Stanford University, Stanford, CA, USA, 2Brown University, Providence, RI, USA
Sex-bias is defined as an unequal number of breeding males and females in a population. This can be caused by variance in reproductive success, demographic events involving unequal numbers of males and females, and/or differential selection at sex-linked genomic loci. A commonly used estimator of the proportion of females is based on the test statistic Q where Q is the ratio of neutral genetic diversity estimated from the X chromosome to that estimated from the autosomes. This is problematic if the population changed in size: because X chromosomal diversity recovers from size changes at a different rate than autosomal diversity due to unequal effective population sizes, this estimator of the proportion of females will be biased. To this end we present a novel likelihood ratio test for sex-bias in a single population based on the Poisson random field model. We use the program dadi to estimate demographic parameters jointly from X chromosomal and autosomal data and test first for a persistent sex-bias, and then for a sex-biased demographic event. Our test has more power to detect sex-bias from unlinked or partially linked sites than the commonly used test statistic Q for a range of demographic scenarios.  Encouragingly, our test is well powered for events relevant to human history including recent rapid expansion whereas the test statistic Q is not.
In addition to being of fundamental interest, the presence of sex-bias affects demographic inference. Sex-bias, either in the male or female direction, decreases the effective population size of the X chromosome as well as the autosomes of a population. If this reduction in effective population size is unaccounted for, demographic parameters estimates (e.g., bottleneck times or divergence times) will be inflated. We assess the effect of cryptic sex-bias on the estimation of demographic parameters using simulated data. We propose a correction based on the joint inference of demographic parameters from the X chromosome and the autosomes. These analyses give us a more complete picture of the presence and effect of human sex-biased demography and can be easily applied to other organisms.



 
Statistical Inference of Archaic Introgression In Central African Pygmies
PingHsun Hsieh 1, Jeffrey Wall2, Joseph Lachance3, Sarah Tishkoff3, Ryan Gutenkunst1, Michael Hammer1
1University of Arizona, Tucson, AZ, USA, 2University of California, San Francisco, CA, USA, 3University of Pennsylvania, Philadelphia, PA, USA
Recent evidence from ancient DNA studies suggests that genetic material introgressed from archaic forms of Homo, such as Neanderthals and Denisovans, into the ancestors of contemporary non-African populations. These findings also imply that hybridization may have given rise to some of adaptive novelties in anatomically modern human (AMH) populations as they expanded from Africa into various ecological niches in Eurasia. Within Africa, fossil evidence suggests that AMH and a variety of archaic forms coexisted for much of the last 200,000 years. Here we present preliminary results leveraging high quality whole-genome data (>60X coverage) for three contemporary sub-Saharan African populations (Biaka, Baka, and Yoruba) from Central and West Africa to test for archaic admixture. With the current lack of African ancient DNA, especially in Central Africa due to its rainforest environment, our statistical inference approach provides an alternative means to understand the complex evolutionary dynamics among groups of the genus Homo.
To identify candidate introgressive loci, we scan the genomes of 16 individuals and calculate S*, a summary statistic that was specifically designed by one of us (JDW) to detect archaic admixture. The significance of each candidate is assessed through extensive whole-genome level simulations using demographic parameters estimated by ∂a∂i to obtain a parametric distribution of S* values under the null hypothesis of no archaic introgression. As a complementary approach, top candidates are also examined by an approximate-likelihood computation method. The admixture time for each individual introgressive variant is inferred by estimating the decay of the genetic length of the diverged haplotype as a function of its underlying recombination rate. A neutrality test that controls for demography is performed for each candidate to test the hypothesis that introgressive variants rose to high frequency due to positive directional selection. The present study represents one of the most comprehensive genomic surveys to date for evidence of archaic introgression to anatomically modern humans in Africa.


Tracing the genetic ancestry of enslaved Africans using ancient DNA
Hannes Schroeder1 ,2, María C. Ávila-Arcos 1 ,4, Pontus Skoglund3, Meredith Carpenter4, Anna Sapfo Malaspinas1, Marcela Sandoval-Velasco1, Jose Víctor Moreno-Mayar1, Morten Rasmussen1 ,4, Jay B. Haviser2, Ludovic Orlando1, Antonio Salas5, Carlos Bustamante4, Mattias Jakobsson3, M. Thomas P Gilbert1
1Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark, 2Faculty of Archaeology, Leiden University, Leiden, The Netherlands, 3Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden, 4Center for Computational, Evolutionary and Human Genomics, Stanford, California, USA, 5Instituto de Ciencias Forenses 'Luís Concheiro', Universidade de Santiago de Compostela, Santiago de Compostela, Spain
Between the 16th and 19th centuries, over 12 million Africans were kidnapped in Africa and transported to the Americas as a result of the transatlantic slave trade. The captives were taken from various parts of mainly West and West Central Africa but their precise origins often remained unknown or were deliberately obscured. In this study, we sequenced enriched DNA libraries from 17th century remains of three enslaved Africans, who had died on the Caribbean island of Saint Martin, in an attempt to trace their ancestral origins in Africa. Our results show that the three captives, who had been buried together, are genetically related to different populations in Africa, including Bantu and non-Bantu speakers. This suggests that they might have originated from different parts of Africa and reflects upon the nature of the transatlantic slave trade and its role in shaping the population history of the Americas.


Accurate estimates of heterozygosity in 135 diverse human populations
Niru Chennagiri 1 ,2, Swapan Mallick1 ,2, Nick Patterson2 ,1, Susanne Nordenfelt1, Arti Tandon1 ,2, Iosif Lazaridis1, Guillermo del Angel2, Gabriel Renaud3, Udo Stenzel3, Brenna Henn4, Antti Sajantila5, Aashish Jha6, Richard Villems15, Michael Hammer8, Andres Ruiz-Linares9, Robert Mahley10, Toomas Kivisild11, Sarah Tishkoff12, Lynn Jorde13, Rem Sukernik14, Mait Metspalu15, Svante Pääbo3, Janet Kelso3, David Reich1 ,2, Simons Genome Diversity Project Consortium16
1Harvard Medical School, Boston MA, USA, 2Broad Institute of Harvard and MIT, Boston MA, USA, 3Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany, 4Stony Brook University, Stony Brook NY, USA, 5University of Helsinki, Helsinki, Finland, 6University of Chicago, Chicago IL, USA, 7Departament de Ciències Experimentals i de la Salut, Barcelona, Spain, 8University of Arizona, Tuscon AZ, USA, 9University College, London, UK, 10University of California San Francisco, San Francisco CA, USA, 11University of Cambridge, Cambridge, UK, 12University of Pennsylvania, Philadelphia PA, USA, 13University of Utah, Salt Lake City Utah, USA, 14Russian Academy of Science Siberian Branch, Novosibirsk, Russia, 15Estonian Biocentre, Tartu, Estonia, 16Simons Foundation, New York, USA
Worldwide human variation studies have established that heterozygosity (genetic diversity) decreases as a function of geographic distance from East Africa (Ramachandran et al. PNAS 2005). However, previous estimates of heterozygosity have been based on microsatellites or linkage disequilibrium, resulting in numbers that are biased or limited in their resolution.
We have generated whole genome sequences (>30x average) in 280 individuals from 135 worldwide populations, using an identical protocol at a single facility (Illumina, Ltd.). In addition we have built an informatics pipeline geared towards population genetics that eliminates biases in standard pipelines that might confound population genetics analyses.
We compute a maximum likelihood estimate for the population mutation rate (heterozygosity) in each population using mlrho (Haubold et al. Mol. Ecol. 2010). This provides precise information about how heterozygosity varies across diverse worldwide human populations. These data can be used to test more powerfully the extent to which a serial founder model is sufficient to explain the empirically observed decline in heterozygosity with distance from Africa.


Genomes from late hunter-gatherers and an early farmer from Europe reveal three ancestral populations for modern Europeans
Iosif Lazaridis 1 ,2, Nick Patterson2, Alissa Mittnik3, Gabriel Renaud4, Swapan Mallick1 ,2, Peter H. Sudmant5, Joschua G. Schraiber6, Sergi Castellano4, Karola Kirsanow7, Christos Economou8, Ruth Bollongino7, Mait Metspalu9, Matthias Meyer4, Evan E. Eichler5, Joachim Burger7, Montgomery Slatkin6, Svante Pääbo4, Janet Kelso4, David Reich1 ,2, Johannes Krause3, for the Ancient European Genomes Consortium1 ,3
1Department of Genetics, Harvard Medical School, Boston, MA, USA, 2Broad Institute, Cambrige, MA, USA, 3Institute for Archaeological Sciences, University of Tübingen, Tübingen, Germany, 4Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany, 5Department of Genome Sciences, University of Washington, Seattle, WA, USA, 6Department of Integrative Biology, University of California, Berkeley, CA, USA, 7Johannes Gutenberg University Mainz, Institute of Anthropology, Mainz, Germany, 8Archaeological Research Laboratory, Stockholm University, Stockholm, Sweden, 9Estonian Biocentre, Evolutionary Biology group, Tartu, Estonia
We sequenced two ancient Europeans from around the time of the Neolithic transition: a ~7.5 thousand year old Linear Pottery farmer from Stuttgart, Germany, and an ~8 thousand year old Mesolithic hunter-gatherer from the Loschbour rock shelter, Luxembourg. We also sequenced at lower coverage seven ~8 thousand year old Mesolithic hunter-gatherers from Motala, Sweden. We co-analyzed the data from these ancient Europeans with a published ~7 thousand year old Mesolithic Iberian genome from La Brana-Arintero, Spain, a ~24 thousand year old Paleolithic Siberian from Mal'ta, Russia, other lower quality ancient European genomes, and a large dataset of present-day humans genotyped on the Affymetrix Human Origins Array.

Our main findings are: (i) early European farmers were of mainly Near Eastern ancestry but with substantial European hunter-gatherer ancestry; (ii) European hunter-gatherers fall outside extant European variation in the direction of Near Eastern-European differentiation, (iii) most modern Europeans do not appear to be a simple mixture of the early European farmers and hunter-gatherers, but rather to have ancestry from at least three ancestral populations: (i) EEF: early European farmers (like the Stuttgart individual), (ii) WHG: west European hunter-gatherers (like the Loschbour and La Brana individuals), and (iii) ANE: ancient North Eurasians (like the Mal'ta individual). Mediterranean populations like Sardinians most closely resemble EEF individuals, while Baltic populations like Lithuanians most closely resemble WHG individuals.

Unexpectedly, all present-day eastern non-African groups (Oceanians, East Asians, Onge from the Indian Ocean, and Native Americans) are genetically closer to Eurasian hunter-gatherer groups than to the Stuttgart individual. We propose a model of Eurasian prehistory in which EEF possessed a fraction of ancestry from a basal Eurasian population that split off from other Eurasians prior to the split between Eurasian hunter-gatherers and eastern non-Africans.

The Scandinavian Motala hunter-gatherers are the only ancient population showing evidence of ANE ancestry, yet such ancestry is pervasive in present-day populations from both Europe and the Near East. This suggests that ANE ancestry spread across much of West Eurasia after the early Neolithic. Additional migrations from the Near East and East Eurasia affected more limited subsets of Europeans from the Mediterranean and Northeastern Europe respectively.

Our results suggest a dynamic history of the emergence of modern Europeans in which the Neolithic-Mesolithic admixture played a major role, but was supplemented by later admixture processes.


Ancient DNA Insights into the Population History of Seafaring Mid-Holocene Hunter-Gatherers on the Gulf of Maine
Alexander Kim 1 ,2, Susanne Nordenfelt1 ,2, Nadin Rohland1 ,2, Nick Patterson1 ,2, Michèle Morgan3, Steven LeBlanc3, David Reich1 ,2
1Department of Genetics, Harvard Medical School, Boston, MA, USA, 2Broad Institute of Harvard and MIT, Cambridge, MA, USA, 3Peabody Museum of Archaeology and Ethnology, Harvard University, Cambridge, MA, USA
The Red Paint People, a remarkable manifestation of the Moorehead Phase (c. 4500-3800 YBP), are an enigmatic pre-Columbian culture  of northeastern North America famed for their distinctive technology and elaborate, strikingly characteristic ceremonial practices — including ochre-laden burials, ritual ground-slate bayonets, the hunting of swordfish and other marine megafauna, and what are potentially the oldest known tumuli and toggling harpoons ever discovered on the continent.  As one of the earliest maritime cultures on the eastern seaboard of North America, their extraordinary flowering, abrupt archaeological disappearance, and situation in a long-range transportation network of artifacts and raw materials stretching as far as the Great Lakes evokes numerous questions about seaborne dispersal capability, coast-interior connectivity, and the extents of genetic continuity or overturn into and through the Archaic of New England and Atlantic Canada.  We report, for the first time, mitochondrial and preliminary genome-wide autosomal data from ancient Moorehead Phase skeletal remains recovered from the Nevin site, a shell midden at Blue Hill Falls, Maine, and situate this locality and its inhabitants in the context of earliest North American settlement, patterns of gene flow at continental and subcontinental scales, and the panorama of social and ecological specialization by forager populations along Holocene North America's Atlantic littoral.


Population history of South America: ancient DNA study of extinct people from Tierra del Fuego
Zuzana Faltyskova 1, Hannes Schroeder2 ,3, Carles Lalueza4, Yolanda Espinoza4, Elena Gigli4, Oscar Ramirez4, Alfredo Prieto5 ,6, Susana Morano5, David Caramelli7, Elena Pilli7, Alessandra Modi7, Giorgio Manzi7, Alessandro Pietrelli8, Ermanno Rizzi8, Aurelio Marangoni9, Guido Barbujani10, Silvia Ghirotto10, Toomas Kivisild1, Maru Mormina1 ,11
1Division of Biological Anthropology, University of Cambridge, Cambridge, UK, 2Centre for GeoGenetics, University of Copenhagen, Copenhagen, Denmark, 3Faculty of Archaeology, Leiden University, Leiden, The Netherlands, 4Institute of Evolutionary Biology, Pompeu Fabra University, Barcelona, Spain, 5Institute of Patagonia, University of Magallanes, Punta Arenas, Chile, 6Autonomous University of Barcelona, Barcelona, Spain, 7Department of Biology, University of Florence, Florence, Italy, 8ITB CNR Institute for Biomedical Technologies, National Research Council, Milan, Italy, 9Department of Environmental Biology, University of Rome La Sapienza, Rome, Italy, 10Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy, 11Department of Archaeology, University of Winchester, Winchester, UK
The details of the early human settlement of the Americas such as the dispersal time, number of migrations, and migration routes remain subject to debate. With many Native populations now extinct, the Pre-Columbian genetic make-up has been partly lost or blurred by recent admixture. The present study examines the mitochondrial genetic diversity of extinct Fuegian populations in order to illuminate the population history of South America.
The Fuegians lived on the islands of Tierra del Fuego in the Southern Cone of South America in isolation from other Native Americans until their extinction at the beginning of the 20th century, likely maintaining their original genetic signature without recent admixture. Based on the Fuegian robust cranial morphology, a few controversial studies have suggested that Fuegians might be descendants of a putative earlier migration wave preceding the arrival of the other Native Americans.
Using target enrichment and next-generation sequencing, we obtained complete mitochondrial genomes from skeletal remains of 37 Fuegians and 19 individuals from adjacent Patagonia. Comparing them to published sequences of other Native Americans, we estimated the divergence times and past population dynamics in the Southern Cone and we assessed the question of population continuity in the region. The coalescent ages of deep Fuegian-specific clades suggest early human settlement in Tierra del Fuego, probably associated with the initial peopling of the continent. The early arrival of Fuegians to the Southern Cone is consistent with the generally accepted scenario of rapid coastal dispersal throughout the Americas, which is further supported by the presence of Monte Verde, the oldest known South American pre-Clovis archaeological site, in Chilean Patagonia. In this presentation, alternative views on Fuegian origins and their genetic affinities with other Native Americans are considered in the context of the evolutionary history of South American populations.


Hundreds of shared ‘deletions’ in ancient hominins are polymorphic in modern human populations
David Radke 1 ,2, Shamil Sunyaev1 ,2
1Harvard Medical School, Boston, MA, USA, 2Brigham and Women's Hospital, Boston, MA, USA
Deciphering the genetic uniqueness of modern humans in relation to distant hominins and other primates is one of the central goals of human evolutionary genomics. Recently, with the availability of high-coverage sequence data for both Neanderthal and Denisova, it is now possible to more precisely determine the particular loci responsible for modern human uniqueness. While much of the distinguishing variation may be due to single nucleotide variants, genomic structural variants may also play a crucial role. Structural variants can be a potent phenotype-shaping force, particularly for unbalanced events, such as deletions, as they can alter reading frames and remove regulatory component space. We find hundreds of ‘deleted’ regions in Neanderthal and Denisova (including shared deletions), compared to the modern human reference. Because these deletions are polymorphic in modern human populations, they may represent regions of modern human-specific insertion, regions lost in archaic human lineages, or regions influenced by forces such as drift or selection.


What changes matter? A genomic approach to human evolution
Nicolas Rohner 1, Michael Zody2, David Reich1, Steven McCarroll1, Daniel Lieberman3, Clifford Tabin1
1Harvard Medical School, Boston, USA, 2Broad Institute of MIT and Harvard, Cambridge, USA, 3Harvard University, Cambridge, USA
We humans and our closest relatives the chimpanzees differ only in 1-2 % of our genomes. Despite this genetic similarity we differ in many anatomical and behavioral traits. Upright walking and larger brains are just two prominent examples amongst many others that allowed us to adapt to new environments. Although full genome sequences are now available for humans, chimpanzees and other primates, surprisingly little is known about the genetic basis underlying these traits. One reason being that even within a 1-2% difference lie many genetic changes potentially driving human evolution. Because open-reading-frames of genes tend to be very similar between great apes, it has been argued that the majority of significant evolutionary changes affect cis-regulatory mutations. To identify regulatory changes specific to the human lineage we undertook a whole genome approach by aligning human, chimpanzee, macaque, and mouse genomes and focusing on conserved non-coding regions. We identified 298 human-specific deletions potentially removing cis-regulatory elements. We used a mouse transgenic approach to test if the deletions affect enhancer activity. Indeed out of 12 tested elements, 4 showed tissue-specific expression at diverse developmental stages. We focused on two human-specific deletions for further study. The first removes an enhancer element near the gene OSR2, and its expression argues for a role in human palate, cranial base and jaw development. The second deletion removes a regulatory element in the gene ACVR2A. Its expression pattern and the phenotype of a full knockout of ACVR2A in mouse point to its role in the human specific shortening of digit 2-5 and the smaller size of upper incisors in humans. We are currently mimicking the human situation by removing the corresponding piece in each of two different mouse models to test the ability to generate human-like phenotypes.


Reproducibility of ancient DNA methodologies within a single laboratory 
Eadaoin Harney 1 ,2, Susanne Nordenfelt2, Nadin Rohland2, David Reich1 ,2
1Howard Hughes Medical Institute, Boston, MA, USA, 2Harvard Medical School, Boston, MA, USA, 3Broad Institute of Harvard and MIT, Cambridge, MA, USA
Recent advances in DNA extraction and targeted (enrichment) capture methods make it possible to study whole mitochondrial genomes or subsets of the nuclear genomes of ancient samples with degraded and/or low levels of endogenous DNA. However relatively little is published about the reproducibility of the data collected within, or even between, labs using the same methods. We report on the degree of reproducibility observed in replicate samples processed during ongoing screening of ancient skeletal remains in our own lab. We are focusing on ancient human bones and teeth—the most abundant type of ancient remains—dating from 3000-9000 years ago, which have each undergone multiple bone powder preparations, DNA extractions, and/or library preparations. As part of our screening, we enrich all libraries for the complete mitochondrial genome, and sequence the enriched and un-enriched libraries on a MiSeq Sequencer. We compare relevant metrics such as percent endogenous reads, contamination rate, and mitochondrial coverage at a fixed number of reads to assess the degree of reproducibility for these samples. An important finding of our work to date is that we obtain relatively little variability in terms of library preparation success (for example as measured by a variation in percentage of endogenous DNA of less than 5%) when applying identical protocols to the same bone powder. The findings of this ongoing study will shed light on the degree of reproducibility inherent in our laboratory’s ancient DNA processing, and may help to assess the degree of optimization of these screening methodologies.


Ancient DNA reveals the complex genetic history of the New World Arctic
Maanasa Raghavan 1, Pontus Skoglund2, Michael DeGiorgio6, Anders Albrechtsen4, Ida Moltke5, Helena Malmström2, M. Thomas P. Gilbert1, Mattias Jakobsson2, Rasmus Nielsen3, Eske Willerslev1
1Centre for GeoGenetics, Natural History Museum of Denmark, Copenhagen, Denmark, 2Uppsala University, Uppsala, Sweden, 3University of California - Berkeley, Berkeley, California, USA, 4University of Copenhagen, Copenhagen, Denmark, 5University of Chicago, Chicago, Illinois, USA, 6Pennsylvania State University, University Park, Pennsylvania, USA

New World Arctic (North America and Greenland) was first occupied by modern humans around 5,000 years ago. The PaleoEskimos constituted the first two cultures to have peopled the region: the Pre-Dorset or Saqqaq culture (ca. 3000-800 BC) and the Dorset culture (ca. 800 BC-1300 AD). The NeoEskimos (Thule culture), who are considered to be ancestral to modern-day Inuit, were the latest migrants into the New World Arctic and spread eastwards from northern Alaska in around 1000 AD. However, despite decades of archaeological research having established when the cultural transitions occurred, there is no consensus on how these people were related to one another and whether one or several gene pools were represented in these different Arctic traditions. We present results from an ongoing study comprising the largest genomic dataset generated thus far on ancient human samples from sites in Siberia, Alaska, Canada and Greenland. Our research contributes new perspectives to the debate of cultural versus genetic replacement in the New World Arctic and also evaluates the extent to which the PaleoEskimos and the NeoEskimos have shaped the genetic structure of modern populations in the region.



Bayesian methods for estimating homozygous tracking length distribution (HTLD) from single individuals
xiaoqian jiang, michael lynch
Indiana University, Bloomington, IN, USA
HTLD refers to the frequency of spans of length separating heterozygous sites, which harbors information on past population history. At high coverage and low error rate, HTLD could be obtained by simply assessing consensus genotypes at each site. However, with most genome data, uncertainties will exist as to whether sites are homozygous or heterozygous. In this project, a Bayesian method has been developed for estimating HTLD in an unbiased fashion. The genome-wide estimates of the individual heterozygosity that obtained from likelihood method is as the prior information of Bayesian method, and then EM algorithm is used to fill the missing genome data. Compared to previous arbitrary way of assigning zygosity to sites with missing data, this method could provide more accurate information on HTLD. This more accurate HTLD allows further investigation into the demographic history. In this project, further mathematical methods will be developed to re-infer the pattern of population history from both simulated data and individual genome sequence. Furthermore, we will compare the relative power of the estimation of HTLD and correlation of heterozygosity in inferring information about population linkage-disequilibrium pattern.


Predicting the discovery rate of genomic features
Simon Gravel
McGill University, Montreal, Qc, Canada
Successful sequencing experiments require judicious sample selection. However, this selection must often be performed on the basis of limited preliminary data. Predicting the statistical properties of the final sample based on preliminary data can be challenging, because numerous uncertain model assumptions may be involved.  Here, we ask whether we can predict "omics" variation across many samples by sequencing only a fraction of them. In the infinite-genome limit, we find that a pilot study sequencing 5% of a population is sufficient to predict the number of genetic variants in the entire population within 6% of the correct value, using an estimator agnostic to demography, selection, or population structure. To reach similar accuracy in a finite genome with millions of polymorphisms, the pilot study would require about 15% of the population.  We present computationally efficient jackknife and linear programming methods that exhibit substantially less bias than the state of the art when applied to simulated data and sub-sampled 1000 Genomes Project data. Extrapolating based on the NHLBI Exome Sequencing Project data, we predict that 7.2% of sites in the capture region would be variable in a sample of 50,000 African-Americans, and 8.8% in a European sample of equal size. Finally, we show how the linear programming method can also predict discovery rates of various genomic features, such as the number of transcription factor binding sites across different cell types.


Genetic data supporting an East-South African migration associated with pastoralism
Gwenna Breton 1 ,2, Mattias Jakobsson1 ,4, Carina Schlebusch1, Himla Soodyall3
1Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden, 2Master Biosciences, École Normale Supérieure de Lyon, Lyon, France, 3Division of Human Genetics, School of Pathology, Faculty of Health Sciences, University of 11 the Witwatersrand, and National Health Laboratory Service, Johannesburg, South Africa, 4Science for Life Laboratory, Uppsala University, Uppsala, Sweden
The ability to digest milk into adulthood, lactase persistence, is heterogeneously distributed. As an example, pastoralist populations often display higher frequencies of lactase persistence. Lactase persistence is considered adaptive in populations with pastoralist practices. The characterization of lactase persistence in southern Africa is poor. By sequencing a 360 bp region in southern Africans we characterized the lactase-persistence genotype of these groups; in order to confirm the results obtained on alleles' origin, we performed a genome-wide analysis of relationships to other groups.
We sequenced the LCT regulatory region in 267 individuals from 13 populations: 7 Khoe and San groups, the ancestral inhabitants of southern Africa, 3 Bantu-speaking groups and 3 groups with mixed ancestry. Those groups have diverse subsistence patterns. We then searched for signals of past East-South African admixture events using DNA chip data including many Eastern and Southern African populations as well as HapMap reference populations.
We found two previously described lactase persistence alleles in our sample: the European 13910C>T allele in individuals with recent European admixture and the East African 14010G>C allele in the Nama (at a frequency of 35.7% if recently admixed individuals are removed) and in other groups with lower frequency. The Nama are a Khoe group and are pastoralist. To learn about the origin of this variant in southern Africans, we analysed a 54.6 kb window of DNA chip data including the two SNPs. It showed that the 14010C allele in the Nama is on the same haplotype as in the East African Maasai; hence, we concluded that the allele appeared only once, likely in the Eastern Africans (greater frequencies) and then it was brought to southern Africa. Thanks to an ADMIXTURE analysis we identified an East African component in several Khoe-San groups; again, the highest percentage of East African ancestry (~13%) is found in the Nama. This admixture event likely took place after the 14010C allele appeared in East Africans, ie ~3,000–7,000 years BP.
In a nutshell, we investigated a South-East African migration event combining information on a single trait and genome-wide data. This event explains the presence of an East African allele in Khoe-San groups. The groups with the largest East African component are the pastoralist groups, in which being able to digest milk is advantageous. Our findings provide new elements about ancestral migrations and spreading of pastoralism in Africa and complement conclusions of other fields, like archaeology.


Reducing pervasive false positive identical-by-descent segments detected by large-scale pedigree analysis
Eric Durand, Nicholas Eriksson, Cory McLean
23andMe, Inc., Mountain View, CA, USA
Analysis of genomic segments shared identical-by-descent (IBD) between individuals is fundamental to many genetic applications, from demographic inference to estimating the heritability of diseases. A large number of methods to detect IBD segments have been developed recently. However, IBD detection accuracy in non-simulated data is largely unknown. In principle, it can be evaluated using known pedigrees, as IBD segments are by definition inherited without recombination down a family tree. We extracted 25,432 genotyped European individuals containing 2,952 father-mother-child trios from the 23andMe, Inc. dataset. We then used GERMLINE, a widely used IBD detection method, to detect IBD segments within this cohort. Exploiting known familial relationships, we identified a false positive rate over 67% for 2–4 centiMorgan (cM) segments, in sharp contrast with accuracies reported in simulated data at these sizes. We show that nearly all false positives arise due to allowing switch errors between haplotypes when detecting IBD, a necessity for retrieving long (> 6 cM) segments in the presence of imperfect phasing. We introduce HaploScore, a novel, computationally efficient metric that enables detection and filtering of false positive IBD segments on population-scale datasets. HaploScore scores IBD segments proportional to the number of switch errors they contain. Thus, it enables filtering of spurious segments reported due to GERMLINE being overly permissive to imperfect phasing. We replicate the false IBD findings and demonstrate the generalizability of HaploScore to alternative genotyping arrays using an independent cohort of 555 European individuals from the 1000 Genomes project. HaploScore can be readily adapted to improve the accuracy of segments reported by any IBD detection method, provided that estimates of the genotyping error rate and switch error rate are available.


What can we learn from the methylation maps of ancient humans?
David Gokhman 1, Eitan Lavi1, Kay Prufer2, Mario Fraga3, Jose Riancho4, Janet Kelso2, Svante Paabo2, Eran Meshorer1, Liran Carmel1
1The Hebrew University of Jerusalem, Jerusalem, Israel, 2Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany, 3University of Oviedo and CNB-CSIC, Oviedo, Spain, 4University of Cantabria, Santander, Spain
We have previously presented the full reconstructed DNA methylation maps of the Neandertal and the Denisovan. Here, we use these maps to reveal trends in recent hominin epigenetic evolution. We show that the methylation pattern of transcription start sites (TSS) are the most conserved regions, and that the distance from TSS highly correlates with variation in methylation.
Additionally, we found several genes that are imprinted in present-day humans but are methylated in archaic humans. This includes H19, a gene that encodes a long non-coding RNA that is maternally imprinted in present-day humans. When imprinting is damaged, methylation of this gene causes the Beckwith-Wiedrmann syndrome, whose symptoms include growth dysregulation, increased susceptibility to cancer and facial features such as a prominent lower jaw and midfacial hypoplasia. Unlike present-day humans and the Denisovan, in the Neandertal the promoter of H19, as well as the imprinting-control region (ICR), are both completely methylated. Methylation of the H19 promoter was previously shown to anti-correlate with its expression levels, suggesting that H19 might have had reduced activity in the Neandertal. Interestingly, H19 was also found to be differentially methylated in Orangutans. This gene is one of several examples where altered methylation in present-day humans results in abnormal symptoms, whereas in the Neandertal, to our knowledge, the symptoms do not come to realization.
Another differentially methylated gene between archaic and modern humans is AUH. Defects in AUH are behind the methylglutaconic aciduria type I syndrome, whose symptoms include speech delay, poor articulation, and forgetfulness. This gene is unmethylated in present-day humans, but is methylated in archaic humans, suggesting differential regulation in both archaic humans. As this gene shows constant methylation levels across 25 human tissues, it is possible that these differences in methylation extend to the brain tissue as well.
Such trends in methylation shed light on the evolutionary constraints that are behind epigenetic regulation in the human lineage and on the mechanisms that lead to disease symptoms in one human group and to a completely healthy individual in another.


A model-based approach for identifying signatures of ancient balancing selection in genetic data
Michael DeGiorgio 1, Kirk Lohmueller2, Rasmus Nielsen3
1Pennsylvania State University, University Park, PA, USA, 2University of California, Los Angeles, Los Angeles, CA, USA, 3University of California, Berkeley, Berkeley, CA, USA
While much effort has focused on detecting positive and negative directional selection in the human genome, relatively little work has been devoted to balancing selection. This lack of attention is likely due to the paucity of sophisticated methods for identifying sites under balancing selection. We designed the first set of likelihood-based methods that explicitly model the genealogical process under balancing selection using a coalescent framework. Simulation results show that our methods for detecting balancing selection vastly outperform previous approaches based on summary statistics are robust to demography. We apply the new methods to whole-genome sequencing data from humans, and find a number of previously-identified loci with strong evidence of balancing selection, including various HLA genes. Additionally, we find evidence for many novel candidates, the strongest of which is FANK1, an imprinted gene that suppresses apoptosis, is expressed during meiosis in males, and displays marginal signs of segregation distortion. We hypothesize that balancing selection acts on this locus to stabilize the segregation distortion and negative fitness effects of the distorter allele. Not only are our methods for identifying signatures of balancing selection the most powerful developed to date, but they can also be applied to any organism with polymorphism data and an outgroup sequence. As such, we expect that our methods will be widely used by the genomics community to uncover the potentially numerous genomic regions that are under balancing selection in many non-human species.


Ancient DNA and the population history of pre-Columbian Puerto Rico
Maria A Nieves-Colon 1, William J Pestle2, Anne C. Stone1
1Arizona State University, Tempe, AZ, USA, 2University of Miami, Coral Gables, FL, USA
The population history of the Caribbean has been recently studied through the use of large scale genome-wide studies on modern populations. However, there are inherent limitations to the use of modern data for making inferences about past population processes. Ancient DNA may be a useful tool for elucidating the contributions of indigenous pre-Columbian populations to the genomes of contemporary, highly admixed Caribbean islanders, as well as for studying the history of ancient Amerindian peoples in the Caribbean basin.
We present the results of pilot research focused on retrieving ancient DNA from human skeletal remains from Puerto Rico. We performed DNA extraction on 43 individuals from three pre-Columbian Puerto Rican sites dated between 590 to 1280 cal AD. We tested our extracts for the presence of ancient DNA through PCR amplification of an 80 bp fragment of mitochondrial DNA (mtDNA). This preliminary assessment indicates that 42% (n=18) of our samples have amplifiable mtDNA.
However, extensive DNA fragmentation and degradation may affect amplification efficiency in these samples. With the aim of overcoming these issues, we converted 18 of our extracts into sequencing libraries, and enriched them by targeting complete mitochondrial genomes. Preliminary quality assessments with fragment analysis and quantitative PCR methods suggest that we have successfully captured ancient mtDNA in no less than nine of our sequencing libraries.
The recovery of complete mtDNA genomes from these individuals will allow us to begin to characterize the genetic diversity and population history of a pre-Columbian Antillean population. These data may also be used to help estimate the contribution that these ancient groups played in shaping the genetic ancestry of modern Puerto Ricans.


A genomic study of the contribution of DNA methylation to regulatory evolution in primates
Julien Roux1, Irene Hernando-Herraez 2, Nicholas Banovich1, Claudia Chavarria1, Amy Mitrano1, Jonathan Pritchard3 ,4, Tomas Marques-Bonet2, Yoav Gilad1
1University of Chicago, Chicago, USA, 2Institute of Evolutionary Biology (UPF-CSIC), PRBB, Barcelona, Spain, 3Howard Hughes Medical Institute, Stanford University, Stanford, USA, 4Departments of Biology and Genetics, Stanford University, Stanford, USA
A long-standing hypothesis is that changes in gene regulation play an important role in adaptive evolution, notably in primates. Yet, in spite of the evidence accumulated in the past decade that regulatory changes contribute to many species-specific adaptations, we still know remarkably little about the mechanisms of regulatory evolution. In this study we focused on DNA methylation, an epigenetic mechanism whose contribution to the evolution of gene expression remains unclear. To interrogate the methylation status of the vast majority of cytosines in the genome, we performed whole-genome bisulfite conversion followed by high-throughput sequencing across 4 tissues (heart, kidney, liver and lung) in 3 primate species (human, chimpanzee and macaque). Because the 4 tissues are from the same individuals, we are able to monitor methylation differences between individuals, tissues and species. In parallel, we collected gene expression profiles using RNA-seq from the same tissue samples, allowing us to perform a high resolution scan for genes and pathways whose regulation evolved under natural selection. We integrated these datasets to characterize better the genome features whose methylation status leads to expression changes, and we developed a statistical model to quantify the proportion of variation in gene expression levels across tissues and species which can be explained by changes in methylation. Globally, our study leads to a better understanding of the molecular basis for regulatory changes and adaptations in primates.


Inferring African population structure and the dynamics of the Out-of-Africa event
Shyam Gopalakrishnan 1, Paul Grabowski1, Michael Turchin1, Brenna Henn6, Jeff Kidd4, George Perry2, Cynthia Beall3, A Gebremedhin5, Carlos Bustamante7, Anna Di Rienzo1, Yoav Gilad1, Abraham Palmer1, Jonathan Pritchard7
1University of Chicago, Chicago, IL, USA, 2Penn State University, University Park, PA, USA, 3Case Western Reserve University, Cleveland, OH, USA, 4University of Michigan, Ann Arbor, MI, USA, 5Addis Ababa University, Addis Ababa, Ethiopia, 6SUNY Stony Brook, Stony Brook, NY, USA, 7Stanford University, Stanford, CA, USA
Human population history is an intriguing and complex story consisting of many events like population growth, bottleneck, time-dependent and non-homogeneous migration, population splits and admixture. Estimating complex demographies with a large number of dependent parameters such as split times, gene flow rates and changing population sizes, has proven especially challenging. Here we propose a framework for estimating the demography of a large number of populations jointly, especially the gene-flow rates and split times between them. We use coalescent rate estimates obtained from Pairwise Sequentially Markovian Coalescent (PSMC) as a starting point for our analysis. We obtain the pairwise coalescent rates for each pair of sampled population using a pairwise application of PSMC to each pair of samples. Using a mathematical model for calculating coalescent probabilites given the demography, we estimate the demography using the parameters that best fit the observed coalsecent rates obtained from PSMC.
In this study, we focus on African demography, specifically the population structure in Africa going back in time and the dynamics of the Out-of-Africa event. To address these questions, we assembled a dataset with whole genome sequences from 162 individuals using some in-house sequencing and publicly available sources such as the 1000 Genomes project. These samples span twenty two populations worldwide. These include eleven African populations which we use to examine the population substructure in Africa. In addition, we also have 2 Middle Eastern, 5 European and 4 East/Central Asian populations which allows us to estimate the timing of the Out-of-Africa event and the European-Asian split.
We find extensive population structure in Africa extending back to before the Out-of-Africa event. The Ethiopian populations show gene flow back from 15kya, with the Maasai and Luhye merging with the east African populations ~40kya. We find evidence for extensive mixing between east and west African populations beginning 50kya. Among the pygmy populations, we see recent gene flow between the Batwa and Mbuti. All the African populations except for the San merge into a single population around 100 kya. The San exchange migrants with the other African populations starting ~120 kya. We estimate the Out-of-Africa event to have occurred ~75kya and the European-Asian split to ~25kya. Our findings also suggest a period of sustained gene flow between East Africa and Middle Eastern populations after the Out-of-African event.


Fast, scalable and distributed dimensionality reduction of genome-wide data 
Suyash Shringarpure, Carlos Bustamante
Stanford University, Stanford, CA, USA
The increasing size of genomic datasets, especially for genome-wide association studies (GWAS), presents significant analytical and computational challenges.  Dimensionality reduction methods such as principal components analysis (PCA) and model-based ancestry inference are used to obtain informative summaries of genome-wide data that can be used in GWAS. However, existing methods require considerable computational time to analyze genomic datasets with tens (or hundreds) of thousands of individuals genotyped at hundreds of thousands (or millions) of  single nucleotide polymorphisms (SNPs).

We propose random projections as a fast and scalable way of performing dimensionality reduction of large genome-wide SNP datasets.  With a sparse implementation, we show that projections can be computed in time linear in the size of the dataset.  Using 20,000 individuals simulated from the HapMap Phase 3 CEU, ASW, CHB and YRI populations at 365,466 SNPs, we show that the projected individuals can be used to (a) perform PCA  (b) accelerate convergence of model-based ancestry inference (b) compute identity-by-state distance. These projections have the following properties: (a) by definition, the projection directions are independent of the data and hence are robust to outliers (b) existing projections do not need to be recomputed if individuals are added to or removed from the dataset (b) the theoretical upper bound on the number of projections required to summarize a dataset is nearly independent of the number of SNPs in the dataset and is proportional to the logarithm of the number of individuals in dataset.  In addition, for large GWAS, where sequencing/genotyping data may be distributed across multiple physical locations, random projections can be computed and shared instead of sharing the genotype data directly. This can reduce data sharing requirements by one/two orders of magnitude.



"Genetic Snapshot of "Palaeoamerican Relicts": a characterisation of Fuegan and Pericu populations"
Cristina Valdiosera1 ,2, María C. Ávila-Arcos 1 ,3, Pontus Skoglund5, Andres Moreno-Estrada3, Ricardo Rodríguez4, Helena Malmström5, Josefina Mansilla6, Morten Allentoft1, Maanasa Raghavan1, Andaine Orlando1, Ilán Leboreiro6, José Luis Vera6, Christoph P. E. Zollikofer7, Marcia S. Ponce de Leon7, Colin Smith2, Carlos Bustamante3, Evelyne Heyer8, Mattias Hakobsson5, Eske Willerslev1
1Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark, 2Department of Archaeology, La Trobe University, Melbourne, Australia, 3Center for Computational, Evolutionary and Human Genomics, Stanford University, Stanford, USA, 4Centro de Investigación Sobre la Evolución y Comportamiento Humanos, Universidad Complutense de Madrid, Madrid, Spain, 55Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden, 6Instituto Nacional de Antropología e Historia, Mexico City, Mexico, 7Anthropological Institute, University of Zurich, Zurich, Switzerland, 8Laboratoire Eco-Anthropologie et Ethnobiologie, Muséum National d'Histoire Naturelle, Centre National de la Recherche Scientifique, Université Paris 7, Heyer, France

Although multiple lines of evidence support the notion that the origin all extant Amerindian populations is of Asian origin, the number of migrations and source populations that gave rise to the first inhabitants of the New World is still contentious. Based on the significant craniofacial discontinuity between the Pleistocene (Paleoamerican) and Holocene (Amerindian) populations, it has been suggested that the Americas were populated twice, from different Asian sources. Under this assumption, a first migration wave originating from Southeast Asia gave rise to the Paleoamericans, whereas all modern Amerindian groups would derive from a second wave of migration originating in Northeast Asia. Pericues in Baja California, Mexico, and the very southern populations of Patagonia and Tierra del Fuego display Paleoamerican craniofacial traits leading some researchers to suggest that these are a temporal extension of the first colonizers of the Americas. We have shotgun sequenced DNA from skeletal remains of Pericues and Fuegans to assert their genetic affinity to modern populations.


Inferring the effects of genetic variants on gene expression and splicing
Nilah Ioannidis, Alexis Battle, Stephen Montgomery, Weiva Sieh, Alice Whittemore, Carlos Bustamante
Stanford University, Stanford, CA, USA
Whole-genome and whole-exome sequencing technologies are increasingly enabling studies of genetic variation in large numbers of healthy and diseased individuals; however, interpreting the clinical significance of the many genetic variants identified in these studies remains a critical challenge. This task is particularly challenging in the case of rare or novel variants that have no effect on protein structure, such as noncoding, intronic, and synonymous variants. Here we develop a method to interpret such variants based on their predicted regulatory impact on gene expression and splicing, based on the hypothesis that clinically significant variants that do not affect protein structure are likely to affect cellular function via expression regulation. We develop a predictive model for the regulatory effects of genetic variants by training random forest-based learners to recognize cis- expression quantitative trait loci (eQTLs) and splicing quantitative trait loci (sQTLs) discovered by the Geuvadis consortium based on RNA-sequencing of lymphoblastoid cell lines from individuals in the 1000 Genomes Project [Lappalainen et al, Nature 2013]. Our model uses genomic features of each variant including its position relative to the transcription start site and nearby splice sites, conservation, overlapping functional elements from ENCODE and Ensembl, and position within these functional elements. We validate the model on additional eQTL and sQTL datasets and characterize its performance on known pathogenic noncoding, intronic, and synonymous variants, which are expected to be enriched for predicted regulatory effects. We anticipate that this regulatory effects predictive model will be useful in future studies characterizing regulatory variation within the genome and for prioritizing the likely clinical significance of rare and novel genetic variants identified in large-scale clinical sequencing studies.



 
Initial results from over 400 high coverage complete human genome sequences from ca. 130 populations of predominantly Eurasian origin.
Mait Metspalu 1
1Evolutionary biology group, Estonian Biocentre, Tartu, Estonia, 2Department of Evolutionary Biology, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia, 3Estonian Genome Center, University of Tartu, Tartu, Estonia, 4Department of Bioinformatics, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia, 5Department of Biological Anthropology, University of Cambridge, Cambridge, UK, 6Department of Integrative Biology, University of California Berkeley, Berkeley, USA, 7Human Genetics Group, Institute of Molecular Biology, National Academy of Sciences, Yerevan, Armenia, 8Institute for Genetic Engineering and Biotechnology, Sarajevo, Bosnia and Herzegovina, 9Institute of Biochemistry and Genetics, Ufa Research Center, Russian Academy of Sciences, Russia, 10Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia, 11Institute of Internal Medicine, Siberian Branch of Russian Academy of Medical Sciences, Novosibirsk, Russia, 12Laboratory of Molecular Biology, North-Eastern Federal University, Yakutsk, Russia, 13Institute of Genetics and Cytology, National Academy of Sciences, Minsk, Belarus, 14Laboratoire d'Anthropologie Moléculaire et Imagerie de Synthèse, Centre National de la Recherche Scientifique, Université de Toulouse, Toulouse, France, 15Research Centre for Medical Genetics, Russian Academy of Medical Sciences, Moscow, Russia, 16Research Department of Genetics, Evolution and Environment, University College London, London, UK, 17Center for GeoGenetics, University of Copenhagen, Copenhagen, Denmark
Complete high coverage individual genome sequences carry the maximum amount of information for reconstructing the evolutionary past of a species in the interplay between random genetic drift and natural selection. Here we present a novel dataset of over 400 human genomes sequenced at 40X on the same platform (Complete Genomics) and uniform bioinformatic pipelines. Based on SNP-chip data we generally chose three samples to represent each population of interest. We cover a wide range of mostly Eurasian populations with additional populations from Oceania, South America and Africa.

We present here initial results from population genetic analyses on the data.
We use recently developed methods based on length distributions of shared genomic segments to estimate the dynamics of past effective population sizes of regional populations, population split times and subsequent admixture events between various Eurasian population pairs.
We map the geographic and temporal variation of Neanderthal and Denisova introgression among different Eurasian populations.
For Y chromosome data we determined the regions of highest mapping quality and applied phylogenetic methods to determine the order and temporal dynamics of branching events in non-African Y chromosome haplogroups. We show that the relatively short branch lengths distinguishing continental non-African populations are consistent with the model of a rapid initial colonization of Eurasia and Oceania.



Anthropometric trait variation among diverse African populations: deviations from drift
Matthew Hansen, Joseph Lachance, Sameer Soi, Laura Scheinfeldt, Alessia Ranciaro, Simon Thompson, Jibril Hirbo, Sarah Tishkoff
University of Pennsylvania, Philadelphia, PA, USA
The African continent contains an immense amount of phenotypic variation, which is commonly attributed to adaptation to a wide range of ecological habitats and lifestyles. There has been intense debate as to the relative amount that neutral genetic drift and natural selection have shaped the human genome. Although a number of adaptive genes have been identified, relatively little is known on whether particular phenotypic traits are adaptive. Here, we use Pst-Fst comparisons to investigate the degree to which human phenotypic variation differs from that expected by neutral genetic drift. Populations analyzed include hunter-gatherers, pastoralists, and agriculturalists from eastern and western sub-Saharan Africa, representing a wide range of lifestyles and ecological habitats. Sample phenotypes involve multiple health and general lifestyle related traits, including weight, BMI, grip strength, blood pressure, lactase response, and glucose levels. For each population we calculated the amount of phenotypic variance among populations relative to the total amount of phenotypic variance in the trait (Pst). We genotyped nearly 700 study participants from over 40 populations using the Illumina 1M-Duo SNP array and calculated Fst between all pairs of populations. For each trait, at least 17 populations had both phenotype and genotype data, resulting in at least 136 pairwise comparisons per trait. Deviations from expected neutral phenotypic drift where analyzed in a Pst-Fst framework over the set of all population pairs. Adaptive traits result in phenotypic distances between populations that exceed genetic distances between population (Pst >> Fst), and these traits are good candidates for follow-up selection studies and QTL mapping. These comparisons allowed us to identify adaptive traits on both a population and a continental scale. In addition, a PCA analysis of correlated phenotypes was performed to identify trait combinations with orthogonal variance contributions.


Integrative Genomic Studies of Evolution and Adaptation in Africa
Sarah Tishkoff
Departments of Genetics and Biology, University of Pennsylvania, Philadelphia, PA, USA

Africa is thought to be the ancestral homeland of all modern human populations.  It is also a region of tremendous cultural, linguistic, climatic, and genetic diversity.   Despite the important role that African populations have played in human history, they remain one of the most underrepresented groups in human genomics studies. A comprehensive knowledge of patterns of variation in African genomes is critical for a deeper understanding of human genomic diversity, the identification of functionally important genetic variation, the genetic basis of adaptation to diverse environments and diets, and the origins of modern humans. Furthermore, a deeper understanding of African genomic variation will provide the necessary foundation for powerful and efficient genome-wide association and systems biology studies to identify coding and regulatory variants that play a role in phenotypic variation including disease susceptibility. We have used whole genome SNP genotyping and high coverage sequencing analyses to characterize patterns of genomic variation, ancestry, and local adaptation across ethnically and geographically diverse African populations.   We have identified candidate loci that play a role in adaptation to infectious disease, diet and high altitude, as well as the short stature trait in African Pygmies.   Additionally, our studies shed light on human evolutionary history and African population history.


Origin and Adaptive Evolution of Allelic Variation at TAS2R16 Associated with Salicin Bitter Taste Sensitivity in Africa
Michael Campbell 1, Alessia Ranciaro1, Daniel Zinshteyn2, Renata Rawlings-Goss1, Jibril Hirbo1 ,3, Simon Thompson1, Dawit Woldemeskel1 ,4, Alain Froment5, Joseph Rucker6, Sabah Omar7, Jean-Marie Bodo8, Thomas Nyambo9, Gurja Belay4, Dennis Drayna10, Paul Breslin11 ,12, Sarah Tishkoff1
1University of Pennsylvania, Philadelphia, PA, USA, 2Cornell University, Ithaca, NY, USA, 3Vanderbilt University, Nashville, TN, USA, 4Addis Ababa University, Addis Ababa, Ethiopia, 5Musee de L’Homme, Paris, France, 6Integral Molecular, Philadelphia, PA, USA, 7Kenya Medical Research Institute, Nairobi, Kenya, 8Ministry of Scientific Research and Innovation, Yaounde, Cameroon, 9Muhimbili University of Health and Allied Sciences, Dar es Salaam, Tanzania, 10National Institute on Deafness and Other Communication Disorders, NIH, Rockville, MD, USA, 11Monell Chemical Senses Center, Philadelphia, PA, USA, 12Rutgers University, New Brunswick, NJ, USA
Bitter taste perception influences human health and nutrition, and the genetic variation underlying this trait is thought to play a role in disease susceptibility. To better understand the genetic architecture and patterns of phenotypic variability of bitter taste perception, we examined genotype and sequence data in the promoter and coding regions of TAS2R16, a bitter taste receptor gene, in ~600 individuals from 74 African populations in West Central, Central and East Africa. We also performed genotype-phenotype association analyses of threshold levels of sensitivity to salicin, a bitter anti-inflammatory compound, in a subset of ~300 individuals from the above populations in Africa. In addition, we characterized TAS2R16 mutants in vitro to investigate the effects of polymorphic loci identified at this locus on receptor. Here, we report striking signatures of positive selection in the coding region of TAS2R16, including significant Fay and Wu's H statistics predominantly in East Africa, indicating strong local adaptation and greater genetic structure among African populations than expected under neutrality. Furthermore, we observed a "star-like" phylogeny for haplotypes with the derived allele at polymorphic site 516 associated with increased bitter taste perception that is consistent with a model of selection for "high-sensitivity" variation. In contrast, haplotypes carrying the "low-sensitivity" ancestral allele at site 516 showed evidence of strong purifying selection. However, we did not observe signals of selection in the TAS2R16 promoter. We also demonstrated, for the first time, the functional effect of nonsynonymous variation at site 516 on salicin phenotypic variance in vivo in diverse Africans and showed that variability at this site is strongly correlated with cell surface expression of the TAS2R16 receptor in vitro, suggesting a molecular basis for differences in salicin bitter taste recognition. In contrast, however, we did not detect a significant association between genetic variability in the TAS2R16 promoter and salicin bitter taste perception, indicating that allelic variation in the coding exon mainly influences bitter taste sensitivity. Additionally, we detected geographic differences in levels of bitter taste perception in Africa not previously reported and infer an East African origin for high salicin sensitivity in human populations. Overall, this study correlates genetic variants that are targets of selection with phenotypic variability, demonstrating the connection between functional variation and local adaptation at a medically-relevant locus in humans.


Archaeogenetics: Bone to Biomolecule, a study of an Early Medieval Population in Ynys Gybi, Cymru
Ashley Matchett
Inter American University of Puerto Rico, Bayamon, Puerto Rico
The archaeological excavation of the early medieval site at Towyn-Y-Capel on the island of Anglesey (Ynys Môn) in North Wales, UK, provided the opportunity to study a large population (122 skeletons) at a site that was in use over a period of up to 550 years (650 -1200 AD). A multidisciplinary study was performed on the skeletal collection from morphology to molecular chemistry and biology to assess and screen samples for later genetic analysis.
Post-sampling the assessment of skeletal sample condition was used to select material chosen for genetic analysis, and 44% of the skeletal population was selected for analysis. The morphology of samples was assessed and 87% of bones and teeth were considered to be in good or fair condition. A novel technique, Qualitative Light Fluorescence, was also used to compare the teeth to modern standards, showing a loss of 21.8% in fluorescence and indicating inorganic degradation. Histological sections taken from non-human bone finds from the site generally varied less than that indicated by the gross morphology, showing good to excellent preservation.
Well preserved skeletal samples were selected for detailed investigations into the biological and chemical condition, principally through amino acid racemisation, amino acid composition, heavy metal analysis. All samples tested had D/L Aspartic acid ratio less than 0.1, although 50% of the samples had a ratio over 0.08, which indicated that the recovery of DNA from these skeletal samples was feasible, although degraded. The element profiles showed no discernable anomalies, either due to diet or diagenesis. To consolidate genographic research, strontium isotope analysis of a small population subset, showed three anomalous ratios, which indicated widespread contact in North Atlantic Europe and unexpected residence patterns
DNA recovery was more successful in teeth than in bones. Amplification over several rounds using various primers specific for human HV1 & 2 mtDNA was conducted. Of all the samples only 14.8% of the skeletal teeth samples were amplified, although over 90% of the screened sampled were amplified and sequenced. DNA spiking trials demonstrated that some of the samples were affected by inhibition and poor template condition as validated by sequencing. Independent confirmation of successful samples was attained by sequencing, and although sequences were highly degraded. Haplogroups identification was from the sequenced HV1 sections and based on likelihood. Generally site showed a high predominance of Haplotype K(5) followed by H(2) and U(2) haplogroup profiles.



























No comments: