People of the British Isles: An analysis of the genetic contributions of European populations to a UK control population. S. Leslie1, B. Winney2, G. Hellenthal3, S. Myers4, P. Donnelly3, W. Bodmer2
1) Statistical Genetics, Murdoch Childrens Research Institute,
Melbourne, Australia; 2) Department of Oncology, University of Oxford,
UK; 3) The Wellcome Trust Centre for Human Genetics, University of
Oxford, UK; 4) Department of Statistics, University of Oxford, UK.
There is much interest in fine scale population structure in the UK,
as a signature of historical migration events and because of the effect
population structure may have on disease association studies. Population
structure appears to have a minor impact on the current generation of
genome-wide association studies, but will probably be important for the
next generation of studies seeking associations to rare variants.
Furthermore there is great interest in understanding where the British
people came from. Thus far genetic studies have been limited to a small
number of markers or to samples not collected to specifically address
these questions. A natural method for understanding population structure
is to control and document carefully the provenance of samples. We
describe the collection of a cohort of rural UK samples (The People of
the British Isles), aimed at providing a well-characterised UK control
population. This will be a resource for research community as well as
providing fine-scale genetic information on the history of the British.
Using a novel clustering algorithm, approximately 2000 samples were
clustered purely as a function of genetic similarity, without reference
to their known sampling locations. When each individual is plotted on a
UK map, there is a striking association between inferred clusters and
geography, reflecting to a major extent the known history of the British
peoples. A similar analysis is performed on samples from different
parts of Europe. Using the European samples as ‘source populations’ we
apply a novel algorithm to determine the proportion of the genomes
within each of the derived British clusters that are most closely
related to each of the source populations. Thus we can observe the
relative contribution (under our model) of each of these European
populations to the genomes of samples in different regions of Britain.
Our results strikingly reflect much of the known historical and
archaeological record while raising some important questions and perhaps
answering others. We believe this is the first detailed analysis of
very fine-scale genetic structure and its origin in a population of very
similar humans. This has been achieved through both a careful sampling
strategy and an approach to analysis that accounts for linkage
disequilibrium.
Estimating and Interpreting FST: the Impact of Rare Variants. G. Bhatia1,2, N. Patterson2, S. Sankararaman2,5, A. L. Price2,3,4
1) Harvard- Massachusetts Institute of Technology (MIT) Division of
Health, Science and Technology, Cambridge, MA; 2) Broad Institute of
Harvard and MIT, Cambridge, MA; 3) Department of Epidemiology, Harvard
School of Public Health, Boston, MA; 4) Department of Biostatistics,
Harvard School of Public Health, Boston, MA; 5) Department of Genetics,
Harvard Medical School, Boston, MA.
FST is a widely used tool for studying population
structure, but many different definitions, estimation methods and
interpretations exist in the literature. Thus, wide variation in
published estimates of FST is important to understand. For example, the FST
between European (CEU) and East Asian (CHB) populations is 0.111 when
estimated from HapMap3 data, but only 0.052 when estimated from 1000
Genomes data (1kG). While, changes in FST from
sequencing data might be expected from including rare variants we show
that this is largely through bias introduced by the estimation method
and not population genetic factors. We describe a method that is shown
to avoid these biases. We consider three specific aspects of estimation:
(1) defining FST for a single SNP, (2) combining estimates of FST
across multiple SNPs, and (3) selecting the set of SNPs used in the
computation. Correcting for differences in each of these aspects of
estimation yields estimates of FST that are much more concordant between genotype and sequence data. For example, our estimate of FST
between CEU and CHB from 1kG is 0.106, only slightly lower than the
HapMap3 estimate. This decrease is due to ascertainment bias of SNPs
included in the HapMap3 project, not to properties of rare variants. In
general, FST at rare variants in a population will be
sensitive to demographic events affecting that population. When
comparing CEU to CHB, for example, we show that rare variants in CEU and
CHB have higher FST than common variants. This is consistent with the influence of strong bottlenecks on FST at rare variants. We note that ascertainment in an out-group—for example, Yoruba (YRI)—will remove this frequency dependence of FST. Finally, we show that single-SNP estimates of FST
based on a common definition (Weir and Cockerham 1984) can become
inflated in a setting of very different sample sizes. This inflation can
result in false-positive signals of natural selection. Indeed, we show
that in a recent study of selection that compared 1,890 African-American
and 113 YRI samples, (Jin et al. 2011), FST estimates
at 9 of the 10 reported novel loci are inflated by the disparity in
sample size, and, after correction, only 7 of these 10 loci remain
nominally significant. This suggests that caution is warranted when
using this definition to rank single-SNP estimates of FST. Our results indicate that a careful protocol is needed for producing FST estimates. We provide such a protocol.
High Exome Mutational Burden in 58 African Americans with Persistent Extreme Blood Pressure. KD. H. Nguyen1, A. C. Morrison2, A. Li2, R. Gibbs3, E. Boerwinkle2, A. Chakravarti1
1) Center for Complex Disease Genomics, McKusick-Nathans Institute of
Genetic Medicine, Johns Hopkins University School of Medicine,
Baltimore, MD, USA; 2) Human Genetics Center, School of Public Health,
University of Texas Health Science Center at Houston, Houston, TX, USA;
3) Human Genome Sequencing Center, Baylor College of Medicine, Houston,
TX, USA.
High blood pressure (BP) is a major cardiovascular risk factor in
African Americans (AA). Despite its modest heritability (35%), ~63 BP
loci have been implicated by genome-wide association studies in European
and African ancestry samples. We explored exome sequencing in 58
African Americans (AA) at the extremes of BP distribution across
multiple visits in the Atherosclerosis Risk in Communities study
(~1%tile and 99%tile residuals of the baseline age- and sex-corrected
systolic BP) to demonstrate the enrichment of deleterious mutations
genome-wide and to identify novel genes. We identified 67,298 high
quality coding/splicing variants (≥10X coverage, ≥2 copies of the variant alleles, PHRED-like score ≥30, call rate
≥90%); each variant had a phyloP conservation score (S) and was
classified as synonymous, mild missense (exon splice junction, non-NMD
nonsense, nonsynonymous) or severe missense (intron splice junction, NMD
nonsense). We assumed that the observed exomic mutation profile (kernel
density of variants for each S value) from the 58 individuals was a
mixture of two profiles, (1- β) of random subjects (107,727 variants in 61 AA individuals from the 1000G Project) and β of ‘true’
mutations (70,393 Mendelian / disease causing mutations from the Human
Genome Mutation Database), and estimated the mutational burden (β^) by least squares. This analysis estimated an overall
β^= 6%, with values of 2%, 12% and 38% for the synonymous, mild missense
and severe missense variants, respectively. Importantly, β^ increased with higher conservation scores to ~100%. Across each of the 3 mutation classes,
β^ was slightly higher for variants observed exclusively in the top than
the bottom BP group (14%/12%, 27%/25%, 60%/41%, for synonymous, mild
and severe missense variants respectively). Conversely, we observed β^=0
for variants that were present in both the top and bottom BP classes
irrespective of mutation class. By considering only variants at
class-specific phyloP thresholds, S≥5 and 4.5, for the mild and severe missense variants (β^=100%),
we estimate that a minimum of 2,412 variants in 1,881 genes, or an
average burden of ~42 mutations at ~32 genes per subject, are involved
in BP. Consequently, our results showed that BP extreme subjects have
distinct global mutational burden; there is a significant enrichment of
deleterious coding mutations at highly conserved sites in these
individuals; and the identified genes reveal new BP candidate genes.
PRDM9 directs genetic recombination away from functional genomic elements. K. Brick1, F. Smagulova2, P. Khil1, RD. Camerini-Otero1, G. Petukhova2
1) Genetics & Biochemistry Branch, NIDDK, National Institutes of
Health, Bethesda, MD; 2) Uniformed Services University of Health
Sciences, Department of Biochemistry and Molecular Biology, Bethesda,
MD, USA.
Recombination initiates with the formation of programmed DNA double
strand breaks (DSBs) at a small subset of genomic loci called hotspots.
Elegant recent studies in mouse and human have determined that PRDM9, a
meiosis-specific histone H3 methyl-transferase is involved in DSB
hotspot site determination (Parvanov et al. Science 2010; Baudat et al.,
Science 2010; Myers et al., Science 2010), likely thorough DNA binding
of its zinc-finger domain. We have recently generated the first
genome-wide DSB hotspot map in a metazoan genome and have shown that the
majority of mouse DSB hotspots are associated with testis-specific
H3K4me3 chromatin marks, potentially formed by PRDM9 (Smagulova et al.,
Nature 2011). Curiously however, Prdm9 knockout mice remain proficient
at initiating recombination. In this work, we describe several
straightforward experiments that elucidate the nature and extent of the
role of PRDM9 in determining DSB hotspots locations.
We used a
novel ChIP-Seq variant developed by our group to detect ssDNA bound by
the meiotic recombinase DMC1 (Khil et al., Genome Res., 2012). Using
this method, we precisely mapped the genome wide distribution of DSB
hotspots in seven mouse strains and in their F1 progeny. While hotspots
in mice sharing a Prdm9 allele mapped to almost identical loci, hotspots in other mice were dependent on the DNA binding specificity of the Prdm9 allele. Importantly, in Prdm9
knockout mice, hotspots were at completely different locations than in
wild-type, definitively illustrating that PRDM9 determines practically
all DSB hotspot locations. Intriguingly, DSBs in the pseudo autosomal
region - the site of an obligate recombination event in every meiosis -
were found to be Prdm9-independent and present in all strains. In Prdm9
knockout mice, DSBs still accumulated in hotspots however, in the
absence of PRDM9, most recombination initiated at H3K4me3 marks at
promoters or enhancers. These sites are rarely targeted in wild-type
mice illustrating an important, unexpected role for PRDM9 in
sequestering the recombination machinery away from functional genomic
elements where the efficient repair of DSBs may be problematic.
A Unified Model of Meiosis Combining Recombination, Non-Disjunction, Interference and Infertility. H. R. Johnston IV, D. J. Cutler Department of Human Genetics, Emory University School of Medicine, Atlanta, GA.
Human male and female recombination rates and patterns differ greatly
across the broad scale of human chromosomes. Rates of infertility and
non-disjunction differ widely between males and females. No simple cause
is known for these observations. To this end, we have created a unified
model of meiosis that combines recombination, non-disjunction,
interference and fertility. The model correctly predicts the rate of
fertility, trisomy 21 occurrences and the number and, most
interestingly, the different patterns of recombination between the
sexes. The model we create is based on the observation that chiasmata
are the mechanism that enables the normal segregation of chromosomes
during meiosis. Non-disjunction is the result of a failed segregation
event. In our model, non-disjunction occurs both when no chiasmata are
present between pairs of non-sister chromatids as well as when multiple
chiasmata are present close together between pairs of non-sister
chromatids. Other elements of our model include having no chiasmata
occur between sister chromatids as well as concluding male meiosis
immediately while arresting female meiosis between birth and the mother’s
age at conception. This period of arrest requires that females begin
with far more chiasmata than males. It also allows for physical
interference to initiate from anywhere on a chromosome arm. In males,
this initiation event is always telomeric. These elements combine to
generate the unique patterns of recombination in each gender that have,
heretofore, not been explained. They also generate the unique patterns
of non-disjunction and infertility, helping to explain why these
phenomena are seen far more often in eggs relative to sperm. Overall,
this model argues that gross differences between male and female
patterns of non-disjunction, infertility, and recombination are
substantially the result of the period of meiotic arrest during
oogenesis.
Human spermatogenic failure purges deleterious mutation load from the
autosomes and both sex chromosomes, including the gene DMRT1. D. F. Conrad1, A. Lopes2, K. I. Aston3, F. Carvalho4, J. Goncalves5, R. Mathiesen2, N. Huang6, A. Ramu1, J. Downie7, S. Fernandes8, A. Amorim2,8, A. Barros9, M. Hurles6, S. Moskovtsev10, C. Ober11, J. Schiffman7, P. N. Schlegel12, M. De Sousa13, D. T. Carrell3, 14
1) Dept Genetics, Washington Univ School Med, St Louis, MO; 2)
IPATIMUP, Institute of Molecular Pathology and Immunology of the
University of Porto, R. Dr. Roberto Frias S/N, 4200-465 Porto, Portugal;
3) Andrology and IVF Laboratories, Department of Surgery; 4) Department
of Genetics, Faculty of Medicine, University of Porto, Porto, Portugal;
5) Centre for Human Genetics, National Institute of Health Dr. Ricardo
Jorge, Lisbon, Portugal; 6) Genome Mutation and Genetic Disease Group,
Wellcome Trust Sanger Institute, Cambridge, UK; 7) Department of
Oncological Sciences; 8) Faculty of Science, University of Porto,
4099-002 Porto, Portugal; 9) Centre for Reproductive Genetics Alberto
Barros, Porto, Portugal; 10) Department of Obstetrics & Gynaecology,
University of Toronto; 11) Department of Human Genetics, Department of
Obstetrics & Gynecology, The University of Chicago, Chicago, IL
60637, USA; 12) Department of Urology, Weill Cornell Medical College,
New York-Presbyterian Hospital, New York, USA; 13) Laboratory of Cell
Biology, UMIB, ICBAS, University of Porto, Porto, Portugal; 14)
Department of Physiology, Department of Obstetrics and Gynecology
University of Utah School of Medicine, Salt Lake City, Utah, 84108, USA.
Gonadal failure, along with early pregnancy loss and perinatal death,
may be an important filter that limits the propagation of harmful
mutations in the human population. We hypothesized men with
spermatogenic impairment, a condition with unknown genetic architecture
and a common cause of male infertility, are enriched for rare
deleterious mutations compared to men with normal spermatogenesis. We
assayed genomewide SNPs and CNVs in 327 men with spermatogenic
impairment and >1100 controls, and estimated that a rare autosomal
deletion multiplicatively changes a man’s risk for this condition by 10% (OR 1.10 [1.05-1.15], p < 4 x 10-5), a rare X-linked CNV by 29%, (OR 1.29 [1.16-1.43], p< 3 x 10-6) and a rare Y-linked duplication by 64% (OR 1.64 [1.28-2.10], p < 9 x 10-5).
Based on the population frequency of potential risk alleles, extent of
homozygosity, and evidence for dosage sensitivity of genes disrupted in
men with spermatogenic impairment, we propose that the CNV burden is
polygenic and distinct from the burden of large, dominant mutations
described for developmental disorders. Our study also identifies focal
deletions of the sex-differentiation gene DMRT1 as likely
recurrent causes of idiopathic azoospermia, and generates hypotheses for
directing future studies on the genetic basis of male infertility and
IVF outcomes.
Genome Wide Association Study of Sexual Orientation in a Large, Web-based Cohort. E. M. Drabant, A. K. Kiefer, N. Eriksson, J. L. Mountain, U. Francke, J. Y. Tung, D. A. Hinds, C. B. Do 23andMe, Mountain View, CA.
There is considerable variation in human sexual orientation.
Heritability studies have differed on the exact scope of genetic
contributions for sexual orientation, but it appears that both genetics
and environment play a role. Though a few linkage studies have pointed
at a possible role for certain genes on the X chromosome, the strength
of that evidence is limited due to the conflicting nature of the reports
and small sample sizes. We sought to clarify some of the questions
surrounding the possible genetic underpinnings of sexual orientation by
deploying a web-based survey to the large 23andMe database and
conducting the first ever genome-wide association study (GWAS) on sexual
orientation.
We adapted the Klein Sexual Orientation Grid to
examine seven elements of sexual orientation. All items were rated on a
seven point scale by participants. Initial analyses focused on the “self identification” item as a continuous variable in response to the question “How do you label, identify or think of yourself?”
In a sample of 7,887 men and 5,570 women, 77.2% of men 74.6% of women
identified as heterosexual only, 7.3% of men and 15.3% of women as
heterosexual mostly, 1.1% of men and 2.7% of women as heterosexual
somewhat more, 1.3% of men and 3.5% of women as bisexual, 0.7% of men
and 0.5% of women as homosexual somewhat more, 2.9% of men and 1.6% of
women as homosexual mostly, and 9.5% of men and 1.8% of women as
homosexual only. In both men and women, sexual identity was most
significantly correlated with sexual attraction (men r=0.97, women
r=0.90), sexual behavior (men r=0.95, women r=0.83), sexual fantasies
(men r=.96, women r=.75), and emotional attraction (men r=0.79, women
r=0.45), and the least strongly correlated with heterosexual/homosexual
lifestyle (men r=.54, women r=.37), and social preference (men r=.15,
women r=.08).
We carried out GWAS stratified by sex in a cohort
of 7887 unrelated men and 5570 unrelated women of European ancestry
collected in the two months since the initial survey release. No clear
genome-wide significant associations have been found thus far, and the
current data do not show any direct association for markers within
chromosome band Xq28. However, data collection is still ongoing, and
increased sample size may help to clarify the roles for currently
suggestive associations.
A scalable pipeline for local ancestry inference using thousands of reference individuals. C. B. Do, E. Durand, J. M. Macpherson, B. Naughton, J. L. Mountain 23andMe, Inc, Mountain View, CA.
Ancestry deconvolution, the task of identifying the ancestral origin
of chromosomal segments in admixed individuals, is straightforward when
the ancestral populations considered are sufficiently distinct. To date,
however, no approaches have been shown to be effective at
distinguishing between closely related populations (e.g., within
Europe). Moreover, due to their computational complexity, most existing
methods for ancestry deconvolution are unsuitable for application in
large-scale settings, where the reference panels used contain thousands
of individuals.
We describe Ancestry Painting 2.0, a modular
three-stage pipeline for efficiently and accurately identifying the
ancestral origin of chromosomal segments in admixed individuals. In the
first stage, an out-of-sample extension of the BEAGLE phasing algorithm
is used to generate a preliminary phasing for an unphased, genotyped
individual. In the second stage, a support vector machine (SVM) using a
specialized string kernel assigns tentative ancestry labels to short
local phased genomic regions. In the third stage, an autoregressive pair
hidden Markov model simultaneously corrects phasing errors and produces
reconciled local ancestry estimates and confidence scores based on the
SVM labels.
We compiled a reference panel of over 7,500
individuals of homogeneous ancestry, derived from a combination of
several publicly available datasets and over 5,000 individuals reporting
four grandparents with the same country-of-origin from the customer
database of the personal genetics company, 23andMe, Inc, and excluding
outliers identified through principal components analysis (PCA). In
cross-validation experiments, Ancestry Painting 2.0 achieves high
sensitivity and specificity (in most cases >90%) for labeling
chromosomal segments across over 20 different populations worldwide. We
also demonstrate the robustness of the algorithm via simulations of
individuals of known local admixture, and compare Ancestry Painting 2.0
with existing state-of-the-art tools for multi-population local and
global ancestry inference, including LAMP, ALLOY, PCA-ADMIX, and
ADMIXTURE.
No comments:
Post a Comment