race/history/evolution notes: Ancestry analysis (ESHG 2008 abstracts)

European Human Genetics Conference (abstract database)

The genome-wide patterns of variation confirms significant substructure in a founder population
K. Rehnström1,2, E. Jakkula1,3, T. Varilo1,2, O. Pietiläinen1, T. Paunio1, N. Pedersen4, M. Järvelin5, S. Ripatti1,4, S. Purcell3, M. Daly3, A. Palotie3,6, L. Peltonen1,6;
1National Public Health Institute, Helsinki, Finland, 2University of Helsinki, Helsinki, Finland, 3Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, United States, 4Karolinska Institute, Stockholm, Sweden, 5Imperial College, London, United Kingdom, 6Wellcome Trust Sanger Institute, Cambridge, United Kingdom.
Presentation Number: P07.051
The genome-wide SNP genotyping platforms enable detailed association studies, but at the same time offer new insight into population genetics. Here we present an example of a founder population by scrutinizing nine geographically distinct Finnish subpopulations representing different eras in the population history to study the effect of bottlenecks and isolation using high-density SNP data. We demonstrate that population substructure and even individual ancestry are detectable at high resolution and support the concept of multiple historical bottlenecks resulting from founder effects.
We performed multidimensional scaling (MDS) of pairwise identity-by state (IBS) sharing data to delineate population structure. Within Finland the two primary dimensions of the MDS-analysis correspond remarkably with the east-west and north-south directions, respectively, showing a distribution of individuals corresponding closely with the geographical distribution of parents’ birthplaces. The youngest subisolates showed higher IBS similarity compared to other subgroups and separation using an extremely fine resolution. We analyzed linkage disequilibrium (LD) and extended regions of homozygosity (ROHs) to further explore the genomic structure of the subpopulations. Highest LD and the largest number of long (>10Mb) ROHs was identified in the youngest regional population and showed a gradual decline of these measures in older and more outbred, subpopulations.
The study shows the power of GWA data to trace the population history and also exemplifies the power to identify stratification even within homogeneous populations. A deeper insight into fine-scale population substructure also emphasizes the importance of adjustment of GWA studies aiming at identifying smaller and smaller genetic effects to avoid confounding.

Comparison of different methods to estimate genetic ancestry and control for stratification in genome-wide association studies
Presentation Time: Tuesday
E. Salvi1,2, G. Guffanti1, A. Orro2, F. Torri1, S. Lupoli3, J. Turner4, D. Keator4, J. Fallon4, S. Potkin4, C. Barlassina1, D. Cusi1, L. Milanesi2, F. Macciardi1;
1Department of Science and Biomedical Technology, University of Milan, Milan, Italy, 2ITB CNR, Segrate, Milan, Italy, 3INSPE, Milan, Italy, 4Department of Psychiatry and Human Behavior University of California, Irvine, CA, United States.
Presentation Number: C13.6
In case-control association studies, population subdivision or admixture can lead to spurious associations between a phenotype and unlinked candidate loci. Population stratification can occur in case-control association studies when allele frequencies differ between cases and controls because of ancestry.
We evaluated five methods (Fst, Genomic Control, STRUCTURE, PLINK and EIGENSTRAT) using 317K SNPs (Illumina HumanHap300) in a case-control sample of 200 American subjects with different races (Caucasian, African and Asian) in order to identify and to correct for stratification. Fst, Structure and Genomic Control are based on the usage of few genetic markers while PLINK and EIGENSTRAT are computationally tractable on a genome-wide scale. Fst, STRUCTURE and Genomic Control did not detect a significant stratification in our sample, as well as EIGENSTRAT and PLINK. However, these last two methods, using a much larger information from the whole set of SNPs, graphically suggested the presence of a partial stratification, due to African and Asian individuals while the estimated inflation factor of 1 didn't statistically confirm stratification. This brought to the decision to further enlarge the sample with hundreds of controls coming from Caucasian populations. When we enlarged the sample to 650 individuals we found a high value of inflation factor as statistical confirmation of the population stratification. The substructure still depends only on African and Asian subjects that are separated from the Caucasian homogeneous sample. Therefore the sample size is crucial to get enough power to detect a possible stratification.

A computational test for biological relatedness in genetic association studies using probabilistically inferred haplotypes
L. Xumerle, G. Malerba, P. F. Pignatti;
Department Maternal Infantile and of Biology-Genetics. Section of Biology and Genetics, University of Verona, Italy.
Presentation Number: P06.058
An association between gene and disease may be incorrectly estimated if the allele frequencies differ among cases and controls depending on inbreeding or unrecognized population stratification.
A program (http://medgen.univr.it/jenoware/) was developed to compute the probability of genetic relatedness in pairs of individuals using a likelihood ratio test.
Using loci that are in LD decreases the accuracy of parentage assignments. Groups of SNPs in linkage disequilibrium (LD) were simulated to verify the effects of linkage on relatedness assignment. The probability of genetic relatedness was computed using the single SNPs and treating the SNPs as composite markers with different r² threshold values. Haplotypes were probabilistically inferred using the PHASE and Gerbil programs. False positive rate and power were assessed by simulation in unrelated individuals and in pedigrees.
As an example of results, in order to estimate the support for II degree relatedness with power 80%, and false positive rate 5%, the following was needed: 100 SNPs with no linkage; 275 SNPs having r²=0.4; 20 probabilistically inferred haplotypes (100 SNPs having r²=0.4); 40 probabilistically inferred haplotypes (200 SNPs having r²=0.8).
In conclusion, if LD blocks are examined, the biological relatedness can be computed with a limited number of markers increasing test accuracy with probabilistically inferred haplotypes.

Ancestry analysis (ESHG 2008 abstracts)

No comments: