NordicDB: a Nordic pool and portal for genome-wide control data

An abstract from European Journal of Human Genetics:
A cost-efficient way to increase power in a genetic association study is to pool controls from different sources. The genotyping effort can then be directed to large case series. The Nordic Control database, NordicDB, has been set up as a unique resource in the Nordic area and the data are available for authorized users through the web portal (http://www.nordicdb.org). The current version of NordicDB pools together high-density genome-wide SNP information from ~5000 controls originating from Finnish, Swedish and Danish studies and shows country-specific allele frequencies for SNP markers. The genetic homogeneity of the samples was investigated using multidimensional scaling (MDS) analysis and pairwise allele frequency differences between the studies. The plot of the first two MDS components showed excellent resemblance to the geographical placement of the samples, with a clear NW–SE gradient. We advise researchers to assess the impact of population structure when incorporating NordicDB controls in association studies. This harmonized Nordic database presents a unique genome-wide resource for future genetic association studies in the Nordic countries.
The first thing that stands out to me in the MDS plot from the NordicDB website is the substantial overlap between the Danish sample and the CEU HapMap sample (Utah whites):
Top axes of genetic variation in the Nordic Control Database (4620 samples) contrasted with the CEU population (108 samples) HapMap and a Finnish reference population (81 samples). The MDS analysis was performed on approximately 45K SNPs that were common between the genotyping platforms. The samples are represented with the color of their country of origin: Finland (red), Sweden (green) and Denmark (yellow).

7 comments:

Dasein said...

I think much of the CEU/Dane overlap is due the large number of Finns on the plot (many of whom I presume are from the western part of the country, and have substantial Swedish admixture). There also looks to be substantial overlap with the Swedish sample (though, unsurprisingly, many of them are closer to Finns). This context dependency is one of the problems with PCA/MDS. It would be nice too if there was a way to jitter these plots. It seems that the order of plotting the data points is Swedes, then Danes, then CEU, making it difficult to tell how many Swedes are buried in there.

I don't know of any other plots that have Danes and CEUs. In Lao's and Novembre's 2008 papers there looks to be good separation of UK and Danish individuals (though Novembre's Danish sample had only 1 individual). I'm not sure that there has been anything published about the precise ethnic ancestries of the CEU sample. The references for Jorde's "Northern Europeans", who are also described as Utahns of northern European ancestry lead back to a 1996 paper by O'brien et al. They sampled from 8 different counties and recorded precise ancestry information. The average ancestries were:

Danish (21.2%), English (48.6%), Irish (1.8%), German (1.9%), Norwegian (3.1%), Scottish (6.1%), Swedish (6.9%), Swiss (5.5%), and Welsh (5.1%)

There is quite a bit of variability for Danish ancestry, with 2 counties having 46.5% and 34.1%.

I don't know to what extent these averages could be representative of the CEU sample, or even of Jorde's "Northern European" sample, which he's used for a number of papers, most recently this one from Xing et al.

n/a, what do you make of Jorde's "Northern European" sample? The admixture analysis shows it to be quite different from the CEU sample. I've heard it said that he's been using the wrong sample, that they are in fact a French CEPH sample (but I havent' seen any evidence provided).

n/a said...

Dasein,

I agree, that's probably a significant factor. I would prefer to see more populations included.

Here's how the CEU sample compares to the European samples from Lao et al.

I believe the CEU individuals are of predominantly British and secondarily Scandinavian ancestry, based on evidence like this and what I know about Utah. According to some sources I came across on Google just now, they are "mostly of Scandinavian or English origin". What the source for that is, and whether listing Scandinavian first is supposed to indicate that is the larger component, I don't know.

"I've heard it said that he's been using the wrong sample, that they are in fact a French CEPH sample (but I havent' seen any evidence provided)."

No idea, but something definitely seems off, and that sounds plausible. Where did you hear that?

Dasein said...

Thanks, I missed that Lao figure (I was looking at the one in the supplementary materials).

That comment about the French CEPH sample was from 'Polak' in a post on the Xing paper at Dienekes'.

The only true North Europeans in this study are the CEU (Utah Americans of Northern and Western Euro descent). They carry the least amount of red, followed by the Slovenians.

The set labeled North European by the authors here looks like the HGDP-CEPH French set from Lyon. In the study they made a mistake of identifying it as "Utah American", which is actually the CEU. But clearly it's not North European, but from somewhere with South Euro admix, like France.

Dasein said...

BTW, do you know of any publicly available European genotype data, besides those from HapMap, HGDP, and Jorde's "Northern Europeans"? It seems that most European SNP data is only available via formal application.

Dasein said...

Actually, that comment from Polak might not be the one I was thinking of. I'd come across a few comments on it while looking into this, but can't remember where they were now. In any case, it would be easy enough to check whether it is the French HGDP sample, since the data sets are available. When I get a chance, I'll compare them.

n/a said...

"do you know of any publicly available European genotype data, besides those from HapMap, HGDP, and Jorde"

There's a Tuscan HapMap Phase 3 sample. You could also check the 1000 genomes project, but I don't know if they include any new European samples. Some 23andMe users have made their own data freely available.

Anonymous said...

Will there be an AlpineDB and a MedDB?