Identification of Candidate Parkinson Disease Genes by Integrating Genome-Wide Association Study, Expression, and Epigenetic Data Sets

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Key Points

Question

What genes and genomic processes underlie risk of sporadic Parkinson disease?

Findings

This genetic association study integrated Parkinson disease genome-wide association study data and brain-derived gene regulation data using various complementary bioinformatic tools and identified 11 candidate genes with evidence of disease-associated regulatory changes. Coexpression and protein level analyses of these genes demonstrated a significant functional association with known mendelian Parkinson disease genes.

Meaning

This study suggests that gene regulation data may be used to identify candidate genes and pathways involved in sporadic Parkinson disease.

Abstract

Importance

Substantial genome-wide association study (GWAS) work in Parkinson disease (PD) has led to the discovery of an increasing number of loci shown reliably to be associated with increased risk of disease. Improved understanding of the underlying genes and mechanisms at these loci will be key to understanding the pathogenesis of PD.

Objective

To investigate what genes and genomic processes underlie the risk of sporadic PD.

Design and Setting

This genetic association study used the bioinformatic tools Coloc and transcriptome-wide association study (TWAS) to integrate PD case-control GWAS data published in 2017 with expression data (from Braineac, the Genotype-Tissue Expression [GTEx], and CommonMind) and methylation data (derived from UK Parkinson brain samples) to uncover putative gene expression and splicing mechanisms associated with PD GWAS signals. Candidate genes were further characterized using cell-type specificity, weighted gene coexpression networks, and weighted protein-protein interaction networks.

Main Outcomes and Measures

It was hypothesized a priori that some genes underlying PD loci would alter PD risk through changes to expression, splicing, or methylation. Candidate genes are presented whose change in expression, splicing, or methylation are associated with risk of PD as well as the functional pathways and cell types in which these genes have an important role.

Results

Gene-level analysis of expression revealed 5 genes ( WDR6 [OMIM 606031], CD38 [OMIM 107270], GPNMB [OMIM 604368], RAB29 [OMIM 603949], and TMEM163 [OMIM 618978]) that replicated using both Coloc and TWAS analyses in both the GTEx and Braineac expression data sets. A further 6 genes ( ZRANB3 [OMIM 615655], PCGF3 [OMIM 617543], NEK1 [OMIM 604588], NUPL2 [NCBI 11097], GALC [OMIM 606890], and CTSB [OMIM 116810]) showed evidence of disease-associated splicing effects. Cell-type specificity analysis revealed that gene expression was overall more prevalent in glial cell types compared with neurons. The weighted gene coexpression performed on the GTEx data set showed that NUPL2 is a key gene in 3 modules implicated in catabolic processes associated with protein ubiquitination and in the ubiquitin-dependent protein catabolic process in the nucleus accumbens, caudate, and putamen. TMEM163 and ZRANB3 were both important in modules in the frontal cortex and caudate, respectively, indicating regulation of signaling and cell communication. Protein interactor analysis and simulations using random networks demonstrated that the candidate genes interact significantly more with known mendelian PD and parkinsonism proteins than would be expected by chance.

Conclusions and Relevance

Together, these results suggest that several candidate genes and pathways are associated with the findings observed in PD GWAS studies.

Abstract

This genetic association study investigates what genes and genomic processes underlie the risk of sporadic Parkinson disease.

Related collections

Most cited references 35

Record: found
Abstract: found
Article: not found

An RNA-sequencing transcriptome and splicing database of glia, neurons, and vascular cells of the cerebral cortex.

Ye Zhang, Kenian Chen, Steven A Sloan … (2014)

The major cell classes of the brain differ in their developmental processes, metabolism, signaling, and function. To better understand the functions and interactions of the cell types that comprise these classes, we acutely purified representative populations of neurons, astrocytes, oligodendrocyte precursor cells, newly formed oligodendrocytes, myelinating oligodendrocytes, microglia, endothelial cells, and pericytes from mouse cerebral cortex. We generated a transcriptome database for these eight cell types by RNA sequencing and used a sensitive algorithm to detect alternative splicing events in each cell type. Bioinformatic analyses identified thousands of new cell type-enriched genes and splicing isoforms that will provide novel markers for cell identification, tools for genetic manipulation, and insights into the biology of the brain. For example, our data provide clues as to how neurons and astrocytes differ in their ability to dynamically regulate glycolytic flux and lactate generation attributable to unique splicing of PKM2, the gene encoding the glycolytic enzyme pyruvate kinase. This dataset will provide a powerful new resource for understanding the development and function of the brain. To ensure the widespread distribution of these datasets, we have created a user-friendly website (http://web.stanford.edu/group/barres_lab/brain_rnaseq.html) that provides a platform for analyzing and comparing transciption and alternative splicing profiles for various cell classes in the brain. Copyright © 2014 the authors 0270-6474/14/3411929-19$15.00/0.

0 comments Cited 1150 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans.

(2015)

Understanding the functional consequences of genetic variation, and how it affects complex human disease and quantitative traits, remains a critical challenge for biomedicine. We present an analysis of RNA sequencing data from 1641 samples across 43 tissues from 175 individuals, generated as part of the pilot phase of the Genotype-Tissue Expression (GTEx) project. We describe the landscape of gene expression across tissues, catalog thousands of tissue-specific and shared regulatory expression quantitative trait loci (eQTL) variants, describe complex network relationships, and identify signals from genome-wide association studies explained by eQTLs. These findings provide a systematic understanding of the cellular and biological consequences of human genetic variation and of the heterogeneity of such effects among a diverse set of human tissues. Copyright © 2015, American Association for the Advancement of Science.

0 comments Cited 1104 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics

Claudia Giambartolomei, Damjan Vukcevic, Eric E. Schadt … (2014)

Introduction In the last decade, hundreds of genomic loci affecting complex diseases and disease relevant intermediate phenotypes have been found and robustly replicated using genome-wide association studies (GWAS, [1]). At the same time, gene expression measurements derived from microarray [2] or RNA sequencing [3] studies have been used extensively as an outcome trait for the GWAS design. Such studies are usually referred to as expression quantitative trait locus (eQTL) analysis. While GWAS datasets have provided a steady flow of positive and replicable results, the interpretation of these findings, and in particular the identification of underlying molecular mechanisms, has proven to be challenging. Integrating molecular level data and other disease relevant intermediate phenotypes with GWAS results is the natural step forward in order to understand the biological relevance of these results. This strategy has been explored before and allowed the identification of the genes and regulatory variations that are important for several diseases (reviewed in [4]). In this context, a natural question to ask is whether two independent association signals at the same locus, typically generated by two GWAS studies, are consistent with a shared causal variant. If the answer is positive, we refer to this situation as colocalised traits, and the probability that both traits share a causal mechanism is greatly increased. A typical example involves an eQTL study and a disease association result, which points to the causal gene and the tissue in which the effect is mediated [5]–[7]. In fact, looking for overlaps between complex trait-associated variants and eQTL variants has been successfully used as evidence of a common causal molecular mechanism (e.g., [5], [8]). The same questions can also be considered between pairs of eQTLs [9], [10], or pairs of diseases [11]. However, identifying the traits that share a common association signal is not a trivial statistical task. Visual comparison of overlaps of association signals with an expression dataset is a step in this direction (using for example Sanger tool Genevar http://www.sanger.ac.uk/resources/software/genevar/), but the abundance of eQTLs in the human genome and across different tissues makes an accidental overlap between these signals very likely [2]. Therefore visual comparison is not enough to make inferences about causality and formal statistical tests must be used to address this question. Nica et al. [5] proposed a methodology to rank the SNPs with an influence on two traits based on the residual association conditional on the most associated SNP. By comparing the GWAS SNP score with all other SNPs in the associated region, this method accounts for the local LD structure. However, this is not a formal test of a null hypothesis for, or against, colocalisation at the locus of interest. A formal test of colocalisation has been developed in a regression framework. This is based on testing a null hypothesis of proportionality of regression coefficients for two traits across any set of SNPs, an assumption which should hold whenever they share causal variant(s) [12], [13]. No assumption is made about the number of causal variants, although the method does assume that in the case of multiple causal variants, all are shared. Both the ranking method and proportionality testing share the drawback of having to specify a subset of SNPs to base the test on, and Wallace [14] shows that this step can generate significant biases. The main sources of bias are overestimation of effect sizes at selected SNPs (termed “Winner's curse”), and the fact that, owing to random fluctuations, the causal variant may not always be the most strongly associated one. These factors lead to rejection of colocalisation in situations where the causal SNP is in fact shared. Although this can be overcome in the case of proportionality testing by averaging over the uncertainty associated with the best SNP models [14], perhaps the greatest limitation is the requirement for individual level genotype data, which are rarely available for large scale eQTL datasets. The success of GWAS meta-analyses has shown that there is considerable benefit in being able to derive association tests on the basis of summary statistics. With these advantages in mind, He et al. [7] developed a statistical test to match the pattern of gene expression with a GWAS dataset. This approach, coded in the software Sherlock, can accommodate p-values as input. However, their hypothesis of interest differs from the question of colocalisation, with the focus of the method being on genome-wide convergence of signals, assuming an abundance of trans eQTLs. In particular, SNPs that are not associated with gene expression do not contribute to the test statistic. Such variants can provide strong evidence against colocalisation if they are strongly associated with the GWAS outcome. These limitations motivate the development of novel methodologies to test for colocalisation between pairs of traits. Here, we derive a novel Bayesian statistical test for colocalisation that addresses many of the shortcomings of existing tools. Our analysis focuses on a single genomic region at a time, with a major focus on interpreting the pattern of LD at that locus. Our underlying model is closely related to the approach developed by Flutre et al. [10], which considers the different but related problem of maximising the power to discover eQTLs in expression datasets of multiple tissues. A key feature of our approach is that it only requires single SNP p-values and their minor allele frequencies (MAFs), or estimated allelic effect and standard error, combined with closed form analytical results that enable quick comparisons, even at the genome-wide scale. Our Bayesian procedure provides intuitive posterior probabilities that can be easily interpreted. A main application of our method is the systematic comparison between a new GWAS dataset and a large catalogue of association studies in order to identify novel shared mechanisms. We demonstrate the value of the method by re-analysing a large scale meta-analysis of blood lipids [15] in combination with a gene expression study in 966 liver samples [16]. Results Overview of the method We consider a situation where two traits have been measured in two distinct datasets of unrelated individuals. We assume that samples are drawn from the same ethnic group, i.e. allele frequencies and pattern of linkage disequilibrium (LD) are identical in both populations. For each of the two samples, we consider for each variant a linear trend model between the outcome phenotypes Y and the genotypes X (or a log-odds generalised linear model if one of the two outcome phenotypes Y is binary): We are interested in a situation where single variant association p-values and MAFs, or estimated regression coefficients and their estimated precisions , are available for both datasets at Q variants, typically SNPs but also indels. We make two additional assumptions and discuss later in this paper how these can be relaxed. Firstly, that the causal variant is included in the set of Q variants, either directly typed or well imputed [17]–[19]. Secondly, that at most one association is present for each trait in the genomic region of interest. We are interested in exploring whether the data support a shared causal variant for both traits. While the method is fully applicable to a case-control outcome, we consider two quantitative traits in this initial description. SNP causality in a region of Q variants can be summarised for each trait using a vector of length Q of (0, 1) values, where 1 means that the variant is causally associated with the trait of interest and at most one entry is non-zero. A schematic illustration of this framework is provided in Figure 1 in a region that contains 8 SNPs. Each possible pair of vectors (for traits 1 and 2, which we refer to as “configuration”) can be assigned to one of five hypotheses: 10.1371/journal.pgen.1004383.g001 Figure 1 Example of one configuration under different hypotheses. A configuration is represented by one binary vector for each trait of (0,1) values of length n = 8, the number of shared variants in a region. The value of 1 means that the variant is causally involved in disease, 0 that it is not. The first plot shows the case where only one dataset shows an association. The second plot shows that the causal SNP is different for the biomarker dataset compared to the expression dataset. The third plot shows the configuration where the single causal variant is the fourth one. : No association with either trait : Association with trait 1, not with trait 2 : Association with trait 2, not with trait 1 : Association with trait 1 and trait 2, two independent SNPs : Association with trait 1 and trait 2, one shared SNP In this framework, the colocalisation problem can be re-formulated as assessing the support for all configurations (i.e. pairs of binary vectors) in hypothesis . Our method is Bayesian in the sense that it integrates over all possible configurations. This process requires the definition of prior probabilities, which are defined at the SNP level (Methods). A probability of the data can be computed for each configuration, and these probabilities can be summed over all configurations and combined with the prior to assess the support for each hypotheses . The result of this procedure is five posterior probabilities (PP0, PP1, PP2, PP3 and PP4). A large posterior probability for hypothesis 3, PP3, indicates support for two independent causal SNPs associated with each trait. In contrast, if PP4 is large, the data support a single variant affecting both traits. An illustration of the method is shown in Figure 2 for negative (Figure 2A–B, FRK gene and LDL, PP3 >90%) and positive (Figure 2C–D, SDC1 gene and total cholesterol, PP4 >80%) colocalisation results. 10.1371/journal.pgen.1004383.g002 Figure 2 Illustration of the colocalisation results. Negative [SPACE] (A–B, FRK gene and LDL, PP3 >90%) and positive (C–D, SDC1 gene and total cholesterol, PP4 >80%) colocalisation results. −log10(p) association p-values for biomarker (top, A and C) and −log10(p) association p-values for expression (bottom, B and D) at the FRK (A, B) and SDC1 locus (C, D), 1Mb range. While the method uses Approximate Bayes Factor computations (ABF, [20], and Methods), no iterative computation scheme (such as Markov Chain Monte Carlo) is required. Therefore, computations are quick and do not require any specific computing infrastructure. Precisely, the computation time behaves as , where Q is the number of variants in the genomic region and d the number distinct associations (typically d = 2, assuming two traits and at most one causal variant per trait). Importantly, the use of ABF enable the computation of posterior probabilities from single variant association p-values and MAFs, although the estimated single SNP regression coefficients and their variances or standard errors are preferred for imputed data. Sample size required for colocalisation analysis Given the well-understood requirements for large sample size for GWAS data, we used simulations to investigate the power of our approach. We generated pairs of eQTL/biomarker datasets assuming a shared causal variant. We varied two parameters: the sample size of the biomarker dataset and the proportion of the biomarker variance explained by the shared genetic variant. We set the proportion of the eQTL variance explained by the shared variant to 10% and we used the original sample size of the liver eQTL dataset described herein [16]. Text S1 contains a description of the simulation procedure. Results are shown in Figure 3. We find that given a sample size of 2,000 individuals for the biomarker dataset, the causal variant needs to explain close to 2% of the variance of the biomarker to provide reliable evidence in favour of a colocalised signal (lower percentile for PP4 >80%). 10.1371/journal.pgen.1004383.g003 Figure 3 Simulation analysis with a shared causal variant between two studies. The two datasets used are one eQTL (sample size 966 samples, 10% of the variance explained by the variant) and one biomarker (such as LDL). The variance explained by the biomarker is colour coded and the x-axis shows the sample size of the biomarker study. The y axis shows the median, 10% and 90% quantile of the distribution of PP4 values (which supports a shared common variant). Consequence of limited variant density and non-additive associations Until recently the assumption that, for a given GWAS signal, the causal variant in that interval had been genotyped was unrealistic. However, the application of imputation techniques [17]–[19] can provide genotype information about the majority of common genetic variants. Therefore, in situations where a common variant drives the GWAS signal, it is now plausible that, in imputed datasets, genotype information about this variant is available. Nevertheless, limited imputation quality can invalidate this hypothesis. This prompted us to investigate the implication of not including the causal variant in the genotype panel. To address this question, we used Illumina MetaboChip data and imputed the genotyped regions using the Minimac software ([19] and Methods). We then selected only the subset of variants present in the Illumina 660K genotyping array. We simulated data under the assumption of a shared causal variant, with 4,000 individuals in the biomarker dataset. We then computed the PP4 statistic with and without restricting the SNP set to the Illumina 660K Chip SNPs (Figure 4). We also considered two different scenarios, with the causal SNP included/not included in the Illumina 660W panel (Figures S1 and S2 for more exhaustive simulations). 10.1371/journal.pgen.1004383.g004 Figure 4 Simulation analysis with a shared causal variant between two studies. The two datasets used are one eQTL (sample size 966 samples) and one biomarker (sample size of 4,000 samples). The variance explained by the biomarker and the expression is the same and is colour coded. The x-axis shows the estimated PP4 for 1,000 simulations using data imputed from metaboChip Illumina array. The y-axis uses the same dataset restricted to variants present on the Illumina 660W genotyping array to assess the impact of a lower variant density. A. The causal variant is included in the Illumina 660W panel. B. The causal SNP not included in Illumina 660W panel. Our results show that when the causal variant is directly genotyped by the low density array, the use of imputed data is not essential (Figure 4A). However, in cases where the causal variant is not typed or imputed in the low density panel, the variance of PP4 is much higher (Figure 4B). In this situation, the resulting PP4 statistic tends to decrease even though considerable variability is observed. Inspection of simulation results in Figure 5 (bottom row for tagging SNP, leftmost graph for shared causal variant) shows that while PP4 tends to be lower than for its counterpart with complete genotype data (top row, leftmost graph), PP3 remains low. This indicates that more probability is given to PP0, PP1 and PP2, which can be interpreted as a loss of power rather than misleading inference in favour of distinct variants for both traits. 10.1371/journal.pgen.1004383.g005 Figure 5 Summary of proportional and Bayesian colocalisation analysis of simulated data. Each plot shows a different scenario, the total number of causal variants in a region is indicated by number of circles in the plot titles with causal variants affecting both traits, the eQTL trait only, or the biomarker trait only, indicated by full circles, top-shaded circles and bottom-shaded circles respectively. In the top row the causal variant is typed or imputed, whereas only tag variants are typed/imputed in the bottom row. For proportional testing (under the BMA approach), we show the proportion of simulations with posterior predictive p-value 0.9. Error bars show 95% confidence intervals (estimated based on an average of 1,000 simulations per scenario). In all cases, for the eQTL sample size is 1,000; genetic variants explain a total of 10% of eQTL variance; for the biomarker trait, the sample size is 10,000. Statistical power may also be affected by the mode of inheritance of the causal variant. To address this, we simulated cases under a recessive pattern of inheritance. Our results show that if the true model is recessive, but the eQTL signal is nonetheless analysed using the trend test, then we will often also successfully detect a colocalised signal (Figure S9). Comparison with existing colocalisation tests We compared the behaviour of our proposed test with that of proportional colocalisation testing [12], [14] in the specific case of a biomarker dataset with 10,000 samples (Figure 5, and also Figures S3 and S4). Broadly, in the case of either a single common causal variant or two distinct causal variants, our proposed method could infer the simulated hypotheses correctly (PP4 or PP3 >0.9) with good confidence, and PP3 >0.9 slightly more often than the proportional testing p-value median PP3), and either hypotheses H3 or H4 can potentially have strong support (PP4 >0.9 in close to 50% of simulations, and PP3 >0.9 in around 25% of simulations). Of course, the ultimate goal should be to extend these tests to cover multiple causal variants, but in the meantime, it can be useful to know that a high PP4 in our proposed Bayesian analysis indicates strong support for “at least one causal variant” and that rejection of the null of proportionality of regression coefficients indicates that the two traits do not share all causal variants, not that they cannot share one. Dealing with several independent associations for the same trait We have so far assumed that each trait is associated with at most one causal variant per locus. However, it is not unusual to observe two or more independent associations at a locus for a trait of interest [22]. In the presence of multiple independent associations, the assumption of a single variant per trait prompts the algorithm to consider only the strongest of these distinct association signals. Hence, the presence of additional associations that explain a smaller fraction of the variance of the trait, for example additional and independently associated rare variants, have a negligible impact on our computations. To illustrate this situation, we simulated datasets with two causal variants: one colocalised eQTL/biomarker signal plus a secondary independent “eQTL only” signal (Figure S8). These simulations confirm that the PP4 statistic is only affected in the presence of two independent associations that explain a similar proportion of the variance of the trait (Figure S8). The natural and statistically exact modification of our approach would compute, for each trait, Bayes factors for sets of SNPs rather than single SNPs (up to N SNPs jointly to accommodate for N distinct associations per trait). However, this approach has two drawbacks. Firstly, the interpretation of the resulting posterior probabilities is more challenging in situations where some but not all of the variants are shared across both traits. More importantly, the typical approach consists of publishing single variant summary statistics, which would prevent the use of standard summary statistics, a key feature of our approach. Owing to the focus of our algorithm on the strongest association signal, an alternative approach to deal with multiple associations consists of using a stepwise regression strategy, which would then reveal the secondary association signals. Our colocalisation test can then be run on using the conditional p-values. We find this approach to be the most practical and illustrate below an application for a locus that contains several independent eQTL associations (Figure 6). In situations where only single SNP summary statistics are available, the approximate conditional meta-analysis framework proposed by Visscher et al. [23] can be used to obtain conditional p-values. 10.1371/journal.pgen.1004383.g006 Figure 6 LDL association and eQTL association plots at the SYPL2 locus. The x-axis shows the physical position on the chromosome (Mb) A: -log10(p) association p-values for LDL. The p-values are from the Teslovich et al published meta-analysis of >100,000 individuals. B: −log10(p) association p-values for SYPL2 expression in 966 liver samples. C: −log10(p) association p-values for SYPL2 expression conditional on the top eQTL associated SNP at this locus (rs2359653). Application to a meta-analysis of blood lipids combined with a liver expression dataset Teslovich et al. [15] reported common variants associated with plasma concentrations of low-density lipoprotein cholesterol (LDL), high-density lipoprotein cholesterol (HDL) and triglyceride (TG) levels in more than 100,000 individuals of European ancestry. They then reported the correlations between the lead SNPs at the loci they found and the expression levels of transcripts in liver. For the lipid dataset we have access only to summary statistics. The liver expression dataset used in this analysis is the same as the one used in [15]. In Teslovich et al., regions are defined within 500 kilobases of the lead SNPs, and the threshold for significance is . At this threshold, they found 38 SNP-to-gene eQTLs in liver (Supplementary Table 8 of [15]). Table S1 shows our results for these 38 previously reported colocalisations. A complete list of all our identified colocalisations (independently of previous reports) is provided in Tables S2, S3, S4, S5 (broken down by lipid traits). Using the coloc web server for this analysis with a PP4 >75, it took 1 minute to complete chromosome 1 and approximately 7 minutes to analyse the entire imputed genome-wide data on a laptop. The majority of our results are consistent with the findings of Teslovich et al., with 26 out of 38 loci having PP4 . To assess the role of the prior, we varied the critical parameter , which codes for the prior probability that a variant is associated with both traits. Here we report the results using the . The complete list of results is provided in Table S1. Table 1 lists the previously reported lipid-eQTL for which we find strong support against the colocalisation hypothesis (PP3 >75%). The LocusZoom association plots for each of these loci can be found in Figure S5. In addition to the loci listed in Table 1, we found strong evidence of distinct signals between HLA-DQ/HLA-DR and TC (Table S1) but these results must be interpreted with caution owing to the extensive polymorphism in the major histocompatibility complex region. 10.1371/journal.pgen.1004383.t001 Table 1 Loci previously reported to colocalise with liver eQTL, but not supported by our analysis. Chr Region Gene Trait Biom pval Biom SNP eQTL pval eQTL SNP Primary signal Secondary signal* Other genes colocalising in region (PP4 >75%) PP3 (%) PP4 (%) PP4 (%) conditional SNP 1 109824678∶110224737 SYPL2 LDL 9.7e–171 rs629301 7.1e–103 rs2359653 >99 99 99 99 99 99 75%. *Secondary signals are reported only when there is a secondary eQTL at a p-value greater than . Colocalisation tests are computed using the expression data conditioned on the listed SNP. Other genes in the same region as the gene listed that colocalise using our method are reported. For only one locus (CEP250), we did not find a significant eQTL signal, pointing to potential differences in bioinformatics processing and/or imputation strategy. In such a situation, both PP3 and PP4 are low and PP0, PP1 and PP2 concentrate most of the posterior distribution. Three loci (TMEM50A, ANGPTL3, PERLD1/PGAP3) do not have enough evidence to strongly support either colocalisation or absence of colocalisation (Table S1) and these should remain marked as doubtful. One of these genes, ANGPTL3 is noteworthy. Examining this locus (Figure S6), it is clear that the pattern of association p-values is consistent between LDL and ANGPTL3 expression. However, the extent of LD is strong, with 98 strongly associated variants. In such a situation, there is uncertainty as to whether the data support a shared causal variant for both traits, or two distincts variants for eQTL/LDL. Because the data are consistent with both scenarios, the choice of prior becomes determinant. Accordingly, PP4 drops from 91% to 49% if one uses instead of . Table 2 lists the 14 colocalised loci (15 genes) that were not reported by Teslovich et al. (or in Global Lipids Genetics Consortium [24] for the gene NYNRIN), but for which our method finds strong support for colocalisation (PP4 >75%). Figure S7 shows the LocusZoom plots for these colocalisation results. Eleven of these 15 genes are strong candidates for involvement in lipid metabolism and/or have been previously suggested as candidate genes: SDC1, TGOLN2, INHBB, UBXN2B, VLDLR, VIM, CYP26A1, OGFOD1, HP, HPR, PPARA. See Text S2 for a brief overview of the function of these genes. Four others genes have a less obvious link: CMTM6, C6orf106, CUX2, ENSG00000259359. 10.1371/journal.pgen.1004383.t002 Table 2 Novel loci not previously reported to colocalise with liver eQTL, but colocalising based on our analysis. Chr Region Gene Trait Biompval BiomSNP eQTLpval eQTLSNP PP3 PP4 Reference 2 20201795∶20601854 SDC1 TC 1.23E-07 2∶20368519 6.66E-09 2∶20371380 17 82 [41] 2 85349026∶85749085 TGOLN2 HDL 1.01E-07 2∶85546192 2.83E-80 2∶85553784 17 83 [42] 2 120908798∶121308857 INHBB LDL 1.43E-06 2∶121305771 4.88E-21 2∶121306440 7 77 [43] 3 32322873∶32722932 CMTM6 TC 4.66E-06 3∶32533010 2.73E-07 3∶32523287 8 77 6 34355095∶34755154 C6orf106 TC 4.68E-11 6∶34546560 4.48E-09 6∶34616322 15 85 8 59158506∶59558565 UBXN2B LDL 3.86E-09 8∶59311697 3.46E-10 8∶59331282 13 87 [44] TC 8.79E-13 8∶59311697 3.46E-10 8∶59331282 15 85 9 2454062∶2854121 VLDLR LDL 8.05E-06 9∶2640759 1.36E-07 9∶2640759 1 91 [45] 10 17079389∶17479448 VIM TC 7.22E-07 10∶17259642 9.84E-09 10∶17260290 5 93 [46] 10 94637063∶95037122 CYP26A1 TG 2.38E-08 10∶94839642 3.51E-06 10∶94839724 3 95 [47] 12 111508189∶111908248 CUX2 HDL 4.38E-06 12∶111904371 2.81E-16 12∶111884608 2 89 LDL 1.73E-09 12∶111884608 2.81E-16 12∶111884608 2 98 TC 2.36E-11 12∶111904371 2.81E-16 12∶111884608 2 98 15 96517293∶96917352 ENSG00000259359 HDL 8.04E-06 15∶96708291 5.50E-13 15∶96708291 2 87 16 56310220∶56710279 OGFOD1 TC 3.19E-06 16∶56490549 3.36E-11 16∶56493573 7 84 [48] 16 71894416∶72310900 HP LDL 1.75E-22 16∶72108093 2.15E-06 16∶72108093 1 97 [49] TC 3.22E-24 16∶72108093 2.15E-06 16∶72108093 1 97 TG 5.66E-06 16∶72108093 2.15E-06 16∶72108093 2 75 HPR LDL 1.75E-22 16∶72108093 4.18E-08 16∶72108093 1 99 [50] TC 3.22E-24 16∶72108093 4.18E-08 16∶72108093 1 99 TG 5.66E-06 16∶72108093 4.18E-08 16∶72108093 2 89 22 46433083∶46833138 PPARA TC 3.59E-06 22∶46627603 5.96E-08 22∶46632994 10 81 [51] Signals previously not reported as having a probable shared variant but supported by our method based on PP4 (posterior probability for a shared signal) >75% for colocalisation between the liver eQTL dataset and the Teslovich et al. meta-analysis of LDL, HDL, TG, TC, using the strict prior . For 11 genes with strong candidate status for lipid metabolism, we list a key reference that describes their function (see Text S2 for more details of gene functions). Three previously reported genes (SYPL2, IFT172, TBKBP1) which, based on our re-analysis, do not colocalise with the lipid traits, have a nearby gene with a high probability of colocalisation (respectively, SORT1, GCKR, KPNB1). This suggests that these genes are more likely candidates in this region. To explore the possibility that secondary signals may colocalise, we applied the stepwise regression strategy described above to deal with several independent associations at a single locus. We performed colocalisation test using eQTL results conditional on the top eQTL associated variant. Two of the loci (SYPL2/LDL or TC, APOC4 and TG) showed evidence of colocalisation with expression after conditional analysis (Table 1). An example of this stepwise procedure for the gene SYPL2 and LDL is provided in Figure 6. We find that the top liver eQTL signal is clearly discordant with LDL association (Table 1 and Figure 6). However, conditioning on the top eQTL signal reveals a second independent association for SYPL2 expression in liver. This secondary SYPL2 eQTL colocalises with the LDL association (PP4 >90%, Figure 6). Web based resource We developed a web site designed for integration of GWAS results using only p-values and the sample size of the datasets (http://coloc.cs.ucl.ac.uk/coloc/). The website was developed using RWUI [25]. Results include a list of potentially causal genes with the associated PP4 with their respective plots and ABF, and can be viewed either interactively or returned by email. Researchers can request a genome-wide scan of results from a genetic association analysis, and obtain a list of genes with a high probability of mediating the GWAS signals in a particular tissue. The tool also allows visualisation of the signals within a genetic region of interest. The database and browser currently include the possibility of investigating colocalisation with liver [15] and brain [26], [27] expression data, however the resource will soon be extended to include expression in different tissues. This method, as well as alternative approaches for colocalisation testing [12], [14], are also available with additional input options in an R package, coloc, from the Comprehensive R Archive Network (http://cran.r-project.org/web/packages/coloc). Discussion We have developed a novel Bayesian statistical procedure to assess whether two association signals are colocalised. Our method is best suited for associations detected by GWAS, which are likely to reflect common, imputable, variations with small effects, or a rare variants with large effect sizes. Our aim differs from a typical fine-mapping exercise in the sense that we are not interested in knowing which variant is likely to be causal but only whether a shared causal variant is plausible. The strength of this approach lies in its speed and analytical forms, combined with the fact that it can use single variant p-values when only these are available. Our results show that to provide an accurate answer to the colocalisation problem, high-density genotyping and/or accurate use of imputation techniques are key. The quality of the imputation is another important parameter. Indeed, while the variance of the regression coefficient can be estimated solely on the basis of the minor allele frequency for typed SNPs and sample size (and the case control ratio in the case of a binary outcome) [17], [28], this ignores the uncertainty due to imputation. Filtering out poorly imputed SNPs partially addresses this problem, with the drawback that it may exclude the causal variant(s). Hence, providing estimates of the variance of the MLE, together with the effect estimates, will result in greater accuracy. This additional option is available on the coloc package in R (http://cran.r-project.org/web/packages/coloc). We currently assume that each genetic variant is equally likely a priori to affect gene expression or trait. A straightforward addition to our methodology would consider location specific priors for each variant, which would depend for example on the distance to the gene of interest, or the presence of functional elements in this chromosome region [29]. Our computation of the BF also assumes that, under , the effect sizes of the shared variant on both traits are independent. This could be modified if, for example, one compares eQTLs across different tissue types, or the same trait in two different studies. [30] has proposed a framework to deal with correlated effect sizes, and these ideas could potentially be incorporated in our colocalisation test. Another related issue is the choice of prior probabilities for the various configurations. For the eQTL analysis, we used a prior probability for a cis-eQTL. A more stringent threshold may be better suited for trans-eQTLs where the variants are further away from the gene under genetic control. We also used a prior probability of for the lipid associations. Although our knowledge about this is still lacking, this estimate has been suggested in the literature in the context of GWAS [20], [31], [32]. We assigned a prior probability of for , which encodes the probability that a variant affects both traits. It has been shown that SNPs associated with complex traits are more likely to be eQTLs compared to other SNPs chosen at random from GWAS platforms [33], and a higher weighting for these SNPs has been proposed when performing Bayesian association analyses [34], [35]. Also, eQTLs have been shown to be enriched for disease-associated SNPs when a disease-relevant tissue is used [9], [36]. Our sensitivity analysis for the parameter showed broadly consistent results (Table S1). In cases where GWAS data are available for both traits, [10] show that it is possible to estimate these parameters from the data using a hierarchical model. This addition is a possible extension of our approach. The interpretation of the posterior probabilities requires caution. For example, a low PP4 may not indicate evidence against colocalisation in situations where PP3 is also low. It may simply be the result of limited power, which is evidenced by high values of PP0, PP1 and/or PP2. Moreover, a high PP4 is a measure of correlation, not causality. To illustrate this, one can consider the relatively common situation where a single variant appears to affect the expression of several genes in a chromosome region (as observed, for example, in the region surrounding the SORT1 gene). Several eQTLs will be colocalised, both between them and with the biomarker of interest. In this situation one would typically expect that a single gene is causally involved in the biomarker pathway but the colocalisation test with the biomarker will generate high PP4 values for all genes in the interval. We show that we can use conditional p-values to deal with multiple independent associations with the same trait at one locus. While we found this solution generally effective, Wallace [14] points out that this top SNP selection for the conditional analysis can create biases, although the bias is small in the case of large samples and/or strong effects. For difficult loci with multiple associations for both traits and available genotype data, it may be more appropriate to estimate Bayes factors for sets rather than single variants in order to obtain an exact answer. This extension would avoid the issue of SNP selection for the conditional analysis. Importantly, GWAS signals can be explained by eQTLs only when the causal variant affects the phenotype by altering the amount of mRNA produced, but not when the phenotype is affected by changing the type of protein produced, although the former seems to be the most common [33]. Furthermore, since many diseases manifest their phenotype in certain tissues exclusively [2], [21], [37], [38], colocalisation results will be dependent on the expression dataset used. In addition to identifying the causal genes, the identification of tissue specificity for the molecular effects underlying GWAS signals is a key outcome of our method. We anticipate that building a reference set of eQTL studies in multiple tissues will provide a useful check for every new GWAS dataset, pointing directly to potential candidate genes/tissue types where these effects are mediated. While this report focuses on finding shared signals between a biomarker dataset and a liver expression dataset, we plan to utilise summary results of multiple GWAS and eQTL studies, for a variety of cell types and traits. In fact, our method can utilise summary results from any association studies. Disease/disease, (cis or trans) eQTL/disease or disease/biomarkers comparisons are all of biological interest and use the same statistical framework. We expect that the fact that the test can be based on single SNP summary statistics will be key to overcome data sharing concerns, hence enabling a large scale implementation of this tool. The increasing availability of RNA-Seq eQTL studies will further increase the opportunity to detect isoform specific eQTLs and their relevance to disease studies. Owing to the increasing availability of GWAS datasets, the systematic application of this approach will potentially provide clues into the molecular mechanisms underlying GWAS signals and the aetiology of the disorders. Materials and Methods Ethics statement This paper re-analyses previously published datasets. All samples and patient data were handled in accordance with the policies and procedures of the participating organisations. Expression dataset We used in our analysis gene expression and genotype data from 966 human liver samples. The samples were collected post-mortem or during surgical resection from unrelated European-American subjects from two different non-overlapping studies, which have been described in [16]. The cohorts were both genotyped using Illumina 650Y BeadChip array, and 39,000 expression probes were profiled using Agilent human gene expression arrays. All of the expression data has been normalised as one unit even though they were part of different studies, since high concordance between data generated using the same array platforms has been previously reported. Probe sequences were searched against the human reference genome GRCh37 from 1000 Genomes using BLASTN. Multiple probes mapping to one gene were kept in order to examine possible splicing. The probes were kept and annotated to a specific gene if they were entirely included in genes defined by Ensembl ID or by HGNC symbol using the package biomaRt in R [39]. After mapping and annotating the probes, we were left with 40,548 mapped probes covering 24,927 genes. Imputation of genetic data Quality control filters were applied both before and after imputation. Before imputation, individuals with more than 10% missing genotypes were removed, and SNPs showing a missing rate greater than 10%, a deviation for HWE at a p-value less than 0.001 were dropped. After imputation, monomorphic SNPs were excluded from analyses. To speed up the imputation process, the genome was broken into small chunks that were phased and imputed separately and then re-assembled. This was achieved using the ChunkChromosome tool (http://genome.sph.umich.edu/wiki/ChunkxChromosome), and specifying chunks of 1000 SNPs, with an overlap window of 200 SNPs on each side, which improves accuracy near the edges during the phasing step. Each chunk was phased using the program MACH1 with the number of states set to 300 and the number of rounds of MCMC set to 20 for all chunks. Phased haplotypes were used as a basis for imputation of untyped SNPs using the software Minimac with 1000 Genomes European ancestry reference haplotypes (phase1 version 3, March 2012) to impute SNPs not genotyped on the Illumina array. Variants with a MAF less than 0.001 were also excluded post-imputation. The data was then collated in probability format that can be used by the R Package snpStats [39]. eQTL analysis eQTL p-values, effect sizes, and standard errors were obtained by fitting a linear trend test regression between the expression of each gene and all variants 200 kilobases upstream and downstream from each probe. After filtering out the variants with MAF 100,000 individuals. B: −log10(p) association p-values for ANGPTL3 expression in 966 liver samples. (TIF) Click here for additional data file. Figure S7 Regional Manhattan plots corresponding to loci listed in Table 2 of main text. Row and column headers defined as in previous figure. The genomic range may be greater than kilobases to improve visualisation of the signal. (PDF) Click here for additional data file. Figure S8 Simulation analysis with multiple shared causal variants. The first plot represents cases with only one causal variant in a region, while the following plots illustrate the behaviour of the statistic in the presence of an additional causal variant affecting the variance explained of the eQTL trait. In all scenarios, the first causal variant explains 10% of the variance of the eQTL trait. The second causal variant explains 1%, 5%, or 10% of the eQTL trait. We show the proportion of simulations with the posterior probability (PP3 or PP4) of the indicated hypothesis >0.9. Error bars show 95% confidence intervals (estimated based on an average of 1,000 simulations per scenario). In all cases, for the eQTL sample size is 1,000; for the biomarker trait, the sample size is 10,000. (TIF) Click here for additional data file. Figure S9 Simulation analysis with a recessive shared causal variant. The two datasets used are one eQTL (sample size 966 samples, 10% of the variance explained by the variant) and one biomarker (sample size 10,000). The variance explained by the biomarker is colour coded and the shape of the dots represent the different mode of inheritance. The simulation procedure and distribution of the statistic are the same as defined in previous figure. (TIF) Click here for additional data file. Table S1 Results using reported loci that colocalise with liver eQTL. Published results of loci correlating with both liver expression and one of the four lipid traits (Teslovich et al. Supplementary Table 8) and posterior probability of different signal (PP3) and common signal (PP4) after applying colocalisation test. Each row lists the results for one probe, and the multiple entries for the same locus and trait represent multiple probes mapping to the same locus. the columns Biom pval and eQTL pval report the lowest p-values found for the association with the trait listed and for the liver expression association respectively, with the corresponding SNP name (Biom SNP and eQTL SNP); the column Best Causal reports the SNP within the region with the highest posterior probability to be the true causal variant. The probabilities have been rounded to 1 significant figure. (PDF) Click here for additional data file. Table S2 eQTL/LDL colocalisation. Positive (PP4 >75%) eQTL/LDL colocalisation results between the liver eQTL dataset and the Teslovich meta-analysis using the most stringent prior for the probability that one SNP is associated with both traits, . The column Signal includes genes that are part of overlapping regions and that colocalise at PP4 >75%; the column Region represents the genomic coordinates for the start and stop of the signal; in the column Tesl, “Y” indicates that this signal with any of the genes included has been reported to be an intermediate for any of the four lipid biomarker associations by Teslovich et al. ; the columns Biom pval and eQTL pval report the lowest p-values found for LDL association and for the expression association respectively, with the corresponding SNP name (Biom SNP and eQTL SNP); the column Best Causal reports the SNP within the region with the highest posterior probability to be the true causal variant. The probabilities have been rounded to 1 significant figure. (PDF) Click here for additional data file. Table S3 eQTL/HDL colocalisation. Positive (PP4 >75%) eQTL/HDL colocalisation results between the liver eQTL dataset and the Teslovich meta-analysis. Column and row headings are the same as in previous figure. (PDF) Click here for additional data file. Table S4 eQTL/TG colocalisation. Positive (PP4 >75%) eQTL/HDL colocalisation results between the liver eQTL dataset and the Teslovich meta-analysis. Column and row headings are the same as in previous figure. (PDF) Click here for additional data file. Table S5 eQTL/TC colocalisation. Positive (PP4 >75%) eQTL/HDL colocalisation results between the liver eQTL dataset and the Teslovich meta-analysis. Column and row headings are the same as in previous figure. (PDF) Click here for additional data file. Text S1 Supplementary materials. Expanded methods, derivations and analyses. (PDF) Click here for additional data file. Text S2 Overview of gene function of new colocalisation results associated with blood lipid levels and liver expression. (PDF) Click here for additional data file.

0 comments Cited 875 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): JAMA Neurol

Journal ID (iso-abbrev): JAMA Neurol

Journal ID (pmc): JAMA Neurol

Title: JAMA Neurology

Publisher: American Medical Association

ISSN (Print): 2168-6149

ISSN (Electronic): 2168-6157

Publication date (Electronic): 1 February 2021

Publication date (Print): April 2021

Publication date PMC-release: 1 February 2021

Volume: 78

Issue: 4

Pages: 1-10

Affiliations

[1 ]Department of Molecular Neuroscience, University College London Institute of Neurology, Queen Square, London, United Kingdom

[2 ]Department of Pharmacology, School of Pharmacy, University College London, London, United Kingdom

[3 ]Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, University of Cardiff, Cardiff, United Kingdom

[4 ]Departamento de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain

[5 ]Department of Comparative Biomedical Sciences, The Royal Veterinary College, London, United Kingdom

[6 ]School of Pharmacy, University of Reading, Reading, United Kingdom

[7 ]Department of Genetics, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia

[8 ]Department of Neurodegenerative Disease and Reta Lila Weston Laboratories, University College London Dementia Research Institute, London, United Kingdom

Author notes

Article Information

Group Information: The complete list of the International Parkinson’s Disease Genomics Consortium (IPDGC) and United Kingdom Brain Expression Consortium (UKBEC) members is listed at the end of this article.

Accepted for Publication: September 11, 2020.

Published Online: February 1, 2021. doi:10.1001/jamaneurol.2020.5257

Corresponding Authors: Nicholas W. Wood, PhD, Institute of Neurology, Department of Molecular Neuroscience, University College London, Queen Square, London WC1N 3BG, United Kingdom ( n.wood@ 123456ucl.ac.uk ); John Hardy, PhD, UCL Queen Square Institute of Neurology, Department of Neurodegenerative Disease and Reta Lila Weston Laboratories, UCL Dementia Research Institute, Wing 1.2, Cruciform Building, Gower Street, London WC1E 6BT, United Kingdom ( j.hardy@ 123456ucl.ac.uk ).

Author Contributions: Messrs Kia and Zhang had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Messrs Kia and Zhang contributed equally to this work. Drs Hardy and Wood contributed equally to this work.

Concept and design: Kia, Zhang, Lewis, Trabzuni, Hardy, Wood.

Acquisition, analysis, or interpretation of data: Kia, Zhang, Guelfi, Manzoni, Hubbard, Reynolds, Botía, Ryten, Ferrari, Lewis, Williams, Wood.

Drafting of the manuscript: Kia, Zhang, Ryten, Lewis, Williams, Trabzuni, Hardy.

Critical revision of the manuscript for important intellectual content: Kia, Zhang, Guelfi, Manzoni, Hubbard, Reynolds, Botía, Ferrari, Lewis, Williams, Wood.

Statistical analysis: Kia, Zhang, Manzoni, Reynolds, Botía, Ryten, Ferrari, Williams.

Obtained funding: Lewis, Williams, Hardy, Wood.

Administrative, technical, or material support: Kia, Zhang, Guelfi, Manzoni, Ryten.

Supervision: Kia, Lewis, Williams, Hardy, Wood.

Conflict of Interest Disclosures: Dr Manzoni and Dr Lewis reported receiving grants from Medical Research Council (MRC) during the conduct of the study. Dr Williams reported receiving grants from Parkinson’s UK during the conduct of the study. Dr Trabzuni reported receiving personal fees from King Faisal Specialist Hospital during the conduct of the study; and grants from University College London outside the submitted work. No other disclosures were reported.

Funding/Support: This study was supported by the Medical Research Council and Wellcome Trust Disease Centre (grant WT089698/Z/09/Z to Drs Wood and Hardy). Funding for the project was provided by the Wellcome Trust under awards 076113, 085475, and 090355. This study was also supported by Parkinson’s UK (grants 8047 and J-0804) and the MRC (G0700943 and G1100643). This work was supported by the MRC grant MR/N026004/1. The Braineac project was supported by the MRC through the MRC Sudden Death Brain Bank Grant (MR/G0901254) to Dr Hardy. Dr Lewis was supported by grants MR/N026004/1 and MR/L010933/1 from the MRC and the Michael J. Fox Foundation for Parkinson’s Research. Dr Trabzuni was supported by the King Faisal Specialist Hospital and Research Centre and the Michael J. Fox Foundation for Parkinson’s Research. Dr Hardy was supported by the Dolby Foundation. Dr Ferrari is supported by grant 284 from the Alzheimer’s Society. University College London Hospitals and University College London receive support from the Department of Health’s National Institute for Health Research (NIHR) Biomedical Research Centres (BRC). Dr Wood is an NIHR senior investigator and receives support from the JPND-MRC Comprehensive Unbiased Risk Factor Assessment for Genetics and Environment in Parkinson’s disease (COURAGE). Ms Reynolds is supported through the award of a Leonard Wolfson Doctoral Training Fellowship in Neurodegeneration. This work was supported in part by the Intramural Research Programs of the National Institute of Neurological Disorders and Stroke (NINDS), the National Institute on Aging (NIA), and the National Institute of Environmental Health Sciences, both part of the National Institutes of Health, Department of Health and Human Services; project numbers 1ZIA-NS003154, Z01-AG000949-02, and Z01-ES101986. In addition, this work was supported by the Department of Defense (award W81XWH-09-2-0128), and the Michael J. Fox Foundation for Parkinson’s Research. This work was supported by National Institutes of Health grants R01NS037167, R01CA141668, and P50NS071674; American Parkinson Disease Association (APDA); Barnes Jewish Hospital Foundation; Greater St Louis Chapter of the APDA. The KORA (Cooperative Research in the Region of Augsburg) research platform was started and financed by the Forschungszentrum für Umwelt und Gesundheit, which is funded by the German Federal Ministry of Education, Science, Research, and Technology and by the State of Bavaria. This study was also funded by the German Federal Ministry of Education and Research (BMBF) under the funding code 031A430A, the EU Joint Programme - Neurodegenerative Diseases Research (JPND) project under the aegis of JPND ( www.jpnd.eu) through Germany, BMBF, funding code 01ED1406 and iMed - the Helmholtz Initiative on Personalized Medicine. This study is funded by the German National Foundation grant (DFG SH599/6-1), Michael J Fox Foundation, and MSA Coalition, USA. The French GWAS work was supported by the French National Agency of Research (ANR-08-MNP-012). This study was also funded by France-Parkinson Association, Fondation de France, the French program “Investissements d’avenir” funding (ANR-10-IAIHU-06) and a grant from Assistance Publique-Hôpitaux de Paris (PHRC, AOR-08010) for the French clinical data. This study was also sponsored by the Landspitali University Hospital Research Fund; Icelandic Research Council; and European Community Framework Programme 7, People Programme, and IAPP on novel genetic and phenotypic markers of Parkinson’s disease and Essential Tremor (MarkMD), contract number PIAP-GA-2008-230596 MarkMD. Institutional research funding IUT20-46 was received from the Estonian Ministry of Education and Research. The McGill study was funded by the Michael J. Fox Foundation and the Canadian Consortium on Neurodegeneration in Aging (CCNA). Sequencing and genotyping done in McGill University was supported by grants from the Michael J. Fox Foundation, the Canadian Consortium on Neurodegeneration in Aging (CCNA) and in part thanks to funding from the Canada First Research Excellence Fund (CFREF), awarded to McGill University for the Healthy Brains for Healthy Lives (HBHL) program.

Role of the Funder/Sponsor: The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Group Information: The International Parkinson’s Disease Genomics Consortium (IPDGC) and United Kingdom Brain Expression Consortium (UKBEC) members are listed in Supplement 10.

Additional Contributions: We thank all the individuals who donated their time and biological samples to be a part of this study. We also thank all members of the International Parkinson’s Disease Genomics Consortium (IPDGC). See http://pdgenetics.org/partners for a complete overview of members, acknowledgments, and funding. This study used the high-performance computational capabilities of the Biowulf Linux cluster at the National Institutes of Health, Bethesda, MD ( https://hpc.nih.gov/), and DNA panels, samples, and clinical data from the National Institute of Neurological Disorders and Stroke Human Genetics Resource Center DNA and Cell Line Repository. People who contributed samples are acknowledged in descriptions of every panel on the repository website. We thank the French Parkinson’s Disease Genetics Study Group and the Drug Interaction With Genes (DIGPD) study group: Y. Agid, M. Anheim, F. Artaud, A.-M. Bonnet, C. Bonnet, F. Bourdain, J.-P. Brandel, C. Brefel-Courbon, M. Borg, A. Brice, E. Broussolle, F. Cormier-Dequaire, J.-C. Corvol, P. Damier, B. Debilly, B. Degos, P. Derkinderen, A. Destée, A. Dürr, F. Durif, A. Elbaz, D. Grabli, A. Hartmann, S. Klebe, P. Krack, J. Kraemmer, S. Leder, S. Lesage, R. Levy, E. Lohmann, L. Lacomblez, G. Mangone, L.-L. Mariani, A.-R. Marques, M. Martinez, V. Mesnage, J. Muellner, F. Ory-Magne, F. Pico, V. Planté-Bordeneuve, P. Pollak, O. Rascol, K. Tahiri, F. Tison, C. Tranchant, E. Roze, M. Tir, M. Vérin, F. Viallet, M. Vidailhet, and A. You. We also thank the members of the French 3C Consortium: A. Alpérovitch, C. Berr, C. Tzourio, and P. Amouyel for allowing us to use part of the 3C cohort, and D. Zelenika for support in generating the genome-wide molecular data. We thank P. Tienari, Molecular Neurology Programme, Biomedicum, University of Helsinki, T. Peuralinna, Department of Neurology, Helsinki University Central Hospital, L. Myllykangas, Folkhalsan Institute of Genetics and Department of Pathology, University of Helsinki, and R. Sulkava, Department of Public Health and General Practice Division of Geriatrics, University of Eastern Finland, for the Finnish controls (Vantaa85+ GWAS data). We used genome-wide association data generated by the Wellcome Trust Case-Control Consortium 2 (WTCCC2) from UK patients with Parkinson disease and UK control individuals from the 1958 Birth Cohort and National Blood Service. Genotyping of UK replication cases on ImmunoChip was part of the WTCCC2 project, which was funded by the Wellcome Trust (083948/Z/07/Z). UK population control data were made available through WTCCC1. As with previous IPDGC efforts, this study makes use of data generated by the Wellcome Trust Case-Control Consortium. A full list of the investigators who contributed to the generation of the data is available from www.wtccc.org.uk. We thank Jeffrey Barrett and Jason Downing, Illumina Inc, for assistance with the design of the ImmunoChip and NeuroX arrays. DNA extraction work that was done in the UK was undertaken at University College London Hospitals, University College London, who received a proportion of funding from the Department of Health’s National Institute for Health Research Biomedical Research Centres funding. This study was supported in part by the Wellcome Trust/Medical Research Council Joint Call in Neurodegeneration award (WT089698) to the Parkinson’s Disease Consortium (UKPDC), whose members are from the UCL Institute of Neurology, University of Sheffield, and the Medical Research Council Protein Phosphorylation Unit at the University of Dundee. We thank the Quebec Parkinson Network ( http://rpq-qpn.ca/en/) and its members.

Article

Publisher ID: noi200099

DOI: 10.1001/jamaneurol.2020.5257

PMC ID: 7851759

PubMed ID: 33523105

SO-VID: 2fb5b873-306c-441f-8bb8-cf6202d0d2cb

License:

This is an open access article distributed under the terms of the CC-BY License.

Identification of Candidate Parkinson Disease Genes by Integrating Genome-Wide Association Study, Expression, and Epigenetic Data Sets

Read this article at

Key Points

Question

Findings

Meaning

Abstract

Importance

Objective

Design and Setting

Main Outcomes and Measures

Results

Conclusions and Relevance

Abstract

Related collections

UCL: UN SDG 03 Good Health and Well-Being

Most cited references 35

An RNA-sequencing transcriptome and splicing database of glia, neurons, and vascular cells of the cerebral cortex.

Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans.

Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 314

Cited by 56

Most referenced authors 1,816