rMVP: A Memory-efficient, Visualization-enhanced, and Parallel-accelerated Tool for Genome-wide Association Study

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Along with the development of high-throughput sequencing technologies, both sample size and SNP number are increasing rapidly in genome-wide association studies (GWAS), and the associated computation is more challenging than ever. Here, we present a memory-efficient, visualization-enhanced, and parallel-accelerated R package called “rMVP” to address the need for improved GWAS computation. rMVP can 1) effectively process large GWAS data, 2) rapidly evaluate population structure, 3) efficiently estimate variance components by Efficient Mixed-Model Association eXpedited (EMMAX), Factored Spectrally Transformed Linear Mixed Models (FaST-LMM), and Haseman-Elston (HE) regression algorithms, 4) implement parallel-accelerated association tests of markers using general linear model (GLM), mixed linear model (MLM), and fixed and random model circulating probability unification (FarmCPU) methods, 5) compute fast with a globally efficient design in the GWAS processes, and 6) generate various visualizations of GWAS-related information. Accelerated by block matrix multiplication strategy and multiple threads, the association test methods embedded in rMVP are significantly faster than PLINK, GEMMA, and FarmCPU_pkg. rMVP is freely available at https://github.com/xiaolei-lab/rMVP.

Related collections

Most cited references 29

Record: found
Abstract: found
Article: not found

PLINK: a tool set for whole-genome association and population-based linkage analyses.

Shaun Purcell, Benjamin M. Neale, Kathe Todd-Brown … (2007)

Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

0 comments Cited 5258 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Inference of Population Structure Using Multilocus Genotype Data

Jonathan Pritchard, Matthew Stephens, Peter Donnelly … (2001)

We describe a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations. We assume a model in which there are K populations (where K may be unknown), each of which is characterized by a set of allele frequencies at each locus. Individuals in the sample are assigned (probabilistically) to populations, or jointly to two or more populations if their genotypes indicate that they are admixed. Our model does not assume a particular mutation process, and it can be applied to most of the commonly used genetic markers, provided that they are not closely linked. Applications of our method include demonstrating the presence of population structure, assigning individuals to populations, studying hybrid zones, and identifying migrants and admixed individuals. We show that the method can produce highly accurate assignments using modest numbers of loci—e.g., seven microsatellite loci in an example using genotype data from an endangered bird species. The software used for this article is available from http://www.stats.ox.ac.uk/~pritch/home.html.

0 comments Cited 2165 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

GCTA: a tool for genome-wide complex trait analysis.

Jian Yang, S. Lee, Michael E Goddard … (2011)

For most human complex diseases and traits, SNPs identified by genome-wide association studies (GWAS) explain only a small fraction of the heritability. Here we report a user-friendly software tool called genome-wide complex trait analysis (GCTA), which was developed based on a method we recently developed to address the "missing heritability" problem. GCTA estimates the variance explained by all the SNPs on a chromosome or on the whole genome for a complex trait rather than testing the association of any particular SNP to the trait. We introduce GCTA's five main functions: data management, estimation of the genetic relationships from SNPs, mixed linear model analysis of variance explained by the SNPs, estimation of the linkage disequilibrium structure, and GWAS simulation. We focus on the function of estimating the variance explained by all the SNPs on the X chromosome and testing the hypotheses of dosage compensation. The GCTA software is a versatile tool to estimate and partition complex trait variation with large GWAS data sets.

0 comments Cited 1658 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Xinyun Li

Xiaolei Liu

Journal

Journal ID (nlm-ta): Genomics Proteomics Bioinformatics

Journal ID (iso-abbrev): Genomics Proteomics Bioinformatics

Title: Genomics, Proteomics & Bioinformatics

Publisher: Elsevier

ISSN (Print): 1672-0229

ISSN (Electronic): 2210-3244

Publication date PMC-release: 02 March 2021

Publication date (Print): August 2021

Publication date (Electronic): 02 March 2021

Volume: 19

Issue: 4

Pages: 619-628

Affiliations

[1 ]Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education & College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, China

[2 ]Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, Huazhong Agricultural University, Wuhan 430070, China

[3 ]School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430070, China

[4 ]Department of Crop and Soil Sciences, Washington State University, Pullman, WA 99164, USA

Author notes

[* ]Corresponding authors. xyli@ 123456mail.hzau.edu.cn xiaoleiliu@ 123456mail.hzau.edu.cn

[#]

Equal contribution.

Article

Publisher Item ID: S1672-0229(21)00050-4

DOI: 10.1016/j.gpb.2020.10.007

PMC ID: 9040015

PubMed ID: 33662620

SO-VID: 0b964e35-d9cb-48e5-9bd6-f9dad5dae2ba

License:

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

History

Date received : 9 March 2020

Date revision received : 21 August 2020

Date accepted : 1 January 2021

Comments

Comment on this article

scite_

Cited by 186

See all cited by

Most referenced authors 478

See all reference authors

- Version 1

rMVP: A Memory-efficient, Visualization-enhanced, and Parallel-accelerated Tool for Genome-wide Association Study

Read this article at

Abstract

Related collections

Embodied Memory

Most cited references 29

PLINK: a tool set for whole-genome association and population-based linkage analyses.

Inference of Population Structure Using Multilocus Genotype Data

GCTA: a tool for genome-wide complex trait analysis.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 142

Cited by 186

Most referenced authors 478