rapidGSEA: Speeding up gene set enrichment analysis on multi-core CPUs and CUDA-enabled GPUs

Publikationstyp:

Zeitschriftenaufsatz

Metadaten:

Autoren

Christian Hundt
Andreas Hildebrandt
Bertil Schmidt

Autoren-URL

https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=fis-test-1&SrcAuth=WosAPI&KeyUT=WOS:000383749600001&DestLinkType=FullRecord&DestApp=WOS_CPL

DOI

10.1186/s12859-016-1244-x

Externe Identifier

Clarivate Analytics Document Solution ID: DW6GY
PubMed Identifier: 27663265

ISSN

1471-2105

Zeitschrift

BMC BIOINFORMATICS

Schlüsselwörter

CUDA
Gene set enrichment analysis
Gene expression data
Resampling statistics

Artikelnummer

ARTN 394

Datum der Veröffentlichung

2016

Status

Published

Titel

rapidGSEA: Speeding up gene set enrichment analysis on multi-core CPUs and CUDA-enabled GPUs

Sub types

Article

Ausgabe der Zeitschrift

Datenquelle: Web of Science (Lite)

Andere Metadatenquellen:

Abstract

Abstract Background Gene Set Enrichment Analysis (GSEA) is a popular method to reveal significant dependencies between predefined sets of gene symbols and observed phenotypes by evaluating the deviation of gene expression values between cases and controls. An established measure of inter-class deviation, the enrichment score, is usually computed using a weighted running sum statistic over the whole set of gene symbols. Due to the lack of analytic expressions the significance of enrichment scores is determined using a non-parametric estimation of their null distribution by permuting the phenotype labels of the probed patients. Accordingly, GSEA is a time-consuming task due to the large number of required permutations to accurately estimate the nominal p-value – a circumstance that is even more pronounced during multiple hypothesis testing since its estimate is lower-bounded by the inverse number of samples in permutation space. Results We present rapidGSEA – a software suite consisting of two tools for facilitating permutation-based GSEA: cudaGSEA and ompGSEA. cudaGSEA is a CUDA-accelerated tool using fine-grained parallelization schemes on massively parallel architectures while ompGSEA is a coarse-grained multi-threaded tool for multi-core CPUs. Nominal p-value estimation of 4,725 gene sets on a data set consisting of 20,639 unique gene symbols and 200 patients (183 cases + 17 controls) each probing one million permutations takes 19 hours on a Xeon CPU and less than one hour on a GeForce Titan X GPU while the established GSEA tool from the Broad Institute (broadGSEA) takes roughly 13 days. Conclusion cudaGSEA outperforms broadGSEA by around two orders-of-magnitude on a single Tesla K40c or GeForce Titan X GPU. ompGSEA provides around one order-of-magnitude speedup to broadGSEA on a standard Xeon CPU. The rapidGSEA suite is open-source software and can be downloaded at https://github.com/gravitino/cudaGSEAas standalone application or package for the R framework.

Autoren

Christian Hundt
Andreas Hildebrandt
Bertil Schmidt

DOI

10.1186/s12859-016-1244-x

eISSN

1471-2105

Ausgabe der Veröffentlichung

Zeitschrift

BMC Bioinformatics

Sprache

Artikelnummer

394

Online publication date

2016

Status

Published online

Herausgeber

Springer Science and Business Media LLC

Herausgeber URL

http://dx.doi.org/10.1186/s12859-016-1244-x

Datum der Datenerfassung

2024

Titel

rapidGSEA: Speeding up gene set enrichment analysis on multi-core CPUs and CUDA-enabled GPUs

Ausgabe der Zeitschrift

Datenquelle: Crossref

Abstract

<h4>Background</h4>Gene Set Enrichment Analysis (GSEA) is a popular method to reveal significant dependencies between predefined sets of gene symbols and observed phenotypes by evaluating the deviation of gene expression values between cases and controls. An established measure of inter-class deviation, the enrichment score, is usually computed using a weighted running sum statistic over the whole set of gene symbols. Due to the lack of analytic expressions the significance of enrichment scores is determined using a non-parametric estimation of their null distribution by permuting the phenotype labels of the probed patients. Accordingly, GSEA is a time-consuming task due to the large number of required permutations to accurately estimate the nominal p-value - a circumstance that is even more pronounced during multiple hypothesis testing since its estimate is lower-bounded by the inverse number of samples in permutation space.<h4>Results</h4>We present rapidGSEA - a software suite consisting of two tools for facilitating permutation-based GSEA: cudaGSEA and ompGSEA. cudaGSEA is a CUDA-accelerated tool using fine-grained parallelization schemes on massively parallel architectures while ompGSEA is a coarse-grained multi-threaded tool for multi-core CPUs. Nominal p-value estimation of 4,725 gene sets on a data set consisting of 20,639 unique gene symbols and 200 patients (183 cases + 17 controls) each probing one million permutations takes 19 hours on a Xeon CPU and less than one hour on a GeForce Titan X GPU while the established GSEA tool from the Broad Institute (broadGSEA) takes roughly 13 days.<h4>Conclusion</h4>cudaGSEA outperforms broadGSEA by around two orders-of-magnitude on a single Tesla K40c or GeForce Titan X GPU. ompGSEA provides around one order-of-magnitude speedup to broadGSEA on a standard Xeon CPU. The rapidGSEA suite is open-source software and can be downloaded at https://github.com/gravitino/cudaGSEA as standalone application or package for the R framework.

Addresses

Department of Computer Science, Johannes Gutenberg University, Staudingerweg 9, Mainz, 55128, Germany. hundt@uni-mainz.de.

Autoren

Christian Hundt
Andreas Hildebrandt
Bertil Schmidt

DOI

10.1186/s12859-016-1244-x

eISSN

1471-2105

Externe Identifier

PubMed Identifier: 27663265
PubMed Central ID: PMC5035472

Open access

true

ISSN

1471-2105

Ausgabe der Veröffentlichung

Zeitschrift

BMC bioinformatics

Sprache

eng

Medium

Electronic

Online publication date

2016

Open access status

Open Access

Paginierung

394

Datum der Veröffentlichung

2016

Status

Published

Publisher licence

CC BY

Datum der Datenerfassung

2016

Titel

rapidGSEA: Speeding up gene set enrichment analysis on multi-core CPUs and CUDA-enabled GPUs.

Sub types

research-article
Journal Article

Ausgabe der Zeitschrift

Files

https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-016-1244-x https://europepmc.org/articles/PMC5035472?pdf=render

Datenquelle: Europe PubMed Central

Abstract

BACKGROUND: Gene Set Enrichment Analysis (GSEA) is a popular method to reveal significant dependencies between predefined sets of gene symbols and observed phenotypes by evaluating the deviation of gene expression values between cases and controls. An established measure of inter-class deviation, the enrichment score, is usually computed using a weighted running sum statistic over the whole set of gene symbols. Due to the lack of analytic expressions the significance of enrichment scores is determined using a non-parametric estimation of their null distribution by permuting the phenotype labels of the probed patients. Accordingly, GSEA is a time-consuming task due to the large number of required permutations to accurately estimate the nominal p-value - a circumstance that is even more pronounced during multiple hypothesis testing since its estimate is lower-bounded by the inverse number of samples in permutation space. RESULTS: We present rapidGSEA - a software suite consisting of two tools for facilitating permutation-based GSEA: cudaGSEA and ompGSEA. cudaGSEA is a CUDA-accelerated tool using fine-grained parallelization schemes on massively parallel architectures while ompGSEA is a coarse-grained multi-threaded tool for multi-core CPUs. Nominal p-value estimation of 4,725 gene sets on a data set consisting of 20,639 unique gene symbols and 200 patients (183 cases + 17 controls) each probing one million permutations takes 19 hours on a Xeon CPU and less than one hour on a GeForce Titan X GPU while the established GSEA tool from the Broad Institute (broadGSEA) takes roughly 13 days. CONCLUSION: cudaGSEA outperforms broadGSEA by around two orders-of-magnitude on a single Tesla K40c or GeForce Titan X GPU. ompGSEA provides around one order-of-magnitude speedup to broadGSEA on a standard Xeon CPU. The rapidGSEA suite is open-source software and can be downloaded at https://github.com/gravitino/cudaGSEA as standalone application or package for the R framework.

Date of acceptance

2016

Autoren

Christian Hundt
Andreas Hildebrandt
Bertil Schmidt

Autoren-URL

https://www.ncbi.nlm.nih.gov/pubmed/27663265

DOI

10.1186/s12859-016-1244-x

eISSN

1471-2105

Externe Identifier

PubMed Central ID: PMC5035472

Ausgabe der Veröffentlichung

Zeitschrift

BMC Bioinformatics

Schlüsselwörter

CUDA
Gene expression data
Gene set enrichment analysis
Resampling statistics

Sprache

eng

Country

England

Paginierung

394

PII

10.1186/s12859-016-1244-x

Datum der Veröffentlichung

2016

Status

Published online

Titel

rapidGSEA: Speeding up gene set enrichment analysis on multi-core CPUs and CUDA-enabled GPUs.

Sub types

Journal Article

Ausgabe der Zeitschrift

Datenquelle: PubMed

Autoren

Christian Hundt
Andreas Hildebrandt
Bertil Schmidt

Zeitschrift

BMC Bioinform.

Paginierung

394 - 394

Datum der Veröffentlichung

2016

Titel

rapidGSEA: Speeding up gene set enrichment analysis on multi-core CPUs and CUDA-enabled GPUs.

Ausgabe der Zeitschrift

Datenquelle: DBLP

Autoren

Christian Hundt
Andreas Hildebrandt
Bertil Schmidt

Zeitschrift

BMC bioinformatics

Artikelnummer

Paginierung

394 - 394

Datum der Veröffentlichung

2016

Herausgeber

BioMed Central

Datum der Datenerfassung

2020

Titel

rapidGSEA: Speeding up gene set enrichment analysis on multi-core CPUs and CUDA-enabled GPUs

Sub types

article

Ausgabe der Zeitschrift

Datenquelle: Manual

Beziehungen:

Eigentum von

rapidGSEA: Speeding up gene set enrichment analysis on multi-core CPUs and CUDA-enabled GPUs

Files

Werkzeuge