rapidGSEA: Speeding up gene set enrichment analysis on multi-core CPUs and CUDA-enabled GPUs
- Publikationstyp:
- Zeitschriftenaufsatz
- Metadaten:
-
- Autoren
- Christian Hundt
- Andreas Hildebrandt
- Bertil Schmidt
- Autoren-URL
- https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=fis-test-1&SrcAuth=WosAPI&KeyUT=WOS:000383749600001&DestLinkType=FullRecord&DestApp=WOS_CPL
- DOI
- 10.1186/s12859-016-1244-x
- Externe Identifier
- Clarivate Analytics Document Solution ID: DW6GY
- PubMed Identifier: 27663265
- ISSN
- 1471-2105
- Zeitschrift
- BMC BIOINFORMATICS
- Schlüsselwörter
- CUDA
- Gene set enrichment analysis
- Gene expression data
- Resampling statistics
- Artikelnummer
- ARTN 394
- Datum der Veröffentlichung
- 2016
- Status
- Published
- Titel
- rapidGSEA: Speeding up gene set enrichment analysis on multi-core CPUs and CUDA-enabled GPUs
- Sub types
- Article
- Ausgabe der Zeitschrift
- 17
Datenquelle: Web of Science (Lite)
- Andere Metadatenquellen:
-
- Abstract
- <jats:title>Abstract</jats:title><jats:sec> <jats:title>Background</jats:title> <jats:p>Gene Set Enrichment Analysis (GSEA) is a popular method to reveal significant dependencies between predefined sets of gene symbols and observed phenotypes by evaluating the deviation of gene expression values between cases and controls. An established measure of inter-class deviation, the enrichment score, is usually computed using a weighted running sum statistic over the whole set of gene symbols. Due to the lack of analytic expressions the significance of enrichment scores is determined using a non-parametric estimation of their null distribution by permuting the phenotype labels of the probed patients. Accordingly, GSEA is a time-consuming task due to the large number of required permutations to accurately estimate the nominal <jats:italic>p</jats:italic>-value – a circumstance that is even more pronounced during multiple hypothesis testing since its estimate is lower-bounded by the inverse number of samples in permutation space.</jats:p> </jats:sec><jats:sec> <jats:title>Results</jats:title> <jats:p>We present rapidGSEA – a software suite consisting of two tools for facilitating permutation-based GSEA: cudaGSEA and ompGSEA. cudaGSEA is a CUDA-accelerated tool using fine-grained parallelization schemes on massively parallel architectures while ompGSEA is a coarse-grained multi-threaded tool for multi-core CPUs. Nominal <jats:italic>p</jats:italic>-value estimation of 4,725 gene sets on a data set consisting of 20,639 unique gene symbols and 200 patients (183 cases + 17 controls) each probing one million permutations takes 19 hours on a Xeon CPU and less than one hour on a GeForce Titan X GPU while the established GSEA tool from the Broad Institute (broadGSEA) takes roughly 13 days.</jats:p> </jats:sec><jats:sec> <jats:title>Conclusion</jats:title> <jats:p>cudaGSEA outperforms broadGSEA by around two orders-of-magnitude on a single Tesla K40c or GeForce Titan X GPU. ompGSEA provides around one order-of-magnitude speedup to broadGSEA on a standard Xeon CPU. The rapidGSEA suite is open-source software and can be downloaded at <jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/gravitino/cudaGSEA">https://github.com/gravitino/cudaGSEA</jats:ext-link>as standalone application or package for the R framework.</jats:p> </jats:sec>
- Autoren
- Christian Hundt
- Andreas Hildebrandt
- Bertil Schmidt
- DOI
- 10.1186/s12859-016-1244-x
- eISSN
- 1471-2105
- Ausgabe der Veröffentlichung
- 1
- Zeitschrift
- BMC Bioinformatics
- Sprache
- en
- Artikelnummer
- 394
- Online publication date
- 2016
- Status
- Published online
- Herausgeber
- Springer Science and Business Media LLC
- Herausgeber URL
- http://dx.doi.org/10.1186/s12859-016-1244-x
- Datum der Datenerfassung
- 2024
- Titel
- rapidGSEA: Speeding up gene set enrichment analysis on multi-core CPUs and CUDA-enabled GPUs
- Ausgabe der Zeitschrift
- 17
Datenquelle: Crossref
- Abstract
- <h4>Background</h4>Gene Set Enrichment Analysis (GSEA) is a popular method to reveal significant dependencies between predefined sets of gene symbols and observed phenotypes by evaluating the deviation of gene expression values between cases and controls. An established measure of inter-class deviation, the enrichment score, is usually computed using a weighted running sum statistic over the whole set of gene symbols. Due to the lack of analytic expressions the significance of enrichment scores is determined using a non-parametric estimation of their null distribution by permuting the phenotype labels of the probed patients. Accordingly, GSEA is a time-consuming task due to the large number of required permutations to accurately estimate the nominal p-value - a circumstance that is even more pronounced during multiple hypothesis testing since its estimate is lower-bounded by the inverse number of samples in permutation space.<h4>Results</h4>We present rapidGSEA - a software suite consisting of two tools for facilitating permutation-based GSEA: cudaGSEA and ompGSEA. cudaGSEA is a CUDA-accelerated tool using fine-grained parallelization schemes on massively parallel architectures while ompGSEA is a coarse-grained multi-threaded tool for multi-core CPUs. Nominal p-value estimation of 4,725 gene sets on a data set consisting of 20,639 unique gene symbols and 200 patients (183 cases + 17 controls) each probing one million permutations takes 19 hours on a Xeon CPU and less than one hour on a GeForce Titan X GPU while the established GSEA tool from the Broad Institute (broadGSEA) takes roughly 13 days.<h4>Conclusion</h4>cudaGSEA outperforms broadGSEA by around two orders-of-magnitude on a single Tesla K40c or GeForce Titan X GPU. ompGSEA provides around one order-of-magnitude speedup to broadGSEA on a standard Xeon CPU. The rapidGSEA suite is open-source software and can be downloaded at https://github.com/gravitino/cudaGSEA as standalone application or package for the R framework.
- Addresses
- Department of Computer Science, Johannes Gutenberg University, Staudingerweg 9, Mainz, 55128, Germany. hundt@uni-mainz.de.
- Autoren
- Christian Hundt
- Andreas Hildebrandt
- Bertil Schmidt
- DOI
- 10.1186/s12859-016-1244-x
- eISSN
- 1471-2105
- Externe Identifier
- PubMed Identifier: 27663265
- PubMed Central ID: PMC5035472
- Open access
- true
- ISSN
- 1471-2105
- Ausgabe der Veröffentlichung
- 1
- Zeitschrift
- BMC bioinformatics
- Sprache
- eng
- Medium
- Electronic
- Online publication date
- 2016
- Open access status
- Open Access
- Paginierung
- 394
- Datum der Veröffentlichung
- 2016
- Status
- Published
- Publisher licence
- CC BY
- Datum der Datenerfassung
- 2016
- Titel
- rapidGSEA: Speeding up gene set enrichment analysis on multi-core CPUs and CUDA-enabled GPUs.
- Sub types
- research-article
- Journal Article
- Ausgabe der Zeitschrift
- 17
Files
https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-016-1244-x https://europepmc.org/articles/PMC5035472?pdf=render
Datenquelle: Europe PubMed Central
- Abstract
- BACKGROUND: Gene Set Enrichment Analysis (GSEA) is a popular method to reveal significant dependencies between predefined sets of gene symbols and observed phenotypes by evaluating the deviation of gene expression values between cases and controls. An established measure of inter-class deviation, the enrichment score, is usually computed using a weighted running sum statistic over the whole set of gene symbols. Due to the lack of analytic expressions the significance of enrichment scores is determined using a non-parametric estimation of their null distribution by permuting the phenotype labels of the probed patients. Accordingly, GSEA is a time-consuming task due to the large number of required permutations to accurately estimate the nominal p-value - a circumstance that is even more pronounced during multiple hypothesis testing since its estimate is lower-bounded by the inverse number of samples in permutation space. RESULTS: We present rapidGSEA - a software suite consisting of two tools for facilitating permutation-based GSEA: cudaGSEA and ompGSEA. cudaGSEA is a CUDA-accelerated tool using fine-grained parallelization schemes on massively parallel architectures while ompGSEA is a coarse-grained multi-threaded tool for multi-core CPUs. Nominal p-value estimation of 4,725 gene sets on a data set consisting of 20,639 unique gene symbols and 200 patients (183 cases + 17 controls) each probing one million permutations takes 19 hours on a Xeon CPU and less than one hour on a GeForce Titan X GPU while the established GSEA tool from the Broad Institute (broadGSEA) takes roughly 13 days. CONCLUSION: cudaGSEA outperforms broadGSEA by around two orders-of-magnitude on a single Tesla K40c or GeForce Titan X GPU. ompGSEA provides around one order-of-magnitude speedup to broadGSEA on a standard Xeon CPU. The rapidGSEA suite is open-source software and can be downloaded at https://github.com/gravitino/cudaGSEA as standalone application or package for the R framework.
- Date of acceptance
- 2016
- Autoren
- Christian Hundt
- Andreas Hildebrandt
- Bertil Schmidt
- Autoren-URL
- https://www.ncbi.nlm.nih.gov/pubmed/27663265
- DOI
- 10.1186/s12859-016-1244-x
- eISSN
- 1471-2105
- Externe Identifier
- PubMed Central ID: PMC5035472
- Ausgabe der Veröffentlichung
- 1
- Zeitschrift
- BMC Bioinformatics
- Schlüsselwörter
- CUDA
- Gene expression data
- Gene set enrichment analysis
- Resampling statistics
- Sprache
- eng
- Country
- England
- Paginierung
- 394
- PII
- 10.1186/s12859-016-1244-x
- Datum der Veröffentlichung
- 2016
- Status
- Published online
- Titel
- rapidGSEA: Speeding up gene set enrichment analysis on multi-core CPUs and CUDA-enabled GPUs.
- Sub types
- Journal Article
- Ausgabe der Zeitschrift
- 17
Datenquelle: PubMed
- Autoren
- Christian Hundt
- Andreas Hildebrandt
- Bertil Schmidt
- Zeitschrift
- BMC Bioinform.
- Paginierung
- 394 - 394
- Datum der Veröffentlichung
- 2016
- Titel
- rapidGSEA: Speeding up gene set enrichment analysis on multi-core CPUs and CUDA-enabled GPUs.
- Ausgabe der Zeitschrift
- 17
Datenquelle: DBLP
- Autoren
- Christian Hundt
- Andreas Hildebrandt
- Bertil Schmidt
- Zeitschrift
- BMC bioinformatics
- Artikelnummer
- 1
- Paginierung
- 394 - 394
- Datum der Veröffentlichung
- 2016
- Herausgeber
- BioMed Central
- Datum der Datenerfassung
- 2020
- Titel
- rapidGSEA: Speeding up gene set enrichment analysis on multi-core CPUs and CUDA-enabled GPUs
- Sub types
- article
- Ausgabe der Zeitschrift
- 17
Datenquelle: Manual
- Beziehungen:
- Eigentum von