Parallel and Space-Efficient Construction of Burrows-Wheeler Transform and Suffix Array for Big Genome Data
- Publication type:
- Journal article
- Metadata:
-
- Autoren
- Yongchao Liu
- Thomas Hankeln
- Bertil Schmidt
- Autoren-URL
- https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=fis-test-1&SrcAuth=WosAPI&KeyUT=WOS:000378528100020&DestLinkType=FullRecord&DestApp=WOS_CPL
- DOI
- 10.1109/TCBB.2015.2430314
- eISSN
- 1557-9964
- Externe Identifier
- Clarivate Analytics Document Solution ID: DP5HY
- PubMed Identifier: 27295644
- ISSN
- 1545-5963
- Ausgabe der Veröffentlichung
- 3
- Zeitschrift
- IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
- Schlüsselwörter
- Burrows-Wheeler transform
- suffix array
- next-generation sequencing
- big data
- Paginierung
- 592 - 598
- Datum der Veröffentlichung
- 2016
- Status
- Published
- Titel
- Parallel and Space-Efficient Construction of Burrows-Wheeler Transform and Suffix Array for Big Genome Data
- Sub types
- Article
- Ausgabe der Zeitschrift
- 13
Data source: Web of Science (Lite)
- Other metadata sources:
-
- Autoren
- Yongchao Liu
- Thomas Hankeln
- Bertil Schmidt
- DOI
- 10.1109/tcbb.2015.2430314
- eISSN
- 1557-9964
- ISSN
- 1545-5963
- Ausgabe der Veröffentlichung
- 3
- Zeitschrift
- IEEE/ACM Transactions on Computational Biology and Bioinformatics
- Paginierung
- 592 - 598
- Datum der Veröffentlichung
- 2016
- Status
- Published
- Herausgeber
- Institute of Electrical and Electronics Engineers (IEEE)
- Herausgeber URL
- http://dx.doi.org/10.1109/tcbb.2015.2430314
- Datum der Datenerfassung
- 2022
- Titel
- Parallel and Space-Efficient Construction of Burrows-Wheeler Transform and Suffix Array for Big Genome Data
- Ausgabe der Zeitschrift
- 13
Data source: Crossref
- Abstract
- Next-generation sequencing technologies have led to the sequencing of more and more genomes, propelling related research into the era of big data. In this paper, we present ParaBWT, a parallelized Burrows-Wheeler transform (BWT) and suffix array construction algorithm for big genome data. In ParaBWT, we have investigated a progressive construction approach to constructing the BWT of single genome sequences in linear space complexity, but with a small constant factor. This approach has been further parallelized using multi-threading based on a master-slave coprocessing model. After gaining the BWT, the suffix array is constructed in a memory-efficient manner. The performance of ParaBWT has been evaluated using two sequences generated from two human genome assemblies: the Ensembl Homo sapiens assembly and the human reference genome. Our performance comparison to FMD-index and Bwt-disk reveals that on 12 CPU cores, ParaBWT runs up to 2.2× faster than FMD-index and up to 99.0× faster than Bwt-disk. BWT construction algorithms for very long genomic sequences are time consuming and (due to their incremental nature) inherently difficult to parallelize. Thus, their parallelization is challenging and even relatively small speedups like the ones of our method over FMD-index are of high importance to research. ParaBWT is written in C++, and is freely available at http://parabwt.sourceforge.net.
- Autoren
- Yongchao Liu
- Thomas Hankeln
- Bertil Schmidt
- DOI
- 10.1109/tcbb.2015.2430314
- eISSN
- 1557-9964
- Externe Identifier
- PubMed Identifier: 27295644
- Funding acknowledgements
- Center for Computational Sciences (SRFN):
- Johannes Gutenberg University Mainz:
- US National Science Foundation: IIS-1416259
- Carl-Zeiss-Foundation:
- Open access
- false
- ISSN
- 1545-5963
- Ausgabe der Veröffentlichung
- 3
- Zeitschrift
- IEEE/ACM transactions on computational biology and bioinformatics
- Schlüsselwörter
- Humans
- Chromosome Mapping
- Sequence Analysis, DNA
- Genomics
- Algorithms
- High-Throughput Nucleotide Sequencing
- Sprache
- eng
- Medium
- Paginierung
- 592 - 598
- Datum der Veröffentlichung
- 2016
- Status
- Published
- Datum der Datenerfassung
- 2016
- Titel
- Parallel and Space-Efficient Construction of Burrows-Wheeler Transform and Suffix Array for Big Genome Data.
- Sub types
- Research Support, Non-U.S. Gov't
- Research Support, U.S. Gov't, Non-P.H.S.
- Journal Article
- Ausgabe der Zeitschrift
- 13
Data source: Europe PubMed Central
- Abstract
- Next-generation sequencing technologies have led to the sequencing of more and more genomes, propelling related research into the era of big data. In this paper, we present ParaBWT, a parallelized Burrows-Wheeler transform (BWT) and suffix array construction algorithm for big genome data. In ParaBWT, we have investigated a progressive construction approach to constructing the BWT of single genome sequences in linear space complexity, but with a small constant factor. This approach has been further parallelized using multi-threading based on a master-slave coprocessing model. After gaining the BWT, the suffix array is constructed in a memory-efficient manner. The performance of ParaBWT has been evaluated using two sequences generated from two human genome assemblies: the Ensembl Homo sapiens assembly and the human reference genome. Our performance comparison to FMD-index and Bwt-disk reveals that on 12 CPU cores, ParaBWT runs up to 2.2× faster than FMD-index and up to 99.0× faster than Bwt-disk. BWT construction algorithms for very long genomic sequences are time consuming and (due to their incremental nature) inherently difficult to parallelize. Thus, their parallelization is challenging and even relatively small speedups like the ones of our method over FMD-index are of high importance to research. ParaBWT is written in C++, and is freely available at http://parabwt.sourceforge.net.
- Autoren
- Yongchao Liu
- Thomas Hankeln
- Bertil Schmidt
- Autoren-URL
- https://www.ncbi.nlm.nih.gov/pubmed/27295644
- DOI
- 10.1109/TCBB.2015.2430314
- eISSN
- 1557-9964
- Ausgabe der Veröffentlichung
- 3
- Zeitschrift
- IEEE/ACM Trans Comput Biol Bioinform
- Schlüsselwörter
- Algorithms
- Chromosome Mapping
- Genomics
- High-Throughput Nucleotide Sequencing
- Humans
- Sequence Analysis, DNA
- Sprache
- eng
- Country
- United States
- Paginierung
- 592 - 598
- Datum der Veröffentlichung
- 2016
- Status
- Published
- Datum, an dem der Datensatz öffentlich gemacht wurde
- 2017
- Titel
- Parallel and Space-Efficient Construction of Burrows-Wheeler Transform and Suffix Array for Big Genome Data.
- Sub types
- Journal Article
- Research Support, Non-U.S. Gov't
- Research Support, U.S. Gov't, Non-P.H.S.
- Ausgabe der Zeitschrift
- 13
Data source: PubMed
- Autoren
- Yongchao Liu
- Thomas Hankeln
- Bertil Schmidt
- Zeitschrift
- IEEE ACM Trans. Comput. Biol. Bioinform.
- Artikelnummer
- 3
- Paginierung
- 592 - 598
- Datum der Veröffentlichung
- 2016
- Titel
- Parallel and Space-Efficient Construction of Burrows-Wheeler Transform and Suffix Array for Big Genome Data.
- Ausgabe der Zeitschrift
- 13
Data source: DBLP
- Beziehungen:
- Property of