A Parallel Algorithm for Error Correction in High-Throughput Short-Read Data on CUDA-Enabled Graphics Hardware
- Publikationstyp:
- Zeitschriftenaufsatz
- Metadaten:
-
- Autoren
- Haixiang Shi
- Bertil Schmidt
- Weiguo Liu
- Wolfgang Mueller-Wittig
- Autoren-URL
- https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=fis-test-1&SrcAuth=WosAPI&KeyUT=WOS:000279272000003&DestLinkType=FullRecord&DestApp=WOS_CPL
- DOI
- 10.1089/cmb.2009.0062
- eISSN
- 1557-8666
- Externe Identifier
- Clarivate Analytics Document Solution ID: 617HQ
- PubMed Identifier: 20426693
- ISSN
- 1066-5277
- Ausgabe der Veröffentlichung
- 4
- Zeitschrift
- JOURNAL OF COMPUTATIONAL BIOLOGY
- Schlüsselwörter
- algorithms
- dynamic programming
- sequence analysis
- suffix trees
- Paginierung
- 603 - 615
- Datum der Veröffentlichung
- 2010
- Status
- Published
- Titel
- A Parallel Algorithm for Error Correction in High-Throughput Short-Read Data on CUDA-Enabled Graphics Hardware
- Sub types
- Article
- Ausgabe der Zeitschrift
- 17
Datenquelle: Web of Science (Lite)
- Andere Metadatenquellen:
-
- Autoren
- Haixiang Shi
- Bertil Schmidt
- Weiguo Liu
- Wolfgang Müller-Wittig
- DOI
- 10.1089/cmb.2009.0062
- eISSN
- 1557-8666
- ISSN
- 1066-5277
- Ausgabe der Veröffentlichung
- 4
- Zeitschrift
- Journal of Computational Biology
- Sprache
- en
- Paginierung
- 603 - 615
- Datum der Veröffentlichung
- 2010
- Status
- Published
- Herausgeber
- Mary Ann Liebert Inc
- Herausgeber URL
- http://dx.doi.org/10.1089/cmb.2009.0062
- Datum der Datenerfassung
- 2018
- Titel
- A Parallel Algorithm for Error Correction in High-Throughput Short-Read Data on CUDA-Enabled Graphics Hardware
- Ausgabe der Zeitschrift
- 17
Datenquelle: Crossref
- Abstract
- Emerging DNA sequencing technologies open up exciting new opportunities for genome sequencing by generating read data with a massive throughput. However, produced reads are significantly shorter and more error-prone compared to the traditional Sanger shotgun sequencing method. This poses challenges for de novo DNA fragment assembly algorithms in terms of both accuracy (to deal with short, error-prone reads) and scalability (to deal with very large input data sets). In this article, we present a scalable parallel algorithm for correcting sequencing errors in high-throughput short-read data so that error-free reads can be available before DNA fragment assembly, which is of high importance to many graph-based short-read assembly tools. The algorithm is based on spectral alignment and uses the Compute Unified Device Architecture (CUDA) programming model. To gain efficiency we are taking advantage of the CUDA texture memory using a space-efficient Bloom filter data structure for spectrum membership queries. We have tested the runtime and accuracy of our algorithm using real and simulated Illumina data for different read lengths, error rates, input sizes, and algorithmic parameters. Using a CUDA-enabled mass-produced GPU (available for less than US$400 at any local computer outlet), this results in speedups of 12-84 times for the parallelized error correction, and speedups of 3-63 times for both sequential preprocessing and parallelized error correction compared to the publicly available Euler-SR program. Our implementation is freely available for download from http://cuda-ec.sourceforge.net .
- Addresses
- School of Computer Engineering, Nanyang Technological University, Singapore.
- Autoren
- Haixiang Shi
- Bertil Schmidt
- Weiguo Liu
- Wolfgang Müller-Wittig
- DOI
- 10.1089/cmb.2009.0062
- eISSN
- 1557-8666
- Externe Identifier
- PubMed Identifier: 20426693
- Open access
- false
- ISSN
- 1066-5277
- Ausgabe der Veröffentlichung
- 4
- Zeitschrift
- Journal of computational biology : a journal of computational molecular cell biology
- Schlüsselwörter
- Sequence Alignment
- Sequence Analysis, DNA
- Computational Biology
- Algorithms
- Computer Graphics
- Computers
- Databases, Nucleic Acid
- Sprache
- eng
- Medium
- Paginierung
- 603 - 615
- Datum der Veröffentlichung
- 2010
- Status
- Published
- Datum der Datenerfassung
- 2010
- Titel
- A parallel algorithm for error correction in high-throughput short-read data on CUDA-enabled graphics hardware.
- Sub types
- Research Support, Non-U.S. Gov't
- Journal Article
- Ausgabe der Zeitschrift
- 17
Datenquelle: Europe PubMed Central
- Abstract
- Emerging DNA sequencing technologies open up exciting new opportunities for genome sequencing by generating read data with a massive throughput. However, produced reads are significantly shorter and more error-prone compared to the traditional Sanger shotgun sequencing method. This poses challenges for de novo DNA fragment assembly algorithms in terms of both accuracy (to deal with short, error-prone reads) and scalability (to deal with very large input data sets). In this article, we present a scalable parallel algorithm for correcting sequencing errors in high-throughput short-read data so that error-free reads can be available before DNA fragment assembly, which is of high importance to many graph-based short-read assembly tools. The algorithm is based on spectral alignment and uses the Compute Unified Device Architecture (CUDA) programming model. To gain efficiency we are taking advantage of the CUDA texture memory using a space-efficient Bloom filter data structure for spectrum membership queries. We have tested the runtime and accuracy of our algorithm using real and simulated Illumina data for different read lengths, error rates, input sizes, and algorithmic parameters. Using a CUDA-enabled mass-produced GPU (available for less than US$400 at any local computer outlet), this results in speedups of 12-84 times for the parallelized error correction, and speedups of 3-63 times for both sequential preprocessing and parallelized error correction compared to the publicly available Euler-SR program. Our implementation is freely available for download from http://cuda-ec.sourceforge.net .
- Autoren
- Haixiang Shi
- Bertil Schmidt
- Weiguo Liu
- Wolfgang Müller-Wittig
- Autoren-URL
- https://www.ncbi.nlm.nih.gov/pubmed/20426693
- DOI
- 10.1089/cmb.2009.0062
- eISSN
- 1557-8666
- Ausgabe der Veröffentlichung
- 4
- Zeitschrift
- J Comput Biol
- Schlüsselwörter
- Algorithms
- Computational Biology
- Computer Graphics
- Computers
- Databases, Nucleic Acid
- Sequence Alignment
- Sequence Analysis, DNA
- Sprache
- eng
- Country
- United States
- Paginierung
- 603 - 615
- Datum der Veröffentlichung
- 2010
- Status
- Published
- Datum, an dem der Datensatz öffentlich gemacht wurde
- 2010
- Titel
- A parallel algorithm for error correction in high-throughput short-read data on CUDA-enabled graphics hardware.
- Sub types
- Journal Article
- Research Support, Non-U.S. Gov't
- Ausgabe der Zeitschrift
- 17
Datenquelle: PubMed
- Autoren
- Haixiang Shi
- Bertil Schmidt
- Weiguo Liu
- Wolfgang Müller-Wittig
- Zeitschrift
- J. Comput. Biol.
- Artikelnummer
- 4
- Paginierung
- 603 - 615
- Datum der Veröffentlichung
- 2010
- Titel
- A Parallel Algorithm for Error Correction in High-Throughput Short-Read Data on CUDA-Enabled Graphics Hardware.
- Ausgabe der Zeitschrift
- 17
Datenquelle: DBLP
- Beziehungen:
- Eigentum von