CARE: Context-Aware Sequencing Read Error Correction
- Publikationstyp:
- Zeitschriftenaufsatz
- Metadaten:
-
- Autoren
- Felix Kallenborn
- Andreas Hildebrandt
- Bertil Schmidt
- Autoren-URL
- https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=fis-test-1&SrcAuth=WosAPI&KeyUT=WOS:000654708400001&DestLinkType=FullRecord&DestApp=WOS_CPL
- DOI
- 10.1093/bioinformatics/btaa738
- eISSN
- 1460-2059
- Externe Identifier
- Clarivate Analytics Document Solution ID: SI3EM
- PubMed Identifier: 32818262
- ISSN
- 1367-4803
- Ausgabe der Veröffentlichung
- 7
- Zeitschrift
- BIOINFORMATICS
- Paginierung
- 889 - 895
- Datum der Veröffentlichung
- 2021
- Status
- Published
- Titel
- CARE: context-aware sequencing read error correction
- Sub types
- Article
- Ausgabe der Zeitschrift
- 37
Datenquelle: Web of Science (Lite)
- Andere Metadatenquellen:
-
- Abstract
- <jats:title>Abstract</jats:title> <jats:sec> <jats:title>Motivation</jats:title> <jats:p>Error correction is a fundamental pre-processing step in many Next-Generation Sequencing (NGS) pipelines, in particular for de novo genome assembly. However, existing error correction methods either suffer from high false-positive rates since they break reads into independent k-mers or do not scale efficiently to large amounts of sequencing reads and complex genomes.</jats:p> </jats:sec> <jats:sec> <jats:title>Results</jats:title> <jats:p>We present CARE—an alignment-based scalable error correction algorithm for Illumina data using the concept of minhashing. Minhashing allows for efficient similarity search within large sequencing read collections which enables fast computation of high-quality multiple alignments. Sequencing errors are corrected by detailed inspection of the corresponding alignments. Our performance evaluation shows that CARE generates significantly fewer false-positive corrections than state-of-the-art tools (Musket, SGA, BFC, Lighter, Bcool, Karect) while maintaining a competitive number of true positives. When used prior to assembly it can achieve superior de novo assembly results for a number of real datasets. CARE is also the first multiple sequence alignment-based error corrector that is able to process a human genome Illumina NGS dataset in only 4 h on a single workstation using GPU acceleration.</jats:p> </jats:sec> <jats:sec> <jats:title>Availabilityand implementation</jats:title> <jats:p>CARE is open-source software written in C++ (CPU version) and in CUDA/C++ (GPU version). It is licensed under GPLv3 and can be downloaded at https://github.com/fkallen/CARE.</jats:p> </jats:sec> <jats:sec> <jats:title>Supplementary information</jats:title> <jats:p>Supplementary data are available at Bioinformatics online.</jats:p> </jats:sec>
- Autoren
- Felix Kallenborn
- Andreas Hildebrandt
- Bertil Schmidt
- DOI
- 10.1093/bioinformatics/btaa738
- Editoren
- Inanc Birol
- eISSN
- 1367-4811
- ISSN
- 1367-4803
- Ausgabe der Veröffentlichung
- 7
- Zeitschrift
- Bioinformatics
- Sprache
- en
- Online publication date
- 2020
- Paginierung
- 889 - 895
- Datum der Veröffentlichung
- 2021
- Status
- Published
- Herausgeber
- Oxford University Press (OUP)
- Herausgeber URL
- http://dx.doi.org/10.1093/bioinformatics/btaa738
- Datum der Datenerfassung
- 2023
- Titel
- CARE: context-aware sequencing read error correction
- Ausgabe der Zeitschrift
- 37
Datenquelle: Crossref
- Abstract
- <h4>Motivation</h4>Error correction is a fundamental pre-processing step in many Next-Generation Sequencing (NGS) pipelines, in particular for de novo genome assembly. However, existing error correction methods either suffer from high false-positive rates since they break reads into independent k-mers or do not scale efficiently to large amounts of sequencing reads and complex genomes.<h4>Results</h4>We present CARE-an alignment-based scalable error correction algorithm for Illumina data using the concept of minhashing. Minhashing allows for efficient similarity search within large sequencing read collections which enables fast computation of high-quality multiple alignments. Sequencing errors are corrected by detailed inspection of the corresponding alignments. Our performance evaluation shows that CARE generates significantly fewer false-positive corrections than state-of-the-art tools (Musket, SGA, BFC, Lighter, Bcool, Karect) while maintaining a competitive number of true positives. When used prior to assembly it can achieve superior de novo assembly results for a number of real datasets. CARE is also the first multiple sequence alignment-based error corrector that is able to process a human genome Illumina NGS dataset in only 4 h on a single workstation using GPU acceleration.<h4>Availabilityand implementation</h4>CARE is open-source software written in C++ (CPU version) and in CUDA/C++ (GPU version). It is licensed under GPLv3 and can be downloaded at https://github.com/fkallen/CARE.<h4>Supplementary information</h4>Supplementary data are available at Bioinformatics online.
- Addresses
- Department of Computer Science, Johannes Gutenberg University, Mainz 55122, Germany.
- Autoren
- Felix Kallenborn
- Andreas Hildebrandt
- Bertil Schmidt
- DOI
- 10.1093/bioinformatics/btaa738
- eISSN
- 1367-4811
- Externe Identifier
- PubMed Identifier: 32818262
- Funding acknowledgements
- Deutsche Forschungsgemeinschaft:
- Open access
- false
- ISSN
- 1367-4803
- Ausgabe der Veröffentlichung
- 7
- Zeitschrift
- Bioinformatics (Oxford, England)
- Schlüsselwörter
- Humans
- Sequence Alignment
- Sequence Analysis, DNA
- Algorithms
- Software
- High-Throughput Nucleotide Sequencing
- Sprache
- eng
- Medium
- Paginierung
- 889 - 895
- Datum der Veröffentlichung
- 2021
- Status
- Published
- Datum der Datenerfassung
- 2020
- Titel
- CARE: context-aware sequencing read error correction.
- Sub types
- Research Support, Non-U.S. Gov't
- Journal Article
- Ausgabe der Zeitschrift
- 37
Datenquelle: Europe PubMed Central
- Abstract
- MOTIVATION: Error correction is a fundamental pre-processing step in many Next-Generation Sequencing (NGS) pipelines, in particular for de novo genome assembly. However, existing error correction methods either suffer from high false-positive rates since they break reads into independent k-mers or do not scale efficiently to large amounts of sequencing reads and complex genomes. RESULTS: We present CARE-an alignment-based scalable error correction algorithm for Illumina data using the concept of minhashing. Minhashing allows for efficient similarity search within large sequencing read collections which enables fast computation of high-quality multiple alignments. Sequencing errors are corrected by detailed inspection of the corresponding alignments. Our performance evaluation shows that CARE generates significantly fewer false-positive corrections than state-of-the-art tools (Musket, SGA, BFC, Lighter, Bcool, Karect) while maintaining a competitive number of true positives. When used prior to assembly it can achieve superior de novo assembly results for a number of real datasets. CARE is also the first multiple sequence alignment-based error corrector that is able to process a human genome Illumina NGS dataset in only 4 h on a single workstation using GPU acceleration. AVAILABILITYAND IMPLEMENTATION: CARE is open-source software written in C++ (CPU version) and in CUDA/C++ (GPU version). It is licensed under GPLv3 and can be downloaded at https://github.com/fkallen/CARE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
- Date of acceptance
- 2020
- Autoren
- Felix Kallenborn
- Andreas Hildebrandt
- Bertil Schmidt
- Autoren-URL
- https://www.ncbi.nlm.nih.gov/pubmed/32818262
- DOI
- 10.1093/bioinformatics/btaa738
- eISSN
- 1367-4811
- Ausgabe der Veröffentlichung
- 7
- Zeitschrift
- Bioinformatics
- Schlüsselwörter
- Algorithms
- High-Throughput Nucleotide Sequencing
- Humans
- Sequence Alignment
- Sequence Analysis, DNA
- Software
- Sprache
- eng
- Country
- England
- Paginierung
- 889 - 895
- PII
- 5894969
- Datum der Veröffentlichung
- 2021
- Status
- Published
- Datum, an dem der Datensatz öffentlich gemacht wurde
- 2021
- Titel
- CARE: context-aware sequencing read error correction.
- Sub types
- Journal Article
- Research Support, Non-U.S. Gov't
- Ausgabe der Zeitschrift
- 37
Datenquelle: PubMed
- Autoren
- Felix Kallenborn
- Andreas Hildebrandt
- Bertil Schmidt
- Zeitschrift
- Bioinform.
- Artikelnummer
- 7
- Paginierung
- 889 - 895
- Datum der Veröffentlichung
- 2021
- Titel
- CARE: context-aware sequencing read error correction.
- Ausgabe der Zeitschrift
- 37
Datenquelle: DBLP
- Autoren
- Felix Kallenborn
- Andreas Hildebrandt
- Bertil Schmidt
- Zeitschrift
- Bioinformatics
- Datum der Veröffentlichung
- 2020
- Datum der Datenerfassung
- 2020
- Titel
- CARE: Context-Aware Sequencing Read Error Correction
- Sub types
- article
Datenquelle: Manual
- Beziehungen:
- Eigentum von