CARE: Context-Aware Sequencing Read Error Correction

Publikationstyp:

Zeitschriftenaufsatz

Metadaten:

Autoren

Felix Kallenborn
Andreas Hildebrandt
Bertil Schmidt

Autoren-URL

https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=fis-test-1&SrcAuth=WosAPI&KeyUT=WOS:000654708400001&DestLinkType=FullRecord&DestApp=WOS_CPL

DOI

10.1093/bioinformatics/btaa738

eISSN

1460-2059

Externe Identifier

Clarivate Analytics Document Solution ID: SI3EM
PubMed Identifier: 32818262

ISSN

1367-4803

Ausgabe der Veröffentlichung

Zeitschrift

BIOINFORMATICS

Paginierung

889 - 895

Datum der Veröffentlichung

2021

Status

Published

Titel

CARE: context-aware sequencing read error correction

Sub types

Article

Ausgabe der Zeitschrift

Datenquelle: Web of Science (Lite)

Andere Metadatenquellen:

Abstract

Abstract Motivation Error correction is a fundamental pre-processing step in many Next-Generation Sequencing (NGS) pipelines, in particular for de novo genome assembly. However, existing error correction methods either suffer from high false-positive rates since they break reads into independent k-mers or do not scale efficiently to large amounts of sequencing reads and complex genomes. Results We present CARE—an alignment-based scalable error correction algorithm for Illumina data using the concept of minhashing. Minhashing allows for efficient similarity search within large sequencing read collections which enables fast computation of high-quality multiple alignments. Sequencing errors are corrected by detailed inspection of the corresponding alignments. Our performance evaluation shows that CARE generates significantly fewer false-positive corrections than state-of-the-art tools (Musket, SGA, BFC, Lighter, Bcool, Karect) while maintaining a competitive number of true positives. When used prior to assembly it can achieve superior de novo assembly results for a number of real datasets. CARE is also the first multiple sequence alignment-based error corrector that is able to process a human genome Illumina NGS dataset in only 4 h on a single workstation using GPU acceleration. Availabilityand implementation CARE is open-source software written in C++ (CPU version) and in CUDA/C++ (GPU version). It is licensed under GPLv3 and can be downloaded at https://github.com/fkallen/CARE. Supplementary information Supplementary data are available at Bioinformatics online.

Autoren

Felix Kallenborn
Andreas Hildebrandt
Bertil Schmidt

DOI

10.1093/bioinformatics/btaa738

Editoren

Inanc Birol

eISSN

1367-4811

ISSN

1367-4803

Ausgabe der Veröffentlichung

Zeitschrift

Bioinformatics

Sprache

Online publication date

2020

Paginierung

889 - 895

Datum der Veröffentlichung

2021

Status

Published

Herausgeber

Oxford University Press (OUP)

Herausgeber URL

http://dx.doi.org/10.1093/bioinformatics/btaa738

Datum der Datenerfassung

2023

Titel

CARE: context-aware sequencing read error correction

Ausgabe der Zeitschrift

Datenquelle: Crossref

Abstract

<h4>Motivation</h4>Error correction is a fundamental pre-processing step in many Next-Generation Sequencing (NGS) pipelines, in particular for de novo genome assembly. However, existing error correction methods either suffer from high false-positive rates since they break reads into independent k-mers or do not scale efficiently to large amounts of sequencing reads and complex genomes.<h4>Results</h4>We present CARE-an alignment-based scalable error correction algorithm for Illumina data using the concept of minhashing. Minhashing allows for efficient similarity search within large sequencing read collections which enables fast computation of high-quality multiple alignments. Sequencing errors are corrected by detailed inspection of the corresponding alignments. Our performance evaluation shows that CARE generates significantly fewer false-positive corrections than state-of-the-art tools (Musket, SGA, BFC, Lighter, Bcool, Karect) while maintaining a competitive number of true positives. When used prior to assembly it can achieve superior de novo assembly results for a number of real datasets. CARE is also the first multiple sequence alignment-based error corrector that is able to process a human genome Illumina NGS dataset in only 4 h on a single workstation using GPU acceleration.<h4>Availabilityand implementation</h4>CARE is open-source software written in C++ (CPU version) and in CUDA/C++ (GPU version). It is licensed under GPLv3 and can be downloaded at https://github.com/fkallen/CARE.<h4>Supplementary information</h4>Supplementary data are available at Bioinformatics online.

Addresses

Department of Computer Science, Johannes Gutenberg University, Mainz 55122, Germany.

Autoren

Felix Kallenborn
Andreas Hildebrandt
Bertil Schmidt

DOI

10.1093/bioinformatics/btaa738

eISSN

1367-4811

Externe Identifier

PubMed Identifier: 32818262

Funding acknowledgements

Deutsche Forschungsgemeinschaft:

Open access

false

ISSN

1367-4803

Ausgabe der Veröffentlichung

Zeitschrift

Bioinformatics (Oxford, England)

Schlüsselwörter

Humans
Sequence Alignment
Sequence Analysis, DNA
Algorithms
Software
High-Throughput Nucleotide Sequencing

Sprache

eng

Medium

Paginierung

889 - 895

Datum der Veröffentlichung

2021

Status

Published

Datum der Datenerfassung

2020

Titel

CARE: context-aware sequencing read error correction.

Sub types

Research Support, Non-U.S. Gov't
Journal Article

Ausgabe der Zeitschrift

Datenquelle: Europe PubMed Central

Abstract

MOTIVATION: Error correction is a fundamental pre-processing step in many Next-Generation Sequencing (NGS) pipelines, in particular for de novo genome assembly. However, existing error correction methods either suffer from high false-positive rates since they break reads into independent k-mers or do not scale efficiently to large amounts of sequencing reads and complex genomes. RESULTS: We present CARE-an alignment-based scalable error correction algorithm for Illumina data using the concept of minhashing. Minhashing allows for efficient similarity search within large sequencing read collections which enables fast computation of high-quality multiple alignments. Sequencing errors are corrected by detailed inspection of the corresponding alignments. Our performance evaluation shows that CARE generates significantly fewer false-positive corrections than state-of-the-art tools (Musket, SGA, BFC, Lighter, Bcool, Karect) while maintaining a competitive number of true positives. When used prior to assembly it can achieve superior de novo assembly results for a number of real datasets. CARE is also the first multiple sequence alignment-based error corrector that is able to process a human genome Illumina NGS dataset in only 4 h on a single workstation using GPU acceleration. AVAILABILITYAND IMPLEMENTATION: CARE is open-source software written in C++ (CPU version) and in CUDA/C++ (GPU version). It is licensed under GPLv3 and can be downloaded at https://github.com/fkallen/CARE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Date of acceptance

2020

Autoren

Felix Kallenborn
Andreas Hildebrandt
Bertil Schmidt

Autoren-URL

https://www.ncbi.nlm.nih.gov/pubmed/32818262

DOI

10.1093/bioinformatics/btaa738

eISSN

1367-4811

Ausgabe der Veröffentlichung

Zeitschrift

Bioinformatics

Schlüsselwörter

Algorithms
High-Throughput Nucleotide Sequencing
Humans
Sequence Alignment
Sequence Analysis, DNA
Software

Sprache

eng

Country

England

Paginierung

889 - 895

PII

5894969

Datum der Veröffentlichung

2021

Status

Published

Datum, an dem der Datensatz öffentlich gemacht wurde

2021

Titel

CARE: context-aware sequencing read error correction.

Sub types

Journal Article
Research Support, Non-U.S. Gov't

Ausgabe der Zeitschrift

Datenquelle: PubMed

Autoren

Felix Kallenborn
Andreas Hildebrandt
Bertil Schmidt

Zeitschrift

Bioinform.

Artikelnummer

Paginierung

889 - 895

Datum der Veröffentlichung

2021

Titel

CARE: context-aware sequencing read error correction.

Ausgabe der Zeitschrift

Datenquelle: DBLP

Autoren

Felix Kallenborn
Andreas Hildebrandt
Bertil Schmidt

Zeitschrift

Bioinformatics

Datum der Veröffentlichung

2020

Datum der Datenerfassung

2020

Titel

CARE: Context-Aware Sequencing Read Error Correction

Sub types

article

Datenquelle: Manual

Beziehungen:

Eigentum von

CARE: Context-Aware Sequencing Read Error Correction

Werkzeuge