MetaCache: context-aware classification of metagenomic reads using minhashing

Publikationstyp:

Zeitschriftenaufsatz

Metadaten:

Autoren

Andre Mueller
Christian Hundt
Andreas Hildebrandt
Thomas Hankeln
Bertil Schmidt

Autoren-URL

https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=fis-test-1&SrcAuth=WosAPI&KeyUT=WOS:000417004100008&DestLinkType=FullRecord&DestApp=WOS_CPL

DOI

10.1093/bioinformatics/btx520

eISSN

1460-2059

Externe Identifier

Clarivate Analytics Document Solution ID: FO6UQ
PubMed Identifier: 28961782

ISSN

1367-4803

Ausgabe der Veröffentlichung

Zeitschrift

BIOINFORMATICS

Paginierung

3740 - 3748

Datum der Veröffentlichung

2017

Status

Published

Titel

MetaCache: context-aware classification of metagenomic reads using minhashing

Sub types

Article

Ausgabe der Zeitschrift

Datenquelle: Web of Science (Lite)

Andere Metadatenquellen:

Abstract

Abstract Motivation Metagenomic shotgun sequencing studies are becoming increasingly popular with prominent examples including the sequencing of human microbiomes and diverse environments. A fundamental computational problem in this context is read classification, i.e. the assignment of each read to a taxonomic label. Due to the large number of reads produced by modern high-throughput sequencing technologies and the rapidly increasing number of available reference genomes corresponding software tools suffer from either long runtimes, large memory requirements or low accuracy. Results We introduce MetaCache—a novel software for read classification using the big data technique minhashing. Our approach performs context-aware classification of reads by computing representative subsamples of k-mers within both, probed reads and locally constrained regions of the reference genomes. As a result, MetaCache consumes significantly less memory compared to the state-of-the-art read classifiers Kraken and CLARK while achieving highly competitive sensitivity and precision at comparable speed. For example, using NCBI RefSeq draft and completed genomes with a total length of around 140 billion bases as reference, MetaCache’s database consumes only 62 GB of memory while both Kraken and CLARK fail to construct their respective databases on a workstation with 512 GB RAM. Our experimental results further show that classification accuracy continuously improves when increasing the amount of utilized reference genome data. Availability and implementation MetaCache is open source software written in C ++ and can be downloaded at http://github.com/muellan/metacache. Supplementary information Supplementary data are available at Bioinformatics online.

Autoren

André Müller
Christian Hundt
Andreas Hildebrandt
Thomas Hankeln
Bertil Schmidt

DOI

10.1093/bioinformatics/btx520

Editoren

Inanc Birol

eISSN

1367-4811

ISSN

1367-4803

Ausgabe der Veröffentlichung

Zeitschrift

Bioinformatics

Sprache

Online publication date

2017

Paginierung

3740 - 3748

Datum der Veröffentlichung

2017

Status

Published

Herausgeber

Oxford University Press (OUP)

Herausgeber URL

http://dx.doi.org/10.1093/bioinformatics/btx520

Datum der Datenerfassung

2023

Titel

MetaCache: context-aware classification of metagenomic reads using minhashing

Ausgabe der Zeitschrift

Datenquelle: Crossref

Abstract

<h4>Motivation</h4>Metagenomic shotgun sequencing studies are becoming increasingly popular with prominent examples including the sequencing of human microbiomes and diverse environments. A fundamental computational problem in this context is read classification, i.e. the assignment of each read to a taxonomic label. Due to the large number of reads produced by modern high-throughput sequencing technologies and the rapidly increasing number of available reference genomes corresponding software tools suffer from either long runtimes, large memory requirements or low accuracy.<h4>Results</h4>We introduce MetaCache-a novel software for read classification using the big data technique minhashing. Our approach performs context-aware classification of reads by computing representative subsamples of k-mers within both, probed reads and locally constrained regions of the reference genomes. As a result, MetaCache consumes significantly less memory compared to the state-of-the-art read classifiers Kraken and CLARK while achieving highly competitive sensitivity and precision at comparable speed. For example, using NCBI RefSeq draft and completed genomes with a total length of around 140 billion bases as reference, MetaCache's database consumes only 62 GB of memory while both Kraken and CLARK fail to construct their respective databases on a workstation with 512 GB RAM. Our experimental results further show that classification accuracy continuously improves when increasing the amount of utilized reference genome data.<h4>Availability and implementation</h4>MetaCache is open source software written in C ++ and can be downloaded at http://github.com/muellan/metacache.<h4>Contact</h4>bertil.schmidt@uni-mainz.de.<h4>Supplementary information</h4>Supplementary data are available at Bioinformatics online.

Addresses

Department of Computer Science.

Autoren

André Müller
Christian Hundt
Andreas Hildebrandt
Thomas Hankeln
Bertil Schmidt

DOI

10.1093/bioinformatics/btx520

eISSN

1367-4811

Externe Identifier

PubMed Identifier: 28961782

Funding acknowledgements

CSM:
Deutsche Forschungsgemeinschaft:
DFG:

Open access

false

ISSN

1367-4803

Ausgabe der Veröffentlichung

Zeitschrift

Bioinformatics (Oxford, England)

Schlüsselwörter

Humans
Sequence Analysis, DNA
Algorithms
Software
Metagenomics
High-Throughput Nucleotide Sequencing

Sprache

eng

Medium

Paginierung

3740 - 3748

Datum der Veröffentlichung

2017

Status

Published

Datum der Datenerfassung

2017

Titel

MetaCache: context-aware classification of metagenomic reads using minhashing.

Sub types

Journal Article

Ausgabe der Zeitschrift

Datenquelle: Europe PubMed Central

Abstract

MOTIVATION: Metagenomic shotgun sequencing studies are becoming increasingly popular with prominent examples including the sequencing of human microbiomes and diverse environments. A fundamental computational problem in this context is read classification, i.e. the assignment of each read to a taxonomic label. Due to the large number of reads produced by modern high-throughput sequencing technologies and the rapidly increasing number of available reference genomes corresponding software tools suffer from either long runtimes, large memory requirements or low accuracy. RESULTS: We introduce MetaCache-a novel software for read classification using the big data technique minhashing. Our approach performs context-aware classification of reads by computing representative subsamples of k-mers within both, probed reads and locally constrained regions of the reference genomes. As a result, MetaCache consumes significantly less memory compared to the state-of-the-art read classifiers Kraken and CLARK while achieving highly competitive sensitivity and precision at comparable speed. For example, using NCBI RefSeq draft and completed genomes with a total length of around 140 billion bases as reference, MetaCache's database consumes only 62 GB of memory while both Kraken and CLARK fail to construct their respective databases on a workstation with 512 GB RAM. Our experimental results further show that classification accuracy continuously improves when increasing the amount of utilized reference genome data. AVAILABILITY AND IMPLEMENTATION: MetaCache is open source software written in C ++ and can be downloaded at http://github.com/muellan/metacache. CONTACT: bertil.schmidt@uni-mainz.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Date of acceptance

2017

Autoren

André Müller
Christian Hundt
Andreas Hildebrandt
Thomas Hankeln
Bertil Schmidt

Autoren-URL

https://www.ncbi.nlm.nih.gov/pubmed/28961782

DOI

10.1093/bioinformatics/btx520

eISSN

1367-4811

Ausgabe der Veröffentlichung

Zeitschrift

Bioinformatics

Schlüsselwörter

Algorithms
High-Throughput Nucleotide Sequencing
Humans
Metagenomics
Sequence Analysis, DNA
Software

Sprache

eng

Country

England

Paginierung

3740 - 3748

PII

4083578

Datum der Veröffentlichung

2017

Status

Published

Datum, an dem der Datensatz öffentlich gemacht wurde

2018

Titel

MetaCache: context-aware classification of metagenomic reads using minhashing.

Sub types

Journal Article

Ausgabe der Zeitschrift

Datenquelle: PubMed

Autoren

André Müller
Christian Hundt
Andreas Hildebrandt
Thomas Hankeln
Bertil Schmidt

Zeitschrift

Bioinform.

Artikelnummer

Paginierung

3740 - 3748

Datum der Veröffentlichung

2017

Titel

MetaCache: context-aware classification of metagenomic reads using minhashing.

Ausgabe der Zeitschrift

Datenquelle: DBLP

Autoren

André Müller
Christian Hundt
Andreas Hildebrandt
Thomas Hankeln
Bertil Schmidt

Zeitschrift

Bioinformatics

Artikelnummer

Paginierung

3740 - 3748

Datum der Veröffentlichung

2017

Herausgeber

Oxford University Press

Datum der Datenerfassung

2020

Titel

MetaCache: context-aware classification of metagenomic reads using minhashing

Sub types

article

Ausgabe der Zeitschrift

Datenquelle: Manual

Beziehungen:

Eigentum von

MetaCache: context-aware classification of metagenomic reads using minhashing

Werkzeuge