MetaCache: context-aware classification of metagenomic reads using minhashing
- Publikationstyp:
- Zeitschriftenaufsatz
- Metadaten:
-
- Autoren
- Andre Mueller
- Christian Hundt
- Andreas Hildebrandt
- Thomas Hankeln
- Bertil Schmidt
- Autoren-URL
- https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=fis-test-1&SrcAuth=WosAPI&KeyUT=WOS:000417004100008&DestLinkType=FullRecord&DestApp=WOS_CPL
- DOI
- 10.1093/bioinformatics/btx520
- eISSN
- 1460-2059
- Externe Identifier
- Clarivate Analytics Document Solution ID: FO6UQ
- PubMed Identifier: 28961782
- ISSN
- 1367-4803
- Ausgabe der Veröffentlichung
- 23
- Zeitschrift
- BIOINFORMATICS
- Paginierung
- 3740 - 3748
- Datum der Veröffentlichung
- 2017
- Status
- Published
- Titel
- MetaCache: context-aware classification of metagenomic reads using minhashing
- Sub types
- Article
- Ausgabe der Zeitschrift
- 33
Datenquelle: Web of Science (Lite)
- Andere Metadatenquellen:
-
- Abstract
- <jats:title>Abstract</jats:title> <jats:sec> <jats:title>Motivation</jats:title> <jats:p>Metagenomic shotgun sequencing studies are becoming increasingly popular with prominent examples including the sequencing of human microbiomes and diverse environments. A fundamental computational problem in this context is read classification, i.e. the assignment of each read to a taxonomic label. Due to the large number of reads produced by modern high-throughput sequencing technologies and the rapidly increasing number of available reference genomes corresponding software tools suffer from either long runtimes, large memory requirements or low accuracy.</jats:p> </jats:sec> <jats:sec> <jats:title>Results</jats:title> <jats:p>We introduce MetaCache—a novel software for read classification using the big data technique minhashing. Our approach performs context-aware classification of reads by computing representative subsamples of k-mers within both, probed reads and locally constrained regions of the reference genomes. As a result, MetaCache consumes significantly less memory compared to the state-of-the-art read classifiers Kraken and CLARK while achieving highly competitive sensitivity and precision at comparable speed. For example, using NCBI RefSeq draft and completed genomes with a total length of around 140 billion bases as reference, MetaCache’s database consumes only 62 GB of memory while both Kraken and CLARK fail to construct their respective databases on a workstation with 512 GB RAM. Our experimental results further show that classification accuracy continuously improves when increasing the amount of utilized reference genome data.</jats:p> </jats:sec> <jats:sec> <jats:title>Availability and implementation</jats:title> <jats:p>MetaCache is open source software written in C ++ and can be downloaded at http://github.com/muellan/metacache.</jats:p> </jats:sec> <jats:sec> <jats:title>Supplementary information</jats:title> <jats:p>Supplementary data are available at Bioinformatics online.</jats:p> </jats:sec>
- Autoren
- André Müller
- Christian Hundt
- Andreas Hildebrandt
- Thomas Hankeln
- Bertil Schmidt
- DOI
- 10.1093/bioinformatics/btx520
- Editoren
- Inanc Birol
- eISSN
- 1367-4811
- ISSN
- 1367-4803
- Ausgabe der Veröffentlichung
- 23
- Zeitschrift
- Bioinformatics
- Sprache
- en
- Online publication date
- 2017
- Paginierung
- 3740 - 3748
- Datum der Veröffentlichung
- 2017
- Status
- Published
- Herausgeber
- Oxford University Press (OUP)
- Herausgeber URL
- http://dx.doi.org/10.1093/bioinformatics/btx520
- Datum der Datenerfassung
- 2023
- Titel
- MetaCache: context-aware classification of metagenomic reads using minhashing
- Ausgabe der Zeitschrift
- 33
Datenquelle: Crossref
- Abstract
- <h4>Motivation</h4>Metagenomic shotgun sequencing studies are becoming increasingly popular with prominent examples including the sequencing of human microbiomes and diverse environments. A fundamental computational problem in this context is read classification, i.e. the assignment of each read to a taxonomic label. Due to the large number of reads produced by modern high-throughput sequencing technologies and the rapidly increasing number of available reference genomes corresponding software tools suffer from either long runtimes, large memory requirements or low accuracy.<h4>Results</h4>We introduce MetaCache-a novel software for read classification using the big data technique minhashing. Our approach performs context-aware classification of reads by computing representative subsamples of k-mers within both, probed reads and locally constrained regions of the reference genomes. As a result, MetaCache consumes significantly less memory compared to the state-of-the-art read classifiers Kraken and CLARK while achieving highly competitive sensitivity and precision at comparable speed. For example, using NCBI RefSeq draft and completed genomes with a total length of around 140 billion bases as reference, MetaCache's database consumes only 62 GB of memory while both Kraken and CLARK fail to construct their respective databases on a workstation with 512 GB RAM. Our experimental results further show that classification accuracy continuously improves when increasing the amount of utilized reference genome data.<h4>Availability and implementation</h4>MetaCache is open source software written in C ++ and can be downloaded at http://github.com/muellan/metacache.<h4>Contact</h4>bertil.schmidt@uni-mainz.de.<h4>Supplementary information</h4>Supplementary data are available at Bioinformatics online.
- Addresses
- Department of Computer Science.
- Autoren
- André Müller
- Christian Hundt
- Andreas Hildebrandt
- Thomas Hankeln
- Bertil Schmidt
- DOI
- 10.1093/bioinformatics/btx520
- eISSN
- 1367-4811
- Externe Identifier
- PubMed Identifier: 28961782
- Funding acknowledgements
- CSM:
- Deutsche Forschungsgemeinschaft:
- DFG:
- Open access
- false
- ISSN
- 1367-4803
- Ausgabe der Veröffentlichung
- 23
- Zeitschrift
- Bioinformatics (Oxford, England)
- Schlüsselwörter
- Humans
- Sequence Analysis, DNA
- Algorithms
- Software
- Metagenomics
- High-Throughput Nucleotide Sequencing
- Sprache
- eng
- Medium
- Paginierung
- 3740 - 3748
- Datum der Veröffentlichung
- 2017
- Status
- Published
- Datum der Datenerfassung
- 2017
- Titel
- MetaCache: context-aware classification of metagenomic reads using minhashing.
- Sub types
- Journal Article
- Ausgabe der Zeitschrift
- 33
Datenquelle: Europe PubMed Central
- Abstract
- MOTIVATION: Metagenomic shotgun sequencing studies are becoming increasingly popular with prominent examples including the sequencing of human microbiomes and diverse environments. A fundamental computational problem in this context is read classification, i.e. the assignment of each read to a taxonomic label. Due to the large number of reads produced by modern high-throughput sequencing technologies and the rapidly increasing number of available reference genomes corresponding software tools suffer from either long runtimes, large memory requirements or low accuracy. RESULTS: We introduce MetaCache-a novel software for read classification using the big data technique minhashing. Our approach performs context-aware classification of reads by computing representative subsamples of k-mers within both, probed reads and locally constrained regions of the reference genomes. As a result, MetaCache consumes significantly less memory compared to the state-of-the-art read classifiers Kraken and CLARK while achieving highly competitive sensitivity and precision at comparable speed. For example, using NCBI RefSeq draft and completed genomes with a total length of around 140 billion bases as reference, MetaCache's database consumes only 62 GB of memory while both Kraken and CLARK fail to construct their respective databases on a workstation with 512 GB RAM. Our experimental results further show that classification accuracy continuously improves when increasing the amount of utilized reference genome data. AVAILABILITY AND IMPLEMENTATION: MetaCache is open source software written in C ++ and can be downloaded at http://github.com/muellan/metacache. CONTACT: bertil.schmidt@uni-mainz.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
- Date of acceptance
- 2017
- Autoren
- André Müller
- Christian Hundt
- Andreas Hildebrandt
- Thomas Hankeln
- Bertil Schmidt
- Autoren-URL
- https://www.ncbi.nlm.nih.gov/pubmed/28961782
- DOI
- 10.1093/bioinformatics/btx520
- eISSN
- 1367-4811
- Ausgabe der Veröffentlichung
- 23
- Zeitschrift
- Bioinformatics
- Schlüsselwörter
- Algorithms
- High-Throughput Nucleotide Sequencing
- Humans
- Metagenomics
- Sequence Analysis, DNA
- Software
- Sprache
- eng
- Country
- England
- Paginierung
- 3740 - 3748
- PII
- 4083578
- Datum der Veröffentlichung
- 2017
- Status
- Published
- Datum, an dem der Datensatz öffentlich gemacht wurde
- 2018
- Titel
- MetaCache: context-aware classification of metagenomic reads using minhashing.
- Sub types
- Journal Article
- Ausgabe der Zeitschrift
- 33
Datenquelle: PubMed
- Autoren
- André Müller
- Christian Hundt
- Andreas Hildebrandt
- Thomas Hankeln
- Bertil Schmidt
- Zeitschrift
- Bioinform.
- Artikelnummer
- 23
- Paginierung
- 3740 - 3748
- Datum der Veröffentlichung
- 2017
- Titel
- MetaCache: context-aware classification of metagenomic reads using minhashing.
- Ausgabe der Zeitschrift
- 33
Datenquelle: DBLP
- Autoren
- André Müller
- Christian Hundt
- Andreas Hildebrandt
- Thomas Hankeln
- Bertil Schmidt
- Zeitschrift
- Bioinformatics
- Artikelnummer
- 23
- Paginierung
- 3740 - 3748
- Datum der Veröffentlichung
- 2017
- Herausgeber
- Oxford University Press
- Datum der Datenerfassung
- 2020
- Titel
- MetaCache: context-aware classification of metagenomic reads using minhashing
- Sub types
- article
- Ausgabe der Zeitschrift
- 33
Datenquelle: Manual
- Beziehungen:
- Eigentum von