Fast and efficient short read mapping based on a succinct hash index
- Publikationstyp:
- Zeitschriftenaufsatz
- Metadaten:
-
- Autoren
- Haowen Zhang
- Yuandong Chan
- Kaichao Fan
- Bertil Schmidt
- Weiguo Liu
- Autoren-URL
- https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=fis-test-1&SrcAuth=WosAPI&KeyUT=WOS:000427155500002&DestLinkType=FullRecord&DestApp=WOS_CPL
- DOI
- 10.1186/s12859-018-2094-5
- Externe Identifier
- Clarivate Analytics Document Solution ID: FY8ZV
- PubMed Identifier: 29523083
- ISSN
- 1471-2105
- Zeitschrift
- BMC BIOINFORMATICS
- Schlüsselwörter
- Next-generation sequencing
- Read mapping
- Hash index
- Seed selection
- Artikelnummer
- ARTN 92
- Datum der Veröffentlichung
- 2018
- Status
- Published
- Titel
- Fast and efficient short read mapping based on a succinct hash index
- Sub types
- Article
- Ausgabe der Zeitschrift
- 19
Datenquelle: Web of Science (Lite)
- Andere Metadatenquellen:
-
- Autoren
- Haowen Zhang
- Yuandong Chan
- Kaichao Fan
- Bertil Schmidt
- Weiguo Liu
- DOI
- 10.1186/s12859-018-2094-5
- eISSN
- 1471-2105
- Ausgabe der Veröffentlichung
- 1
- Zeitschrift
- BMC Bioinformatics
- Sprache
- en
- Artikelnummer
- 92
- Online publication date
- 2018
- Datum der Veröffentlichung
- 2018
- Status
- Published
- Herausgeber
- Springer Science and Business Media LLC
- Herausgeber URL
- http://dx.doi.org/10.1186/s12859-018-2094-5
- Datum der Datenerfassung
- 2019
- Titel
- Fast and efficient short read mapping based on a succinct hash index
- Ausgabe der Zeitschrift
- 19
Datenquelle: Crossref
- Abstract
- <h4>Background</h4>Various indexing techniques have been applied by next generation sequencing read mapping tools. The choice of a particular data structure is a trade-off between memory consumption, mapping throughput, and construction time.<h4>Results</h4>We present the succinct hash index - a novel data structure for read mapping which is a variant of the classical q-gram index with a particularly small memory footprint occupying between 3.5 and 5.3 GB for a human reference genome for typical parameter settings. The succinct hash index features two novel seed selection algorithms (group seeding and variable-length seeding) and an efficient parallel construction algorithm, which we have implemented to design the FEM (Fast(F) and Efficient(E) read Mapper(M)) mapper. FEM can return all read mappings within a given edit distance. Our experimental results show that FEM is scalable and outperforms other state-of-the-art all-mappers in terms of both speed and memory footprint. Compared to Masai, FEM is an order-of-magnitude faster using a single thread and two orders-of-magnitude faster when using multiple threads. Furthermore, we observe an up to 2.8-fold speedup compared to BitMapper and an order-of-magnitude speedup compared to BitMapper2 and Hobbes3.<h4>Conclusions</h4>The presented succinct index is the first feasible implementation of the q-gram index functionality that occupies around 3.5 GB of memory for a whole human reference genome. FEM is freely available at https://github.com/haowenz/FEM .
- Addresses
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, 30332, GA, USA.
- Autoren
- Haowen Zhang
- Yuandong Chan
- Kaichao Fan
- Bertil Schmidt
- Weiguo Liu
- DOI
- 10.1186/s12859-018-2094-5
- eISSN
- 1471-2105
- Externe Identifier
- PubMed Identifier: 29523083
- PubMed Central ID: PMC5845352
- Open access
- true
- ISSN
- 1471-2105
- Ausgabe der Veröffentlichung
- 1
- Zeitschrift
- BMC bioinformatics
- Schlüsselwörter
- Humans
- Sequence Analysis, DNA
- Base Sequence
- Base Pairing
- Genome, Human
- Algorithms
- Computer Simulation
- Software
- Databases, Genetic
- Sprache
- eng
- Medium
- Electronic
- Online publication date
- 2018
- Open access status
- Open Access
- Paginierung
- 92
- Datum der Veröffentlichung
- 2018
- Status
- Published
- Publisher licence
- CC BY
- Datum der Datenerfassung
- 2018
- Titel
- Fast and efficient short read mapping based on a succinct hash index.
- Sub types
- Research Support, Non-U.S. Gov't
- research-article
- Journal Article
- Ausgabe der Zeitschrift
- 19
Files
https://europepmc.org/articles/PMC5845352?pdf=render
Datenquelle: Europe PubMed Central
- Abstract
- BACKGROUND: Various indexing techniques have been applied by next generation sequencing read mapping tools. The choice of a particular data structure is a trade-off between memory consumption, mapping throughput, and construction time. RESULTS: We present the succinct hash index - a novel data structure for read mapping which is a variant of the classical q-gram index with a particularly small memory footprint occupying between 3.5 and 5.3 GB for a human reference genome for typical parameter settings. The succinct hash index features two novel seed selection algorithms (group seeding and variable-length seeding) and an efficient parallel construction algorithm, which we have implemented to design the FEM (Fast(F) and Efficient(E) read Mapper(M)) mapper. FEM can return all read mappings within a given edit distance. Our experimental results show that FEM is scalable and outperforms other state-of-the-art all-mappers in terms of both speed and memory footprint. Compared to Masai, FEM is an order-of-magnitude faster using a single thread and two orders-of-magnitude faster when using multiple threads. Furthermore, we observe an up to 2.8-fold speedup compared to BitMapper and an order-of-magnitude speedup compared to BitMapper2 and Hobbes3. CONCLUSIONS: The presented succinct index is the first feasible implementation of the q-gram index functionality that occupies around 3.5 GB of memory for a whole human reference genome. FEM is freely available at https://github.com/haowenz/FEM .
- Date of acceptance
- 2018
- Autoren
- Haowen Zhang
- Yuandong Chan
- Kaichao Fan
- Bertil Schmidt
- Weiguo Liu
- Autoren-URL
- https://www.ncbi.nlm.nih.gov/pubmed/29523083
- DOI
- 10.1186/s12859-018-2094-5
- eISSN
- 1471-2105
- Externe Identifier
- PubMed Central ID: PMC5845352
- Ausgabe der Veröffentlichung
- 1
- Zeitschrift
- BMC Bioinformatics
- Schlüsselwörter
- Hash index
- Next-generation sequencing
- Read mapping
- Seed selection
- Algorithms
- Base Pairing
- Base Sequence
- Computer Simulation
- Databases, Genetic
- Genome, Human
- Humans
- Sequence Analysis, DNA
- Software
- Sprache
- eng
- Country
- England
- Paginierung
- 92
- PII
- 10.1186/s12859-018-2094-5
- Datum der Veröffentlichung
- 2018
- Status
- Published online
- Datum, an dem der Datensatz öffentlich gemacht wurde
- 2018
- Titel
- Fast and efficient short read mapping based on a succinct hash index.
- Sub types
- Journal Article
- Research Support, Non-U.S. Gov't
- Ausgabe der Zeitschrift
- 19
Datenquelle: PubMed
- Autoren
- Haowen Zhang
- Yuandong Chan
- Kaichao Fan
- Bertil Schmidt
- Weiguo Liu
- Zeitschrift
- BMC Bioinform.
- Artikelnummer
- 1
- Paginierung
- 92:1 - 92:1
- Datum der Veröffentlichung
- 2018
- Titel
- Fast and efficient short read mapping based on a succinct hash index.
- Ausgabe der Zeitschrift
- 19
Datenquelle: DBLP
- Beziehungen:
- Eigentum von