RabbitFX: Efficient Framework for FASTA/Q File Parsing on Modern Multi-Core Platforms
- Publication type:
- Journal article
- Metadata:
-
- Autoren
- Hao Zhang
- Honglei Song
- Xiaoming Xu
- Qixin Chang
- Mingkai Wang
- Yanjie Wei
- Zekun Yin
- Bertil Schmidt
- Weiguo Liu
- Autoren-URL
- https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=fis-test-1&SrcAuth=WosAPI&KeyUT=WOS:001006656100064&DestLinkType=FullRecord&DestApp=WOS_CPL
- DOI
- 10.1109/TCBB.2022.3219114
- eISSN
- 1557-9964
- Externe Identifier
- Clarivate Analytics Document Solution ID: J0NJ1
- PubMed Identifier: 36327193
- ISSN
- 1545-5963
- Ausgabe der Veröffentlichung
- 3
- Zeitschrift
- IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
- Schlüsselwörter
- Sequential analysis
- Instruction sets
- Task analysis
- Bioinformatics
- Runtime
- Codes
- Arrays
- Next generation sequencing
- FASTQ
- FASTA
- I
- O
- file parsing
- multi-core CPUs
- HPC
- Paginierung
- 2341 - 2348
- Datum der Veröffentlichung
- 2023
- Status
- Published
- Titel
- RabbitFX: Efficient Framework for FASTA/Q File Parsing on Modern Multi-Core Platforms
- Sub types
- Article
- Ausgabe der Zeitschrift
- 20
Data source: Web of Science (Lite)
- Other metadata sources:
-
- Autoren
- Hao Zhang
- Honglei Song
- Xiaoming Xu
- Qixin Chang
- Mingkai Wang
- Yanjie Wei
- Zekun Yin
- Bertil Schmidt
- Weiguo Liu
- DOI
- 10.1109/tcbb.2022.3219114
- eISSN
- 1557-9964
- ISSN
- 1545-5963
- Ausgabe der Veröffentlichung
- 3
- Zeitschrift
- IEEE/ACM Transactions on Computational Biology and Bioinformatics
- Paginierung
- 2341 - 2348
- Datum der Veröffentlichung
- 2023
- Status
- Published
- Herausgeber
- Institute of Electrical and Electronics Engineers (IEEE)
- Herausgeber URL
- http://dx.doi.org/10.1109/tcbb.2022.3219114
- Datum der Datenerfassung
- 2024
- Titel
- RabbitFX: Efficient Framework for FASTA/Q File Parsing on Modern Multi-Core Platforms
- Ausgabe der Zeitschrift
- 20
Data source: Crossref
- Abstract
- The continuous growth of generated sequencing data leads to the development of a variety of associated bioinformatics tools. However, many of them are not able to fully exploit the resources of modern multi-core systems since they are bottlenecked by parsing files leading to slow execution times. This motivates the design of an efficient method for parsing sequencing data that can exploit the power of modern hardware, especially for modern CPUs with fast storage devices. We have developed RabbitFX, a fast, efficient, and easy-to-use framework for processing biological sequencing data on modern multi-core platforms. It can efficiently read FASTA and FASTQ files by combining a lightweight parsing method by means of an optimized formatting implementation. Furthermore, we provide user-friendly and modularized C++ APIs that can be easily integrated into applications in order to increase their file parsing speed. As proof-of-concept, we have integrated RabbitFX into three I/O-intensive applications: fastp, Ktrim, and Mash. Our evaluation shows that the inclusion of RabbitFX leads to speedups of at least 11.6 (6.6), 2.4 (2.4), and 3.7 (3.2) compared to the original versions on plain (gzip-compressed) files, respectively. These case studies demonstrate that RabbitFX can be easily integrated into a variety of NGS analysis tools to significantly reduce associated runtimes. It is open source software available at https://github.com/RabbitBio/RabbitFX.
- Autoren
- Hao Zhang
- Honglei Song
- Xiaoming Xu
- Qixin Chang
- Mingkai Wang
- Yanjie Wei
- Zekun Yin
- Bertil Schmidt
- Weiguo Liu
- DOI
- 10.1109/tcbb.2022.3219114
- eISSN
- 1557-9964
- Externe Identifier
- PubMed Identifier: 36327193
- Funding acknowledgements
- Deutsche Forschungsgemeinschaft:
- Key Project of Joint Fund of Shandong Province: ZR2019LZH007
- National Natural Science Foundation of China: 62102231
- Ministry of Education, China:
- PPP project from CSC and DAAD:
- Natural Science Foundation of Shandong Province: ZR2021QF089
- Engineering Research Center of Digital Media Technology:
- National Natural Science Foundation of China: 61972231
- Open access
- false
- ISSN
- 1545-5963
- Ausgabe der Veröffentlichung
- 3
- Zeitschrift
- IEEE/ACM transactions on computational biology and bioinformatics
- Schlüsselwörter
- Computational Biology
- Software
- High-Throughput Nucleotide Sequencing
- Sprache
- eng
- Medium
- Print-Electronic
- Online publication date
- 2023
- Paginierung
- 2341 - 2348
- Datum der Veröffentlichung
- 2023
- Status
- Published
- Datum der Datenerfassung
- 2022
- Titel
- RabbitFX: Efficient Framework for FASTA/Q File Parsing on Modern Multi-Core Platforms.
- Sub types
- Research Support, Non-U.S. Gov't
- Journal Article
- Ausgabe der Zeitschrift
- 20
Data source: Europe PubMed Central
- Abstract
- The continuous growth of generated sequencing data leads to the development of a variety of associated bioinformatics tools. However, many of them are not able to fully exploit the resources of modern multi-core systems since they are bottlenecked by parsing files leading to slow execution times. This motivates the design of an efficient method for parsing sequencing data that can exploit the power of modern hardware, especially for modern CPUs with fast storage devices. We have developed RabbitFX, a fast, efficient, and easy-to-use framework for processing biological sequencing data on modern multi-core platforms. It can efficiently read FASTA and FASTQ files by combining a lightweight parsing method by means of an optimized formatting implementation. Furthermore, we provide user-friendly and modularized C++ APIs that can be easily integrated into applications in order to increase their file parsing speed. As proof-of-concept, we have integrated RabbitFX into three I/O-intensive applications: fastp, Ktrim, and Mash. Our evaluation shows that the inclusion of RabbitFX leads to speedups of at least 11.6 (6.6), 2.4 (2.4), and 3.7 (3.2) compared to the original versions on plain (gzip-compressed) files, respectively. These case studies demonstrate that RabbitFX can be easily integrated into a variety of NGS analysis tools to significantly reduce associated runtimes. It is open source software available at https://github.com/RabbitBio/RabbitFX.
- Autoren
- Hao Zhang
- Honglei Song
- Xiaoming Xu
- Qixin Chang
- Mingkai Wang
- Yanjie Wei
- Zekun Yin
- Bertil Schmidt
- Weiguo Liu
- Autoren-URL
- https://www.ncbi.nlm.nih.gov/pubmed/36327193
- DOI
- 10.1109/TCBB.2022.3219114
- eISSN
- 1557-9964
- Ausgabe der Veröffentlichung
- 3
- Zeitschrift
- IEEE/ACM Trans Comput Biol Bioinform
- Schlüsselwörter
- Software
- Computational Biology
- High-Throughput Nucleotide Sequencing
- Sprache
- eng
- Country
- United States
- Paginierung
- 2341 - 2348
- Datum der Veröffentlichung
- 2023
- Status
- Published
- Datum, an dem der Datensatz öffentlich gemacht wurde
- 2023
- Titel
- RabbitFX: Efficient Framework for FASTA/Q File Parsing on Modern Multi-Core Platforms.
- Sub types
- Journal Article
- Research Support, Non-U.S. Gov't
- Ausgabe der Zeitschrift
- 20
Data source: PubMed
- Autoren
- Hao Zhang
- Honglei Song
- Xiaoming Xu
- Qixin Chang
- Mingkai Wang
- Yanjie Wei
- Zekun Yin
- Bertil Schmidt
- Weiguo Liu
- Zeitschrift
- IEEE ACM Trans. Comput. Biol. Bioinform.
- Artikelnummer
- 3
- Paginierung
- 2341 - 2348
- Datum der Veröffentlichung
- 2023
- Titel
- RabbitFX: Efficient Framework for FASTA/Q File Parsing on Modern Multi-Core Platforms.
- Ausgabe der Zeitschrift
- 20
Data source: DBLP
- Beziehungen:
- Property of