RabbitQCPlus 2.0: More efficient and versatile quality control for sequencing data
- Publication type:
- Journal article
- Metadata:
-
- Autoren
- Lifeng Yan
- Zekun Yin
- Hao Zhang
- Zhan Zhao
- Mingkai Wang
- André Müller
- Felix Kallenborn
- Alexander Wichmann
- Yanjie Wei
- Beifang Niu
- Bertil Schmidt
- Weiguo Liu
- DOI
- 10.1016/j.ymeth.2023.06.007
- ISSN
- 1046-2023
- Zeitschrift
- Methods
- Sprache
- en
- Paginierung
- 39 - 50
- Datum der Veröffentlichung
- 2023
- Status
- Published
- Herausgeber
- Elsevier BV
- Herausgeber URL
- http://dx.doi.org/10.1016/j.ymeth.2023.06.007
- Datum der Datenerfassung
- 2023
- Titel
- RabbitQCPlus 2.0: More efficient and versatile quality control for sequencing data
- Ausgabe der Zeitschrift
- 216
Data source: Crossref
- Other metadata sources:
-
- Abstract
- Assessing the quality of sequencing data plays a crucial role in downstream data analysis. However, existing tools often achieve sub-optimal efficiency, especially when dealing with compressed files or performing complicated quality control operations such as over-representation analysis and error correction. We present RabbitQCPlus, an ultra-efficient quality control tool for modern multi-core systems. RabbitQCPlus uses vectorization, memory copy reduction, parallel (de)compression, and optimized data structures to achieve substantial performance gains. It is 1.1 to 5.4 times faster when performing basic quality control operations compared to state-of-the-art applications yet requires fewer compute resources. Moreover, RabbitQCPlus is at least 4 times faster than other applications when processing gzip-compressed FASTQ files and 1.3 times faster with the error correction module turned on. Furthermore, it takes less than 4 minutes to process 280 GB of plain FASTQ sequencing data, while other applications take at least 22 minutes on a 48-core server when enabling the per-read over-representation analysis. C++ sources are available at https://github.com/RabbitBio/RabbitQCPlus.
- Addresses
- School of Software, Shandong University, Jinan, China.
- Autoren
- Lifeng Yan
- Zekun Yin
- Hao Zhang
- Zhan Zhao
- Mingkai Wang
- André Müller
- Felix Kallenborn
- Alexander Wichmann
- Yanjie Wei
- Beifang Niu
- Bertil Schmidt
- Weiguo Liu
- DOI
- 10.1016/j.ymeth.2023.06.007
- eISSN
- 1095-9130
- Externe Identifier
- PubMed Identifier: 37330158
- Funding acknowledgements
- Natural Science Foundation of Shandong Province: ZR2021QF089
- Deutsche Forschungsgemeinschaft: 439669440 TRR319 RMaP TP C01
- National Natural Science Foundation of China: 62102231
- National Natural Science Foundation of China: 61972231
- Ministry of Education of the People's Republic of China:
- Open access
- false
- ISSN
- 1046-2023
- Zeitschrift
- Methods (San Diego, Calif.)
- Schlüsselwörter
- Sequence Analysis, DNA
- Algorithms
- Quality Control
- Data Compression
- Software
- High-Throughput Nucleotide Sequencing
- Sprache
- eng
- Medium
- Print-Electronic
- Online publication date
- 2023
- Paginierung
- 39 - 50
- Datum der Veröffentlichung
- 2023
- Status
- Published
- Datum der Datenerfassung
- 2023
- Titel
- RabbitQCPlus 2.0: More efficient and versatile quality control for sequencing data.
- Sub types
- Research Support, Non-U.S. Gov't
- Journal Article
- Ausgabe der Zeitschrift
- 216
Data source: Europe PubMed Central
- Abstract
- Assessing the quality of sequencing data plays a crucial role in downstream data analysis. However, existing tools often achieve sub-optimal efficiency, especially when dealing with compressed files or performing complicated quality control operations such as over-representation analysis and error correction. We present RabbitQCPlus, an ultra-efficient quality control tool for modern multi-core systems. RabbitQCPlus uses vectorization, memory copy reduction, parallel (de)compression, and optimized data structures to achieve substantial performance gains. It is 1.1 to 5.4 times faster when performing basic quality control operations compared to state-of-the-art applications yet requires fewer compute resources. Moreover, RabbitQCPlus is at least 4 times faster than other applications when processing gzip-compressed FASTQ files and 1.3 times faster with the error correction module turned on. Furthermore, it takes less than 4 minutes to process 280 GB of plain FASTQ sequencing data, while other applications take at least 22 minutes on a 48-core server when enabling the per-read over-representation analysis. C++ sources are available at https://github.com/RabbitBio/RabbitQCPlus.
- Date of acceptance
- 2023
- Autoren
- Lifeng Yan
- Zekun Yin
- Hao Zhang
- Zhan Zhao
- Mingkai Wang
- André Müller
- Felix Kallenborn
- Alexander Wichmann
- Yanjie Wei
- Beifang Niu
- Bertil Schmidt
- Weiguo Liu
- Autoren-URL
- https://www.ncbi.nlm.nih.gov/pubmed/37330158
- DOI
- 10.1016/j.ymeth.2023.06.007
- eISSN
- 1095-9130
- Zeitschrift
- Methods
- Schlüsselwörter
- Error correction
- Gzip-compressed
- HPC
- Over-representation
- Quality control
- Sequencing data
- Vectorization
- Software
- High-Throughput Nucleotide Sequencing
- Data Compression
- Quality Control
- Algorithms
- Sequence Analysis, DNA
- Sprache
- eng
- Country
- United States
- Paginierung
- 39 - 50
- PII
- S1046-2023(23)00099-3
- Datum der Veröffentlichung
- 2023
- Status
- Published
- Datum, an dem der Datensatz öffentlich gemacht wurde
- 2023
- Titel
- RabbitQCPlus 2.0: More efficient and versatile quality control for sequencing data.
- Sub types
- Journal Article
- Research Support, Non-U.S. Gov't
- Ausgabe der Zeitschrift
- 216
Data source: PubMed
- Beziehungen:
- Property of