Parallelized short read assembly of large genomes using de Bruijn graphs
- Publikationstyp:
- Zeitschriftenaufsatz
- Metadaten:
-
- Autoren
- Yongchao Liu
- Bertil Schmidt
- Douglas L Maskell
- Sammlungen
- metadata
- ISSN
- 1471-2105
- Zeitschrift
- BMC bioinformatics
- Schlüsselwörter
- 600 Technik
- 600 Technology (Applied sciences)
- Sprache
- eng
- Paginierung
- Art. 354
- Datum der Veröffentlichung
- 2011
- Herausgeber
- BioMed Central
- Herausgeber URL
- http://dx.doi.org/10.1186/1471-2105-12-354
- Datum der Datenerfassung
- 2020
- Datum, an dem der Datensatz öffentlich gemacht wurde
- 2020
- Zugang
- Public
- Titel
- Parallelized short read assembly of large genomes using de Bruijn graphs
- Ausgabe der Zeitschrift
- 12
Datenquelle: METADATA.UB
- Andere Metadatenquellen:
-
- Autoren
- Yongchao Liu
- Bertil Schmidt
- Douglas L Maskell
- Autoren-URL
- https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=fis-test-1&SrcAuth=WosAPI&KeyUT=WOS:000294558500001&DestLinkType=FullRecord&DestApp=WOS_CPL
- DOI
- 10.1186/1471-2105-12-354
- Externe Identifier
- Clarivate Analytics Document Solution ID: 815VQ
- PubMed Identifier: 21867511
- ISSN
- 1471-2105
- Zeitschrift
- BMC BIOINFORMATICS
- Artikelnummer
- ARTN 354
- Datum der Veröffentlichung
- 2011
- Status
- Published
- Titel
- Parallelized short read assembly of large genomes using de Bruijn graphs
- Sub types
- Article
- Ausgabe der Zeitschrift
- 12
Datenquelle: Web of Science (Lite)
- Autoren
- Yongchao Liu
- Bertil Schmidt
- Douglas L Maskell
- DOI
- 10.1186/1471-2105-12-354
- eISSN
- 1471-2105
- Ausgabe der Veröffentlichung
- 1
- Zeitschrift
- BMC Bioinformatics
- Sprache
- en
- Artikelnummer
- 354
- Online publication date
- 2011
- Datum der Veröffentlichung
- 2011
- Status
- Published
- Herausgeber
- Springer Science and Business Media LLC
- Herausgeber URL
- http://dx.doi.org/10.1186/1471-2105-12-354
- Datum der Datenerfassung
- 2019
- Titel
- Parallelized short read assembly of large genomes using de Bruijn graphs
- Ausgabe der Zeitschrift
- 12
Datenquelle: Crossref
- Abstract
- <h4>Background</h4>Next-generation sequencing technologies have given rise to the explosive increase in DNA sequencing throughput, and have promoted the recent development of de novo short read assemblers. However, existing assemblers require high execution times and a large amount of compute resources to assemble large genomes from quantities of short reads.<h4>Results</h4>We present PASHA, a parallelized short read assembler using de Bruijn graphs, which takes advantage of hybrid computing architectures consisting of both shared-memory multi-core CPUs and distributed-memory compute clusters to gain efficiency and scalability. Evaluation using three small-scale real paired-end datasets shows that PASHA is able to produce more contiguous high-quality assemblies in shorter time compared to three leading assemblers: Velvet, ABySS and SOAPdenovo. PASHA's scalability for large genome datasets is demonstrated with human genome assembly. Compared to ABySS, PASHA achieves competitive assembly quality with faster execution speed on the same compute resources, yielding an NG50 contig size of 503 with the longest correct contig size of 18,252, and an NG50 scaffold size of 2,294. Moreover, the human assembly is completed in about 21 hours with only modest compute resources.<h4>Conclusions</h4>Developing parallel assemblers for large genomes has been garnering significant research efforts due to the explosive size growth of high-throughput short read datasets. By employing hybrid parallelism consisting of multi-threading on multi-core CPUs and message passing on compute clusters, PASHA is able to assemble the human genome with high quality and in reasonable time using modest compute resources.
- Addresses
- School of Computer Engineering, Nanyang Technological University, Singapore. liuy0039@ntu.edu.sg
- Autoren
- Yongchao Liu
- Bertil Schmidt
- Douglas L Maskell
- DOI
- 10.1186/1471-2105-12-354
- eISSN
- 1471-2105
- Externe Identifier
- PubMed Identifier: 21867511
- PubMed Central ID: PMC3167803
- Open access
- true
- ISSN
- 1471-2105
- Zeitschrift
- BMC bioinformatics
- Schlüsselwörter
- Humans
- Bacteria
- Computational Biology
- Genome
- Genome, Human
- Software
- High-Throughput Nucleotide Sequencing
- Sprache
- eng
- Medium
- Electronic
- Online publication date
- 2011
- Open access status
- Open Access
- Paginierung
- 354
- Datum der Veröffentlichung
- 2011
- Status
- Published
- Publisher licence
- CC BY
- Datum der Datenerfassung
- 2011
- Titel
- Parallelized short read assembly of large genomes using de Bruijn graphs.
- Sub types
- research-article
- Journal Article
- Ausgabe der Zeitschrift
- 12
Files
https://bmcbioinformatics.biomedcentral.com/counter/pdf/10.1186/1471-2105-12-354 https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/21867511/pdf/?tool=EBI https://europepmc.org/articles/PMC3167803?pdf=render
Datenquelle: Europe PubMed Central
- Abstract
- BACKGROUND: Next-generation sequencing technologies have given rise to the explosive increase in DNA sequencing throughput, and have promoted the recent development of de novo short read assemblers. However, existing assemblers require high execution times and a large amount of compute resources to assemble large genomes from quantities of short reads. RESULTS: We present PASHA, a parallelized short read assembler using de Bruijn graphs, which takes advantage of hybrid computing architectures consisting of both shared-memory multi-core CPUs and distributed-memory compute clusters to gain efficiency and scalability. Evaluation using three small-scale real paired-end datasets shows that PASHA is able to produce more contiguous high-quality assemblies in shorter time compared to three leading assemblers: Velvet, ABySS and SOAPdenovo. PASHA's scalability for large genome datasets is demonstrated with human genome assembly. Compared to ABySS, PASHA achieves competitive assembly quality with faster execution speed on the same compute resources, yielding an NG50 contig size of 503 with the longest correct contig size of 18,252, and an NG50 scaffold size of 2,294. Moreover, the human assembly is completed in about 21 hours with only modest compute resources. CONCLUSIONS: Developing parallel assemblers for large genomes has been garnering significant research efforts due to the explosive size growth of high-throughput short read datasets. By employing hybrid parallelism consisting of multi-threading on multi-core CPUs and message passing on compute clusters, PASHA is able to assemble the human genome with high quality and in reasonable time using modest compute resources.
- Date of acceptance
- 2011
- Autoren
- Yongchao Liu
- Bertil Schmidt
- Douglas L Maskell
- Autoren-URL
- https://www.ncbi.nlm.nih.gov/pubmed/21867511
- DOI
- 10.1186/1471-2105-12-354
- eISSN
- 1471-2105
- Externe Identifier
- PubMed Central ID: PMC3167803
- Zeitschrift
- BMC Bioinformatics
- Schlüsselwörter
- Bacteria
- Computational Biology
- Genome
- Genome, Human
- High-Throughput Nucleotide Sequencing
- Humans
- Software
- Sprache
- eng
- Country
- England
- Paginierung
- 354
- PII
- 1471-2105-12-354
- Datum der Veröffentlichung
- 2011
- Status
- Published online
- Datum, an dem der Datensatz öffentlich gemacht wurde
- 2011
- Titel
- Parallelized short read assembly of large genomes using de Bruijn graphs.
- Sub types
- Journal Article
- Ausgabe der Zeitschrift
- 12
Datenquelle: PubMed
- Autoren
- Yongchao Liu
- Bertil Schmidt
- Douglas L Maskell
- Zeitschrift
- BMC Bioinform.
- Paginierung
- 354 - 354
- Datum der Veröffentlichung
- 2011
- Titel
- Parallelized short read assembly of large genomes using de Bruijn graphs.
- Ausgabe der Zeitschrift
- 12
Datenquelle: DBLP
- Beziehungen:
- Eigentum von