Parallelized short read assembly of large genomes using de Bruijn graphs

Publikationstyp:

Zeitschriftenaufsatz

Metadaten:

Autoren

Yongchao Liu
Bertil Schmidt
Douglas L Maskell

Sammlungen

metadata

ISSN

1471-2105

Zeitschrift

BMC bioinformatics

Schlüsselwörter

600 Technik
600 Technology (Applied sciences)

Sprache

eng

Paginierung

Art. 354

Datum der Veröffentlichung

2011

Herausgeber

BioMed Central

Herausgeber URL

http://dx.doi.org/10.1186/1471-2105-12-354

Datum der Datenerfassung

2020

Datum, an dem der Datensatz öffentlich gemacht wurde

2020

Zugang

Public

Titel

Parallelized short read assembly of large genomes using de Bruijn graphs

Ausgabe der Zeitschrift

Datenquelle: METADATA.UB

Andere Metadatenquellen:

Autoren

Yongchao Liu
Bertil Schmidt
Douglas L Maskell

Autoren-URL

https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=fis-test-1&SrcAuth=WosAPI&KeyUT=WOS:000294558500001&DestLinkType=FullRecord&DestApp=WOS_CPL

DOI

10.1186/1471-2105-12-354

Externe Identifier

Clarivate Analytics Document Solution ID: 815VQ
PubMed Identifier: 21867511

ISSN

1471-2105

Zeitschrift

BMC BIOINFORMATICS

Artikelnummer

ARTN 354

Datum der Veröffentlichung

2011

Status

Published

Titel

Parallelized short read assembly of large genomes using de Bruijn graphs

Sub types

Article

Ausgabe der Zeitschrift

Datenquelle: Web of Science (Lite)

Autoren

Yongchao Liu
Bertil Schmidt
Douglas L Maskell

DOI

10.1186/1471-2105-12-354

eISSN

1471-2105

Ausgabe der Veröffentlichung

Zeitschrift

BMC Bioinformatics

Sprache

Artikelnummer

354

Online publication date

2011

Datum der Veröffentlichung

2011

Status

Published

Herausgeber

Springer Science and Business Media LLC

Herausgeber URL

http://dx.doi.org/10.1186/1471-2105-12-354

Datum der Datenerfassung

2019

Titel

Parallelized short read assembly of large genomes using de Bruijn graphs

Ausgabe der Zeitschrift

Datenquelle: Crossref

Abstract

<h4>Background</h4>Next-generation sequencing technologies have given rise to the explosive increase in DNA sequencing throughput, and have promoted the recent development of de novo short read assemblers. However, existing assemblers require high execution times and a large amount of compute resources to assemble large genomes from quantities of short reads.<h4>Results</h4>We present PASHA, a parallelized short read assembler using de Bruijn graphs, which takes advantage of hybrid computing architectures consisting of both shared-memory multi-core CPUs and distributed-memory compute clusters to gain efficiency and scalability. Evaluation using three small-scale real paired-end datasets shows that PASHA is able to produce more contiguous high-quality assemblies in shorter time compared to three leading assemblers: Velvet, ABySS and SOAPdenovo. PASHA's scalability for large genome datasets is demonstrated with human genome assembly. Compared to ABySS, PASHA achieves competitive assembly quality with faster execution speed on the same compute resources, yielding an NG50 contig size of 503 with the longest correct contig size of 18,252, and an NG50 scaffold size of 2,294. Moreover, the human assembly is completed in about 21 hours with only modest compute resources.<h4>Conclusions</h4>Developing parallel assemblers for large genomes has been garnering significant research efforts due to the explosive size growth of high-throughput short read datasets. By employing hybrid parallelism consisting of multi-threading on multi-core CPUs and message passing on compute clusters, PASHA is able to assemble the human genome with high quality and in reasonable time using modest compute resources.

Addresses

School of Computer Engineering, Nanyang Technological University, Singapore. liuy0039@ntu.edu.sg

Autoren

Yongchao Liu
Bertil Schmidt
Douglas L Maskell

DOI

10.1186/1471-2105-12-354

eISSN

1471-2105

Externe Identifier

PubMed Identifier: 21867511
PubMed Central ID: PMC3167803

Open access

true

ISSN

1471-2105

Zeitschrift

BMC bioinformatics

Schlüsselwörter

Humans
Bacteria
Computational Biology
Genome
Genome, Human
Software
High-Throughput Nucleotide Sequencing

Sprache

eng

Medium

Electronic

Online publication date

2011

Open access status

Open Access

Paginierung

354

Datum der Veröffentlichung

2011

Status

Published

Publisher licence

CC BY

Datum der Datenerfassung

2011

Titel

Parallelized short read assembly of large genomes using de Bruijn graphs.

Sub types

research-article
Journal Article

Ausgabe der Zeitschrift

Files

https://bmcbioinformatics.biomedcentral.com/counter/pdf/10.1186/1471-2105-12-354 https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/21867511/pdf/?tool=EBI https://europepmc.org/articles/PMC3167803?pdf=render

Datenquelle: Europe PubMed Central

Abstract

BACKGROUND: Next-generation sequencing technologies have given rise to the explosive increase in DNA sequencing throughput, and have promoted the recent development of de novo short read assemblers. However, existing assemblers require high execution times and a large amount of compute resources to assemble large genomes from quantities of short reads. RESULTS: We present PASHA, a parallelized short read assembler using de Bruijn graphs, which takes advantage of hybrid computing architectures consisting of both shared-memory multi-core CPUs and distributed-memory compute clusters to gain efficiency and scalability. Evaluation using three small-scale real paired-end datasets shows that PASHA is able to produce more contiguous high-quality assemblies in shorter time compared to three leading assemblers: Velvet, ABySS and SOAPdenovo. PASHA's scalability for large genome datasets is demonstrated with human genome assembly. Compared to ABySS, PASHA achieves competitive assembly quality with faster execution speed on the same compute resources, yielding an NG50 contig size of 503 with the longest correct contig size of 18,252, and an NG50 scaffold size of 2,294. Moreover, the human assembly is completed in about 21 hours with only modest compute resources. CONCLUSIONS: Developing parallel assemblers for large genomes has been garnering significant research efforts due to the explosive size growth of high-throughput short read datasets. By employing hybrid parallelism consisting of multi-threading on multi-core CPUs and message passing on compute clusters, PASHA is able to assemble the human genome with high quality and in reasonable time using modest compute resources.

Date of acceptance

2011

Autoren

Yongchao Liu
Bertil Schmidt
Douglas L Maskell

Autoren-URL

https://www.ncbi.nlm.nih.gov/pubmed/21867511

DOI

10.1186/1471-2105-12-354

eISSN

1471-2105

Externe Identifier

PubMed Central ID: PMC3167803

Zeitschrift

BMC Bioinformatics

Schlüsselwörter

Bacteria
Computational Biology
Genome
Genome, Human
High-Throughput Nucleotide Sequencing
Humans
Software

Sprache

eng

Country

England

Paginierung

354

PII

1471-2105-12-354

Datum der Veröffentlichung

2011

Status

Published online

Datum, an dem der Datensatz öffentlich gemacht wurde

2011

Titel

Parallelized short read assembly of large genomes using de Bruijn graphs.

Sub types

Journal Article

Ausgabe der Zeitschrift

Datenquelle: PubMed

Autoren

Yongchao Liu
Bertil Schmidt
Douglas L Maskell

Zeitschrift

BMC Bioinform.

Paginierung

354 - 354

Datum der Veröffentlichung

2011

Titel

Parallelized short read assembly of large genomes using de Bruijn graphs.

Ausgabe der Zeitschrift

Datenquelle: DBLP

Beziehungen:

Eigentum von

High Performance Computing

Parallelized short read assembly of large genomes using de Bruijn graphs

Files

Werkzeuge