Computational space reduction and parallelization of a new clustering approach for large groups of sequences

Publikationstyp:

Zeitschriftenaufsatz

Metadaten:

Autoren

O Trelles
MA Andrade
A Valencia
EL Zapata
JM Carazo

Autoren-URL

https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=fis-test-1&SrcAuth=WosAPI&KeyUT=WOS:000075133400008&DestLinkType=FullRecord&DestApp=WOS_CPL

DOI

10.1093/bioinformatics/14.5.439

Externe Identifier

Clarivate Analytics Document Solution ID: 106MM
PubMed Identifier: 9682057

ISSN

1367-4803

Ausgabe der Veröffentlichung

Zeitschrift

BIOINFORMATICS

Paginierung

439 - 451

Datum der Veröffentlichung

1998

Status

Published

Titel

Computational space reduction and parallelization of a new clustering approach for large groups of sequences

Sub types

Article

Ausgabe der Zeitschrift

Datenquelle: Web of Science (Lite)

Andere Metadatenquellen:

Abstract

Abstract MOTIVATION: The explosive growth of the biological sequences databases stimulated by genome projects has modified the framework of several applications in the biological sequence analysis area. In most cases, this new scenario is characterized by studies on large sets of sequences, suggesting the need for effective and automatic methods for their clustering. A more effective clustering of the database could be followed by the application of common family analysis schemes to the groups so formed. RESULTS: In this work, we present a new strategy to reduce the computational cost associated with the clustering of large sets of sequences which are expected to contain several families. The strategy is based on the grouping of the sequences into families by using a dynamic threshold on a pairwise sequence similarity criterion. Routine clustering of large data sets can now be done very efficiently. The method developed here achieves a computational space reduction of about an order of magnitude over more traditional ones of all-versus-all comparisons. The outcome of this approach produces family groupings that reproduce closely already accepted biological results. Our work includes a parallel implementation for distributed memory multiprocessors with a dynamic scheduling strategy for performance optimization. AVAILABILITY: By anonymous ftp at ftp.ac.uma.es (/pub/ots/pCluster directory), or from our Web site http://www.cnb. uam.es/www/software/software_index.html CONTACT: ots@ac.uma.es

Autoren

O Trelles
MA Andrade
A Valencia
EL Zapata
JM Carazo

DOI

10.1093/bioinformatics/14.5.439

eISSN

1367-4811

ISSN

1367-4803

Ausgabe der Veröffentlichung

Zeitschrift

Bioinformatics

Sprache

Online publication date

1998

Paginierung

439 - 451

Datum der Veröffentlichung

1998

Status

Published

Herausgeber

Oxford University Press (OUP)

Herausgeber URL

http://dx.doi.org/10.1093/bioinformatics/14.5.439

Datum der Datenerfassung

2023

Titel

Computational space reduction and parallelization of a new clustering approach for large groups of sequences.

Ausgabe der Zeitschrift

Datenquelle: Crossref

Abstract

<h4>Motivation</h4>The explosive growth of the biological sequences databases stimulated by genome projects has modified the framework of several applications in the biological sequence analysis area. In most cases, this new scenario is characterized by studies on large sets of sequences, suggesting the need for effective and automatic methods for their clustering. A more effective clustering of the database could be followed by the application of common family analysis schemes to the groups so formed.<h4>Results</h4>In this work, we present a new strategy to reduce the computational cost associated with the clustering of large sets of sequences which are expected to contain several families. The strategy is based on the grouping of the sequences into families by using a dynamic threshold on a pairwise sequence similarity criterion. Routine clustering of large data sets can now be done very efficiently. The method developed here achieves a computational space reduction of about an order of magnitude over more traditional ones of all-versus-all comparisons. The outcome of this approach produces family groupings that reproduce closely already accepted biological results. Our work includes a parallel implementation for distributed memory multiprocessors with a dynamic scheduling strategy for performance optimization.<h4>Availability</h4>By anonymous ftp at ftp.ac.uma.es (/pub/ots/pCluster directory), or from our Web site http://www.cnb. uam.es/www/software/software_index.html<h4>Contact</h4>ots@ac.uma.es

Addresses

Computer Architecture Department, University of Malaga, 29017 Malaga, Spain. ots@ac.uma.es

Autoren

O Trelles
MA Andrade
A Valencia
EL Zapata
JM Carazo

DOI

10.1093/bioinformatics/14.5.439

eISSN

1367-4811

Externe Identifier

PubMed Identifier: 9682057

Open access

false

ISSN

1367-4803

Ausgabe der Veröffentlichung

Zeitschrift

Bioinformatics (Oxford, England)

Schlüsselwörter

Proteins
Cluster Analysis
Sequence Alignment
Computational Biology
Genome
Algorithms
Databases, Factual
Evaluation Studies as Topic

Sprache

eng

Medium

Paginierung

439 - 451

Datum der Veröffentlichung

1998

Status

Published

Datum der Datenerfassung

1998

Titel

Computational space reduction and parallelization of a new clustering approach for large groups of sequences.

Sub types

Comparative Study
Research Support, Non-U.S. Gov't
Journal Article

Ausgabe der Zeitschrift

Datenquelle: Europe PubMed Central

Abstract

MOTIVATION: The explosive growth of the biological sequences databases stimulated by genome projects has modified the framework of several applications in the biological sequence analysis area. In most cases, this new scenario is characterized by studies on large sets of sequences, suggesting the need for effective and automatic methods for their clustering. A more effective clustering of the database could be followed by the application of common family analysis schemes to the groups so formed. RESULTS: In this work, we present a new strategy to reduce the computational cost associated with the clustering of large sets of sequences which are expected to contain several families. The strategy is based on the grouping of the sequences into families by using a dynamic threshold on a pairwise sequence similarity criterion. Routine clustering of large data sets can now be done very efficiently. The method developed here achieves a computational space reduction of about an order of magnitude over more traditional ones of all-versus-all comparisons. The outcome of this approach produces family groupings that reproduce closely already accepted biological results. Our work includes a parallel implementation for distributed memory multiprocessors with a dynamic scheduling strategy for performance optimization. AVAILABILITY: By anonymous ftp at ftp.ac.uma.es (/pub/ots/pCluster directory), or from our Web site http://www.cnb. uam.es/www/software/software_index.html CONTACT: ots@ac.uma.es

Autoren

O Trelles
MA Andrade
A Valencia
EL Zapata
JM Carazo

Autoren-URL

https://www.ncbi.nlm.nih.gov/pubmed/9682057

DOI

10.1093/bioinformatics/14.5.439

ISSN

1367-4803

Ausgabe der Veröffentlichung

Zeitschrift

Bioinformatics

Schlüsselwörter

Algorithms
Cluster Analysis
Computational Biology
Databases, Factual
Evaluation Studies as Topic
Genome
Proteins
Sequence Alignment

Sprache

eng

Country

England

Paginierung

439 - 451

PII

btb050

Datum der Veröffentlichung

1998

Status

Published

Datum, an dem der Datensatz öffentlich gemacht wurde

1998

Titel

Computational space reduction and parallelization of a new clustering approach for large groups of sequences.

Sub types

Comparative Study
Journal Article
Research Support, Non-U.S. Gov't

Ausgabe der Zeitschrift

Datenquelle: PubMed

Beziehungen:

Eigentum von

Bioinformatik

Computational space reduction and parallelization of a new clustering approach for large groups of sequences

Werkzeuge