Computational space reduction and parallelization of a new clustering approach for large groups of sequences
- Publikationstyp:
- Zeitschriftenaufsatz
- Metadaten:
-
- Autoren
- O Trelles
- MA Andrade
- A Valencia
- EL Zapata
- JM Carazo
- Autoren-URL
- https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=fis-test-1&SrcAuth=WosAPI&KeyUT=WOS:000075133400008&DestLinkType=FullRecord&DestApp=WOS_CPL
- DOI
- 10.1093/bioinformatics/14.5.439
- Externe Identifier
- Clarivate Analytics Document Solution ID: 106MM
- PubMed Identifier: 9682057
- ISSN
- 1367-4803
- Ausgabe der Veröffentlichung
- 5
- Zeitschrift
- BIOINFORMATICS
- Paginierung
- 439 - 451
- Datum der Veröffentlichung
- 1998
- Status
- Published
- Titel
- Computational space reduction and parallelization of a new clustering approach for large groups of sequences
- Sub types
- Article
- Ausgabe der Zeitschrift
- 14
Datenquelle: Web of Science (Lite)
- Andere Metadatenquellen:
-
- Abstract
- <jats:title>Abstract</jats:title> <jats:p>MOTIVATION: The explosive growth of the biological sequences databases stimulated by genome projects has modified the framework of several applications in the biological sequence analysis area. In most cases, this new scenario is characterized by studies on large sets of sequences, suggesting the need for effective and automatic methods for their clustering. A more effective clustering of the database could be followed by the application of common family analysis schemes to the groups so formed. RESULTS: In this work, we present a new strategy to reduce the computational cost associated with the clustering of large sets of sequences which are expected to contain several families. The strategy is based on the grouping of the sequences into families by using a dynamic threshold on a pairwise sequence similarity criterion. Routine clustering of large data sets can now be done very efficiently. The method developed here achieves a computational space reduction of about an order of magnitude over more traditional ones of all-versus-all comparisons. The outcome of this approach produces family groupings that reproduce closely already accepted biological results. Our work includes a parallel implementation for distributed memory multiprocessors with a dynamic scheduling strategy for performance optimization. AVAILABILITY: By anonymous ftp at ftp.ac.uma.es (/pub/ots/pCluster directory), or from our Web site http://www.cnb. uam.es/www/software/software_index.html CONTACT: ots@ac.uma.es</jats:p>
- Autoren
- O Trelles
- MA Andrade
- A Valencia
- EL Zapata
- JM Carazo
- DOI
- 10.1093/bioinformatics/14.5.439
- eISSN
- 1367-4811
- ISSN
- 1367-4803
- Ausgabe der Veröffentlichung
- 5
- Zeitschrift
- Bioinformatics
- Sprache
- en
- Online publication date
- 1998
- Paginierung
- 439 - 451
- Datum der Veröffentlichung
- 1998
- Status
- Published
- Herausgeber
- Oxford University Press (OUP)
- Herausgeber URL
- http://dx.doi.org/10.1093/bioinformatics/14.5.439
- Datum der Datenerfassung
- 2023
- Titel
- Computational space reduction and parallelization of a new clustering approach for large groups of sequences.
- Ausgabe der Zeitschrift
- 14
Datenquelle: Crossref
- Abstract
- <h4>Motivation</h4>The explosive growth of the biological sequences databases stimulated by genome projects has modified the framework of several applications in the biological sequence analysis area. In most cases, this new scenario is characterized by studies on large sets of sequences, suggesting the need for effective and automatic methods for their clustering. A more effective clustering of the database could be followed by the application of common family analysis schemes to the groups so formed.<h4>Results</h4>In this work, we present a new strategy to reduce the computational cost associated with the clustering of large sets of sequences which are expected to contain several families. The strategy is based on the grouping of the sequences into families by using a dynamic threshold on a pairwise sequence similarity criterion. Routine clustering of large data sets can now be done very efficiently. The method developed here achieves a computational space reduction of about an order of magnitude over more traditional ones of all-versus-all comparisons. The outcome of this approach produces family groupings that reproduce closely already accepted biological results. Our work includes a parallel implementation for distributed memory multiprocessors with a dynamic scheduling strategy for performance optimization.<h4>Availability</h4>By anonymous ftp at ftp.ac.uma.es (/pub/ots/pCluster directory), or from our Web site http://www.cnb. uam.es/www/software/software_index.html<h4>Contact</h4>ots@ac.uma.es
- Addresses
- Computer Architecture Department, University of Malaga, 29017 Malaga, Spain. ots@ac.uma.es
- Autoren
- O Trelles
- MA Andrade
- A Valencia
- EL Zapata
- JM Carazo
- DOI
- 10.1093/bioinformatics/14.5.439
- eISSN
- 1367-4811
- Externe Identifier
- PubMed Identifier: 9682057
- Open access
- false
- ISSN
- 1367-4803
- Ausgabe der Veröffentlichung
- 5
- Zeitschrift
- Bioinformatics (Oxford, England)
- Schlüsselwörter
- Proteins
- Cluster Analysis
- Sequence Alignment
- Computational Biology
- Genome
- Algorithms
- Databases, Factual
- Evaluation Studies as Topic
- Sprache
- eng
- Medium
- Paginierung
- 439 - 451
- Datum der Veröffentlichung
- 1998
- Status
- Published
- Datum der Datenerfassung
- 1998
- Titel
- Computational space reduction and parallelization of a new clustering approach for large groups of sequences.
- Sub types
- Comparative Study
- Research Support, Non-U.S. Gov't
- Journal Article
- Ausgabe der Zeitschrift
- 14
Datenquelle: Europe PubMed Central
- Abstract
- MOTIVATION: The explosive growth of the biological sequences databases stimulated by genome projects has modified the framework of several applications in the biological sequence analysis area. In most cases, this new scenario is characterized by studies on large sets of sequences, suggesting the need for effective and automatic methods for their clustering. A more effective clustering of the database could be followed by the application of common family analysis schemes to the groups so formed. RESULTS: In this work, we present a new strategy to reduce the computational cost associated with the clustering of large sets of sequences which are expected to contain several families. The strategy is based on the grouping of the sequences into families by using a dynamic threshold on a pairwise sequence similarity criterion. Routine clustering of large data sets can now be done very efficiently. The method developed here achieves a computational space reduction of about an order of magnitude over more traditional ones of all-versus-all comparisons. The outcome of this approach produces family groupings that reproduce closely already accepted biological results. Our work includes a parallel implementation for distributed memory multiprocessors with a dynamic scheduling strategy for performance optimization. AVAILABILITY: By anonymous ftp at ftp.ac.uma.es (/pub/ots/pCluster directory), or from our Web site http://www.cnb. uam.es/www/software/software_index.html CONTACT: ots@ac.uma.es
- Autoren
- O Trelles
- MA Andrade
- A Valencia
- EL Zapata
- JM Carazo
- Autoren-URL
- https://www.ncbi.nlm.nih.gov/pubmed/9682057
- DOI
- 10.1093/bioinformatics/14.5.439
- ISSN
- 1367-4803
- Ausgabe der Veröffentlichung
- 5
- Zeitschrift
- Bioinformatics
- Schlüsselwörter
- Algorithms
- Cluster Analysis
- Computational Biology
- Databases, Factual
- Evaluation Studies as Topic
- Genome
- Proteins
- Sequence Alignment
- Sprache
- eng
- Country
- England
- Paginierung
- 439 - 451
- PII
- btb050
- Datum der Veröffentlichung
- 1998
- Status
- Published
- Datum, an dem der Datensatz öffentlich gemacht wurde
- 1998
- Titel
- Computational space reduction and parallelization of a new clustering approach for large groups of sequences.
- Sub types
- Comparative Study
- Journal Article
- Research Support, Non-U.S. Gov't
- Ausgabe der Zeitschrift
- 14
Datenquelle: PubMed
- Beziehungen:
- Eigentum von