Automated genome sequence analysis and annotation

Publication type:

Journal article

Metadata:

Autoren

MA Andrade
NP Brown
C Leroy
S Hoersch
A de Daruvar
C Reich
A Franchini
J Tamames
A Valencia
C Ouzounis
C Sander

Autoren-URL

https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=fis-test-1&SrcAuth=WosAPI&KeyUT=WOS:000081230200008&DestLinkType=FullRecord&DestApp=WOS_CPL

DOI

10.1093/bioinformatics/15.5.391

eISSN

1460-2059

Externe Identifier

Clarivate Analytics Document Solution ID: 212RN
PubMed Identifier: 10366660

ISSN

1367-4803

Ausgabe der Veröffentlichung

Zeitschrift

BIOINFORMATICS

Paginierung

391 - 412

Datum der Veröffentlichung

1999

Status

Published

Titel

Automated genome sequence analysis and annotation

Sub types

Article

Ausgabe der Zeitschrift

Data source: Web of Science (Lite)

Other metadata sources:

Abstract

Abstract MOTIVATION: Large-scale genome projects generate a rapidly increasing number of sequences, most of them biochemically uncharacterized. Research in bioinformatics contributes to the development of methods for the computational characterization of these sequences. However, the installation and application of these methods require experience and are time consuming. RESULTS: We present here an automatic system for preliminary functional annotation of protein sequences that has been applied to the analysis of sets of sequences from complete genomes, both to refine overall performance and to make new discoveries comparable to those made by human experts. The GeneQuiz system includes a Web-based browser that allows examination of the evidence leading to an automatic annotation and offers additional information, views of the results, and links to biological databases that complement the automatic analysis. System structure and operating principles concerning the use of multiple sequence databases, underlying sequence analysis tools, lexical analyses of database annotations and decision criteria for functional assignments are detailed. The system makes automatic quality assessments of results based on prior experience with the underlying sequence analysis tools; overall error rates in functional assignment are estimated at 2.5-5% for cases annotated with highest reliability ('clear' cases). Sources of over-interpretation of results are discussed with proposals for improvement. A conservative definition for reporting 'new findings' that takes account of database maturity is presented along with examples of possible kinds of discoveries (new function, family and superfamily) made by the system. System performance in relation to sequence database coverage, database dynamics and database search methods is analysed, demonstrating the inherent advantages of an integrated automatic approach using multiple databases and search methods applied in an objective and repeatable manner. AVAILABILITY: The GeneQuiz system is publicly available for analysis of protein sequences through a Web server at http://www.sander.ebi.ac. uk/gqsrv/submit

Autoren

MA Andrade
NP Brown
C Leroy
S Hoersch
A de Daruvar
C Reich
A Franchini
J Tamames
A Valencia
C Ouzounis
C Sander

DOI

10.1093/bioinformatics/15.5.391

eISSN

1367-4811

ISSN

1367-4803

Ausgabe der Veröffentlichung

Zeitschrift

Bioinformatics

Sprache

Online publication date

1999

Paginierung

391 - 412

Datum der Veröffentlichung

1999

Status

Published

Herausgeber

Oxford University Press (OUP)

Herausgeber URL

http://dx.doi.org/10.1093/bioinformatics/15.5.391

Datum der Datenerfassung

2023

Titel

Automated genome sequence analysis and annotation.

Ausgabe der Zeitschrift

Data source: Crossref

Abstract

<h4>Motivation</h4>Large-scale genome projects generate a rapidly increasing number of sequences, most of them biochemically uncharacterized. Research in bioinformatics contributes to the development of methods for the computational characterization of these sequences. However, the installation and application of these methods require experience and are time consuming.<h4>Results</h4>We present here an automatic system for preliminary functional annotation of protein sequences that has been applied to the analysis of sets of sequences from complete genomes, both to refine overall performance and to make new discoveries comparable to those made by human experts. The GeneQuiz system includes a Web-based browser that allows examination of the evidence leading to an automatic annotation and offers additional information, views of the results, and links to biological databases that complement the automatic analysis. System structure and operating principles concerning the use of multiple sequence databases, underlying sequence analysis tools, lexical analyses of database annotations and decision criteria for functional assignments are detailed. The system makes automatic quality assessments of results based on prior experience with the underlying sequence analysis tools; overall error rates in functional assignment are estimated at 2.5-5% for cases annotated with highest reliability ('clear' cases). Sources of over-interpretation of results are discussed with proposals for improvement. A conservative definition for reporting 'new findings' that takes account of database maturity is presented along with examples of possible kinds of discoveries (new function, family and superfamily) made by the system. System performance in relation to sequence database coverage, database dynamics and database search methods is analysed, demonstrating the inherent advantages of an integrated automatic approach using multiple databases and search methods applied in an objective and repeatable manner.<h4>Availability</h4>The GeneQuiz system is publicly available for analysis of protein sequences through a Web server at http://www.sander.ebi.ac. uk/gqsrv/submit

Addresses

European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK.

Autoren

MA Andrade
NP Brown
C Leroy
S Hoersch
A de Daruvar
C Reich
A Franchini
J Tamames
A Valencia
C Ouzounis
C Sander

DOI

10.1093/bioinformatics/15.5.391

eISSN

1367-4811

Externe Identifier

PubMed Identifier: 10366660

Open access

false

ISSN

1367-4803

Ausgabe der Veröffentlichung

Zeitschrift

Bioinformatics (Oxford, England)

Schlüsselwörter

Humans
Proteins
Sequence Analysis
Amino Acid Sequence
Automation
Computer Systems
Software
Molecular Sequence Data
Databases, Factual

Sprache

eng

Medium

Paginierung

391 - 412

Datum der Veröffentlichung

1999

Status

Published

Datum der Datenerfassung

1999

Titel

Automated genome sequence analysis and annotation.

Sub types

Journal Article

Ausgabe der Zeitschrift

Data source: Europe PubMed Central

Abstract

MOTIVATION: Large-scale genome projects generate a rapidly increasing number of sequences, most of them biochemically uncharacterized. Research in bioinformatics contributes to the development of methods for the computational characterization of these sequences. However, the installation and application of these methods require experience and are time consuming. RESULTS: We present here an automatic system for preliminary functional annotation of protein sequences that has been applied to the analysis of sets of sequences from complete genomes, both to refine overall performance and to make new discoveries comparable to those made by human experts. The GeneQuiz system includes a Web-based browser that allows examination of the evidence leading to an automatic annotation and offers additional information, views of the results, and links to biological databases that complement the automatic analysis. System structure and operating principles concerning the use of multiple sequence databases, underlying sequence analysis tools, lexical analyses of database annotations and decision criteria for functional assignments are detailed. The system makes automatic quality assessments of results based on prior experience with the underlying sequence analysis tools; overall error rates in functional assignment are estimated at 2.5-5% for cases annotated with highest reliability ('clear' cases). Sources of over-interpretation of results are discussed with proposals for improvement. A conservative definition for reporting 'new findings' that takes account of database maturity is presented along with examples of possible kinds of discoveries (new function, family and superfamily) made by the system. System performance in relation to sequence database coverage, database dynamics and database search methods is analysed, demonstrating the inherent advantages of an integrated automatic approach using multiple databases and search methods applied in an objective and repeatable manner. AVAILABILITY: The GeneQuiz system is publicly available for analysis of protein sequences through a Web server at http://www.sander.ebi.ac. uk/gqsrv/submit

Autoren

MA Andrade
NP Brown
C Leroy
S Hoersch
A de Daruvar
C Reich
A Franchini
J Tamames
A Valencia
C Ouzounis
C Sander

Autoren-URL

https://www.ncbi.nlm.nih.gov/pubmed/10366660

DOI

10.1093/bioinformatics/15.5.391

ISSN

1367-4803

Ausgabe der Veröffentlichung

Zeitschrift

Bioinformatics

Schlüsselwörter

Amino Acid Sequence
Automation
Computer Systems
Databases, Factual
Humans
Molecular Sequence Data
Proteins
Sequence Analysis
Software

Sprache

eng

Country

England

Paginierung

391 - 412

PII

btc053

Datum der Veröffentlichung

1999

Status

Published

Datum, an dem der Datensatz öffentlich gemacht wurde

1999

Titel

Automated genome sequence analysis and annotation.

Sub types

Journal Article

Ausgabe der Zeitschrift

Data source: PubMed

Beziehungen:

Property of

Bioinformatik

Automated genome sequence analysis and annotation

Tools