Title: Using traveling salesman problem algorithms for evolutionary tree construction Author:Korostensky, Chantal Contributor:Gonnet, Gaston H Description:
Motivation: The construction of evolutionary trees is one of the major problems in computational biology, mainly due to its complexity. Results: We present a new tree construction method that constructs a tree with minimum score for a given set of sequences, where the score is the amount of evolution measured in PAM distances. To do this, the problem of tree construction is reduced to the Traveling Salesman Problem (TSP). The input for the TSP algorithm are the pairwise distances of the sequences and the output is a circular tour through the optimal, unknown tree plus the minimum score of the tree. The circular order and the score can be used to construct the topology of the optimal tree. Our method can be used for any scoring function that correlates to the amount of changes along the branches of an evolutionary tree, for instance it could also be used for parsimony scores, but it cannot be used for least squares fit of distances. A TSP solution reduces the space of all possible trees to \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \(2^{n}\) \end{document}. Using this order, we can guarantee that we reconstruct a correct evolutionary tree if the absolute value of the error for each distance measurement is smaller than \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \(\frac{x}{2}\) \end{document}, where \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \(x\) \end{document}is the length of the shortest edge in the tree. For data sets with large errors, a dynamic programming approach is used to reconstruct the tree. Finally simulations and experiments with real data are shown. Availability: The software may be used via our cbrg server at http://cbrg.inf.ethz.ch/MultAlign. Contact: chantal.roth@nobilitas.com Supplementary information: An html and postscript version of this paper is available at http://chantal.nobilitas.com/(Papers section). * To whom correspondence should be addressed
Linked entry:
Bioinformatics. - 2000/16/7/619-627
Host document:Bioinformatics Identifier:
10.1093/bioinformatics/16.7.619 (DOI)
Title: OMA Browser—Exploring orthologous relations across 352 complete genomes Author:Schneider, Adrian Contributor:Dessimoz, Christophe; Gonnet, Gaston H Subject:APPLICATIONS NOTES Description:
Motivation: Inference of the evolutionary relation between proteins, in particular the identification of orthologs, is a central problem in comparative genomics. Several large-scale efforts with various methodologies and scope tackle this problem, including OMA (the Orthologous MAtrix project). Results: Based on the results of the OMA project, we introduce here the OMA Browser, a web-based tool allowing the exploration of orthologous relations over 352 complete genomes. Orthologs can be viewed as groups across species, but also at the level of sequence pairs, allowing the distinction among one-to-one, one-to-many and many-to-many orthologs. Availability: http://omabrowser.org Contact: schneadr@inf.ethz.ch
Linked entry:
Bioinformatics. - 2007/23/16/2180-2182
Host document:Bioinformatics Identifier:
10.1093/bioinformatics/btm295 (DOI)
Title: OMA 2011: orthology inference among 1000 complete genomes Author:Altenhoff, Adrian M Contributor:Schneider, Adrian; Gonnet, Gaston H; Dessimoz, Christophe Subject:Articles Description:
OMA (Orthologous MAtrix) is a database that identifies orthologs among publicly available, complete genomes. Initiated in 2004, the project is at its 11th release. It now includes 1000 genomes, making it one of the largest resources of its kind. Here, we describe recent developments in terms of species covered; the algorithmic pipeline—in particular regarding the treatment of alternative splicing, and new features of the web (OMA Browser) and programming interface (SOAP API). In the second part, we review the various representations provided by OMA and their typical applications. The database is publicly accessible at http://omabrowser.org
Linked entry:
Nucleic Acids Research. - 2011/39//D289-D294
Host document:Nucleic Acids Research Identifier:
10.1093/nar/gkq1238 (DOI)
Title: A Repetition Test for Pseudo-Random Number Generators Author:Gil, Manuel Contributor:Gonnet, Gaston H; Petersen, Wesley P Description:
A new statistical test for uniform pseudo-random number generators (PRNGs) is presented. The idea is that a sequence of pseudo-random numbers should have numbers reappear with a certain probability. The expectation time that a repetition occurs provides the metric for the test. For linear congruential generators (LCGs) failure can be shown theoretically. Empirical test results for a number of commonly used PRNGs are reported, showing that some PRNGs considered to have good statistical properties fail. A sample implementation of the test is provided over the Internet
Linked entry:
Monte Carlo Methods and Applications. - 2006/12/5/385-393
Host document:Monte Carlo Methods and Applications Identifier:
10.1515/156939606779329017 (DOI)
Several versions
OMA 2011: orthology inference among 1000 complete genomes
Altenhoff, Adrian M, Schneider, Adrian, Gonnet, Gaston H, Dessimoz, Christophe
Nucleic acids research, January 2011, Vol.39(Database issue), pp.D289-94
[Peer Reviewed Journal]
Title: Surprising results on phylogenetic tree building methods based on molecular sequences Author:Gonnet, Gastón H Subject:Phylogenetic Trees ; Tree Building Methods ; Maximum Likelihood ; Distance Measures ; Multiple Sequence Alignments ; Substitution Matrices ; Molecular Sequences ; Biology Description:
Background We analyze phylogenetic tree building methods from molecular sequences (PTMS). These are methods which base their construction solely on sequences, coding DNA or amino acids. Results Our first result is a statistically significant evaluation of 176 PTMSs done by comparing trees derived from 193138 orthologous groups of proteins using a new measure of quality between trees. This new measure, called the Intra measure, is very consistent between different groups of species and strong in the sense that it separates the methods with high confidence. The second result is the comparison of the trees against trees derived from accepted taxonomies, the Taxon measure. We consider the NCBI taxonomic classification and their derived topologies as the most accepted biological consensus on phylogenies, which are also available in electronic form. The correlation between the two measures is remarkably high, which supports both measures simultaneously. Conclusions The big surprise of the evaluation is that the maximum likelihood methods do not score well, minimal evolution distance methods over MSA-induced alignments score consistently better. This comparison also allows us to rank different components of the tree building methods, like MSAs, substitution matrices, ML tree builders, distance methods, etc. It is also clear that there is a difference between Metazoa and the rest, which points out to evolution leaving different molecular traces. We also think that these measures of quality of trees will motivate the design of new PTMSs as it is now easier to evaluate them with certainty.
Is part of:
BMC Bioinformatics, June 2012
Identifier:
1471-2105 (ISSN); 1471-2105 (E-ISSN); 10.3929/ethz-b-000055626 (DOI)
Several versions
Surprising results on phylogenetic tree building methods based on molecular sequences
Gonnet Gaston H
BMC bioinformatics, 01 June 2012, Vol.13(1), p.148
[Peer Reviewed Journal]
Title: Surprising results on phylogenetic tree building methods based on molecular sequences Author:Gonnet, Gaston H Subject:Phylogeny ; Sequence Analysis, DNA ; Sequence Analysis, Protein Description:
We analyze phylogenetic tree building methods from molecular sequences (PTMS). These are methods which base their construction solely on sequences, coding DNA or amino acids. Our first result is a statistically significant evaluation of 176 PTMSs done by comparing trees derived from 193138 orthologous groups of proteins using a new measure of quality between trees. This new measure, called the Intra measure, is very consistent between different groups of species and strong in the sense that it separates the methods with high confidence. The second result is the comparison of the trees against trees derived from accepted taxonomies, the Taxon measure. We consider the NCBI taxonomic classification and their derived topologies as the most accepted biological consensus on phylogenies, which are also available in electronic form. The correlation between the two measures is remarkably high, which supports both measures simultaneously. The big surprise of the evaluation is that the maximum likelihood methods do not score well, minimal evolution distance methods over MSA-induced alignments score consistently better. This comparison also allows us to rank different components of the tree building methods, like MSAs, substitution matrices, ML tree builders, distance methods, etc. It is also clear that there is a difference between Metazoa and the rest, which points out to evolution leaving different molecular traces. We also think that these measures of quality of trees will motivate the design of new PTMSs as it is now easier to evaluate them with certainty.
Is part of:
BMC bioinformatics, 27 June 2012, Vol.13, pp.148
Identifier:
1471-2105 (E-ISSN); 22738078 Version (PMID)
Several versions
Inferring Hierarchical Orthologous Groups from Orthologous Gene Pairs
Altenhoff, Adrian M, Gil, Manuel, Gonnet, Gaston H, Dessimoz, Christophe
PLoS ONE, 2013, Vol.8(1)
[Peer Reviewed Journal]