Escolar Documentos
Profissional Documentos
Cultura Documentos
- distance
- maximum likelihood
Popular programs:
PHYLIP (phylogenetic inference package – J Felsenstein)
PAUP (phylogenetic analysis using parsimony – Sinauer Assoc
sequence A sequence C
unrooted tree
sequence B sequence D
No
No
Because all possible trees are examined, method is best suited for sequences
that are quite similar + for small number of sequences.
It is guaranteed to find the best tree.
G A A A A A
G A A A A A
Seq2 Seq4 Seq3 Seq4 Seq4 Seq3
Informative sites: (1) must favor one tree over another (site 5 is informative, but
sites 1, 6, 8 are not).
(2) To be informative, a site must also have the same sequence character in at
least two genomes (only sites 5, 7, and 9 are informative according to this rule).
Combining sites 5, 7, and 9, the left tree is the best tree for these 4 sequences.
8. Lecture WS 2003/04 Bioinformatics III 7
Where maximum parsimony fails
Parsimony can give misleading information when rates of sequence change vary
in the different branches of a tree that are represented by the sequence data.
Seq1 Seq4
G G Seq1 Seq2
G A
G A
Seq4 Seq3
Goal of distance methods: identify tree that correctly positions neighbors and that
also has branch lengths that reproduce the original data as closely as possible.
1 2
B D
8. Lecture WS 2003/04 Bioinformatics III 10
Maximum likelihood approach
Method uses probability calculations to find a tree that best accounts for the
variation in a set of sequences.
Similar to maximum parsimony method in that analysis is performed on each
column of a multiple sequence alignment. All trees are considered.
Because the rate of appearance of new mutations is very small, the more
mutations are needed to fit a tree to the data, the less likely that tree.
3 types of analysis:
- maximum likelihood (ML) analysis of nucleotide data
- maximum parsimony (MP) analysis of nucleotide data
- MP of the amino acid data
Rokas et al. Nature 425, 798 (2003)
E.g. the validity of the branch arrangement in a predicted phylogenetic tree can
be tested by resampling columns in a multiple sequence alignment to create
many new alignments.
Results from the commonly used genes actin (g), hsp70 (h), β-tubulin (i), RNA
polymerase II (j) elongation factor 1-α (k) and 18S rDNA (l). Numbers above
branches indicate bootstrap values (ML on nucleotides/MP on nucleotides).
→ Same problem of alternative topologies as before.
The distribution of bootstrap values for the eight prevalent branches recovered
from 106 single-gene analyses highlights the pervasive conflict among single-
gene analyses. a, Majority-rule consensus tree of the 106 ML trees derived from
single-gene analyses. Across all analyses, there were eight commonly observed
branches; the five branches in the consensus tree (numbers 1–5; a) and the three
branches (numbers 6–8) shown in b.
c, For each of the eight branches, the ranked distribution of per cent bootstrap values recovered from
the three analyses of 106 genes is shown. Results from ML (blue) and MP (red) analyses of
nucleotide data sets, and MP analyses of amino acid data sets (black), are shown. For each branch,
the mean bootstrap value and 95% confidence intervals from the ML analyses and the percentage of
ML trees supporting this branch (in parentheses) are indicated below each graph. Although the
ranked distributions of bootstrap values from the three analyses are remarkably similar for most
branches, on a gene-by-gene basis there is no tight correspondence between bootstrap values from
ML and MP analyses Rokas et al. Nature 425, 798 (2003)
8. Lecture WS 2003/04 Bioinformatics III 21
How different are the trees?
The degree of conflict among the trees could be relatively minor.
Determine how many taxa (genes) would need to be removed to make two
trees congruent (deckungsgleich).
Many factors were checked that could lead to incongruence between single-gene
phylogenies:
- outgroup choice
repeat all analyses without C. albicans
}
- number of variable sites significantly correlated with
- number of parsimony-informative sites bootstrap values for some
- gene size branches
- rate of evolution
- nucleotide composition
- base compositional bias
- genome location
- gene ontology
Can single gene trees be concatenated into one large data set?
→ At what size did the data set arrive at the species tree?
branch 3
branch 5
This indicates that nucleotides in genes have not evolved independently (because
when using complete genes more than 20 genes are necessary to generate single
tree).
This lecture rounds up the first block of the Bioinformatics III course on
genome structure, rearrangements etc.
Next block until Christmas: gene finding, SNPs, functional genomics
8. Lecture WS 2003/04 Bioinformatics III 31