Approach 2: Phylogenetic trees and their like
Spaces of phylogenetic trees
3Billera, Holmes, Vogtmann: Geometry of the space of Phylogenetic trees, Adv. in Appl. Math, 2001.
23/50
Figure: Figure borrowed from 3
I Billera et al. study the metric geometry of spaces of phylogenetic trees3, which describe genetic development of species.
I Rooted trees with labeled leaves (so ordered trees) and length attributes on all edges.
I Metric geometry existence and uniqueness of geodesics and dataset
centroids, computation of centroids of a set of phylogenetic trees.
I Centroid computation is NP hard
I Model applies directly to leaf-labeled trees with constant labels sets and edge length attributes
Approach 2: Phylogenetic trees and their like
Spaces of phylogenetic trees
3Billera, Holmes, Vogtmann: Geometry of the space of Phylogenetic trees, Adv. in Appl. Math, 2001.
Figure: Figure borrowed from 3
I Billera et al. study the metric geometry of spaces of phylogenetic trees3, which describe genetic development of species.
I Rooted trees with labeled leaves (so ordered trees) and length attributes on all edges.
centroids, computation of centroids of a set of phylogenetic trees.
I Centroid computation is NP hard
I Model applies directly to leaf-labeled trees with constant labels sets and edge length attributes
Approach 2: Phylogenetic trees and their like
Spaces of phylogenetic trees
3Billera, Holmes, Vogtmann: Geometry of the space of Phylogenetic trees, Adv. in Appl. Math, 2001.
23/50
Figure: Figure borrowed from 3
I Billera et al. study the metric geometry of spaces of phylogenetic trees3, which describe genetic development of species.
I Rooted trees with labeled leaves (so ordered trees) and length attributes on all edges.
I Metric geometry existence and uniqueness of geodesics and dataset
centroids, computation of centroids of a set of phylogenetic trees.
I Centroid computation is NP hard
I Model applies directly to leaf-labeled trees with constant labels sets and edge length attributes
Approach 2: Phylogenetic trees and their like
Spaces of phylogenetic trees
3Billera, Holmes, Vogtmann: Geometry of the space of Phylogenetic trees, Adv. in Appl. Math, 2001.
Figure: Figure borrowed from 3
I Billera et al. study the metric geometry of spaces of phylogenetic trees3, which describe genetic development of species.
I Rooted trees with labeled leaves (so ordered trees) and length attributes on all edges.
I Metric geometry existence and uniqueness of geodesics and dataset
centroids, computation of centroids of a set of phylogenetic trees.
I Centroid computation is NP hard
Approach 2: Phylogenetic trees and their like
Spaces of phylogenetic trees
3Billera, Holmes, Vogtmann: Geometry of the space of Phylogenetic trees, Adv. in Appl. Math, 2001.
23/50
Figure: Figure borrowed from 3
I Billera et al. study the metric geometry of spaces of phylogenetic trees3, which describe genetic development of species.
I Rooted trees with labeled leaves (so ordered trees) and length attributes on all edges.
I Metric geometry existence and uniqueness of geodesics and dataset
centroids, computation of centroids of a set of phylogenetic trees.
I Centroid computation is NP hard
I Model applies directly to leaf-labeled trees with constant labels sets and edge length attributes
Approach 2: Phylogenetic trees and their like
Modeling phylogenetic trees
I Fix a set of n leaf labels, e.g. {human, gorilla, orangutan, computer scientist}, or {1,2,3,4}.
evolutionary length
specified.
1 2 3 4
0
1 2 4 3
0
2 4 3 1
0
FIG. 6. Three pictures of the same tree.
Approach 2: Phylogenetic trees and their like
Modeling phylogenetic trees
I Fix a set of n leaf labels, e.g. {human, gorilla, orangutan, computer scientist}, or {1,2,3,4}.
I Build the binary tree with the corresponding leaves
I Attach lenghts∈R+= [0,∞[to all branches, representing evolutionary length
specified.
1 2 3 4
0
1 2 4 3
0
2 4 3 1
0
FIG. 6. Three pictures of the same tree.
Figure: Figure borrowed from Billera et al 24/50
Modeling phylogenetic trees
I Fix a set of n leaf labels, e.g. {human, gorilla, orangutan, computer scientist}, or {1,2,3,4}.
I Build the binary tree with the corresponding leaves
I Attach lenghts∈R+= [0,∞[to all branches, representing evolutionary length
specified.
1 2 3 4
0
1 2 4 3
0
2 4 3 1
0
FIG. 6. Three pictures of the same tree.
Approach 2: Phylogenetic trees and their like
Modeling phylogenetic trees
24/50
This gives a space of phylogenetic trees:
I For each type of binary tree with the given labels, we form a quadrant RN+⊂RN
(N =] branches in binary tree with n leaves)
I Now for a point x ∈RN+ the
coordinate xi ≥0 is the length of the ith branch
I Glue the quadrants together along the natural branch collapses
human gorilla orangutan computer scientist
Approach 2: Phylogenetic trees and their like
Modeling phylogenetic trees
This gives a space of phylogenetic trees:
I For each type of binary tree with the given labels, we form a quadrant RN+⊂RN
(N =] branches in binary tree with n leaves)
I Now for a point x ∈RN+ the
coordinate xi ≥0 is the length of the ith branch
human gorilla orangutan computer scientist
Approach 2: Phylogenetic trees and their like
Modeling phylogenetic trees
24/50
This gives a space of phylogenetic trees:
I For each type of binary tree with the given labels, we form a quadrant RN+⊂RN
(N =] branches in binary tree with n leaves)
I Now for a point x ∈RN+ the
coordinate xi ≥0 is the length of the ith branch
I Glue the quadrants together along the natural branch collapses
human gorilla orangutan computer scientist
The space of phylogenetic trees
geometry of tr ee space 741
1 2 3 4 1 2 3 4 1 2 4 3
43 2 1
g
a
a b b
c
c
d
d
e e
f f
1324
4 3 1 2
4 2 1 3 1432
3 2 1 4 4 1 3 2
24313241 13244132
1 3 2 4
g h
h i
j k
l
k
l i
j
FIG. 4. Cubical tiling of M0 5, where the arrows indicate oriented identifications.
4 bil l er a, hol mes, and vogtmann
= (0,1) (1,1) =
= (0,0) (1,0) =
12 34 1
12 34
12 34 1 1 2 34
1 1
12 34 0.6 0.3 (.3,.6) = 0
0
0
0 0
FIG. 8. The 2-dimensional quadrant corresponding to a metric 4-tree.
Figure: Figures shamelessly copied from Billera, Holmes, Vogtmann:
Geometry of the space of Phylogenetic Trees
Approach 2: Phylogenetic trees and their like
The really cool thing about the space of phylogenetic trees!
Theorem (Billera, Holmes, Vogtmann)
The space of phylogenetic trees is a CAT(0) space
What does that mean?
26/50
The really cool thing about the space of phylogenetic trees!
Theorem (Billera, Holmes, Vogtmann)
The space of phylogenetic trees is a CAT(0) space
What does that mean?
Timeout: CAT(0)-spaces, our new favorite statistical playground?
Timeout: CAT ( 0 ) -spaces,
our new favorite statistical playground?
27/50
Timeout: CAT(0)-spaces, our new favorite statistical playground?
Statistics in metric spaces?
Recall that a metric space is a space X of points with a distance measure d such that
I d(x,y) =d(y,x)
I d(x,y) =0 if and only if x=y
I d(x,y) +d(y,z)≥d(x,z) (triangle inequality)
Timeout: CAT(0)-spaces, our new favorite statistical playground?
Statistics in metric spaces?
Recall that a metric space is a space X of points with a distance measure d such that
I d(x,y) =d(y,x)
I d(x,y) =0 if and only if x=y
I d(x,y) +d(y,z)≥d(x,z) (triangle inequality)
I In order to formulate statistics, we want to have geodesics.
What does the word geodesic even mean in a metric space?
28/50
Timeout: CAT(0)-spaces, our new favorite statistical playground?
Geodesics in metric spaces
I Let (X,d)be a metric space. The length of a curve c: [a,b]→X is
l(c) =supa=t0≤t1≤...≤tn=b n−1
X
i=0
d(c(ti,ti+1)).
I (X,d) is a geodesic space if all pairs x,y can be joined by a geodesic.
Timeout: CAT(0)-spaces, our new favorite statistical playground?
Geodesics in metric spaces
I Let (X,d)be a metric space. The length of a curve c: [a,b]→X is
l(c) =supa=t0≤t1≤...≤tn=b n−1
X
i=0
d(c(ti,ti+1)).
I A geodesic from x to y in X is a path c: [a,b]→X such that c(a) =x,c(b) =y and l(c) =d(x,y).
I (X,d) is a geodesic space if all pairs x,y can be joined by a geodesic.
29/50
Geodesics in metric spaces
I Let (X,d)be a metric space. The length of a curve c: [a,b]→X is
l(c) =supa=t0≤t1≤...≤tn=b n−1
X
i=0
d(c(ti,ti+1)).
I A geodesic from x to y in X is a path c: [a,b]→X such that c(a) =x,c(b) =y and l(c) =d(x,y).
I (X,d) is a geodesic space if all pairs x,y can be joined by a geodesic.
Timeout: CAT(0)-spaces, our new favorite statistical playground?
Curvature in metric spaces
I A CAT(0) space is a metric space in which geodesic triangles are thinner than for their comparison triangles in the plane;
that is, d(x,a)≤d(¯x,¯a).
I A space has non-positive curvature if it is locally CAT(0).
30/50
Curvature in metric spaces
I A CAT(0) space is a metric space in which geodesic triangles are thinner than for their comparison triangles in the plane;
that is, d(x,a)≤d(¯x,¯a).
I A space has non-positive curvature if it is locally CAT(0).
Timeout: CAT(0)-spaces, our new favorite statistical playground?
Curvature in metric spaces
Example geometry of tr ee space 741
12 3 4 12 3 4 12 4 3 43 2 1 g
a
a b b
c
c
d
d
e e
f f
1324
43 1 2 4 2 1 3 1432
3 2 1 4 4 1 3 2
24313241 13244132
1 3 2 4
g h
h i
j k
l
k
l i
j FIG. 4. Cubical tiling of M0 5, where the arrows indicate oriented identifications.
Figure: CAT(0)spaces.
Theorem (see e.g. Bridson-Haeiger)
Let(X,d) be a CAT(0)space; then all pairs of points have a unique geodesic joining them.
30/50
Curvature in metric spaces
Example geometry of tr ee space 741
12 3 4 12 3 4 12 4 3 43 2 1 g
a
a b b
c
c
d
d
e e
f f
1324
43 1 2 4 2 1 3 1432
3 2 1 4 4 1 3 2
24313241 13244132
1 3 2 4
g h
h i
j k
l
k
l i
j FIG. 4. Cubical tiling of M0 5, where the arrows indicate oriented identifications.
Figure: CAT(0)spaces.
Theorem (see e.g. Bridson-Haeiger)
Let(X,d) be a CAT(0) space; then all pairs of points have a unique geodesic joining them.
Timeout: CAT(0)-spaces, our new favorite statistical playground?
Curvature in metric spaces
Subsets{x1, . . . ,xn} in CAT(0)-spaces Theorem
4 ...have unique means, dened as argminPd(x,xi)2.
4Feragen, Hauberg, Nielsen, Lauze, Means in spaces of treelike shapes, ICCV 2011
30/50
Curvature in metric spaces
Subsets{x1, . . . ,xn} in CAT(0)-spaces Theorem (Bridson, Haeiger)
...have unique circumcenters, dened as the center of the smallest sphere containing all the{xi}si=1.
Timeout: CAT(0)-spaces, our new favorite statistical playground?
Curvature in metric spaces
Subsets{x1, . . . ,xn} in CAT(0)-spaces Theorem (Billera, Vogtmann, Holmes)
...have unique centroids, dened by induction on|S|=n:
I If|S|=2, then c(S) is the midpoint of the geodesic between the two elements of S.
I If|S|=n>2 and we have dened c(S0) for all S0 with
|S0|<n, then denote by c1(S) the set {c(S0)|S0⊂S,|S0|=n−1} and denote by ck(S) =c1(ck−1(S))when k >1.
I If ck(S)→p for some p ∈X as k¯ → ∞, then c(S) =p is the centroid of S.
30/50
Timeout over: Back to the phylogenetic trees
Timeout over: Back to the phylogenetic trees
What does this mean for the phylogenetic trees?
I We can compute average phylogenetic trees!
I Possible problem: Based on Billera, Holmes, Vogtmann, centroid phylogenetic trees have exponential computation time
I Moreover, geodesics between phylogenetic trees do not have obvious polynomial computation algorithms, either.
32/50
Computability?
Using the CAT(0) properties, it is possible to prove:
Theorem
4 There is a polynomial time algorithm for computing the geodesic between two phylogenetic trees.
4Owen, Provan: A Fast Algorithm for Computing Geodesic Distances in Tree Space, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2011
Timeout over: Back to the phylogenetic trees
Summary
Pros:
I A nice mathematical theory
I Computability
I Excellent modeling properties for phylogenetic trees
I CAT(0) property gives potential for more statistical measurements
Cons:
I Does not carry directly over to trees with more geometric branch descriptors
I Fixed branch label set
I Ordered trees (⇔leaf labels)
I No noise tolerance
34/50
Summary
Pros:
I A nice mathematical theory
I Computability
I Excellent modeling properties for phylogenetic trees
I CAT(0) property gives potential for more statistical measurements
Cons:
I Does not carry directly over to trees with more geometric branch descriptors
I Fixed branch label set
I Ordered trees (⇔ leaf labels)
I No noise tolerance