• Ingen resultater fundet

Phylogenetic trees and their like

Approach 2: Phylogenetic trees and their like

Spaces of phylogenetic trees

3Billera, Holmes, Vogtmann: Geometry of the space of Phylogenetic trees, Adv. in Appl. Math, 2001.

23/50

Figure: Figure borrowed from 3

I Billera et al. study the metric geometry of spaces of phylogenetic trees3, which describe genetic development of species.

I Rooted trees with labeled leaves (so ordered trees) and length attributes on all edges.

I Metric geometry existence and uniqueness of geodesics and dataset

centroids, computation of centroids of a set of phylogenetic trees.

I Centroid computation is NP hard

I Model applies directly to leaf-labeled trees with constant labels sets and edge length attributes

Approach 2: Phylogenetic trees and their like

Spaces of phylogenetic trees

3Billera, Holmes, Vogtmann: Geometry of the space of Phylogenetic trees, Adv. in Appl. Math, 2001.

Figure: Figure borrowed from 3

I Billera et al. study the metric geometry of spaces of phylogenetic trees3, which describe genetic development of species.

I Rooted trees with labeled leaves (so ordered trees) and length attributes on all edges.

centroids, computation of centroids of a set of phylogenetic trees.

I Centroid computation is NP hard

I Model applies directly to leaf-labeled trees with constant labels sets and edge length attributes

Approach 2: Phylogenetic trees and their like

Spaces of phylogenetic trees

3Billera, Holmes, Vogtmann: Geometry of the space of Phylogenetic trees, Adv. in Appl. Math, 2001.

23/50

Figure: Figure borrowed from 3

I Billera et al. study the metric geometry of spaces of phylogenetic trees3, which describe genetic development of species.

I Rooted trees with labeled leaves (so ordered trees) and length attributes on all edges.

I Metric geometry existence and uniqueness of geodesics and dataset

centroids, computation of centroids of a set of phylogenetic trees.

I Centroid computation is NP hard

I Model applies directly to leaf-labeled trees with constant labels sets and edge length attributes

Approach 2: Phylogenetic trees and their like

Spaces of phylogenetic trees

3Billera, Holmes, Vogtmann: Geometry of the space of Phylogenetic trees, Adv. in Appl. Math, 2001.

Figure: Figure borrowed from 3

I Billera et al. study the metric geometry of spaces of phylogenetic trees3, which describe genetic development of species.

I Rooted trees with labeled leaves (so ordered trees) and length attributes on all edges.

I Metric geometry existence and uniqueness of geodesics and dataset

centroids, computation of centroids of a set of phylogenetic trees.

I Centroid computation is NP hard

Approach 2: Phylogenetic trees and their like

Spaces of phylogenetic trees

3Billera, Holmes, Vogtmann: Geometry of the space of Phylogenetic trees, Adv. in Appl. Math, 2001.

23/50

Figure: Figure borrowed from 3

I Billera et al. study the metric geometry of spaces of phylogenetic trees3, which describe genetic development of species.

I Rooted trees with labeled leaves (so ordered trees) and length attributes on all edges.

I Metric geometry existence and uniqueness of geodesics and dataset

centroids, computation of centroids of a set of phylogenetic trees.

I Centroid computation is NP hard

I Model applies directly to leaf-labeled trees with constant labels sets and edge length attributes

Approach 2: Phylogenetic trees and their like

Modeling phylogenetic trees

I Fix a set of n leaf labels, e.g. {human, gorilla, orangutan, computer scientist}, or {1,2,3,4}.

evolutionary length

specified.

1 2 3 4

0

1 2 4 3

0

2 4 3 1

0

FIG. 6. Three pictures of the same tree.

Approach 2: Phylogenetic trees and their like

Modeling phylogenetic trees

I Fix a set of n leaf labels, e.g. {human, gorilla, orangutan, computer scientist}, or {1,2,3,4}.

I Build the binary tree with the corresponding leaves

I Attach lenghts∈R+= [0,∞[to all branches, representing evolutionary length

specified.

1 2 3 4

0

1 2 4 3

0

2 4 3 1

0

FIG. 6. Three pictures of the same tree.

Figure: Figure borrowed from Billera et al 24/50

Modeling phylogenetic trees

I Fix a set of n leaf labels, e.g. {human, gorilla, orangutan, computer scientist}, or {1,2,3,4}.

I Build the binary tree with the corresponding leaves

I Attach lenghts∈R+= [0,∞[to all branches, representing evolutionary length

specified.

1 2 3 4

0

1 2 4 3

0

2 4 3 1

0

FIG. 6. Three pictures of the same tree.

Approach 2: Phylogenetic trees and their like

Modeling phylogenetic trees

24/50

This gives a space of phylogenetic trees:

I For each type of binary tree with the given labels, we form a quadrant RN+⊂RN

(N =] branches in binary tree with n leaves)

I Now for a point x ∈RN+ the

coordinate xi ≥0 is the length of the ith branch

I Glue the quadrants together along the natural branch collapses

human gorilla orangutan computer scientist

Approach 2: Phylogenetic trees and their like

Modeling phylogenetic trees

This gives a space of phylogenetic trees:

I For each type of binary tree with the given labels, we form a quadrant RN+⊂RN

(N =] branches in binary tree with n leaves)

I Now for a point x ∈RN+ the

coordinate xi ≥0 is the length of the ith branch

human gorilla orangutan computer scientist

Approach 2: Phylogenetic trees and their like

Modeling phylogenetic trees

24/50

This gives a space of phylogenetic trees:

I For each type of binary tree with the given labels, we form a quadrant RN+⊂RN

(N =] branches in binary tree with n leaves)

I Now for a point x ∈RN+ the

coordinate xi ≥0 is the length of the ith branch

I Glue the quadrants together along the natural branch collapses

human gorilla orangutan computer scientist

The space of phylogenetic trees

geometry of tr ee space 741

1 2 3 4 1 2 3 4 1 2 4 3

43 2 1

g

a

a b b

c

c

d

d

e e

f f

1324

4 3 1 2

4 2 1 3 1432

3 2 1 4 4 1 3 2

24313241 13244132

1 3 2 4

g h

h i

j k

l

k

l i

j

FIG. 4. Cubical tiling of M0 5, where the arrows indicate oriented identifications.

4 bil l er a, hol mes, and vogtmann

= (0,1) (1,1) =

= (0,0) (1,0) =

12 34 1

12 34

12 34 1 1 2 34

1 1

12 34 0.6 0.3 (.3,.6) = 0

0

0

0 0

FIG. 8. The 2-dimensional quadrant corresponding to a metric 4-tree.

Figure: Figures shamelessly copied from Billera, Holmes, Vogtmann:

Geometry of the space of Phylogenetic Trees

Approach 2: Phylogenetic trees and their like

The really cool thing about the space of phylogenetic trees!

Theorem (Billera, Holmes, Vogtmann)

The space of phylogenetic trees is a CAT(0) space

What does that mean?

26/50

The really cool thing about the space of phylogenetic trees!

Theorem (Billera, Holmes, Vogtmann)

The space of phylogenetic trees is a CAT(0) space

What does that mean?

Timeout: CAT(0)-spaces, our new favorite statistical playground?

Timeout: CAT ( 0 ) -spaces,

our new favorite statistical playground?

27/50

Timeout: CAT(0)-spaces, our new favorite statistical playground?

Statistics in metric spaces?

Recall that a metric space is a space X of points with a distance measure d such that

I d(x,y) =d(y,x)

I d(x,y) =0 if and only if x=y

I d(x,y) +d(y,z)≥d(x,z) (triangle inequality)

Timeout: CAT(0)-spaces, our new favorite statistical playground?

Statistics in metric spaces?

Recall that a metric space is a space X of points with a distance measure d such that

I d(x,y) =d(y,x)

I d(x,y) =0 if and only if x=y

I d(x,y) +d(y,z)≥d(x,z) (triangle inequality)

I In order to formulate statistics, we want to have geodesics.

What does the word geodesic even mean in a metric space?

28/50

Timeout: CAT(0)-spaces, our new favorite statistical playground?

Geodesics in metric spaces

I Let (X,d)be a metric space. The length of a curve c: [a,b]→X is

l(c) =supa=t0t1≤...≤tn=b n1

X

i=0

d(c(ti,ti+1)).

I (X,d) is a geodesic space if all pairs x,y can be joined by a geodesic.

Timeout: CAT(0)-spaces, our new favorite statistical playground?

Geodesics in metric spaces

I Let (X,d)be a metric space. The length of a curve c: [a,b]→X is

l(c) =supa=t0t1≤...≤tn=b n1

X

i=0

d(c(ti,ti+1)).

I A geodesic from x to y in X is a path c: [a,b]→X such that c(a) =x,c(b) =y and l(c) =d(x,y).

I (X,d) is a geodesic space if all pairs x,y can be joined by a geodesic.

29/50

Geodesics in metric spaces

I Let (X,d)be a metric space. The length of a curve c: [a,b]→X is

l(c) =supa=t0t1≤...≤tn=b n1

X

i=0

d(c(ti,ti+1)).

I A geodesic from x to y in X is a path c: [a,b]→X such that c(a) =x,c(b) =y and l(c) =d(x,y).

I (X,d) is a geodesic space if all pairs x,y can be joined by a geodesic.

Timeout: CAT(0)-spaces, our new favorite statistical playground?

Curvature in metric spaces

I A CAT(0) space is a metric space in which geodesic triangles are thinner than for their comparison triangles in the plane;

that is, d(x,a)≤d(¯x,¯a).

I A space has non-positive curvature if it is locally CAT(0).

30/50

Curvature in metric spaces

I A CAT(0) space is a metric space in which geodesic triangles are thinner than for their comparison triangles in the plane;

that is, d(x,a)≤d(¯x,¯a).

I A space has non-positive curvature if it is locally CAT(0).

Timeout: CAT(0)-spaces, our new favorite statistical playground?

Curvature in metric spaces

Example geometry of tr ee space 741

12 3 4 12 3 4 12 4 3 43 2 1 g

a

a b b

c

c

d

d

e e

f f

1324

43 1 2 4 2 1 3 1432

3 2 1 4 4 1 3 2

24313241 13244132

1 3 2 4

g h

h i

j k

l

k

l i

j FIG. 4. Cubical tiling of M0 5, where the arrows indicate oriented identifications.

Figure: CAT(0)spaces.

Theorem (see e.g. Bridson-Haeiger)

Let(X,d) be a CAT(0)space; then all pairs of points have a unique geodesic joining them.

30/50

Curvature in metric spaces

Example geometry of tr ee space 741

12 3 4 12 3 4 12 4 3 43 2 1 g

a

a b b

c

c

d

d

e e

f f

1324

43 1 2 4 2 1 3 1432

3 2 1 4 4 1 3 2

24313241 13244132

1 3 2 4

g h

h i

j k

l

k

l i

j FIG. 4. Cubical tiling of M0 5, where the arrows indicate oriented identifications.

Figure: CAT(0)spaces.

Theorem (see e.g. Bridson-Haeiger)

Let(X,d) be a CAT(0) space; then all pairs of points have a unique geodesic joining them.

Timeout: CAT(0)-spaces, our new favorite statistical playground?

Curvature in metric spaces

Subsets{x1, . . . ,xn} in CAT(0)-spaces Theorem

4 ...have unique means, dened as argminPd(x,xi)2.

4Feragen, Hauberg, Nielsen, Lauze, Means in spaces of treelike shapes, ICCV 2011

30/50

Curvature in metric spaces

Subsets{x1, . . . ,xn} in CAT(0)-spaces Theorem (Bridson, Haeiger)

...have unique circumcenters, dened as the center of the smallest sphere containing all the{xi}si=1.

Timeout: CAT(0)-spaces, our new favorite statistical playground?

Curvature in metric spaces

Subsets{x1, . . . ,xn} in CAT(0)-spaces Theorem (Billera, Vogtmann, Holmes)

...have unique centroids, dened by induction on|S|=n:

I If|S|=2, then c(S) is the midpoint of the geodesic between the two elements of S.

I If|S|=n>2 and we have dened c(S0) for all S0 with

|S0|<n, then denote by c1(S) the set {c(S0)|S0⊂S,|S0|=n−1} and denote by ck(S) =c1(ck1(S))when k >1.

I If ck(S)→p for some p ∈X as k¯ → ∞, then c(S) =p is the centroid of S.

30/50

Timeout over: Back to the phylogenetic trees

Timeout over: Back to the phylogenetic trees

What does this mean for the phylogenetic trees?

I We can compute average phylogenetic trees!

I Possible problem: Based on Billera, Holmes, Vogtmann, centroid phylogenetic trees have exponential computation time

I Moreover, geodesics between phylogenetic trees do not have obvious polynomial computation algorithms, either.

32/50

Computability?

Using the CAT(0) properties, it is possible to prove:

Theorem

4 There is a polynomial time algorithm for computing the geodesic between two phylogenetic trees.

4Owen, Provan: A Fast Algorithm for Computing Geodesic Distances in Tree Space, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2011

Timeout over: Back to the phylogenetic trees

Summary

Pros:

I A nice mathematical theory

I Computability

I Excellent modeling properties for phylogenetic trees

I CAT(0) property gives potential for more statistical measurements

Cons:

I Does not carry directly over to trees with more geometric branch descriptors

I Fixed branch label set

I Ordered trees (⇔leaf labels)

I No noise tolerance

34/50

Summary

Pros:

I A nice mathematical theory

I Computability

I Excellent modeling properties for phylogenetic trees

I CAT(0) property gives potential for more statistical measurements

Cons:

I Does not carry directly over to trees with more geometric branch descriptors

I Fixed branch label set

I Ordered trees (⇔ leaf labels)

I No noise tolerance