Discrete phase-type distributions - On the relation between matrix-geometric and discrete phase

Let {Xn}_n∈_N_≥0 be a discrete-time Markov chain on the state space E ={1,2, . . . , m, m+ 1}. We let {1,2, . . . , m} be the transient states of the

Markov chain and m+ 1 be the absorbing state. The transition probability matrix of this Markov chain is given by

P =

T t 0 1

HereT is them×msub-transition probability matrix for the transient states, andtis the exit vector which gives the probability of absorption into statem+ 1 from any transient state. SinceP is a transition probability matrix, each row ofP must sum to 1, hence

Te+t=e.

The probability of initiating the Markov chain in state i is denoted by αi = P(X₀=i).The initial probability vector of the Markov chain is then given by (α, αm+1) = (α1, α2, . . . , αm, αm+1) and we have Pm+1

i=1 αi= 1.

Definition 2.1 (Discrete phase-type distribution) A random variable τ has a discrete phase-type distribution if τ is the time until absorption in a discrete-time Markov chain,

τ:= min{n∈N≥0:X_n=m+ 1}.

The name phase-type distribution refers to the states (phases) of the underlying Markov chain. IfTis anm×mmatrix we say that the density is of orderm.The order of the discrete-phase type distribution is the minimal size of the matrix T needed to represent the density.

2.1.1 Density and distribution function

In order to find the density of τ we look at the probability that the Markov chain is in one of the transient statesi∈ {1,2, . . . , m} afternsteps,

p⁽ⁿ⁾_i =P(Xn=i) =

k=1

αk(Tⁿ)_ki.

We can collect these probabilities in a vector and getρ⁽ⁿ⁾= (p⁽ⁿ⁾₁ , p⁽ⁿ⁾₂ ,· · ·, p⁽ⁿ⁾m ).

Note thatρ⁽⁰⁾=α.

Lemma 2.2 The density of a discrete phase-type random variableτ is given by

fτ(n) =αTⁿ⁻¹t, n∈N (2.1)

andfτ(0) =αm+1.

2.1 Discrete phase-type distributions 7

Proof. The probability of absorption of the Markov chain at timen is given by the sum over the probabilities of the Markov chain being in one of the states {1,2, . . . , m}at timen−1 multiplied by the probability that absorption takes place from that state. The state of the Markov chain at timen−1 depends on the initial state of the Markov chain and the (n−1)-step transition probability matrixTⁿ⁻¹.Hence we get

fτ(n) =P(τ=n) =

i=1

p⁽ⁿ⁻¹⁾_i ti=ρ⁽ⁿ⁻¹⁾t=αTⁿ⁻¹t, n∈N.

Note that fτ(0) is the probability of absorption of the Markov chain in zero steps, which is given by αm+1, the probability of initiating in the absorbing

state.

The density of τ is completely defined by the initial probability vector α and the sub-transition probability matrix T, sincet= (I−T)e. We write

τ ∼DPH (α, T)

to denote thatτis of discrete phase type with parametersαandT. The density of a discrete phase-type distribution is said to have an atom in zero of sizeα_m+1 iff_τ(0) =α_m+1.

A representation (α, T) for a phase-type distribution is calledirreducibleif every state of the Markov chain can be reached with positive probability if the initial distribution is given byα. We can always find such a representation by simply leaving out the states that cannot be reached.

From now on we will omit the subscriptτand simply writef(n) for the density of a discrete phase-type random variable. We have to check thatf(n) =αTⁿ⁻¹t is a well-defined density on the non-negative integers. Since α, T andt only have non-negative entries (as their entries are probabilities) we know thatf(n) is non-negative for alln.The infinite sum off(n) is given by:

∞

n=0

f(n) = f(0) +

∞

n=1

αTⁿ⁻¹t

= α_m+1+α

∞

n=0

Tⁿ

! t

= αm+1+α(I−T)⁻¹t

= αm+1+α(I−T)⁻¹(I−T)e

= αm+1+αe= 1.

As T is a sub-stochastic matrix, all its eigenvalues are less than one (see e.g.

[2] Proposition I.6.3). Therefore we have in the above that the series P∞ n=0Tⁿ converges to the matrix (I−T)⁻¹ (see e.g. [16] Lemma B.1.).

We denote byF(n) =P(τ≤n) the distribution function ofτ. The distribution function can be deduced by the following probabilistic argument.

Lemma 2.3 The distribution function of a discrete phase-type variable is given by

F(n) = 1−αTⁿe. (2.2)

Proof. We look at the probability that absorption has not yet taken place and hence the Markov chain is in one of the transient states. We get

1−F(n) = P(τ > n)

i=1

p⁽ⁿ⁾_i

= p⁽ⁿ⁾e

= αTⁿe.

2.1.2 Probability generating function and moments

For a discrete random variable X defined on the non-negative integers and densityp_n=P(X=n) the probability generating function (pgf) is given by

H(z) =p0+zp1+z²p2+. . .=

∞

k=0

z^kpk, |z| ≤1.

HenceH(z) =E z^X

. Denote byH⁽ⁱ⁾(z) the i-th derivative ofH(z). We have H(1) =

∞

k=0

pk= 1, H⁽¹⁾(1) =

∞

k=0

kpk=E(X) and in general we get thek-th factorial moment, if it exists, by

H^(k)(1) =E(X(X−1). . .(X−(k−1))). For a probability generating function, the following properties hold:

2.1 Discrete phase-type distributions 9

1. The probability generating function uniquely determines the probability density ofX, since

H^(k)(0)/k! =pk for k= 1,2, . . . .

It follows that if HX(z) = HY(z) for all zwith|z| ≤ 1 then fX(n) = fY(n) for alln∈N.

2. IfX, Y are two independent random variables, then the pgf of the random variableZ=X+Y is given by

H_Z(z) =E z^X+Y

=E z^Xz^Y

=E z^X E z^Y

=H_X(z)H_Y(z).

3. IfZ is with probability pequal to X and with probability (1−p) equal toY then

HZ(z) =pHX(z) + (1−p)HY(z).

Lemma 2.4 The probability generating function of a discrete phase-type ran-dom variable is given by

H(z) =αm+1+αz(I−zT)⁻¹t. (2.3)

Proof.

H(z) =E(zⁿ) =

∞

n=0

zⁿf(n)

= f(0) +

∞

n=1

zⁿαTⁿ⁻¹t

= α_m+1+αz

∞

n=1

(zT)ⁿ⁻¹t

= αm+1+αz

∞

n=0

(zT)ⁿt

= αm+1+αz(I−zT)⁻¹t.

Note thatH(z) is a rational expression (a ratio between two polynomials) inz.

This because all elements of (I−zT) are polynomials inzand taking the inverse leads to ratios of those polynomials. The poles of a rational function are given by the roots of the denominator. Hence the poles ofH(z) are the solutions to the equation det(I−zT) = 0. As the eigenvalues of the matrix T are found by solving det(T−zI) = 0, we see that the poles of the probability generating

functions are reciprocals of eigenvalues of T (there might be less poles than eigenvalues due to cancellation).

From the probability generating function we can obtain the expectation of a discrete phase-type random variable by differentiating H(z) and takingz = 1.

We first need the following lemma:

Lemma 2.5 For|z| ≤1andIandT the identity and sub-transition probability matrix it holds that

∞

from which the desired result follows.

Using Lemma 2.5 we can now calculate the first derivative ofH(z) and find the expectation by substituting z= 1:

H⁽¹⁾(z) = d

2.1 Discrete phase-type distributions 11

Corollary 2.6 The expectation of a discrete phase-type random variable is given by

E(X) =α(I−T)⁻¹e. (2.5)

The factorial moments for a discrete phase-type random variable can be obtained by successive differentiation of the probability generating function and are given by:

E(X(X−1). . .(X−(k−1))) =H^(k)(1) =k!α(I−T)^−ke.

The formulation in (2.5) of the mean of a discrete phase-type distribution sug-gests that the matrix (I−T)⁻¹ denotes the mean number of visits to each stat of the Markov chain, prior to absorption.

Lemma 2.7 Let U = (I−T)⁻¹. Then uij gives the mean number of visits to statej, given that the Markov chain initiated in statei, prior to absorption.

Proof. Let Bj denote the number of visits to state j prior to absorption.

Further, let Ei andPi denote the expectation respectively the probability con-ditioned on the eventX0=i. Then

Ei(Bj) = Ei τ−1

n=0

1_{X_n_=j}

∞

n=0

Pi(X_n =j, τ −1≥n)

∞

n=0

Pi(Xn =j, τ > n)

∞

n=0

(Tⁿ)_ij. Now we can writeU = (Ei(Bj))_ij to get

U =

∞

n=0

Tⁿ = (I−T)⁻¹.

2.1.3 Geometric and negative binomial distribution

As a first example of a discrete phase-type distribution we have a look at the phase-type representation for the geometric distribution. Let the random

vari-ableX be of discrete phase-type with parameters α = 1 andT = (1−p) for 0< p <1. Then it is readily seen that the exit vector is given byt=pand we can compute the density ofX by

f(n) =αTⁿ⁻¹t= (1−p)ⁿ⁻¹p.

HenceXis geometrically distributed with success probabilityp. The probability generating function is given by

H(z) =αm+1+αz(I−zT)⁻¹t= 1·z(1−z(1−p))⁻¹p= zp 1−(1−p)z. We can generalize the previous example by considering the (m+ 1)×(m+ 1)-transition probability matrix

If we take the initial probability vector (1,0, . . . ,0) we have a Markov chain that runs through all the states, and makes a transition to the following state accord-ing to a geom(p) distribution. The discrete phase-type distribution represented by the firstmrows and columns of P is the sum ofmgeometric distributions, which is a negative binomial distribution. The densityg(n) of a negative bino-mial distribution with success probabilitypandmsuccesses is given by

g(n) =

n+m−1 m−1

(1−p)ⁿp^m

and has support on the non-negative integers. This density counts the number of failuresnuntilmsuccesses have occurred. The phase-type expression in (2.6) however counts all trials. Hence the density corresponding to this representation is given by: generalization is possible, by taking the sum ofm geometric distributions that each have a different success probabilitypk for 1≤k≤m. The resulting distri-bution is called generalized negative binomial distridistri-bution. The corresponding

2.1 Discrete phase-type distributions 13

transition probability matrix is given by

P =

2.1.4 Non-uniqueness of representation

The representation of a discrete phase-type distribution is not unique. Consider the phase-type density f(n) with parameters α and T, where we take T to be irreducible. Since T is a sub-stochastic matrix, and the matrix (I−T) is invertible, we know that T has at least one eigenvalue λ with 0 < λ < 1 and corresponding eigenvectorν. We can normalizeν such thatνe= 1. If we now take α=ν we get

f(n) = νTⁿ⁻¹t

= λⁿ⁻¹νt=λⁿ⁻¹ν(I−T)e

= λⁿ⁻¹(νIe−νTe) =λⁿ⁻¹(1−λ).

It follows that the geometric density with success probability (1−λ) can be represented as a discrete phase-type density both with parameters (ν, T) of dimensionm, say, and with parameters (1, λ) of dimension 1.

2.1.5 Properties of discrete phase-type distributions

In this section we will look at some properties for the set of discrete phase-type distributions. We will give only probabilistic proofs for these properties. The analytic proofs can be found in Section 4.4 on properties for matrix-geometric distributions.

Property 2.1 Any probability density on a finite number of positive integers is of discrete phase-type.

Proof. Let{g(i)}_i∈I be a probability density on the positive integers of finite support, hence I ( N. Then we can make a Markov chain {Xn}_n∈_N_≥0 on a

state space of size I+ 1 such that the probability of absorption after i steps equals exactlyg(i). The discrete phase-type distribution with this underlying Markov chain and initiating in phase 1 with probability 1 will now give the

desired density.

Example 2.1 As an example of this property, we have a look at the density g(n) with support on the set{1,2,3,4}given by

We want to find a discrete phase-type representation (β, S) for this density.

The underlying Markov chain {Xn}n∈N≥0 will consist of five states (labelled 1,2,3,4,5 and visited in that order), where the fifth state is the absorbing state, and henceS will be of order four. We takeβ= (1,0,0,0) and denote bysthe exit vector of the phase-type representation, which is given by (I−S)e.

The Markov chain initiates in state one with probability 1, P(X₀ = 1) = 1, and since the probability of absorption in one step should equal g(1) we get P(X₁ = 5) = g(1) = ¹₃, and hence P(X₁ = 2) = ²₃. This gives s₁ = ¹₃ and S_1,2=²₃.From state 2 the probability of absorption must beg(2) =¹₆. We find P(X1 = 2)P(X2 = 5) = ²₃s2 = ¹₆ which leads to s2 = ¹₄ and hence S2,3 = ³₄. Continuing this way, we find that the transition probability matrix

P =

describes the underlying Markov process of the discrete phase-type densityg(n), and leads to the representation

β= (1,0,0,0) andS=

Property 2.1 is of limited use in practise, unless the setI is small. As a conse-quence of this first property, the discrete phase-type distributions are dense in the set of distributions on the non-negative integers. This means that for any probability distributionF onN≥0 there is a sequenceFnof discrete phase-type distributions that converges in distribution toF (this convergence is called weak convergence).

Property 2.2 The convolution of a finite number of discrete phase-type den-sities is itself of phase-type.

2.1 Discrete phase-type distributions 15

Proof. We will prove this property for two phase-type distributions. Let X and Y be two independently distributed phase-type random variables with distributions represented by (α, T) and (β, S) respectively. We letT be of order mand S of orderkand denote the exit vectors of the two densities bytands respectively. Consider the transition probability matrix

P = de-scribes a discrete-time Markov chain with m+k+ 1 phases, where the firstm transient states are visited according to initial probability vectorα and transi-tion probability matrix T and they are left with the probabilities given in the exit vectort. The next kstates are entered with the initial probabilities given in β and visited according to the matrixS. Absorption from those states into statem+k+ 1 takes place with the exit probabilities given ins.

With probabilityαm+1the chain is entered in statem+ 1 and with probability αm+1βk+1 the chain is entered immediately in the absorbing statem+k+ 1.If

we see that (γ, V) is a representation for the discrete phase-type random variable

Z =X+Y.

Property 2.3 Any finite mixture of probability densities of phase type is itself of phase type.

Proof. Let the vector (p1, p2, . . . , pk) denote the mixing density. Let the densitiesfi(n) be represented by the matricesTiand initial probability vectors α_i. Then the mixture can be represented by the transition probability matrix

P =

For two matricesA andB of sizek×m andp×qrespectively, the Kronecker

Hence A⊗B has sizekp×mq. We can use the Kronecker product to express the maximum and minimum of two discrete phase-type variables.

Property 2.4 LetX andY be discrete phase-type distributed with parameters (α, T) and (β, S) respectively. Then U = min(X, Y) is discrete phase-type distributed with parameters (α⊗β, T ⊗S) and V = max(X, Y) is discrete phase-type distributed with parameters

γ = (α⊗β, β_k+1α, α_m+1β),

Proof. In order to prove that the minimumU is of discrete phase-type we have to find a Markov chain where the time until absorption has the same distribution as the minimum ofX andY. This minimum is the time until the first of the two underlying Markov chains ofX andY gets absorbed. We make a new Markov chain that keeps track of the state of both underlying Markov chains ofXandY. The state space of this new Markov chain consists of all combinations of states of XandY. Hence, ifXhas ordermandY has orderkwe have the new state space {(1,1),(1,2), . . . ,(1, k),(2,1),(2,2), . . . ,(2, k), . . . ,(m,1),(m,2), . . . ,(m, k)}.

The mk×mk order transition probability matrix corresponding to this new Markov chain is given by T ⊗S since each entrytijS gives for a transition in the Markov chain corresponding to (α, T) all possible transitions in the Markov chain corresponding to (β, S). The corresponding initial probability vector is given byα⊗β.

The maximumV is the time until both underlying Markov chains ofX andY get absorbed. This means that after one of the Markov chains gets absorbed we need to keep track of the second Markov chain until it too gets absorbed. Hence we need a new Markov chain withmk+m+kstates, where the firstmk states are the same as in the case of the minimum above. Then with the probabilities in the exit vectorsthe Markov chain ofY gets absorbed, and our new Markov chain continues in the states ofX.And with the probabilities in the exit vector tthe Markov chain ofX gets absorbed and our new Markov chain continues in the states ofY.Hence the matrixLdescribes the transition probabilities in the new Markov chain. If one of the Markov chains initiates in the absorbing state

2.1 Discrete phase-type distributions 17

we only have to keep track of the remaining Markov chain. Hence the initial probability vector corresponding to the matrixLis given byγ.

The last property is about theN-fold convolution of a discrete phase-type distri-bution, whereNis also discrete phase-type distributed. For ease of computation we will assume in the following that the probability of initiating in the absorbing state is zero.

Property 2.5 Let X_i ∼ DPH(α, T) i.i.d. and N ∼ DPH(β, S). Then the N-fold convolution of theX_i,

i=1

is again discrete phase-type distributed, with representation (γ, V) where γ= (α⊗β)

andV is the upper left part of the transition probability matrix P =

T⊗I+ (tα)⊗S t⊗s

0 1

where the identity matrix has the same dimension as the matrixS.

Proof. The interpretation of the matrixV = (T ⊗I+ (tα)⊗S) is that first the transient states of the Markov chain corresponding to the Xi are visited according to the transition probabilities in T. When this Markov chain gets absorbed with the probabilities given in t, the Markov chain underlying N is visited according toS. If absorption in this second Markov chain does not take place, the first Markov chain will be revisited with initial probabilities given in α.The new system corresponding toV is absorbed when both the first and the

second Markov chain get absorbed.

We will make this probabilistic proof more clear with the following example.

Example 2.2 LetXi∼DPH(α, T) be represented by the following vector and matrix:

α= (1,0), T = 1

2 1 1 3 3

1 3

and let N ∼DPH(β, S) be geometrically distributed with success probability p∈(0,1), hence

β= 1, S = (1−p).

This leads to the exit vectors t = 1

61 3

and s = p. The underlying Markov chains for theX_i and forN are shown in Figure 2.1:

1

¹₃

2 3

1 2

1 6

1 3 1 3 1

1

(I) Underlying Markov chain ofXi

a b

p

1 − p 1

(II) Underlying Markov chain ofN

Figure 2.1: The Markov chains corresponding toXi andN.

Usingγ= (α⊗β) andV = (T⊗I+ (tα)⊗S) we get for the representation of Z:

γ= (1,0), V = 1

2+¹₆·(1−p) ¹₃+ 0·(1−p)

3+¹₃·(1−p) ¹₃+ 0·(1−p)

,v= p

6p 3

The interpretation of thisN-fold convolution of theXiis that first states 1 and 2 of (I) are visited, and when the absorbing state 3 is reached the Markov chain given in (II) is started. With probability (1−p) this Markov chain chooses to go for another round in (I), and with probabilitypthis Markov chain gets absorbed in state b and then the total system is absorbed. We can think of the Markov chain underlying Z as a combination of (I) and (II), where the states 3 anda are left out, as the probabilities of going from state 1 or 2 through state 3 and stateaback to state 1 are equal to the new probabilities given in the matrixV of going directly from state 1 or 2 to state 1. Figure 2.2 depicts this Markov chain underlying the random variableZ:

In document On the relation between matrix-geometric and discrete phase-type distributions (Sider 15-29)