**C.3 Implementation Issues for Hidden Markov Models**

**C.3.1 Scaling**

When calculating the forward and backward probabilities for long observation sequences, we often run into problems of representing the forward and backward

C.3 Implementation Issues for Hidden Markov Models 129 probabilities within the normal range of double precision values in the computer.

To cope with these problems, we now introduce a way of scaling the forward and backward probabilities, such that we are able to calculate the probabilities within normal double precision values in the computer.

Scaling the Forward Probabilities

The new scaled version of the Forward-Backward algorithm uses a scaling
co-efficient denoted *c**t* for every time step *t* or observation symbol *O**t*. The first
scaling coefficient is defined by:

*c*1= 1
P_{N}

*i=1**α*1(i)*,* (C.49)

whereas the rest of the scaling coefficients are defined as:

*c**t*= 1
P_{N}

*i=1**α*ˆˆ*t*(i)*.* (C.50)

The notation of ˆˆ*α**t*(i) denote that the forward probability is still not scaled for
time *t, but for everyt−*1, t*−*2,*· · ·,*1, whereas the notation ˆ*α**t*(i) denote that
the probability is scaled for every *t. Non-scaled forward probabilities will just*
be denote*α** _{t}*(i) as before.

With the new notation in our hands, we are now able to define the scaled Forward-Backward algorithm:

To illustrate how the algorithm works, we take an example using the HMM
presented in section C.1.3 on page 110. First we calculate *α*1(i) as normal by
the initialisation step given in equation (C.11) on page 113:

*α*1(1) = *π*1*b*1(1) = 1/3*×*2/5 = 2/15
*α*1(2) = *π*2*b*2(1) = 1/3*×*4/6 = 4/18
*α*1(3) = *π*3*b*3(1) = 1/3*×*1/6 = 1/18,

then we calculate the scaling coefficient as the sum of all*α*_{1}(i) for every state*i:*

*c*1 = 1
and finally we scale all the forward values using (C.52):

ˆ
then we calculate the scaling value*c*2:

*c*2 = 1

ˆˆ

*α*2(1) + ˆˆ*α*2(2) + ˆˆ*α*2(3) = 1

1/15 + 0 + 1/6 = 1
7/30*,*
and finally use equation (C.55) to find the scaled forward values:

ˆ

C.3 Implementation Issues for Hidden Markov Models 131
To find the values of ˆ*α*3(i) we do the following calculations:

ˆˆ

If we compare the above calculations to the calculations done for the normal
Forward-Backward algorithm without scaling in section C.2.2 on page 115, we
can see that the scaled probabilities now are normalised such thatP_{N}

*i=0**α*ˆ*t*(i) =
1, enabling us to represent the forward probabilities within reasonable bounds
for even large *ts.*

If we look at the scaled version of calculating the forward probabilities ˆ*α** _{t}*(i), we
can express it in terms of the non-scaled probabilities

*α*

*t*(i) as given in equation (C.11), (C.12) on page 113 and the scaling coefficients.

ˆ

Furthermore it can be proven by induction that:

ˆ

To see why this is so, we take a look at the calculations for ˆ*α*2(j) in a HMM
with three states, and observe how the scaling coefficients can be extracted from
the summations:

With the definition of ˆ*α**t*(j) given in equation(C.57) and the observation made
in equation (C.56) we can now see that:

ˆ

So, with the scaling applied to the Forward-Backward algorithm, we are now
guaranteed that all forward values are normalised, within [0,1], and thatP_{N}

*i=0**α*ˆ*t*(i) =
1.

Now that we have scaled the forward values, we are no longer able to compute
the value*P*(O|λ) by equation (C.13) given on page 114. But if we look at the
propertyP_{N}

*i=0**α*ˆ*t*(i) = 1 for the scaled forward values for*t*=*T*, we can instead

C.3 Implementation Issues for Hidden Markov Models 133

Here we compute the log likelihood *log[P*(O|λ)] instead of *P(O|λ), because*
*P*(O|λ) will properly be very small, and not within the boundary of double
precision values in the computer.

Scaling the Backward Probabilities

We now turn to the scaling of the backward probabilities. This is done in quite the same way as with the forward probabilities, but instead of calculating new scale coefficients for all the backward probabilities, we just re-use the same scaling coefficients as found by the scaled version of the Forward-Backward algorithm, keeping the calculations to a minimum.

step 1: Initialisation.
To illustrate how the algorithm works, we take the example HMM presented in
section C.1.3 on page 110 and calculate the ˆ*β*values for the observation sequence
*O*=*RBG*using the scaling coefficients found by the scaled Forward-Backward
algorithm.

*β*ˆ3(1) = *c*3= 1

16/45 = 45/16
*β*ˆ3(2) = *c*3= 45/16

*β*ˆ3(3) = *c*3= 45/16

*β*ˆ2(1) = *c*2*×*[a11*b*1(2) ˆ*β*3(1) +*a*12*b*2(2) ˆ*β*3(2) +*a*13*b*3(2) ˆ*β*3(3)]

= 1/3*×*2/5*×*45/16 + 1/3*×*2/6*×*45/16 + 1/3*×*2/6*×*45/16
7/30

= 3/8 + 5/16 + 5/16

7/30 = 30/7

*β*ˆ2(2) = *c*2*×*[a21*b*1(2) ˆ*β*3(1) +*a*22*b*2(2) ˆ*β*3(2) +*a*23*b*3(2) ˆ*β*3(3)]

= 30/7

*β*ˆ2(3) = *c*2*×*[a31*b*1(2) ˆ*β*3(1) +*a*32*b*2(2) ˆ*β*3(2) +*a*33*b*3(2) ˆ*β*3(3)]

= 30/7

*β*ˆ1(1) = *c*1[a11*b*1(3) ˆ*β*2(1) +*a*12*b*2(3) ˆ*β*2(2) +*a*13*b*3(3) ˆ*β*2(3)]

= 1/3*×*1/5*×*30/7 + 1/3*×*0/6*×*30/7 + 1/3*×*3/6*×*30/7
37/90

= 2/7 + 0 + 5/7

37/90 = 90/37

*β*ˆ1(2) = *c*1[a21*b*1(3) ˆ*β*2(1) +*a*22*b*2(3) ˆ*β*2(2) +*a*23*b*3(3) ˆ*β*2(3)]

= 90/37

*β*ˆ1(3) = *c*1[a31*b*1(3) ˆ*β*2(1) +*a*32*b*2(3) ˆ*β*2(2) +*a*33*b*3(3) ˆ*β*2(3)]

= 90/37

When using the scaling coefficients found from forward values to also scale the backward values with, we are forced to always calculate the scaled forward values before we can calculate the scaled backward values. This should be kept in mind when implementing the scaled versions of the Forward-Backward and Backward-Forward algorithms.

As with the scaled forward variables, we can express the scaled backward vari-ables in terms of the non-scaled backwards varivari-ables and the product of the scaling coefficients.

*β*ˆ*t*(i) =
µY*T*

*τ=t*

*c**τ*

¶

*β**t*(i) =*D**t**β**t*(i) (C.66)

Re-Estimating Using Scaled *αs andβ*s

We now turn to the re-estimating of the Hidden Markov Model parameters using the scaled forward and backward probabilities.

When re-estimating the initial state probabilities*π, we need to calculateγ*1(i),
but how do we find the *γ**t*(i) values from the scaled *αs and* *βs. As seen in*

C.3 Implementation Issues for Hidden Markov Models 135

section C.2.4 from equation (C.22) on page 120 we defined *γ**t*(i) as:

*γ**t*(i) = *α**t*(i)β*t*(i)

Furthermore, the terms *C**t*and*D**t*can be expressed as:

*C**t**D**t* =
enabling us to express *γ**t*(i) in terms of the scaled forward and backward
prob-abilities, and the scale coefficient as:

*γ** _{t}*(i) =

*α*ˆ

*t*(i) ˆ

*β*

*t*(i)

The re-estimation of the initial state probability distribution*π*is therefore given
by:

*π**i* = *γ*1(i) (C.75)

= *α*ˆ1(i) ˆ*β*1(i)
*c*1

(C.76)
When re-estimating the state transition probabilities *a**ij* using the scaled
for-ward and backfor-ward values, we simply use the following equation:

*a**ij* =
To see why this gives the correct re-estimation, we re-write the equation in terms
of*αs andβs as defined by equation (C.57) and (C.66):*

*a**ij* =

The product of the two terms*C**t*and*D**t+1* can by expressed by:
which is independent of*t*and can therefore be cancel out in both the numerator
and the denominator:
resulting in the original re-estimation of*a**ij* as presented in equation (C.48).

The re-estimation of the observation probabilities*b**j*(k) is quite simple, because
it is expressed in terms of*γ**t*(i) values and we have already shown how to
com-pute the*γ**t*(i) values using the scaled forward and backward values.