The Viterbi Algorithm made its first appearance in the coding literature in a paper written by A. J. Viterbi in 1967. Since then, due to its easiness in implementation, it has been applied to many different areas related to decoding problems.
In general, the decoding problem is as follows: Given a model and a sequence of observations, what is the most likely state sequence in the model that produces the observations? Suppose we have an observation sequence Z=(z0,z1,...,zn) and we want to check if that sequence is a correct one. The idea is to find the sequence C=(c0,c1,...,cn) which is the closest to sequence Z. Each ci in C, fori=0,1,...,n, can take any value of the states given in the model. Let M be the number of states in the model. Fig. 1 summarizes the decoding problem.
Fig. 1 The decoding problem
Now, the question is, how can we know which sequence C is the closest to sequence Z? Different types of techniques have been developed, the most attractives ones are based on compound decision theory, where probability plays an important role. One of those techniques is the Viterbi algorithm, which gives an optimal solution to the decoding problem if the underlying process is assumed to be a Markov process.
2. Decoding problem:
Decoding problem: Given the HMM M=(A, B, p) and the observation sequence O=o1 o2 ... oK , calculate the most likely sequence of hidden states si that produced this observation sequence.
We want to find the state sequence Q= q1…qKwhich maximizesP(Q | o1 o2 ... oK ) , or equivalently P(Q , o1 o2 ... oK ) .
Brute force consideration of all paths takes exponential time. Use efficient Viterbialgorithm instead.
Define variable dk(i) as the maximum probability of producing observation sequence o1 o2 ... okwhen moving along any hidden state sequence q1… qk-1 and getting into qk= si .
dk(i)= max P(q1… qk-1,qk= si, o1 o2 ... ok)
where max is taken over all possible paths q1… qk-1 .
3. Viterbi Algorithm:
General Idea: if best path ending in qk= sjgoes through qk-1= sithen itshould coincide with best path ending in qk-1= si.
dk(i) = max P(q1…qk-1,qk= sj, o1 o2…ok)= maxi[aijbj (ok ) max P(q1… qk-1= si,o1 o2 … ok-1)]
To backtrack best path keep info that predecessor of sj was si.
Initialization: d1(i)= max P(q1= si,o1)= pibi(o1), 1<=i<=N.
Forward recursion:dk(j) = max P(q1… qk-1,qk= sj,o1 o2...ok)=maxi[aijbj(ok) max P(q1… qk-1= si,o1 o2...ok-1)]=maxi[aijbj(ok) dk-1(i)],1<=j<=N, 2<=k<=K.
Termination: Choose best path ending at time Kmaxi[dK(i)]
Backtrack best path.
This algorithm is similar to the forward recursion of evaluation problem, with S replaced by max and additional backtracking.