The Smith–Waterman algorithm is a well-known algorithm for performing local sequence alignment; that is, for determining similar regions between two nucleotide or protein sequences. Instead of looking at the total sequence, the Smith–Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure.
Sensitive to detect similarity in highly diverged sequences.
Algorithm: similar to global alignment with modiﬁed boundary conditions and recurrence rules.
The top row and left column are now ﬁlled with 0.
If the (sub-)alignment score becomes negative, restart the search:
Traceback: from the maximum of F (i, j) in the whole matrix to the ﬁrst 0.
Example: the optimal local alignment between HEAGAWGHEE and PAWHEAE is AWGHE::AW-HE.
Issue: In gapped alignments, the expected score for a random match may be positive.
Consider The Simple Example
N x N integer matrix
N is sequence length (both s and t)
Compute M[i ][j ] based on Score Matrix and optimum score compute so far (DP)
Figure: Computation Matrix alginment, M
t : − − − − − − − −
s : c c c t a g g t
Figure: Aligning s to gaps
t : c g g g t a t ...
s : − − − − − − − ...
Figure: Aligning t to gaps
How to compute cell score?
How to find M[i][j]? Three ways to finish the alignment of s0..i and t0..j