Thursday, 2017-11-23, 1:01 AM

# BioInfoPakistan.Ucoz

Statistics
Total online: 1
Guests: 1
Users: 0

Is Online User
Site Search
Main » 2011 » August » 12 » Needleman–Wunsch Algorithm
10:31 PM
Needleman–Wunsch Algorithm

 Needleman–Wunsch Algorithm1. Description:   The Needleman–Wunsch algorithm performs a global alignment on two sequences (called A and B here). It is commonly used in bioinformatics to align protein or nucleotide sequences. The algorithm was published in 1970 by Saul B. Needleman and Christian D. Wunsch.It is an example of dynamic programming, and was the first application of dynamic programming to biological sequence comparison.2. Explanation:We consider all possible pairs of residue from two sequences (this gives rise to a 2D matrix representation).We will have two matrices: the score matrix and traceback matrix.The Needleman-Wunsch algorithm consists of three steps:Initialisation of the score matrixCalculation of scores and filling the traceback matrixDeducing the alignment from the traceback matrixConsider The Simple ExampleThe alignment of two sequences SEND and AND with the BLOSUM62 substitution matrix and gap opening penalty of 10 (no gap extension):SEND-AND score: +1A-ND score: +3   the bestAN-D score: -3AND- score: -8The Score And Traceback MatricesThe cells of the score matrix are labelled C(i; j) where i = 1; 2; :::; N and j = 1; 2; :::; MScoringThe score matrix cells are filled by row starting from the cell C(2; 2)The score of any cell C(i; j) is the maximum of:qdiag = C(i  1; j  1) + S(i; j)qup = C(i  1; j) + gqleft = C(i; j  1) + gwhere S(i; j) is the substitution score for letters i and j, and g is the gap penalty.Scoring A Pictorial RepresentationThe value of the cell C(i; j) depends only on the values of the immediately adjacent northwest diagonal, up, and left cells:The Needleman-Wunsch Progression1.  The first step is to calculate the value of C(2; 2):2.  The calculation for the cell C(2; 2):qdiag = C(1; 1) + S(S; A) = 0 + 1 = 1qup = C(1; 2) + g = 10 + (10) = 20qleft = C(2; 1) + g = 10 + (10) = 20Where C(1; 1), C(1; 2), and C(2; 1) are read from the score matrix, and S(S; A) is the score for the S <-> A taken from the BLOSUM62 matrix.3.  For the score matrix C(2; 2) =qdiag which is 1. The corresponding cell of the traceback matrix is "diag":4.  The next step is to calculate C(2; 3):5.  The calculation for the cell C(2; 3)qdiag = C(1; 2) + S(E; A) = 10 + 1 = 11qup = C(1; 3) + g = 20 + (10) = 30qleft = C(2; 2) + g = 1 + (10) = 96.  Thus C(2; 3) = 9 and the corresponding cell of the traceback matrix is "left".7.  After all cells are lled, the score and traceback matrices are:Page 2
Category: Algorithms | Views: 1890 | Added by: Ansari | Rating: 0.0/0