Friday, 2017-11-24, 11:30 PM
Welcome Guest | RSS

Login form

Statistics
Total online: 1
Guests: 1
Users: 0

Is Online User
Site Search
Main » 2011 » August » 23 » Markov Chains
3:23 PM
Markov Chains


Markov Chains

1. Description:
   A Markov chain, named for Andrey Markov, is a mathematical system that undergoes transitions from one state to another (from a finite or countable number of possible states) in a chainlike manner. It is a random process characterized as memoryless: the next state depends only on the current state and not on the entire past. This specific kind of "memorylessness" is called the Markov property. Markov chains have many applications as statistical models of real-world processes.

2. Key Points:
  1. If we can predict all of the properties of a sequence knowing only the conditional dinucleotide probabilities, then that sequence is an example of a Markov chain
  2. A Markov chain is defined as a sequence of states in which each state depends only on the previous state
3. Formalism For Markov Chains:
  1. M=(Q,.,P) is a Markov chain, where 
  2. Q = vector (1,..,n) is the list of states 
    • Q(1)=A, Q(2)=C, Q(3)=G, Q(4)=T for DNA 
  3. p = vector (p1,..,pn) is the initial probability of each state 
    • p(i)=pQ(i)  (e,g., p(1)=pA for DNA) 
  4. P= n x n matrix where the entry in row i and column j is the probability of observing state j if the previous state is i and the sum of entries in each row is 1 ( dinucleotide probabilities)  
    • P(i,j)=p*Q(i)Q(i) (e.g., P(1,2)=p*AC for DNA) 
4. Generating Markov Chains:
  1. Given Q,.,P (and a random number generator), we can generate sequences that are members of the Markov chain M
  2. If .,P are derived from a single sequence, the family of sequences generated by M will include that sequence as well as many others
  3. If .,P are derived from a sampled set of sequences, the family of sequences generated by M will be the population from which that set has been sampled

5. Matlab Code For Generating Markov Chains:
chars = ['a' 'c' 'g' 't'];
 
% the dinucs array shows the frequency of observing the character in the 
% row followed by the character in the column
% these values show strong preference for c-c
dinucs = [2, 1, 2, 0; 0, 8, 0, 1; 2, 0, 2, 0; 1, 0, 0, 1];
% these values restrict transitions more
%dinucs = [2, 0, 2, 0; 0, 8, 0, 0; 2, 0, 2, 0; 1, 1, 0, 1];
 
% calculate mononucleotide frequencies only as the probability of
% starting with each nucleotide
monocounts = sum(dinucs,2);
monofreqs = monocounts/sum(monocounts);
cmonofreqs = cumsum(monofreqs);
% calculate dinucleotide frequencies and cumulative dinuc freqs
freqs = dinucs./repmat(monocounts,1,4);
cfreqs = cumsum(freqs,2);
 disp('Dinucleotide frequencies (transition probabilities)');
fprintf('     %c        %c        %c        %c\n',chars)
for i=1:4
    fprintf('%c %f %f %f %f\n',chars(i),freqs(i,:))
end
nseq = 10;
for ntries=1:20
    rnums = rand(nseq,1);
    % start sequence using mononucleotide frequencies
    seq(1) = min(find(cmonofreqs>=rnums(1)));
    for i=2:nseq
        % extend it using the appropriate row from the dinuc freqs
        seq(i) = min(find(cfreqs(seq(i-1),:)>=rnums(i)));
    end
 
    output=chars(seq);
    disp(strvcat(output));
end

6. Discriminating Between Two States With Markov Chains:
  1. To determine which of two states a sequence is more likely to have resulted from, we calculate
7. State probablities for + and - models
  1. Given examples sequences that are from either + model (CpG island) or - model (not CpG island), can calculate the probability that each nucleotide will occur for each model (the a values for each model)
+   A     C     G     T    -   A     C     G     T
A 0.180 0.274 0.426 0.120  A 0.300 0.205 0.285 0.210
C 0.171 0.368 0.274 0.188  C 0.322 0.298 0.078 0.302
G 0.161 0.339 0.375 0.125  G 0.248 0.246 0.298 0.208
T 0.079 0.355 0.384 0.182  T 0.177 0.239 0.292 0.292

8. Transition Probabilities Converted To Log Likelihood Ratios:

ß               A          C                 G                 T
A      -0.740     0.419         0.580       -0.803
C      -0.913     0.302         1.812       -0.685
G      -0.624     0.461         0.331       -0.730
T       -1.169     0.573         0.393       -0.679

  
9. Example:
  1. What is relative probability of C+G+C+ compared with C-G-C-?
  2. First calculate log-odds ratio: S(CGC)= ß(CG) +ß(GC)=1.812+0.461=2.273
  3. Convert to relative probability: 22.273=4.833
  4. Relative probability is ratio of (+)to(-) P(+)=4.833 P(-)
  5. Convert to percentage
    • P(+) + P(-) = 1 4.833
    • P(-) + P(-) = 1
    • P(-) = 1/5.833 = 17%
  6. Conclusion
    • P(+)=83% P(-)=17%


Category: Algorithms | Views: 601 | Added by: Ansari | Rating: 0.0/0
Total comments: 0
Name *:
Email *:
Code *:
Tag Board
Your Ads
Site friends

Copyright MyCorp © 2017
Free website builderuCoz