Thursday, 2024-11-07
BioInfo Pakistan
Site menu
Section categories
Related Subjects [38]
This category includes brief overview of all related subjects.
Defining BioInformatics [7]
In this section we tried to briefly explain what bioinformatics is ?
Unviersities [30]
This contains information about universities that are offering bioinformatics degree programs.
Resources [24]
Contains information about bioinformatics resources including databases, tools and techniques.
Algorithms [31]
This category includes some of the basic algorithms that are usually used by bioinformaticians.
Our poll
Pakistani Students Should Join Bio-Informatics
Total of answers: 36
Chat Box
Statistics

Total online: 25
Guests: 25
Users: 0
Home » 2011 » August » 23 » Markov Chains
3:23 PM
Markov Chains


Markov Chains

1. Description:
   A Markov chain, named for Andrey Markov, is a mathematical system that undergoes transitions from one state to another (from a finite or countable number of possible states) in a chainlike manner. It is a random process characterized as memoryless: the next state depends only on the current state and not on the entire past. This specific kind of "memorylessness" is called the Markov property. Markov chains have many applications as statistical models of real-world processes.

2. Key Points:
  1. If we can predict all of the properties of a sequence knowing only the conditional dinucleotide probabilities, then that sequence is an example of a Markov chain
  2. A Markov chain is defined as a sequence of states in which each state depends only on the previous state
3. Formalism For Markov Chains:
  1. M=(Q,.,P) is a Markov chain, where 
  2. Q = vector (1,..,n) is the list of states 
    • Q(1)=A, Q(2)=C, Q(3)=G, Q(4)=T for DNA 
  3. p = vector (p1,..,pn) is the initial probability of each state 
    • p(i)=pQ(i)  (e,g., p(1)=pA for DNA) 
  4. P= n x n matrix where the entry in row i and column j is the probability of observing state j if the previous state is i and the sum of entries in each row is 1 ( dinucleotide probabilities)  
    • P(i,j)=p*Q(i)Q(i) (e.g., P(1,2)=p*AC for DNA) 
4. Generating Markov Chains:
  1. Given Q,.,P (and a random number generator), we can generate sequences that are members of the Markov chain M
  2. If .,P are derived from a single sequence, the family of sequences generated by M will include that sequence as well as many others
  3. If .,P are derived from a sampled set of sequences, the family of sequences generated by M will be the population from which that set has been sampled

5. Matlab Code For Generating Markov Chains:
chars = ['a' 'c' 'g' 't'];
 
% the dinucs array shows the frequency of observing the character in the 
% row followed by the character in the column
% these values show strong preference for c-c
dinucs = [2, 1, 2, 0; 0, 8, 0, 1; 2, 0, 2, 0; 1, 0, 0, 1];
% these values restrict transitions more
%dinucs = [2, 0, 2, 0; 0, 8, 0, 0; 2, 0, 2, 0; 1, 1, 0, 1];
 
% calculate mononucleotide frequencies only as the probability of
% starting with each nucleotide
monocounts = sum(dinucs,2);
monofreqs = monocounts/sum(monocounts);
cmonofreqs = cumsum(monofreqs);
% calculate dinucleotide frequencies and cumulative dinuc freqs
freqs = dinucs./repmat(monocounts,1,4);
cfreqs = cumsum(freqs,2);
 disp('Dinucleotide frequencies (transition probabilities)');
fprintf('     %c        %c        %c        %c\n',chars)
for i=1:4
    fprintf('%c %f %f %f %f\n',chars(i),freqs(i,:))
end
nseq = 10;
for ntries=1:20
    rnums = rand(nseq,1);
    % start sequence using mononucleotide frequencies
    seq(1) = min(find(cmonofreqs>=rnums(1)));
    for i=2:nseq
        % extend it using the appropriate row from the dinuc freqs
        seq(i) = min(find(cfreqs(seq(i-1),:)>=rnums(i)));
    end
 
    output=chars(seq);
    disp(strvcat(output));
end

6. Discriminating Between Two States With Markov Chains:
  1. To determine which of two states a sequence is more likely to have resulted from, we calculate
7. State probablities for + and - models
  1. Given examples sequences that are from either + model (CpG island) or - model (not CpG island), can calculate the probability that each nucleotide will occur for each model (the a values for each model)
+   A     C     G     T    -   A     C     G     T
A 0.180 0.274 0.426 0.120  A 0.300 0.205 0.285 0.210
C 0.171 0.368 0.274 0.188  C 0.322 0.298 0.078 0.302
G 0.161 0.339 0.375 0.125  G 0.248 0.246 0.298 0.208
T 0.079 0.355 0.384 0.182  T 0.177 0.239 0.292 0.292

8. Transition Probabilities Converted To Log Likelihood Ratios:

ß               A          C                 G                 T
A      -0.740     0.419         0.580       -0.803
C      -0.913     0.302         1.812       -0.685
G      -0.624     0.461         0.331       -0.730
T       -1.169     0.573         0.393       -0.679

  
9. Example:
  1. What is relative probability of C+G+C+ compared with C-G-C-?
  2. First calculate log-odds ratio: S(CGC)= ß(CG) +ß(GC)=1.812+0.461=2.273
  3. Convert to relative probability: 22.273=4.833
  4. Relative probability is ratio of (+)to(-) P(+)=4.833 P(-)
  5. Convert to percentage
    • P(+) + P(-) = 1 4.833
    • P(-) + P(-) = 1
    • P(-) = 1/5.833 = 17%
  6. Conclusion
    • P(+)=83% P(-)=17%


Category: Algorithms | Views: 1517 | Added by: Ansari | Rating: 0.0/0
Total comments: 0
Name *:
Email *:
Code *:
Log In

Search
Calendar
«  August 2011  »
SuMoTuWeThFrSa
 123456
78910111213
14151617181920
21222324252627
28293031
Entries archive
Site friends
Copyright MyCorp © 2024
Free website builderuCoz