Monday, 2018-02-19 BioInfo Pakistan
Section categories
 Related Subjects [38]This category includes brief overview of all related subjects. Defining BioInformatics [7]In this section we tried to briefly explain what bioinformatics is ? Unviersities [30]This contains information about universities that are offering bioinformatics degree programs. Resources [24]Contains information about bioinformatics resources including databases, tools and techniques. Algorithms [31]This category includes some of the basic algorithms that are usually used by bioinformaticians.
 Our poll Pakistani Students Should Join Bio-Informatics Yes No Total of answers: 35
 Chat Box Only authorized users can post messages
 Statistics Total online: 1 Guests: 1 Users: 0
Home » 2011 » August » 23 » Markov Chains
3:23 PM
Markov Chains

 Markov Chains1. Description:   A Markov chain, named for Andrey Markov, is a mathematical system that undergoes transitions from one state to another (from a finite or countable number of possible states) in a chainlike manner. It is a random process characterized as memoryless: the next state depends only on the current state and not on the entire past. This specific kind of "memorylessness" is called the Markov property. Markov chains have many applications as statistical models of real-world processes.2. Key Points:If we can predict all of the properties of a sequence knowing only the conditional dinucleotide probabilities, then that sequence is an example of a Markov chainA Markov chain is defined as a sequence of states in which each state depends only on the previous state3. Formalism For Markov Chains:M=(Q,.,P) is a Markov chain, where Q = vector (1,..,n) is the list of states Q(1)=A, Q(2)=C, Q(3)=G, Q(4)=T for DNA p = vector (p1,..,pn) is the initial probability of each state p(i)=pQ(i)  (e,g., p(1)=pA for DNA) P= n x n matrix where the entry in row i and column j is the probability of observing state j if the previous state is i and the sum of entries in each row is 1 ( dinucleotide probabilities)  P(i,j)=p*Q(i)Q(i) (e.g., P(1,2)=p*AC for DNA) 4. Generating Markov Chains:Given Q,.,P (and a random number generator), we can generate sequences that are members of the Markov chain MIf .,P are derived from a single sequence, the family of sequences generated by M will include that sequence as well as many othersIf .,P are derived from a sampled set of sequences, the family of sequences generated by M will be the population from which that set has been sampled5. Matlab Code For Generating Markov Chains:chars = ['a' 'c' 'g' 't']; % the dinucs array shows the frequency of observing the character in the % row followed by the character in the column% these values show strong preference for c-cdinucs = [2, 1, 2, 0; 0, 8, 0, 1; 2, 0, 2, 0; 1, 0, 0, 1];% these values restrict transitions more%dinucs = [2, 0, 2, 0; 0, 8, 0, 0; 2, 0, 2, 0; 1, 1, 0, 1]; % calculate mononucleotide frequencies only as the probability of% starting with each nucleotidemonocounts = sum(dinucs,2);monofreqs = monocounts/sum(monocounts);cmonofreqs = cumsum(monofreqs);% calculate dinucleotide frequencies and cumulative dinuc freqsfreqs = dinucs./repmat(monocounts,1,4);cfreqs = cumsum(freqs,2); disp('Dinucleotide frequencies (transition probabilities)');fprintf('     %c        %c        %c        %c\n',chars)for i=1:4    fprintf('%c %f %f %f %f\n',chars(i),freqs(i,:))endnseq = 10;for ntries=1:20    rnums = rand(nseq,1);    % start sequence using mononucleotide frequencies    seq(1) = min(find(cmonofreqs>=rnums(1)));    for i=2:nseq        % extend it using the appropriate row from the dinuc freqs        seq(i) = min(find(cfreqs(seq(i-1),:)>=rnums(i)));    end     output=chars(seq);    disp(strvcat(output));end6. Discriminating Between Two States With Markov Chains:To determine which of two states a sequence is more likely to have resulted from, we calculate7. State probablities for + and - modelsGiven examples sequences that are from either + model (CpG island) or - model (not CpG island), can calculate the probability that each nucleotide will occur for each model (the a values for each model) +   A     C     G     T    -   A     C     G     T A 0.180 0.274 0.426 0.120  A 0.300 0.205 0.285 0.210 C 0.171 0.368 0.274 0.188  C 0.322 0.298 0.078 0.302 G 0.161 0.339 0.375 0.125  G 0.248 0.246 0.298 0.208 T 0.079 0.355 0.384 0.182  T 0.177 0.239 0.292 0.292 8. Transition Probabilities Converted To Log Likelihood Ratios: ß               A          C                 G                 T A      -0.740     0.419         0.580       -0.803 C      -0.913     0.302         1.812       -0.685 G      -0.624     0.461         0.331       -0.730 T       -1.169     0.573         0.393       -0.679  9. Example:What is relative probability of C+G+C+ compared with C-G-C-?First calculate log-odds ratio: S(CGC)= ß(CG) +ß(GC)=1.812+0.461=2.273Convert to relative probability: 22.273=4.833Relative probability is ratio of (+)to(-) P(+)=4.833 P(-)Convert to percentageP(+) + P(-) = 1 4.833P(-) + P(-) = 1P(-) = 1/5.833 = 17%ConclusionP(+)=83% P(-)=17%
Category: Algorithms | Views: 632 | Added by: Ansari | Rating: 0.0/0