Bioinformatics Online Home
   Chapters Links Problems Enroll for Updates Help
 Problem 1
 Problem 2
 Problem 3
 Problem 4
 Problem 5
 Problem 6
 Problem 7
 Problem 8
 All problems

Home  >  Problems  >  Chapter 5

4. In question 3, we assumed that we already have a global alignment of a set of sequences so that a scoring matrix could be made from the alignment. Although we may know that a set of sequences has the same function, and thus should align, the sequences may vary so much that it is difficult to align them globally. In this case, we have to resort to a statistical analysis to find conserved patterns. The following problem goes through the first few steps required to find the best alignment by a statistical method. Students will need to study first the example of the expectation maximization algorithm in the text.

Analyze the following ten DNA sequences by the expectation maximization algorithm. Assume that the background base frequencies are each 0.25 and that the middle three positions are a motif. The size of the motif is a guess that is based on a molecular model. The alignment of the sequences is also a guess.

seq1   C CAG A
seq2   G TTA A
seq3   G TAC C
seq4   T TAT T
seq5   C AGA T
seq6   T TTT G
seq7   A TAC T
seq8   C TAT G
seq9   A GCT C
seq10  G TAG A
  1. To start the PSSM, make a table with three columns (position in motif) and four rows (1 for each base).
  2. Calculate the observed frequency of each base at each of the three middle positions in the alignment.
  3. Using the frequencies in the column tables, and the background frequencies, calculate the odds likelihood of finding the motif at each of the possible locations in sequence 5.
  4. Calculate the probability of finding the motif at each position in sequence 5.
  5. Calculate what change will be made to the base count in each column of the motif table as a result of matching the motif to the first position in sequence 5. This is usually a fractional number of one base.
  6. What other steps are taken to update or maximize the table values?


© 2004 by Cold Spring Harbor Laboratory Press. All rights reserved.
No part of these pages, either text or image, may be used for any purpose other than personal use. Therefore, reproduction, modification, storage in a retrieval system, or retransmission, in any form or by any means, electronic, mechanical, or otherwise, for reasons other than personal use, is strictly prohibited without prior written permission.


Home Chapters Links Problems Enroll for Updates Help CSHL Press