CHAPTER 3 PROBLEMS
 Problem 1 Problem 2 Problem 3 Problem 4 Problem 5 All problems

Home  >  Problems  >  Chapter 3

 PART II. ALIGNMENT OF TWO SEQUENCES BY THE DYNAMIC PROGRAMMING ALGORITHM In this section, protein sequence pairs will be aligned using Internet servers. Using one of the Web sites listed below and the default conditions provided by the Web site, align the protein sequences for the phage λ and p22 phage repressors. Cut and paste FASTA files of the sequences already available into these sites. Record the resulting percent identity and similarity and briefly describe what each represents. Internet Sites for Sequence Alignment The following are Web sites that will perform sequence alignment of two sequences by the dynamic programming algorithm. LALIGN (http://fasta.bioch.virginia.edu/) at the University of Virginia. This program is also available for Mac and PC computers but without a windows/mouse interface. The program finds not just 1, but also n nonoverlapping alignment of two sequences according to the SIM algorithm discussed in the text. In these alignments, the same two residues will never be found together more than once. SIM (http://us.expasy.org/tools/sim-prot.html) uses the same algorithm as the above site. BCM (http://searchlauncher.bcm.tmc.edu/) Baylor College of Medicine Web site offers a variety of methods of sequence alignment. Read the "h" option to see how these programs work. Not all of these programs use dynamic programming as the sole method. LFASTA and BLAST2 search for common words and then align on the basis of these words. The program align is a global alignment program based on the Needleman–Wunsch alignment algorithm instead of the Smith–Waterman local alignment algorithm. Unless dealing with strongly similar sequences of the same length, and alike along their entire lengths, a global alignment will not be useful.

 PART III. CALCULATION OF SEQUENCE ALIGNMENT SCORES Calculation of Log Odds and Odds Scores by the BLOSUM Method In one column of an alignment of a set of related, similar sequences, amino acid D changes to amino acid E at a frequency of 0.10, and the number of times this change is expected based on the number of occurrences of D and E in the column is 0.05. What is the odds score of finding a D-to-E substitution in an alignment? What is the log odds score for the D-to-E substitution in bits? (Note: log to base 2 = natural log / 0.693.) What would be the entry in the BLOSUM amino acid scoring matrix for this substitution? Compare your result to the actual entry in the BLOSUM62 matrix. In the same column, D does not change at all at a frequency of 0.80, and the expected frequency of D not changing is 0.10. Calculate the corresponding log odds score and the BLOSUM62 entry for D not changing. Log Odds and Odds Score of a Short Alignment Using the above values, what is the log odds score of the following alignment in bits? (Note that these two short sequences have very low sequence complexity by having only two amino acids of the available 20. These sequences were chosen to simplify the calculations. Alignments of low complexity sequences can give quite high scores that are misleading of the sequence similarity, as discussed in Chapter 6.) DEDEDEDE DDDDDDDD What is the odds score of the above alignment?

 PART IV. COMPARING ALIGNMENT SCORES WITH SMALL AND LARGE GAP PENALTIES For this question, use the program LALIGN on the University of Virginia FASTA server http://fasta.bioch.virginia.edu/. This program aligns sequences by a local dynamic programming algorithm and includes end gap penalties. It produces as many different alignments as specified, with no two alignments including a match of the same two sequence positions. Obtain the following two sequences from GenBank in FASTA format: recA.pro (P03017) from the bacterium E. coli and rad51.pro (P25454) from budding yeast (Saccharomyces cerevisiae). These proteins have the same function, i.e., promoting the pairing of homologous single-stranded DNAs. They almost certainly have the same three-dimensional structure but have diverged enough that they are difficult to align. Use LALIGN to align the above two sequences with gap penalties of –12 and –2. Note the length of the alignment, the percent identity, and the score of the alignment. Repeat the alignment with gap penalties of –5 and –1 and note the features of the alignment. Describe what happened when the gap penalties were reduced. Which of these alignments looks like a local alignment and which looks like a global alignment?

 PART V. USING THE DYNAMIC PROGRAMMING METHOD TO CALCULATE THE LOCAL ALIGNMENT OF TWO SHORT SEQUENCES BY HAND The BLASTP algorithm performs a local alignment between a query sequence and a matching database sequence using the dynamic programming algorithm with the BLOSUM62 scoring matrix, a gap opening penalty of –11, and a gap extension penalty of –1 (i.e., a gap of length 1 has a penalty of –11, one of length 2, –12, etc.). Align the sequences MDPW and MEDPW using the Smith–Waterman algorithm described in the dynamic programming notes by following the global alignment example given in the notes, but using the Smith–Waterman algorithm. Make a matrix for keeping track of best scores and a second matrix to keep track of the moves that give the best scores. (Hint: The alignment of M's, P's, and W's all give high scores, so the problem boils down to how to align D with ED and is actually quite a trivial problem.) Use the BLOSUM62 matrix and BLASTP gap penalties of –11,–1. What is the optimal alignment and score between these two sequences?

 © 2004 by Cold Spring Harbor Laboratory Press. All rights reserved. No part of these pages, either text or image, may be used for any purpose other than personal use. Therefore, reproduction, modification, storage in a retrieval system, or retransmission, in any form or by any means, electronic, mechanical, or otherwise, for reasons other than personal use, is strictly prohibited without prior written permission.