Bioinformatics Online Home
   Chapters Links Problems Enroll for Updates Help
 Problem 1
 Problem 2
 Problem 3
 Problem 4
 Problem 5
 Problem 6
 All problems

Home  >  Problems  >  Chapter 6

3. For protein database searches, the BLASTP algorithm first makes a list of three-letter words in the query sequence and then scores these words for matches with themselves and with all other possible words using the BLOSUM62 scoring matrix. The 50 highest-scoring matches are kept. Database sequences are then scanned for matches to these high-scoring words, and if such are found, a local alignment is made with the query sequence by dynamic programming. Use the BLOSUM62 scoring matrix in Figure 3.16, page 105. Note that the matrix values are in half-bit units.
  1. Suppose that the three-letter word HFA is in the query sequence, what is the log odds score of a match of HFA with itself?
  2. Scan through the table and find the highest-scoring match with H (say amino acid X). What would be the score for HFA in our query sequence matching XFA in the database sequence?
  3. Scan again and find any worst-scoring match with H. What is the score for a match of HFA with YFA?
  4. Repeat the last two questions for the second and third letters in HFA.
  5. How many possible matches are there with HFA? (BLASTP uses approximately the best 50.)
  6. How many words will be searched for, starting with a query sequence that is 300 amino acids long?


© 2004 by Cold Spring Harbor Laboratory Press. All rights reserved.
No part of these pages, either text or image, may be used for any purpose other than personal use. Therefore, reproduction, modification, storage in a retrieval system, or retransmission, in any form or by any means, electronic, mechanical, or otherwise, for reasons other than personal use, is strictly prohibited without prior written permission.


Home Chapters Links Problems Enroll for Updates Help CSHL Press