3. For protein database searches, the BLASTP algorithm first makes a list of three-letter words in the query sequence and then scores these words for matches with themselves and with all other possible words using the BLOSUM62 scoring matrix. The 50 highest-scoring matches are kept. Database sequences are then scanned for matches to these high-scoring words, and if such are found, a local alignment is made with the query sequence by dynamic programming. Use the BLOSUM62 scoring matrix in Figure 3.16, page 105. Note that the matrix values are in half-bit units.
  1. Suppose that the three-letter word HFA is in the query sequence, what is the log odds score of a match of HFA with itself?
  2. Scan through the table and find the highest-scoring match with H (say amino acid X). What would be the score for HFA in our query sequence matching XFA in the database sequence?
  3. Scan again and find any worst-scoring match with H. What is the score for a match of HFA with YFA?
  4. Repeat the last two questions for the second and third letters in HFA.
  5. How many possible matches are there with HFA? (BLASTP uses approximately the best 50.)
  6. How many words will be searched for, starting with a query sequence that is 300 amino acids long?


