|
Home > Problems > Chapter 6
3. For protein database searches, the BLASTP algorithm first makes a list of three-letter words in the query sequence and then scores these words for matches with themselves and with all other possible words using the BLOSUM62 scoring matrix. The 50 highest-scoring matches are kept. Database sequences are then scanned for matches to these high-scoring words, and if such are found, a local alignment is made with the query sequence by dynamic programming. Use the BLOSUM62 scoring matrix in Figure 3.16, page 105. Note that the matrix values are in half-bit units.
- Suppose that the three-letter word HFA is in the query sequence, what is the log odds score of a match of HFA with itself?
- Scan through the table and find the highest-scoring match with H (say amino acid X). What would be the score for HFA in our query sequence matching XFA in the database sequence?
- Scan again and find any worst-scoring match with H. What is the score for a match of HFA with YFA?
- Repeat the last two questions for the second and third letters in HFA.
- How many possible matches are there with HFA? (BLASTP uses approximately the best 50.)
- How many words will be searched for, starting with a query sequence that is 300 amino acids long?
|
|
© 2004 by Cold Spring Harbor Laboratory Press. All rights reserved. |
 |
No part of these pages, either text or image, may be used for any purpose other than personal use. Therefore, reproduction, modification, storage in a retrieval system, or retransmission, in any form or by any means, electronic, mechanical, or otherwise, for reasons other than personal use, is strictly prohibited without prior written permission. |
|
|
|