Bioinformatics Online Home
   Chapters Links Problems Enroll for Updates Help
 Problem 1
 Problem 2
 Problem 3
 Problem 4
 Problem 5
 Problem 6
 All problems

Home  >  Problems  >  Chapter 6

5. PSI-BLAST is a version of the BLAST algorithm that uses the results from an initial search for similar protein sequences to construct a type of scoring matrix that can then be used for additional rounds of searches, called iterations. The variability found in each column of the scoring matrix allows additional sequences that have different combinations of amino acids in the sequence positions to be found. The algorithm provides a rapid but less precise search than other methods because the scoring matrix produced is only approximate and includes most of the original query sequence. A note of caution: The iterations can lead to more sequences being added that do not share a region in common with the original query sequence, but share a totally different region in some of the added sequences; e.g., these new sequences are not true family members but alien sequences. The process will stop when no more sequences are found. The user can control the number of sequences to be included at each iteration or else use the score cutoff recommended by the program. The method is often used to perform a rapid and preliminary search for members of a sequence family. The sequences found can then be multiply aligned by other better-defined methods.

Perform the following analysis and answer these questions:

We provide a protein sequence of a DNA polymerase called iota that replicates past sites of DNA damage and makes mutations. This is a mouse homolog (Entrez search for gi|6755274) of a yeast gene called RAD30. Submit the sequence to PSI-BLAST searching the nr (nonredundant) Genpro database. Use the given (default) options of the program. Repeat the search for an additional iteration using the cutoff scores recommended by the program.

  1. How many matches were found above the cutoff score after the initial search?
  2. Using the Web links provided, identify some of the highest-scoring sequences. What classes of organisms do the matched genes originate from? Is this sequence representative of a protein family found in just a few or many organisms?
  3. How many additional matches were found after the first iteration, and do most appear to be the same type of function, e.g., DNA repair or replication?


© 2004 by Cold Spring Harbor Laboratory Press. All rights reserved.
No part of these pages, either text or image, may be used for any purpose other than personal use. Therefore, reproduction, modification, storage in a retrieval system, or retransmission, in any form or by any means, electronic, mechanical, or otherwise, for reasons other than personal use, is strictly prohibited without prior written permission.


Home Chapters Links Problems Enroll for Updates Help CSHL Press