Bioinformatics Online Home
   Chapters Links Problems Enroll for Updates Help
 Problem 1
 Problem 2
 Problem 3
 Problem 4
 Problem 5
 Problem 6
 All problems

Home  >  Problems  >  Chapter 6

6. MAST search with PSSMs obtained from MEME and BLOCKS alignments. We will use two Web sites that search for common patterns in a submitted group of protein sequences—the BLOCKS server at the Frederick Hutchison Cancer Facility, University of Washington, and the MEME server at the University of California at San Diego supercomputing center. These sites provide examples of well-defined pattern analyses. A family of related sequences from a PSI-BLAST search should usually be subjected to further analysis by these other methods. These searches produce a log odds scoring matrix (position-specific scoring matrix or PSSM; see Chapter 5) that may then be used to search through other sequences for the same pattern. There is no provision for gaps. The MAST program also at UCSD searches every sequence in a protein sequence database for those sequences that have high-scoring matches to the patterns. The BLOCKS server has a number of very useful programs for sequence analysis and maintains a database of aligned sequence patterns from related sequences called the BLOCKS database. BLOCKS define a region of similarity that is a signature of a particular protein family. A family may be defined by one or more BLOCKS. A single sequence may be aligned with all of the existing BLOCKS in the database to determine whether the sequence carries any of the patterns represented by the database. The BLOCKS server searches sequentially through the sequences for common patterns and also uses the Gibb's sampler to locate patterns. MEME uses the expectation maximization algorithm to locate patterns.

These servers produce large volumes of output and MEME E-mails the results in Web page (HTML) format. A family of five related protein sequences that are repair proteins in the RecA-Rad51 family were analyzed for common patterns (search for gi|54866, gi|118683, gi|132224, gi|3914552, and gi|1350566 in Entrez). These proteins bind to single-stranded and double-stranded DNAs and promote base-pairing between the molecules that can lead to genetic recombination. Retrieve them in FASTA format and paste them together in series in the FASTA msa format (see Chapter 2, p. 53) using a simple text editor.

  1. BLOCKS search: Perform a BLOCKS search of these protein sequences on the BLOCKS Web site and answer these questions:
    1. How many blocks were found by the MOTIF program and by the Gibbs sampler, and approximately how long were they?
    2. Were any of the patterns found by the MOTIF and Gibbs sampling the same ones?
    3. Are the patterns convincing; i.e., do at least some of the columns have a majority of one amino acid or is there a lot of variation?
    4. How do the relative positions of each pattern in the five original sequences compare?
  2. MEME search: Submit the same five sequences to the MEME Web site, requesting a search for three patterns that may or may not be present in all of the sequences with one copy per sequence. Use the default options of MEME. Examine the results of the MEME analysis and answer the following questions. (Note that MEME sends two files: the first one showing the patterns found, and the second a map of the sequence showing the relative positions of the patterns.)
    1. How many patterns were found and approximately how long were they?
    2. How does the relative position of each pattern in the five original sequences compare?
  3. MAST search: Use the first MEME output file to search the SwissProt database to find additional family members that share the same patterns. A very large output file will be produced. Scan the file, noting the expect values for the aligned regions, and answer the following questions:
    1. Can additional members of this family be identified by this approach? Give three examples of different types of organisms that are in the matched list.
    2. How does the relative order of the patterns in the matched sequences compare with those in the query sequences? Would you expect these sequences to align well?
    3. In the PSSM-to-sequence alignments shown, how was the alignment score determined?


© 2004 by Cold Spring Harbor Laboratory Press. All rights reserved.
No part of these pages, either text or image, may be used for any purpose other than personal use. Therefore, reproduction, modification, storage in a retrieval system, or retransmission, in any form or by any means, electronic, mechanical, or otherwise, for reasons other than personal use, is strictly prohibited without prior written permission.


Home Chapters Links Problems Enroll for Updates Help CSHL Press