BioinformaticsOnline.org

CHAPTER 6 PROBLEMS


	Introduction
	Problem 1
	Problem 2
	Problem 3
	Problem 4
	Problem 5
	Problem 6
	All problems

Home > Problems > Chapter 6

4. Run the E. coli RecA protein against the yeast genome on the BLAST server. Choose the BLASTP program and carefully review the various option windows on the page that comes up. Choose yeast as the genome database to be searched. Enter the RecA sequence in FASTA format or the PIR identifier into the input data window and indicate which choice was made in the small option window just above the input data window. Otherwise, use the default parameters provided by the program. You must wait in a queue for the results, then click on the format results window.

Answer the following questions:

In the diagram that comes up, click the mouse on the yeast sequence which best matches the RecA query sequence. Identify the name and gi (GenBank index) of the highest-scoring sequence and the score in bits.
What scoring matrix and gap penalties were used?
What values of K and λ were used for calculating the expect values (E) for the gapped alignment (note that there are two sets of these parameters�one for ungapped and one for gapped alignments)? Where do these values come from?
The score shown in the program output is in units of "normalized bits" = [(λ x raw score) - ln K] / ln 2. The raw score is shown in parentheses. What are the units of the raw score (those of the BLOSUM62 matrix)? Calculate the raw score in bits from the "normalized bits."
How many database sequences were searched?
Calculate the expect value E for a search of this many sequences achieving a score as high as that found in part 1. In the formula, be sure to use the effective lengths of the sequences given in the program output.
By looking at the scores and E values from this search, what is the approximate value of the alignment score in normalized bits that corresponds to an expect value E of 0.06 (close to an approximate cutoff of 0.02�0.05 for significance)? How many sequences reached this high a score?
Is the alignment of the highest-scoring sequence with RecA protein significant and why? What biological information (protein structure and function) does this match suggest about the bacterial RecA protein and the yeast protein?
What was the lowest reported score in this search and is this score significant?

No part of these pages, either text or image, may be used for any purpose other than personal use. Therefore, reproduction, modification, storage in a retrieval system, or retransmission, in any form or by any means, electronic, mechanical, or otherwise, for reasons other than personal use, is strictly prohibited without prior written permission.