 2. Statistical evaluation of sequence alignment scores. This question is a continuation of Part IV in Chapter 3 (p. 119). We will calculate significance of an alignment between two sequences by scrambling one sequence many times and recalculating the alignment scores to see how they compare. The program PRSS on the Pearson FASTA Web site http://fasta.bioch.virginia.edu/ will scramble the second sequence and calculate many alignment scores. Scrambling can be done at the individual amino acid level or with a window of amino acids to keep repetitive sequences intact. A plot of the scores of the scrambled sequence alignments is shown on the Web page, and these scores are compared to the original alignment score between the sequences. The scores are fitted to an extreme value distribution curve and K and λ are calculated. Note that when there are many such comparisons made, e.g., when the first sequence is compared to 100 scrambled second sequences, the expected value of this many alignments achieving the original score has to be calculated. If the probability that one score of an alignment with a scrambled sequence achieves the original score is 1/10,000 and 100 scrambled sequences were tested, then the expected value for 100 sequences is 1/10,000 x 100 = 1/100. Obtain the same two sequences from GenBank in FASTA format as done previously. Use PRSS to align the reca.pro (P03017 from the bacterium E. coli) and rad51.pro (P25454 from budding yeast S. cerevisiae) sequences downloaded in Chapter 3 problems with gap penalties of –12 and –2 and perform 1000 scrambled alignments. Note the expect value for the alignment score found between these proteins. Repeat the analysis with gap penalties of –5 and –1 and note the expect score. Describe what happened when the gap penalties were reduced.

