Bioinformatics Online Home
   Chapters Links Problems Enroll for Updates Help
 CHAPTER 9 PROBLEMS
   
 Problem 1
 Problem 2
 All problems

Home  >  Problems  >  Chapter 9

PROBLEM 2. PRODUCE A GENE MODEL BY PREDICTING THE SPLICE SITES IN THE ARABIDOPSIS DNA REPAIR ENDONUCLEASE GENE (RAD1) BY SEQUENCE ALIGNMENT OF THE cDNA AND GENOMIC SEQUENCES, BY USING A GENE PREDICTION WEB SITE, AND BY USING A WEB SITE WITH A PROGRAM DESIGNED FOR THIS PURPOSE.
  1. Align the cDNA sequences and genomic sequences using the service provided on the lalign Web server http://fasta.bioch.virginia.edu/fasta/lalign.htm, perform a Web search, or use a local copy of lalign.
    Sequences: Retrieve the following sequences in FASTA format and either paste them into the sequence window of an alignment program Web site or else save them to a local file on your computer using a simple text editor for analysis with a local program.
    1. Arabidopsis ATRAD1 cDNA sequence GenBank accesssion no. AF160500.
    2. Retrieving the genomic sequences for any gene is not so straightforward because GenBank usually does not store the sequence in a file but rather generates the genome sequences from the chromosomal sequence present on large sequence fragments (contigs) in GenBank. Arabidopsis Rad1 genomic DNA sequence reads backward on the complementary strand of entry no. AB010072 from 66706 to 62831. The best way to retrieve this sequence is to open GenBank nucleotide entry no. AB010072, find this coding sequence, and click on CDS. A new window will appear with the coding sequence.
    3. Then open a new browser window at Web site http://searchlauncher.bcm.tmc.edu/seq-util/Options/revcomp.html and paste in the sequence.
    4. Run the program and retrieve the complementary sequence in the new window that comes up. This sequence can then be used as input for the genome sequence of the gene on a third browser window with the alignment program.
    5. Note: These steps of retrieving a sequence are more easily performed if one writes a simple computer program called a perl script or perl wrapper on a local machine that retrieves the cDNA and genome sequences automatically, extracts the desired genome sequence, makes the complementary strand, and then performs the local alignment with the cDNA sequence. The methodology is described in Chapter 12. You still have to know the accession numbers of the GenBank entries that include the sequences of interest.

    Alternatively, the Arabidopsis genome information resource (TAIR) provides information on the chromosomal locus, mRNA, and genomic sequences of this gene (which is called UVH1 or ATRAD1 on this site) and provides a gene model at http://www.arabidopsis.org/servlets/TairObject?id=133166&type=locus. Try the following:

    1. The location of the ATUVH1 gene can be found on the SeqViewer at http://www.arabidopsis.org. Choose SeqViewer. The first view is of all five chromosomes of Arabidopsis.
    2. Choose only the gene models box, and then search for the ATRAD1 gene. A new view will appear with the gene location shown on chromosome 5.
    3. Choose an 80-kb viewing range, click on the gene location mark, and then look in the expanded view for locus AT5G41150.
    4. Once the gene has been located, a mouse click will open a new window with new links to the cDNA (CDS for coding sequence) and the genome sequence (the TAIR accession number of the sequences is AT5G41150.1).
    5. Note that the locus page also includes information on the predicted gene structure that we are trying to find by sequence alignment of the cDNA and genome sequences.

    The accuracy of your findings below can also be confirmed on the gene information page in GenBank or TAIR.

    1. What do the gaps between the aligned genomic and cDNA sequences represent?
    2. Look for 5� and 3� splice junctions near the ends of the gapped regions.
      1. Use the Arabidopsis table of consensus splice sites at http://www.nal.usda.gov/pgdic/Probe/v2n3/codon.html to figure out where they are on the genomic sequence and use this information to find the ends of the exons.
      2. Predict and indicate the positions of all exons in the genomic DNA sequence.
      3. Note: The alignment is reading along both sequences, and when a gap is placed the ends may not be put where we expect it to be based on our knowledge of the expected splice sites�some adjustment of the gap may be necessary.
  2. Submit the above genomic sequence to the GenScan server at http://genes.mit.edu/GENSCAN.html and compare the results of their analysis with yours.
    1. How accurate is the GenScan server?
    2. What differences (if any) exist between the GenScan output and the gene locations you predicted using lalign?
  3. There are several sites (see Table 3.1) that specialize in aligning cDNA or EST sequences with the genome. This analysis can locate the gene on the genome or reveal the gene structure with the location of genomes. One site is GeneSeqer at http://www.bioinformatics.iastate.edu/bioinformatics2go/. Predict the mRNA sequence of the ATRAD1 on this site using the model provided of Arabidopsis gene structure and also align the genome and cDNA sequences of ATRAD1 found in part A above and compare the result to that found in part B.




 

© 2004 by Cold Spring Harbor Laboratory Press. All rights reserved.
No part of these pages, either text or image, may be used for any purpose other than personal use. Therefore, reproduction, modification, storage in a retrieval system, or retransmission, in any form or by any means, electronic, mechanical, or otherwise, for reasons other than personal use, is strictly prohibited without prior written permission.

 

 
Home Chapters Links Problems Enroll for Updates Help CSHL Press