Table 6.8. Programs and Web sites for database similarity searches with a regular expression, motif, block, or profile |
Program |
|
Database searched |
|
Source or location or analysis |
|
1. Regular expression and motifsa |
EMOTIF Scan |
SwissProt and Genpept |
http://dna.stanford.edu/emotif/emotif-scan.html |
Prosite patterns |
SwissProt and TrEMBL |
http://au.expasy.org/tools/scanprosite/ |
ISREC pattern-finding service |
SwissProt and non-redundant EMBL database |
http://hits.isb-sib.ch/cgi-bin/hits_patsearch/ |
fpat |
PDB SwissProt Genpept |
http://www.ibc.wustl.edu/fpat/ (Web site not currently active) |
PHI-BLAST |
BLAST databases |
http://www.ncbi.nlm.nih.gov/ |
MOTIF |
SwissProt, PDB, PIR, PRF, Genes |
http://motif.genome.jp/ |
2. Blocks |
BLOCKSb |
most databases |
http://blocks.fhcrc.org/blocks/make_blocks.html |
MASTc |
most databases |
http://meme.sdsc.edu/meme/website/ |
BLIMPSd |
locally available databases |
anonymous FTP ftp.ncbi.nih.gov/repository/blocks/unix/blimps |
Probee |
BLAST databases |
anonymous FTP ftp.ncbi.nih.gov/pub/neuwald/probe1.0 |
Genefindf |
PIR |
http://pir.georgetown.edu/gfserver |
3. Profiles |
Profilesearchg |
locally available databases |
anonymous FTP ftp.sdsc.edu/pub/sdsc/biology/profile_programs |
Profile-SSh |
most databases |
http://www.psc.edu/general/software/packages/
profiless/profiless.html |
|
These resources search for similarity to a sequence pattern. Resources for producing patterns from aligned or unaligned sequences are described in Chapter 4. An individual sequence may also be searched for matches to a motif database, and this procedure is discussed in Chapter 9. Additional resources for database searching are listed in Bork and Gibson (1996).
A statistical estimate of finding the site by random chance in a sequence is sometimes but not always given. Reading how these estimates are derived by the individual programs is strongly recommended. The statistical theory for sequence alignments described in Chapter 3 can be used in these types of analyses (Bailey and Gribskov 1998) but may not always be implemented.
a The Scan Web page shows how to compile a regular expression. Mismatches with the expression are allowed. The Prosite form of a regular expression is at http://www.expasy.ch/tools/scnpsit3.html. PHI-BLAST is a BLAST derivative that searches a given sequence for a regular expression and then searches iteratively for other sequences matching the pattern found, at each iteration including the newly found sequences to expand the search.
b The BLOCKS server will send a new block analysis to the MAST server.
c MAST is the Motif Alignment and Search Tool (Bailey and Gribskov 1998). Available protein databases are similar to those on the BLAST server. It is also possible to search translated nucleotide sequence databases.
d BLIMPS will prepare a PSSM from a motif and perform a database search with the PSSM (see README file on FTP site).
e PROBE (Neuwald et al. 1997) is described in the text.
f The GENEFIND site has the program MOTIFIND for Motif Identification by Neural Design (Wu et al. 1996). This motif finder uses a neural network design to generate motifs and a search strategy for those motifs. The method performed favorably in sensitivity and selectivity with others such as BLIMPS and Profilesearch and is in addition very fast. Neural networks are described in Chapters 8 and 9.
g Profilesearch is one of a set of programs in the GCG suite (see text). It is important to review the parameters of the program which if used inappropriately can lead to incomplete or low-efficiency searches (Bork and Gibson 1996).
h A version of Profilesearch running at the University of Pittsburgh Supercomputing Center. |