Table 6.8. Programs and Web sites for database similarity searches with a regular expression, motif, block, or profile
Program   Database searched   Source or location or analysis
1. Regular expression and motifsa
EMOTIF Scan SwissProt and Genpept
Prosite patterns SwissProt and TrEMBL
ISREC pattern-finding service SwissProt and non-redundant EMBL database
fpat PDB SwissProt Genpept (Web site not currently active)
MOTIF SwissProt, PDB, PIR, PRF, Genes
2. Blocks
BLOCKSb most databases
MASTc most databases
BLIMPSd locally available databases anonymous FTP
Probee BLAST databases anonymous FTP
Genefindf PIR
3. Profiles
Profilesearchg locally available databases anonymous FTP
Profile-SSh most databases
    These resources search for similarity to a sequence pattern. Resources for producing patterns from aligned or unaligned sequences are described in Chapter 4. An individual sequence may also be searched for matches to a motif database, and this procedure is discussed in Chapter 9. Additional resources for database searching are listed in Bork and Gibson (1996).
    A statistical estimate of finding the site by random chance in a sequence is sometimes but not always given. Reading how these estimates are derived by the individual programs is strongly recommended. The statistical theory for sequence alignments described in Chapter 3 can be used in these types of analyses (Bailey and Gribskov 1998) but may not always be implemented.
    a The Scan Web page shows how to compile a regular expression. Mismatches with the expression are allowed. The Prosite form of a regular expression is at PHI-BLAST is a BLAST derivative that searches a given sequence for a regular expression and then searches iteratively for other sequences matching the pattern found, at each iteration including the newly found sequences to expand the search.
    b The BLOCKS server will send a new block analysis to the MAST server.
    c MAST is the Motif Alignment and Search Tool (Bailey and Gribskov 1998). Available protein databases are similar to those on the BLAST server. It is also possible to search translated nucleotide sequence databases.
    d BLIMPS will prepare a PSSM from a motif and perform a database search with the PSSM (see README file on FTP site).
    e PROBE (Neuwald et al. 1997) is described in the text.
    f The GENEFIND site has the program MOTIFIND for Motif Identification by Neural Design (Wu et al. 1996). This motif finder uses a neural network design to generate motifs and a search strategy for those motifs. The method performed favorably in sensitivity and selectivity with others such as BLIMPS and Profilesearch and is in addition very fast. Neural networks are described in Chapters 8 and 9.
    g Profilesearch is one of a set of programs in the GCG suite (see text). It is important to review the parameters of the program which if used inappropriately can lead to incomplete or low-efficiency searches (Bork and Gibson 1996).
    h A version of Profilesearch running at the University of Pittsburgh Supercomputing Center.


