Table 10.4. Databases of patterns and sequences of protein families |
Name |
|
Web address |
|
Description |
|
Reference |
|
3D-Ali |
http://139.91.72.10/def2/def2.html |
aligned protein structures and related sequences using only secondary structures assigned by author of the structures |
Pascarella and Argos (1992) |
3DCA |
http://www.rlandgraf.med.ucla.edu/3DCA.html |
Cluster Analysis integrating structural and sequence information to obtain predictions about functionally relevant clusters of residues |
see Web site |
3D-PSSM |
http://www.sbg.bio.ic.ac.uk/3dpssm |
uses a library of scoring matrices based on structural similarity given in the SCOP classification scheme (p. 433) for alignment with matrices based on sequence similarity |
Kelley et al. (2000) |
BIND (Biomolecular Interaction Network Database) |
http://www.blueprint.org/bind/bind.php |
includes information on protein–protein interactions, molecular complexes, and pathways |
see Web site |
BLOCKS |
http://blocks.fhcrc.org/ |
ungapped blocks in families defined by the Prosite catalog |
Henikoff and Henikoff (1996); Henikoff et al. (1998) |
cluSTr |
http://www.ebi.ac.uk/clustr/ |
clustering of all proteins in PIR, TrEMBL, and SwissProt based on pair-wise similarity |
Kriventseva et al. (2003) |
COGS (Clusters of Orthologous Groups database and search site) |
http://www.ncbi.nlm.nih.gov/COG |
clusters of similar proteins in at least three species collected from available genomic sequences |
Tatusov et al. (1997) |
CONSURF (Protein Surface Contact Analysis) |
http://consurf.tau.ac.il/ |
mapping of functional regions on surface of proteins using conserved amino acid patterns |
Glaser et al. (2003) |
DiffTool |
http://bioweb.pasteur.fr/seqanal/difftool/ |
clustering of proteins based in similarity |
Chetouani et al. (2002) |
DIP (Database of Interacting Proteins) |
http://dip.doe-mbi.ucla.edu |
database of interacting proteins |
Xenarios et al. (2000) |
eMOTIF |
http://dna.Stanford.EDU/emotif/ |
common and rare amino acid motifs in the BLOCKS and
HSSP databases |
Nevill-Manning et al. (1998) |
HOMSTRAD |
http://www-cryst.bioc.cam.ac.uk/homstrad/ |
structure-based alignments organized at the level of homologous familiesa |
Mizuguchi et al. (1998a) |
HSSP |
http://swift.embl-heidelberg.de/hssp/ |
sequences similar to proteins of known structure |
Dodge et al. (1998) |
INTERPRO resource of protein domains and functional sitesb |
http://www.ebi.ac.uk/interpro |
combination of Pfam, PRINTS, Prosite, and current SwissProt/TrEMBL sequence |
see Web site |
LPFC |
http://smi-web.stanford.edu/
projects/helix/LPFC/ |
a library of protein family cores based on msa of protein cores using amino acid substitution matrices based on structure (see Chapter 3) |
see Web page |
NCBI |
http://www.ncbi.nlm.nih.gov |
search conserved domain database (rpsblast) or for domain architecture (cdart) |
see Web page |
NetOGly 2.0 server |
http://www.cbs.dtu.dk/services/
NetOGlyc/ |
predicts glycosylation sites in mammalian proteins by NN analysis |
Hansen et al. (1997) |
NNPSL |
http://predict.sanger.ac.uk/nnpsl/ |
predicts subcellular location of proteins by NN |
see Web site |
Pfam |
http://www.sanger.ac.uk/Pfam |
profiles derived from alignment of protein families, each one composed of similar sequence and analyzed by HMMs |
Sonnhammer et al. (1998) |
PIR |
http://www-nbrf.georgetown.edu/ |
family and superfamily classification based on sequence alignment |
Barker et al. (1996) |
PRINTS |
http://www.bioinf.man.ac.uk/dbbrowser/PRINTS/ |
protein fingerprints or sets of unweighted sequence motifs from aligned sequence families |
Attwood et al. (1999) |
PROCLASS |
http://www-nbrf.georgetown.edu/gfserver/proclass.html |
database organized by Prosite patterns and PIR superfamilies; NN system for protein classification into superfamily |
Wu (1996); Wu et al. (1996) |
PRODOM |
http://protein.toulouse.inra.fr/
prodom.html |
groups of sequence segments or domains from similar sequences found in SwissProt database by BLASTP algorithm; aligned by msa |
Corpet et al. (1998) |
ProtoNet |
http://www.protonet.cs.huji.ac.il/ |
automatic hierarchical clustering of SwissProt |
Sasson et al. (2003) |
Prosite |
http://www.expasy.ch/prosite |
groups of proteins of similar biochemical function on basis of amino acid patterns |
Bairoch (1991); Hofmann et al. (1999) |
ProtoMap |
http://protomap.cornell.edu |
classification of SwissProt and TrEMBL proteins into clusters |
Yona et al. (1999) |
PSORT |
http://psort.nibb.ac.jp |
predicts presence of protein localization signals in proteins |
see Web site |
SignalP Web server |
http://www.cbs.dtu.dk/services/
SignalP-2.0/ |
predicts presence and location of signal peptide cleavage sites in proteins of different organisms by NN analysis |
Nielsen et al. (1997) |
SMART |
http://smart.embl-heidelberg.de |
database of signaling domain sequences with accurate alignments |
Schultz et al. (1998) |
SYSTERS |
http://systers.molgen.mpg.de/ |
classification of all sequences in the SwissProt database into clusters based on sequence similarity |
Krause et al. (2000) |
TargetDB |
http://targetdb.pdb.org/ |
database of peptides that target proteins to cellular locations |
see Web site |
Uniprot |
http://www.pir.uniprot.org/ |
combined protein sequence database of PIR, SwissProt, and TrEMBL |
see Web site |
|
A list of Web sites with protein sequence/structure databases is maintained at http://www.imb-jena.de/ImgLibDoc/help/db/. Many
protein family databases are accessible through the European Bioinformatics Institute (http://srs.ebi.ac.uk/). List of protein–protein interaction databases is maintained at http://www.hgmp.mrc.ac.uk/GenomeWeb/prot-interaction.html
a Sequence alignments of each family shown with residues labeled by solvent accessibility, secondary structure, H bonds to main-chain amide or carbonyl group, disulfide bond, and positive Φ angle.
b A combination of Pfam 5.0, PRINTS 25.0, Prosite 16, and current SwissProt and TrEMBL data. Additional merges with other protein pattern databases are planned. |