Protein-Protein Interaction Prediction
PiNat is a platform for assessing protein-protein interaction networks. The platform integrates information about protein function and sub-cellular localization and outputs the reliable interactions involving the query proteins. The interactions are rendered as an image in the cellular context and can help elucidate biological pathways and processes. We have used the system to analyze proteins implicated with Alzheimer's disease and shown how the integrated view corroborates previous observations and helps formulate new hypotheses regarding the molecular underpinnings of the disease. (server, paper, download)
LOCtree is a novel system of support vector machines (SVMs) that predict the subcellular localization of proteins, and DNA-binding propensity for nuclear proteins, by incorporating a hierarchical ontology of localization classes modeled onto biological processing pathways. Biological similarities are incorporated from the description of cellular components provided by the gene ontology consortium (GO). GO definitions have been simplified and tailored to the problem of protein sorting. Technically the ontology has been implemented using a decision tree with SVMs as the nodes. LOCtree, was extremely successful at learning evolutionary similarities among subcellular localization classes and was significantly more accurate than other traditional networks at predicting subcellular localization. Whenever available, LOCtree also reports predictions based on the following: 1) Nuclear localization signals found by PredictNLS, 2) Localization inferred using Prosite motifs and Pfam domains found in the protein, and 3) SWISS-PROT keywords associated with a protein. Localization is inferred in the last two cases using the entropy-based LOCkey algorithm.(server, paper, download)
PredictNLS is an automated tool for the analysis and in silico determination of Nuclear Localization Signals (NLS). In NLS discovery mode, PredictNLS searches a query protein for known and potential NLS's in NLSdb to determine if a protein is likely to be targeted to the nucleus. If the protein is determined to be nuclear, the program also reports if a known DNA binding motif is found. In Motif detection mode, the program can help you decide if a sequence motif is likely to act as a nuclear localization signal. The PredictNLS website also documents the largest collection of experimentally determined NLS's.
(server, paper, download)
LOCkey is a database of subcellular localization of eukaryotic proteins inferred using SWISSPROT keywords. LOCkey was the first fully automated algorithm for inferring subcellular loclaization from database annotations. LOKey outperformed semi-automated methods relying on expert annotators in benchmark tests.
NLSdb
NLSdb is a database of nuclear localization signals (NLSs) and of nuclear proteins targeted to the nucleus by NLS motifs.NLSdb contains over 12500 predicted nuclear proteins and over 1500 DNA-binding proteins from six entirely sequenced eukaryotic proteomes (human, mouse, fly, worm, grass and yeast).
ER/Golgi Localization: Analysis of experimentally characterized endoplasmic reticulum and Golgi apparatus retrieval motifs and estimates of their specificity to classify subcellular localization for the ER and Golgi. Further investigation of inferring ER and Golgi localization from homology-transfer sequence similarity of ER and Golgi localized proteins. (server, paper, download)
Prediction of Protein Disorder
NORSnet is a neural network based method that focuses on the identification of unstructured loops. NORSnet was trained to distinguish between very long contiguous segments with non-regular secondary structure (NORS regions) and well-folded proteins. NORSnet was trained on predicted information rather than on experimental data. Therefore, it was optimized on a large data, which is not biased by today's experimental means of capturing disorder. Thus, NORSnet reached into regions in sequence space that are not covered by the specialized disorder predictors. One disadvantage of this approach is that it is not optimal for the identification of the "average" disordered region. (server, paper, download)
Ucon (prediction of natively unstructured regions through contacts) is a method that combines protein-specific internal contacts with generic pairwise energy potentials to accurately predict long and functional unstructured regions. One advantage of Ucon over statistical-potential based methods is that it incorporates the contribution of the specific order of the amino-acids rather than the amino acid composition alone.
(server, paper, download)
MD (Meta-Disorder predictor) is a neural-network based meta-predictor that uses different sources of information predominantly obtained from orthogonal approaches. MD significantly outperformed its constituents, and compared favorably to other top prediction methods. MD is capable of predicting disordered regions of all "flavors", and identifying new ones that are not captured by other predictors.(server, paper, download)
PROFbval is a neural-network method that aimed at predicting flexible and rigid residues in proteins from sequence alone. PROFbval was trained on B-factor data from PDB- Xray structures and, to an extent, can capture disordered residues. Additionally, surface residues that are predicted to be rigid by PROFbval are correlated with the location of enzyme active sites.(server, paper, download)
Protein Function Prediction
Cell Cycle Protein Identification: Identification of cell cycle control proteins through homology transfer and machine learning techniques. We use database mining, literature searches and evolutionary conservation estimates to provide genome-wide annotations for cell cycle control proteins. We have also developed a SVM method to complement homology-transfer in the identification of cell cycle kinases from sequence alone.(server, paper, download)
Cell Cycle Kinase Identification: Using information from highly conserved and semi-exposed protein residues from cell cycle kinases we are able to classify kinases involved in this specific biological pathway. We show the ability to correctly predict kinases involved in the cell cycle from all kinases by using a superset of highly conserved and semi-exposed residues. These residues, many of which reside in the nucleotide binding site of the enzymes, represent a majority of the functionally significant residues of the kinases and lead us towards their specific cell cycle functional classification.
Structural Genomics
PSI-BIG4 : The PSI-BIG4 website assess the progress of the large scale structural genomics initiative (PSI) funded by NIGMS. Progress is assessed by reporting monthly statistics on the number of novel structures, progress with BIG and MEGA families and novel leverage generated by the structures as specified by the milestones document.
Subcellular Localization
NYCOMPS: NYCOMPS targets are subdivided into three main categories: Pipeline, Nominated and Biological Theme targets. Pipeline targets are selected by a protocol that currently begins with E. coli proteins. These E. coli seeds are predicted membrane proteins that have been expressed successfully in a previous large-scale experiment on membrane proteins carried out by the lab of Gunnar von Heijne (Daley et al. Science. 2005 308:1321-3). We expand these seeds and into 92 prokaryotic genomes (reagent genomes) from which we clone. Nominated targets are seeds selected by individual NYCOMPS experimental groups and expanded by the bioinformatics unit into the 92 reagent genomes. Biological Theme targets are proteins of exceptional biological interest that are cloned by the individual NYCOMPS experimental groups.
Data from experimental trials (cloning, expression, purification, crystallization) are processed and analyzed with the aim to iteratively improve our target selection strategy. (website)
|