Metastudent predicts gene ontology (GO) terms for protein sequences through homology. It first runs a BLAST query for a given target against a custom BLAST database containing sequences and GO terms of previously annotated proteins. If a similar sequence is found, the output is used by three different base classifiers to calculate three separate sets of GO terms. Finally, a smart meta classifier converts them into one set. If there is no similar sequence in the BLAST database, no prediction is made.
What is predicted?
The output of metastudent is a list of Gene Ontology (GO) terms. They describe the functions of a protein and are either part of the Molecular Function Ontology (MFO) or the Biological Process Ontology (BPO). An ontology is a rooted graph in which each node represents a GO term and each link a functional relationship. Thus, the prediction of metastudent can be seen as two subgraphs of the full ontologies. These two subgraphs are displayed below the tabular result. Often, the tabular result only contains very specific functional terms, and not the more general descriptions that can be inferred by going to the root of the ontology. The graphical results do show such terms (predicted: yellow boxes, inferred: white boxes).
What can you expect from GO term predictions?
Since the Gene Ontology is very big (> 50,000 terms), and the problem of sequence based protein function prediction very hard, the predictions usually have quite a low reliability (reliability measures the chance that the target protein actually has this particular function). A recent independent assessment showed that this is not a problem of metastudent in particular, but of any method in the field. Nevertheless, the output of metastudent can still be very useful: some terms are predicted with a very high reliability and all predicted terms for a protein together contain about 60% (BPO) to 80% (MFO) of all true functions of a protein. Hence, to find most of the functions of a protein experimentally, you only have to test the predicted functions, not the entire GO.
Metastudent has four parts: three base classifiers and one meta classifier. First, a PSI-BLAST query is executed against a database of all proteins with known functions. The result is forwarded to each base classifier which compiles the GO terms of the BLAST hits, their eValues, sequence identities etc. into a list of predicted GO terms. All three base classifier solve this problem, but in different ways. The meta classifier finally takes all predictions from the base classifiers and combines them in an optimal way. Metastudent achieves its state-of-the-art accuracy through highly optimized free parameters (e.g. PSI-BLAST eValue cutoff, number of iterations, scoring scheme parameters, etc.) and by focussing only on the latest experimentally resolved GO term annotations.
- The program can be accessed online via the PredictProtein service.
- Standalone version can be downloaded as a Debian package here.
- Source packages can be downloaded via FTP.
For questions, please contact firstname.lastname@example.org