From Rost Lab Open

What is predicted?

SomeNA predicts the binding of proteins to polynucleotides on several hierarchical levels. On top, there is the decision if any polynucleotide is bound at all, further divided into the separation of DNA or RNA. If a protein is predicted to have this ability, SomeNA assigns the residues which most probably take part in this interaction.

How is this interaction defined?

An interaction between a polynucleotide and the protein is existing, if the Euclidian distance between any atom of a residue and any atom of the polynucleotide is less than 5 Angstroem. There are additional criteria applied, as an interaction could be shielded by another atom in between the residue and the nucleotide. Such issues are also taken into account.

What is the technical background of the predictor?

SomeNA is based on a set of artificial neural networks (ANN) trained for their specific purpose in the system. To discriminate between residues of proteins taking part in polynucleotide binding and residues of non-binding proteins, an ANN is trained on data consisting of binding and non-binding proteins. The global prediction is then evaluated by scoring clusters of these predicted residues. For the specific prediction of binding residues on the other hand, the non-binding proteins were left out to achieve more accurate results for this specific task. The features used in the training and prediction progress of the machine learning algorithm include, but are not limited to, homology profiles, physiochemical properties of residues and other protein- or residue-specific characteristics like the residues position or the global amino acid composition.

How good does it work?

The first step is performed best: 77% of the proteins are correctly predicted to bind DNA and RNA. The distinction between the type of nucleotide is slightly more difficult: 74% of the proteins predicted to bind DNA and 72% of the proteins predicted to bind RNA were correct. Slightly over 53% of the residues binding DNA and/or RNA were correctly predicted. These levels of performance are about 3-fold higher than random. Without (or with little) homology information, performance will drop. However, homology profiles are just one among other features and the performance will still be significantly better than random.