See the tool's GitHub repository, for the most updated information.
How we predict functional effects?Functional effects of mutations are predicted with SNAP2. SNAP2 is a trained classifier that is based on a machine learning device called "neural network". It distinguishes between effect and neutral variants/non-synonymous SNPs by taking a variety of sequence and variant features into account. The most important input signal for the prediction is the evolutionary information taken from an automatically generated multiple sequence alignment. Also structural features such as predicted secondary structure and solvent accessibility are considered. If available also annotation (i.e. known functional residues, pattern, regions) of the sequence or close homologs are pulled in. In a cross-validation over 100,000 experimentally annotated variants, SNAP2 reached a sustained two-state accuracy (effect/neutral) of 82% (at an AUC of 0.9). In our hands this constitutes an important and significant improvement over other methods.
What is predicted?
SNAP2 predicts the impact (effect) of single amino acid substitutions on protein function. For a given substitution e.g. Arginine (R) at position 152 is substituted by Asparagine (N) -typically abbreviated to R152N- we predict a score (ranges from -100 strong neutral prediction to +100 strong effect prediction) that reflects the likelihood of this specific mutation to alter the native protein function. Moreover, our analysis suggests that the prediction score is to some extent correlated to the severity of effect as shown in Fig.1.
We predict (each substitution independently) and show every possible substitution at each position of a protein in a heatmap representation. Dark red indicates a high score (score>50, strong signal for effect), white indicates weak signals (-50<score<50), and green a low score (score<-50, strong signal for neutral/no effect. Please note that the new webserver uses blue instead of green). Black marks the corresponding wildtype residues. Fig. 2 shows an example of such a heatmap.
SNAP2 is a neural network based classifier. The feed-forward multilayer perceptron consists of 848 nodes in the input layer, 25 nodes in the hidden layer and two nodes (one for each neutral/effect) in the output layer. In each training step, all samples are presented to the network and the connection weights are adjusted through a backpropagation algorithm. The final method consists of ten different models (created during 10-fold cross-validation using different subsets for training and optimization). Each model outputs one score for each output class (neutral/effect). These scores of 10 models are averaged in a jury decision. The final score is calculated as the difference of the average score for effect and the average score for neutral.
All neccessary information (e.g. secondary structure, solvent accessibility, disorder, alignments of related sequences etc.) is produced by the PredictProtein pipeline. Feature calculation algorithms are written in Perl and transform the predictions into normalized numeric input values. The neural network implementation is provided through the Fast Artificial Neural Network Library.
Preditcion reliability score
The reliability score (Reliability Index: RI) is calculated from the final prediction score. Fig 3. shows that stronger predictions are more reliable. This measure is meant to simplify assessing prediction strength and to immediatly convey the reliabilty of a prediction. The score ranges from 0 (very low reliabilty) to 9 (very high reliabilty)
Accuracy of functional effect prediction
We evaluated the performance of SNAP2 in a 10-fold cross-validation and on two independent datasets. The overall two-state accuracy was shown to be above 82% in the cross-validation with an area under the ROC curve of 0.9. We estimated an standard error of 0.013 in a bootstrapping scenario of 1000 sets each consisting of 50k randomly drawn (without replacement) samples.
A web server is available here.
For questions and/or comments please contact email@example.com