PROFphd - Secondary Structure, Solvent Accessibility and Transmembrane Helices Prediction
Secondary structure is predicted by a system of neural networks rating at an expected average accuracy > 72% for the three states helix, strand and loop (Rost & Sander, PNAS, 1993 , 90, 7558-7562; Rost & Sander, JMB, 1993 , 232, 584-599; and Rost & Sander, Proteins, 1994 , 19, 55-72; evaluation of accuracy). Evaluated on the same data set, PROFsec is rated at ten percentage points higher three-state accuracy than methods using only single sequence information, and at more than six percentage points higher than, e.g., a method using alignment information based on statistics (Levin, Pascarella, Argos & Garnier, Prot. Engng., 6, 849-54, 1993). PHDsec predictions have three main features:
- improved accuracy through evolutionary information from multiple sequence alignments
- improved beta-strand prediction through a balanced training procedure
- more accurate prediction of secondary structure segments by using a multi-level system
Solvent accessibility is predicted by a neural network method rating at a correlation coefficient (correlation between experimentally observed and predicted relative solvent accessibility) of 0.54 cross-validated on a set of 238 globular proteins (Rost & Sander, Proteins, 1994, 20, 216-226; evaluation of accuracy). The output of the neural network codes for 10 states of relative accessibility. Expressed in units of the difference between prediction by homology modelling (best method) and prediction at random (worst method), PROFacc is some 26 percentage points superior to a comparable neural network using three output states (buried, intermediate, exposed) and using no information from multiple alignments.
Transmembrane helices in integral membrane proteins are predicted by a system of neural networks. The shortcoming of the network system is that often too long helices are predicted. These are cut by an empirical filter. The final prediction (Rost et al., Protein Science, 1995, 4, 521-533; evaluation of accuracy) has an expected per-residue accuracy of about 95%. The number of false positives, i.e., transmembrane helices predicted in globular proteins, is about 2%. The neural network prediction of transmembrane helices (PHDhtm) is refined by a dynamic programming-like algorithm. This method resulted in correct predictions of all transmembrane helices for 89% of the 131 proteins used in a cross-validation test; more than 98% of the transmembrane helices were correctly predicted. The output of this method is used to predict topology, i.e., the orientation of the N-term with respect to the membrane. The expected accuracy of the topology prediction is > 86%. Prediction accuracy is higher than average for eukaryotic proteins and lower than average for prokaryotes. PHDtopology is more accurate than all other methods tested on identical data sets.
The output is written into /path/to/input/file/INPUTFILENAME.rdbProf where 'rdbProf' replaces the extension of the input file. In lack of extension '.rdbProf' is appended to the input file name. It is mandatory that /path/to/input/file/ is writable.
- Rost B, Sander C. (1994). Combining evolutionary information and neural networks to predict protein secondary structure. Proteins. 1994 May;19(1):55-72.
- Rost B, Sander C. (1993). Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol., 232, 584-599.
- Rost B, Sander C. (1993). Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc. Natl. Acad. Sci. U.S.A., 1993, 90, 7558-7562.
This program can be run through the PredictProtein service.
PROFphd licenses: is licensed to academic users under the GPL license. Commercial licenses can be obtained by writing to Biosof LLC
If you download PROFphd, you are presumed to have read and accepted the PROFphd Licensing Conditions for either:
- a free, perpetual Academic License, which is free to any academic organization wanting to use PROFphd entirely for academic (teaching and/or research) purposes.
- an annual Commercial License, which is available for organizations wishing to use PROFphd internally or to exploit it commercially.
Installation with aptitude (Debian, Ubuntu, etc.)
PROFphd is now available in Debian please look at the Debian PROFphd page for details on download and installations instructions
Installation From the Rostlab Repoitory [OBSOLETE]
- If you have not done so until now, add the rostlab repository to the list of your syanptic package manager. This is how it's done: Debian_repository#sources.list.d
- aptitude update
- aptitude (search for rostlab keyring and install by marking the package with a '+' and hit 'g' twice to install)
- aptitude update (to determine all rostlab packages to install)
- aptitude install profphd. Here's a step by step guide Debian_repository#Installing_a_package_step_by_step
Generating Input Files for PROFphd
PROFphd can take an alignment profile file as an input:
- filtered_hssp_file - PSI-BLAST alignment profile file.
Generating an HSSP Input File
Here is a protocol for converting an alignment to an HSSP profile.
Please see the PROFphd man page:
Downloading PROFphd source?
PROFphd is available under the GPL. The source can be fetched through the Debian repository:
apt-get source profphd
or by emailing the developers.
MAKE SURE YOU READ THE SECTION BELOW ABOUT OBTAINING DEPENDENCIES AND COMPILING PROFphd
How to compile
This section applies ONLY if you have the source (as a tar.gz) for PROFphd.
Dear Computational Biologist, dear Bioinformatician, thank you for using our profphd package. Exactly in order to solve this sort of problem ('maintainers of this software *should* ... provide a neutral distribution vector (e.g. sources) for their software') the deb source package is already made available together with the tar.gz source! The deb packages you are going to need the source for are: profnet, profphd-utils. Please compile these packages from the deb source and install then as usual. Your system administrators know how to compile a deb package from its source. In case you need tar.gz sources (which the deb source package is based on), please use this table to find and retrieve them: https://rostlab.org/owiki/index.php/Packages#Package_overview You are going to need profnet and profphd-utils. The other components of profphd are architecture independent and need no compilation. We strongly recommend you use a deb package based Linux distribution in order to minimize maintenance difficulty and maintenance effort. Please let us know if your system administrators have problems compiling the above packages. Their comments on the process are vital to the success of our effort to provide robust packages of our software. Best regards, Laszlo Kajan Rost Lab