UniqueProt should be used by researchers who want to analyze a sequence-set containing proteins of a certain functional class or cellular location. It removes the bias of sequence-redundant proteins from these sets hoping that the aquired unique sub-set will be a more accurate approximation of the protein universe.
UniqueProt takes a fasta file of protein sequences and calculates a set of sequence-unique proteins. It first compares the sequences with BLAST and then uses a greedy algorithm to derive a representative set reaching maximum coverage and minimum redundancy. The number of output sequences/clusters depends on the HSSP-value, which the program needs as a cutoff parameter. Please refer to the UniqueProt man page for further information.
Author and References
The original UniqueProt has been thoroughly revised and ported to python by T. Hamp. This new implementation is called UniqueProt2. It is available via the Rostlab repository and not published as a paper. Therefore, please still refer to the original implementation when using UniqueProt2:
- S. Mika and B. Rost: UniqueProt. Creating representative protein-sequence sets. Nucleic Acids Res. 2003;31(13):3789-3791.
For questions contact: hampt - at - rostlab.org
Availability and download
This method is available through the Rostlab repository under the GPL license.
Source packages are available via FTP.
Commercial licenses can be obtained through Biosof LLC