ConSequenceS

From Rost Lab Open
Jump to: navigation, search

Contents

Intro

This is a method for searching and aligning databases of consensus sequences. It is developed by Dariusz Przybylski at the Rost Group, at [www.columbia.edu Columbia University], New York.

Publication abstract: Sequence alignments may be the most fundamental computational resource for molecular biology. The best methods that identify relatedness through profile-profile comparisons are much slower and more complex than sequence-sequence and sequence-profile comparisons such as, respectively, BLAST and PSI-BLAST. Families of related genes and gene products (proteins) can be represented by consensus sequences that list the nucleic/amino acid most frequent at each sequence position in that family. Here we proposed a novel approach for consensus sequence-based comparisons. This approach improved searches and alignments as a standard add-on to PSI-BLAST without any changes of code. Improvements were particularly significant for more difficult tasks such as the identification and alignment of distant structural relations between proteins. Despite the fact that the improvements were higher for more divergent relations, they were consistent even at high accuracy/low error rates for non-trivially related proteins. The improvements were very easy to achieve: No parameter used by PSI-BLAST was altered and no single line of code changed. On top the consensus sequence add-on required relatively little additional CPU time. Thus, advanced users of PSI-BLAST can immediately benefit from using consensus sequences on their local computers. In addition we made the method available through the Internet.

PLEASE NOTE (!!!): in general the amino acid composition of consensus sequences differs from the composition of original sequences (consequently the statistical significance of alignment scores as reported by PSI-BLAST could likely be incorrect even though relative ordering of scores seem to be very good). Therefore in here we present consensus sequences that on average have SIMILAR composition to the original sequences (their construction differs from the one described in our first paper).

Author and References

If you find this method useful for your research, please cite:

  • Przybylski D & Rost B (2007) Consensus sequences improve PSI-BLAST through mimicking profile-profile alignments searches. Nucleic Acids Res. 2007;35(7):2238-46.
  • Przybylski D & Rost B (2008) Powerful fusion: PSI-BLAST and consensus sequences. Bioinformatics. 2008 Aug4 [Epub ahead of print].

Availability and download

The Consequences method is currently NOT available for download. Please write to assistant at rostlab dot org.

Commercial licenses can be obtained through Biosof LLC

Help

An explanation of the fields and options in the submission page:

  • Input sequence format - Simple sequence or fasta formatted sequence
  • Output format - Standard TXT (ASCII) format. The output is almost the same as the standard BLAST output. The only difference is in the fields indicating type of match (identity or similarity). Those fields indicate match with consensus sequence instead of the native one.

Sample Output

This file contains alignments of the query sequence with consensus sequences.
On the left side:  consensus sequences are translated back into raw (real) sequences.
On the right side: consensus sequences are not translated.
                     *******

-------- Raw sequences  --------                                                   | -------- Consensus sequences (in the "Sbjct:" fields) --------
                                                                                   |      
BLASTP 2.2.9 [May-01-2004]                                                         | BLASTP 2.2.9 [May-01-2004]
                                                                                   | 
                                                                                   | 
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,          | Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),               | Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search           | "Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.                                       | programs",  Nucleic Acids Res. 25:3389-3402.
                                                                                   | 
Query= 1cyx   mol:protein length:205     Cyoa                                      | Query= 1cyx   mol:protein length:205     Cyoa
         (205 letters)                                                             |          (205 letters)
                                                                                   | 
Database: pdbcons_100                                                              | Database: pdbcons_100 
           22,314 sequences; 5,192,384 total letters                               |            22,314 sequences; 5,192,384 total letters
                                                                                   | 
Searching.............................................done                         | Searching.............................................done
                                                                                   | 
                                                                                   | 
                                                                                   | 
                                                                 Score    E        |                                                                  Score    E
Sequences producing significant alignments:                      (bits) Value      | Sequences producing significant alignments:                      (bits) Value
                                                                                   | 
1iby_A                                                                100   2e-22  | 1iby_A                                                                100   2e-22
                                                                                   | 
                                                                                   | 
>1iby_A                                                                            | >1iby_A
          Length = 112                                                             |           Length = 112
                                                                                   | 
 Score =  100 bits (249), Expect = 2e-22                                           |  Score =  100 bits (249), Expect = 2e-22
 Identities = 24/87 (27%), Positives = 36/87 (41%), Gaps = 3/87 (3%)               |  Identities = 24/87 (27%), Positives = 36/87 (41%), Gaps = 3/87 (3%)
                                                                                   | 
Query: 31  IYPEQGIATVNEIAFPANTPVYFKVT-SNSVMHSFFIPRLGSQIYAMAGMQTRLHLIANE 89         | Query: 31  IYPEQGIATVNEIAFPANTPVYFKVT-SNSVMHSFFIPRLGSQIYAMAGMQTRLHLIANE 89
           +   + +   N +  P   PV + +T S+ V H ++IP  G ++ A  GM        N             |            +   + +   N +  P   PV + +T S+ V H ++IP  G ++ A  GM        N 
Sbjct: 28  IRAFNVLNEPETLVVKKGDAVKVVVENKSPISEGFSIDAFGVQEVIKAGETKTISFTADK 87         | Sbjct: 28  MNQFRLLEVDNRLVVPMGDPVRWVLTNSDDVWHGWWIPSHGIKMDACHGMTWTYWFTFNR 87
                                                                                   | 
Query: 90  PGTYDGICAEICGPGHSGMKFKAIATP 116                                         | Query: 90  PGTYDGICAEICGPGHSGMKFKAIATP 116
           P  Y G C+E CG  H  M                                                     |            P  Y G C+E CG  H  M        
Sbjct: 88  AGAFTIWCQLHPKNIH--LPGTLNVVE 112                                         | Sbjct: 88  PWMYYGQCSEYCGANH--MPGVVEVVE 112
                                                                                   | 
                                                                                   | 
                                                                                   | 
                                                                                   | 
  Database: pdbcons_100                                                            |   Database: pdbcons_100
    Posted date:  Sep 25, 2006  3:27 PM                                            |     Posted date:  Sep 25, 2006  3:27 PM
  Number of letters in database: 5,192,384                                         |   Number of letters in database: 5,192,384
  Number of sequences in database:  22,314                                         |   Number of sequences in database:  22,314
                                                                                   |   
Lambda     K      H                                                                | Lambda     K      H
   0.322    0.150    0.450                                                         |    0.322    0.150    0.450 
                                                                                   | 
Lambda     K      H                                                                | Lambda     K      H
   0.267   0.0460    0.140                                                         |    0.267   0.0460    0.140 
                                                                                   | 
                                                                                   | 
Matrix: BLOSUM62                                                                   | Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1                                         | Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 4,437,015                                                    | Number of Hits to DB: 4,437,015
Number of Sequences: 22314                                                         | Number of Sequences: 22314
Number of extensions: 359971                                                       | Number of extensions: 359971
Number of successful extensions: 2292                                              | Number of successful extensions: 2292
Number of sequences better than 10.0: 290                                          | Number of sequences better than 10.0: 290
Number of HSP's better than 10.0 without gapping: 212                              | Number of HSP's better than 10.0 without gapping: 212
Number of HSP's successfully gapped in prelim test: 78                             | Number of HSP's successfully gapped in prelim test: 78
Number of HSP's that attempted gapping in prelim test: 1953                        | Number of HSP's that attempted gapping in prelim test: 1953
Number of HSP's gapped (non-prelim): 394                                           | Number of HSP's gapped (non-prelim): 394
length of query: 205                                                               | length of query: 205
length of database: 5,192,384                                                      | length of database: 5,192,384
effective HSP length: 87                                                           | effective HSP length: 87
effective length of query: 118                                                     | effective length of query: 118
effective length of database: 3,251,066                                            | effective length of database: 3,251,066
effective search space: 383625788                                                  | effective search space: 383625788
effective search space used: 383625788                                             | effective search space used: 383625788
T: 11                                                                              | T: 11
A: 40                                                                              | A: 40
X1: 16 ( 7.4 bits)                                                                 | X1: 16 ( 7.4 bits)
X2: 38 (14.6 bits)                                                                 | X2: 38 (14.6 bits)
X3: 64 (24.7 bits)                                                                 | X3: 64 (24.7 bits)
S1: 41 (21.8 bits)                                                                 | S1: 41 (21.8 bits)
S2: 54 (25.2 bits)                                                                 | S2: 54 (25.2 bits)

Personal tools