CHOPPER

From Rost Lab Open
(Redirected from CHOP)
Jump to: navigation, search

Contents

Intro

CHOP (Liu J & Rost B 2004 Proteins 55(3):678-688) is a method of dissecting proteins into domain-like fragments based on sequence homology. It is developed by Jinfeng. It analyses the protein for its homology to PDB domains (SCOP, CATH, and PrISM domains), Pfam domains and SWISS-PROT proteins.

I have developed a related method, CHOPnet, which is one of the first de novo domain boundary prediction methods based on an artificial neural network (Liu J & Rost B 2004 Nucleic Acids Res 32: 3522-3530).

I've tried to combine these two methods into a single package 'profchop'.

References

  • Liu J & Rost B (2004) CHOP proteins into structural domain-like fragments. Proteins, 55(3):678-688 MEDLINE Paper text
  • Liu J & Rost B (2004) CHOP: Domain Dissection Based on Homology Nucleic Acids Research submitted.

Installation with aptitude (Debian, Ubuntu, etc.)

Software Installation

  1. If you have not done so until now, add the rostlab repository to the list of your syanptic package manager. This is how it's done: Debian_repository#sources.list.d
  2. aptitude update
  3. aptitude (search for rostlab keyring and install by marking the package with a '+' and hit 'g' twice to install)
  4. aptitude update (to determine all rostlab packages to install)
  5. aptitude install profchop. Here's a step by step guide Debian_repository#Installing_a_package_step_by_step

Running CHOPPER

Please see the CHOPPER man page:

 man profchop

Availability/Web server

This program is currently available only as a standalone package available from our debian repository.

Help

Here is the excerpt from the README file in the package.

CHOPPER is a prediction method for protein domain boundaries. It has two components,
a homology based method (CHOP) and a neural network approach (CHOPnet). CHOPPER 
takes a fasta sequence as input, and generates XML output of the domain prediction. 
It can also output an ASCII table, a HTML table, and a CASP DP format by -of option. The user 
has the option to turn off either CHOP ( -nochop ) or CHOPnet ( -nochopnet ). 
Using ' -h ' on the command line will give you the detailed options as shown below:


chopper.pl: running CHOP and CHOPnet for domain prediction
Usage: chopper.pl [options] -i in_file -o out_file
  Opt:  -h            print this help
        -i <file>     input file (REQUIRED)
        -o <file>     output file (REQUIRED)
        -of <string>  format of the output (xml|casp|txt|html), default=xml
        -keepxml      always keep XML output (default=TRUE)
        -id <string>  identifier of input protein (default: taken from input fasta)
        -(no)chop     run CHOP prediction (default=TRUE)
        -(no)chopnet  run CHOPnet prediction (default=TRUE)
        -(no)debug    print debug info(default=nodebug)
        -printconf    print all current options and exit


Input sequence

>YOL113W SKM1, Chr XV from 104325-106292
MKGVKKEGWISYKVDGLFSFLWQKRYLVLNDSYLAFYKSDKCNEEPVLSVPLTSITNVSR
IQLKQNCFEILRATDQKENISPINSYFYESNSKRSIFISTRTERDLHGWLDAIFAKCPLL
SGVSSPTNFTHKVHVGFDPKVGNFVGVPDSWAKLLQTSEITYDDWNRNSKAVIKALQFYE
DYNGLDTMQFNDHLNTSLDLKPLKSPTRYIINKRTNSIKRSVSRTLRKGKTDSILPVYQS
ELKPFPRPSDDDYKFTNIEDNKVREEGRVHVSKESTADSQTKQLGKKEQKVIQSHLRRHD
NNSTFRPHRLAPSAPATKNHDSKTKWHKEDLLELKNNDDSNEIIMKMKTVAIDVNPRPYF
QLVEKAGQGASGAVYLSKRIKLPQENDPRFLKSHCHRVVGERVAIKQIRLSEQPKKQLIM
NELLVMNDSRQENIVNFLEAYIIDDEELWVIMEYMEGGCLTDILDAVARSNTGEHSSPLN
ENQMAYIVKETCQGLKFLHNKKIIHRDIKSDNILLNSQGLVKITDFGFCVELTEKRSKRA
TMVGTPYWMAPEIVNQKGYDEKVDVWSLGIMLIEMIEGEPPYLNEDPLKALYLIANNGSP
KLRHPESVSKQTKQFLDACLQVNVESRASVRKLLTFEFLSMACSPEQLKVSLKWH

TEXT output

Result of CHOP prediction (Jinfeng Liu & Burkhard Rost)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jinfeng Liu & Burkhard Rost
Proteins. (2004) in press
________________________________________________________
# Query  : yol113w
# Length : 655
Fragments    Homologue(region)      E_value      Method
----------   --------------------   ----------   --------------------
4-118        PF00169(1-92)          7.5e-19      HMMER/Pfam_ls
123-161      1eesB1(8-46)           7e-09        BLAST/prism_trim
162-363      NULL                   NULL         NULL
364-453      1f3mC1(5-76)           2e-15        BLAST/prism_trim
479-523      1erk_2(12-56)          4e-04        BLAST/prism_trim
548-633      1bygA3(10-94)          3e-04        BLAST/prism_trim
//


HTML output

Fragments    Homologue(region)    E value    Method    Database   
4-118 PF00169 (1-92) 7.5e-19 HMMER Pfam_ls
123-161 1ees B1(8-46) 7e-09 BLAST prism_trim
162-363
364-453 1f3m C1(5-76) 2e-15 BLAST prism_trim
479-523 1erk_2(12-56) 4e-04 BLAST prism_trim
548-633 1byg A3(10-94) 3e-04 BLAST prism_trim

Information

CHOP has been applied to more than 60 completely-sequenced proteomes. Here are some statistics.

Organism Number of proteins Number of fragments Chopped proteins Single-domain Chopped proteins
Aeropyrum pernix K1 2692 3931 979(36%) 270(27%)
Achaeoglobus fulgidus 2394 4240 1583(66%) 561(35%)
Halobacterium sp. (strain NRC-1) 2058 3757 1295(62%) 383(29%)
Methanosarcina acetivorans 4532 8592 2611(57%) 707(27%)
Methanococcus jannaschii 1762 3142 1196(67%) 430(35%)
Methanopyrus kandleri 1683 2853 1019(60%) 348(34%)
Methanobacterium thermoautotrophicum 1862 3388 1249(67%) 427(34%)
Pyrococcus abyssi 1763 3273 1322(74%) 451(34%)
Pyrococcus furiosus 2061 3647 1401(67%) 468(33%)
Pyrococcus horikoshii 2063 3431 1163(56%) 371(31%)
Sulfolobus solfataricus 943 1568 571(60%) 182(31%)
Sulfolobus tokodaii 2826 4497 1520(53%) 514(33%)
Thermoplasma acidophilum 1475 2633 1003(68%) 332(33%)
Thermoplasma volcanium 1525 2686 1001(65%) 316(31%)
Aquifex aeolicus 1522 3039 1202(78%) 397(33%)
Bacillus subtilis 4093 7611 2632(64%) 791(30%)
Bifidobacterium longum 1728 3676 1260(72%) 259(20%)
Borrelia burgdorferi 850 1671 572(67%) 168(29%)
Brucella melitensis 2059 3980 1394(67%) 363(26%)
Campylobacter jejuni 1633 3001 1137(69%) 382(33%)
Caulobacter crescentus 3737 7117 2466(65%) 653(26%)
Chlamydia pneumoniae 1052 1938 632(60%) 197(31%)
Chlorobium tepidum 2251 4182 1407(62%) 411(29%)
Chlamydia trachomatis 894 1765 610(68%) 183(30%)
Clostridium acetobutylicum 3843 7350 2515(65%) 669(26%)
Clostridium perfringens 2723 5241 1869(68%) 532(28%)
Deinococcus radiodurans 3102 5883 1996(64%) 488(24%)
Escherichia coli 4280 8225 3044(71%) 903(29%)
Fusobacterium nucleatum 2058 3742 1320(64%) 407(30%)
Haemophilus influenzae 1708 3320 1312(76%) 463(35%)
Helicobacter pylori 1549 2803 1033(66%) 341(33%)
Lactococcus lactis (subsp. lactis) 2266 4153 1539(67%) 484(31%)
Leptospira interrogans 4726 7569 1978(41%) 501(25%)
Listeria innocua 2968 5454 2012(67%) 638(31%)
Listeria monocytogenes 2845 5402 2065(72%) 669(32%)
Mycoplasma genitalium 470 991 367(78%) 107(29%)
Mycobacterium leprae 1605 3346 1109(69%) 254(22%)
Mycoplasma pneumoniae 688 1386 484(70%) 128(26%)
Mycobacterium tuberculosis 4186 8233 2566(61%) 568(22%)
Neisseria meningitidis 2061 3688 1267(61%) 386(30%)
Oceanobacillus iheyensis 3491 6540 2407(68%) 748(31%)
Pasteurella multocida 2014 4057 1633(81%) 556(34%)
Pseudomonas aeruginosa 5562 11058 4032(72%) 1107(27%)
Rickettsia conorii 1374 2283 702(51%) 241(34%)
Rickettsia prowazekii 834 1678 642(76%) 223(34%)
Staphylococcus aureus 2622 4785 1752(66%) 571(32%)
Streptomyces coelicolor 7889 15121 4888(61%) 980(20%)
Streptococcus pyogenes 1845 3395 1230(66%) 387(31%)
Synechococcus elongatus 2473 4928 1688(68%) 458(27%)
Synechocystis PCC6803 3166 6356 2123(67%) 582(27%)
Thermotoga maritima 1844 3608 1391(75%) 440(31%)
Treponema pallidum 1031 2104 693(67%) 173(24%)
Ureaplasma urealyticum 611 1129 380(62%) 104(27%)
Vibrio cholerae 2735 5521 1966(71%) 561(28%)
Xanthomonas campestris (pv. citri) 4312 8271 2819(65%) 740(26%)
Xylella fastidiosa 2766 4738 1389(50%) 355(25%)
Arabidopsis thaliana 25528 61241 16619(65%) 1716(10%)
Caenorhabditis elegans 20244 45427 11736(57%) 1530(13%)
Drosophila melanogaster 14314 33601 8310(58%) 850(10%)
Saccharomyces cerevisiae 6349 13334 3474(54%) 466(13%)
Homo sapiens 36750 93619 22733(61%) 3098(13%)

Questions

Please see the FAQ section or contact the maintainer

Personal tools