| Title: | Supporting online material |
| Author: | Jinfeng Liu & Burkhard Rost |
| Quote: | Proteins, 2004, vol, pages |
Supporting online material
for:
CHOP proteins into structural domain-like fragments
| 1 | CUBIC, Dept. of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA |
| 2 | Columbia University Center for Computational Biology and Bioinformatics (C2B2), Russ Berrie Pavilion, 1150 St. Nicholas Avenue, New York, NY 10032, USA |
| 3 | North East Structural Genomics Consortium (NESG), Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA |
| 4 | Dept. of Pharmacology, Columbia Univ., 630 West 168th Street, New York, NY 10032, USA |
| * | Corresponding authors: cubic@cubic.bioc.columbia.edu URL http://cubic.bioc.columbia.edu/ Tel: +1-212-305-4018, fax: +1-212-305-7932 |
Fig. S1: CHOP was robust with respect to parameter choices. The two major free parameters for CHOP are (1) the required coverage of a domain annotated by PrISM or Pfam-A (values from 0.7-0.9, i.e. 70-90% of the annotated domain have to be sequence similar), and (2) the required threshold for sequence similarity (shown BLAST E-values from 0.1-0.001, i.e. matches are accepted if more similar than this threshold). The data shown was compiled for the yeast proteome. The maximum number (14,047) was obtained at coverage of 80% and E-value of 0.1 (right dashed line), the minimum was 13,915 (left dashed line). For the parameters we used (coverage of 80% and E-value of 0.01, central solid line), the number was 13,988. Thus, the variations in the results for CHOP between throughout this extensive parameter range were smaller than 1%.
Fig. S2: PrISM and Pfam-A predicted similar number of domains for same proteins. 6,349 yeast proteins were chopped according to PrISM alone or Pfam-A alone and compared for the number of predicted domains. 1,521 of them can be dissected by either of them. In 40% of the cases, predictions from PrISM and Pfam are the same. Cumulatively, the difference is smaller than three domains for 90% of the proteins.
Fig. S3: Percentage of single domain proteins in proteomes. As expected, eukaryotic proteomes (blue) have more multi-domain proteins, than do prokaryotes (green) and archae bacteria (red).
Fig. S4: CLUP versus other family-based databases. The distribution of CLUP lengths was most similar to that of the semi-automatic domain parsing method SBase: both slightly over-represented regions ²50 residues, and over-represented (with respect to SCOP) fragments with over 360 residues. Both also came closest to Pfam-A regions. For all public methods the results are taken from the respective databases.
|
Organism |
Number of proteins |
Number of CHOP frag |
Archae
|
|
|
|
Aeropyrum pernix K1 |
1013 |
3931 |
|
Achaeoglobus fulgidus |
2019 |
4240 |
|
Halobacterium sp. (strain NRC-1) |
1338 |
3757 |
|
Methanosarcina acetivorans |
2744 |
8592 |
|
Methanococcus jannaschii |
1707 |
3142 |
|
Methanopyrus kandleri |
1091 |
2853 |
|
Methanobacterium thermoautotrophicum |
1335 |
3388 |
|
Pyrococcus abyssi |
1379 |
3273 |
|
Pyrococcus furiosus |
1469 |
3647 |
|
Pyrococcus horikoshii |
1233 |
3431 |
|
Sulfolobus solfataricus |
586 |
1568 |
|
Sulfolobus tokodaii |
1590 |
4497 |
|
Thermoplasma acidophilum |
1053 |
2633 |
|
Thermoplasma volcanium |
1039 |
2686 |
Prokaryotes
|
|
|
|
Aquifex aeolicus |
1390 |
3039 |
|
Bacillus subtilis |
3266 |
7611 |
|
Bifidobacterium longum |
1291 |
3676 |
|
Borrelia burgdorferi |
646 |
1671 |
|
Brucella melitensis |
1471 |
3980 |
|
Campylobacter jejuni |
1198 |
3001 |
|
Caulobacter crescentus |
2587 |
7117 |
|
Chlamydia pneumoniae |
692 |
1938 |
|
Chlorobium tepidum |
1465 |
4182 |
|
Chlamydia trachomatis |
661 |
1765 |
|
Clostridium acetobutylicum |
2655 |
7350 |
|
Clostridium perfringens |
1987 |
5241 |
|
Deinococcus radiodurans |
2058 |
5883 |
|
Escherichia coli |
4089 |
8225 |
|
Fusobacterium nucleatum |
1404 |
3742 |
|
Haemophilus influenzae |
1694 |
3320 |
|
Helicobacter pylori |
1088 |
2803 |
|
Lactococcus lactis (subsp. lactis) |
1621 |
4153 |
|
Leptospira interrogans |
2043 |
7569 |
|
Listeria innocua |
2218 |
5454 |
|
Listeria monocytogenes |
2211 |
5402 |
|
Mycoplasma genitalium |
468 |
991 |
|
Mycobacterium leprae |
1220 |
3346 |
|
Mycoplasma pneumoniae |
683 |
1386 |
|
Mycobacterium tuberculosis |
2865 |
8233 |
|
Neisseria meningitidis |
1385 |
3688 |
|
Oceanobacillus iheyensis |
2658 |
6540 |
|
Pasteurella multocida |
1815 |
4057 |
|
Pseudomonas aeruginosa |
4337 |
11058 |
|
Rickettsia conorii |
806 |
2283 |
|
Rickettsia prowazekii |
781 |
1678 |
|
Staphylococcus aureus |
1903 |
4785 |
|
Streptomyces coelicolor |
5102 |
15121 |
|
Streptococcus pyogenes |
1301 |
3395 |
|
Synechococcus elongatus |
1767 |
4928 |
|
Synechocystis PCC6803 |
2216 |
6356 |
|
Thermotoga maritima |
1446 |
3608 |
|
Treponema pallidum |
955 |
2104 |
|
Ureaplasma urealyticum |
454 |
1129 |
|
Vibrio cholerae |
2135 |
5521 |
|
Xanthomonas campestris (pv. citri) |
2988 |
8271 |
|
Xylella fastidiosa |
1513 |
4738 |
Eukaryotes
|
|
|
|
Arabidopsis thaliana |
16992 |
61241 |
|
Caenorhabditis elegans |
12519 |
45427 |
|
Drosophila melanogaster |
8762 |
33601 |
|
Saccharomyces cerevisiae |
5447 |
13334 |
|
Homo sapiens |
24383 |
93619 |
| Contact: rost@columbia.edu | Version: Dec 2, 2003 |