ProteinStructureFunction
This practical is designed to help you understand function and structure of a protein and to practice methods for predicting function and structure. More details can be found at the Rostlab website.
This website will collect the slides from the presentations about the underlying theory, protein assignments and the reports about the results.
Contents
Theoretical background
Marco's introduction...
Paper presentations
- Rost B. Twilight zone of protein sequence alignments. Protein Engineering Design and Selection. 1999;12(2):85-94. Available at: http://peds.oxfordjournals.org/cgi/content/abstract/12/2/85. (covered in Marco's introduction)
- Kosloff M, Kolodny R. Sequence-similar, structure-dissimilar protein pairs in the PDB. Proteins. 2008;71(2):891-902. Available at: http://www.ncbi.nlm.nih.gov/pubmed/18004789. Presented on 27 April
- O’Sullivan O, Suhre K, et al. 3DCoffee: combining protein sequences and structures within multiple sequence alignments. Journal of molecular biology. 2004;340(2):385-95. Available at: http://www.ncbi.nlm.nih.gov/pubmed/15201059. Presented on 27 April
- Sali A., Blundell T.L. . Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 1993; 234:779-815. Available at: http://www.ncbi.nlm.nih.gov/pubmed/8254673. Presented on 11 May:File:MODELLER wellmann.pdf
- Krieger E, Joo K, Lee J, et al. Improving physical realism, stereochemistry, and side-chain accuracy in homology modeling: Four approaches that performed well in CASP8. Proteins. 2009;77 Suppl 9(S9):114-22. Available at: http://www3.interscience.wiley.com/journal/122540778/abstract. Presented on 11 May:File:Krieger2009 phyiscal realism thomas hopf.pdf
- Venclovas C, Margelevicius M. The use of automatic tools and human expertise in template-based modeling of CASP8 target proteins. Proteins. 2009;77 Suppl 9(S9):81-8. Available at: http://www3.interscience.wiley.com/journal/122462831/abstract. Presented on 18 May: File:Thomma template-based modeling hhpred.pdf
- Holm L, Sander C. Protein structure comparison by alignment of distance matrices. Journal of molecular biology. 1993;233(1):123-38. Available at: http://www.ncbi.nlm.nih.gov/pubmed/8377180. Presented on 18 May
- Hermann JC, Marti-Arbona R, Fedorov Aa, et al. Structure-based activity prediction for an enzyme of unknown function. Nature. 2007;448(7155):775-9. Available at: http://www.ncbi.nlm.nih.gov/pubmed/17603473. Presented on 27 April: File:Paper presentation Structured-based activity prediction.pdf
- Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proceedings of the National Academy of Sciences of the United States of America. 1999;96(8):4285-8. Available at: http://www.ncbi.nlm.nih.gov/pubmed/10200254. Presented on 1 June: File:ProteinPhylogeneticProfiles.pdf
- Todd A, Orengo C, Thornton J. Evolution of function in protein superfamilies, from a structural perspective. Journal of Molecular Biology. 2001;307(4):1113–1143. Available at: http://linkinghub.elsevier.com/retrieve/pii/S0022283601945139. Presented on 1 June
- Schlessinger A, Punta M, Yachdav G, Kajan L, Rost B. Improved disorder prediction by combination of orthogonal approaches. PloS one. 2009;4:4433. Available at: http://www.ncbi.nlm.nih.gov/pubmed/19209228. Presented on 1 June
Other literature
Andrea maintains a list of papers relevant for this practical in Mendeley. -- Of course you don't have to read all of this, but you can find starting points in case you would like to dig deeper in some field.
Resources
Here we will collect resources you can use for the tasks in this practical.
Function prediction
Alignments
- STRAP structural alignments online server
- T-Coffee -- also installed on the machines in the student cluster, see T-Coffee tutorial for help on how to run T-Coffee
Structure prediction
List of protein structure prediction software
- Modeller is installed on the student cluster machines under '/usr/lib/modeller9v8/'.
- Here you can also find example scripts (in the examples directory). You cannot directly run the examples in that directory since you do not have write permission there.
- When you run Modeller, use the python installation provided by the Linux installation, not the one provided by Modeller. (You do this by using 'python scriptname' instead of 'mod... scriptname'.)
- The databases used by Modeller are in /mnt/opt/ModellerDatabases/. You can create links to those databases in your working directory.
Practical tasks
Here we will also collect information about the proteins we will use for function and structure predictions, "homework" we give you and reports about the methods that have been applied.
Function prediction
Homework for the Bioinformatics Praktikum “Protein Structure and Function Analysis” (note: in your answers, detail the procedure/protocols used to reach your conclusions; although we suggest that you use MarkUs as a starting point we also encourage you to use other prediction methods/servers/databases and remind you that sometimes reading the original publications, e.g. for a protein structure, is the only way to really figure out if your hypothesis makes sense)
Prologue
(note: getting the results from MarkUs may take 1-2 days, so the sooner you submit your protein structure to the server the better it is!):
- First READ the MarkUs tutorial
- Then, go to the Protein Data Bank and query it using the following TARGET sequence (suggestion: go to Advanced Search and look into Sequence Features when selecting your query type):
MAYWLMKSEPDEFSISDLQRLGKARWDGVRNYQARNFLRTMAEGDEFFFYHSSCPEPGIAGIGKIVKTAYPDPTALDPDSHYHDAKATTEKNPWSALDIGFVDIFKNVLGLGYL KQQSQLEQLPLVQKGSRLSVMPVTAEQWAAILALR
- Download the structure pdb file (text) of the TARGET sequence.
- Go to the MarkUs submission page
- Upload the pdb file of the TARGET sequence and, after the upload, check the Dali box in Structure Analysis. Run MarkUs (again, this may take 1-2 days).
- Once you get the full results from MarkUs you can start addressing the following:
Tasks
- look at the sequence neighbors: is there any functionally annotated sequence neighbor? Is there any sequence signature (pattern of conserved residues) that you can identify in the TARGET sequence family? Can you make any hypothesis on the TARGET function based on sequence information?
- work now on the structure: are there interesting cavities? Are there interesting ligands co-crystallized with the target protein? Where do they bind? Are they likely to be functional ligands?
- look at the structurally similar proteins: are some of them functionally annotated? Which ones? What function(s) do they have? Are annotated functional residues found in these proteins also conserved in the target sequence and within its sequence family? Are there functional ligands that could fit into the target cavities?
- consider all structurally similar proteins: how do they differ and how do they cluster together (look at topology, cavities, overall 3-D structure, conserved residues, functional information, etc.). Try to cluster them into different ‘groups’ and explain how each group is defined with respect to the other.
- Summary: what is your prediction for the target protein function? Please discuss all the evidence you collected and how you reached your conclusions.
Using Modeller
- Run (at least the first two steps of) the Modeller tutorial (Run using "python scriptname" instead of "modnvn". You can find example scripts in /usr/lib/modeller9v8/examples. You have to copy them to your home, since you do not have write permissions in that directory.)
- The databases used by Modeller are in /mnt/opt/ModellerDatabases/. You can create links to those databases in your working directory.
- Build models of Thrombin (THRB_HUMAN)
- based on individual templates of different sequence similarities (ca. 90%, ca. 60%, ca. 30%),
- based on multiple templates having the same sequence similarity
- compare the models to the thrombin structures
CASP
- register for CASP as a person and as part of the group 'TUM_Bioinf_Praktikum'
- find out as much as you can for one of the targets in CASP:
- BLAST/PSI-BLAST against Uniprot and PDB
- run the Modeller workflow
- examine the alignment
- compare with some automatic models