ProteinStructureFunction

From Rost Lab Open

This practical is designed to help you understand function and structure of a protein and to practice methods for predicting function and structure. More details can be found at the Rostlab website.

This website will collect the slides from the presentations about the underlying theory, protein assignments and the reports about the results.

Theoretical background

Marco's introduction...

Paper presentations

  • O’Sullivan O, Suhre K, et al. 3DCoffee: combining protein sequences and structures within multiple sequence alignments. Journal of molecular biology. 2004;340(2):385-95. Available at: http://www.ncbi.nlm.nih.gov/pubmed/15201059. Presented on 27 April
  • Schlessinger A, Punta M, Yachdav G, Kajan L, Rost B. Improved disorder prediction by combination of orthogonal approaches. PloS one. 2009;4:4433. Available at: http://www.ncbi.nlm.nih.gov/pubmed/19209228. Presented on 1 June

Other literature

Andrea maintains a list of papers relevant for this practical in Mendeley. -- Of course you don't have to read all of this, but you can find starting points in case you would like to dig deeper in some field.


Resources

Here we will collect resources you can use for the tasks in this practical.

Function prediction

Alignments

  • STRAP structural alignments online server
  • T-Coffee -- also installed on the machines in the student cluster, see T-Coffee tutorial for help on how to run T-Coffee

Structure prediction

List of protein structure prediction software

  • Modeller is installed on the student cluster machines under '/usr/lib/modeller9v8/'.
    • Here you can also find example scripts (in the examples directory). You cannot directly run the examples in that directory since you do not have write permission there.
    • When you run Modeller, use the python installation provided by the Linux installation, not the one provided by Modeller. (You do this by using 'python scriptname' instead of 'mod... scriptname'.)
    • The databases used by Modeller are in /mnt/opt/ModellerDatabases/. You can create links to those databases in your working directory.

Practical tasks

Here we will also collect information about the proteins we will use for function and structure predictions, "homework" we give you and reports about the methods that have been applied.

Function prediction

Homework for the Bioinformatics Praktikum “Protein Structure and Function Analysis” (note: in your answers, detail the procedure/protocols used to reach your conclusions; although we suggest that you use MarkUs as a starting point we also encourage you to use other prediction methods/servers/databases and remind you that sometimes reading the original publications, e.g. for a protein structure, is the only way to really figure out if your hypothesis makes sense)

Prologue

(note: getting the results from MarkUs may take 1-2 days, so the sooner you submit your protein structure to the server the better it is!):

  • First READ the MarkUs tutorial
  • Then, go to the Protein Data Bank and query it using the following TARGET sequence (suggestion: go to Advanced Search and look into Sequence Features when selecting your query type):

MAYWLMKSEPDEFSISDLQRLGKARWDGVRNYQARNFLRTMAEGDEFFFYHSSCPEPGIAGIGKIVKTAYPDPTALDPDSHYHDAKATTEKNPWSALDIGFVDIFKNVLGLGYL KQQSQLEQLPLVQKGSRLSVMPVTAEQWAAILALR

  • Download the structure pdb file (text) of the TARGET sequence.
  • Go to the MarkUs submission page
  • Upload the pdb file of the TARGET sequence and, after the upload, check the Dali box in Structure Analysis. Run MarkUs (again, this may take 1-2 days).
  • Once you get the full results from MarkUs you can start addressing the following:

Tasks

  1. look at the sequence neighbors: is there any functionally annotated sequence neighbor? Is there any sequence signature (pattern of conserved residues) that you can identify in the TARGET sequence family? Can you make any hypothesis on the TARGET function based on sequence information?
  2. work now on the structure: are there interesting cavities? Are there interesting ligands co-crystallized with the target protein? Where do they bind? Are they likely to be functional ligands?
  3. look at the structurally similar proteins: are some of them functionally annotated? Which ones? What function(s) do they have? Are annotated functional residues found in these proteins also conserved in the target sequence and within its sequence family? Are there functional ligands that could fit into the target cavities?
  4. consider all structurally similar proteins: how do they differ and how do they cluster together (look at topology, cavities, overall 3-D structure, conserved residues, functional information, etc.). Try to cluster them into different ‘groups’ and explain how each group is defined with respect to the other.
  5. Summary: what is your prediction for the target protein function? Please discuss all the evidence you collected and how you reached your conclusions.

Using Modeller

  • Run (at least the first two steps of) the Modeller tutorial (Run using "python scriptname" instead of "modnvn". You can find example scripts in /usr/lib/modeller9v8/examples. You have to copy them to your home, since you do not have write permissions in that directory.)
  • The databases used by Modeller are in /mnt/opt/ModellerDatabases/. You can create links to those databases in your working directory.
  • Build models of Thrombin (THRB_HUMAN)
    • based on individual templates of different sequence similarities (ca. 90%, ca. 60%, ca. 30%),
    • based on multiple templates having the same sequence similarity
    • compare the models to the thrombin structures

CASP

  • find out as much as you can for one of the targets in CASP:
    • BLAST/PSI-BLAST against Uniprot and PDB
    • run the Modeller workflow
    • examine the alignment
    • compare with some automatic models