Proseminar 'Proteins and Disease' WS 2012/2013

Type:            Seminar (2 SWS)
Ects:             3.0 (or whatever it says in the module description...)
Lecturer:      Burkhard Rost
Time:           Monday, 13:30 - 15:00
Room:          MI 01.09.034
Language:    English

Application / Registration

Application is organised centrally for all bioinformatics seminars. After you have been assigned to our seminar, we will distribute the topics.


 Topics related to the research interests of the group: protein sequence analysis, sequence based predictions, 
 protein structure prediction and analysis; interaction networks; text mining. 



The rules and hints for preparation of the seminar that were given in the pre-meeting are also summarised in our Checklist and on these slides.



Oct 22    Dominik SchönhoferBiological Databases
              Advisor: Esmeralda Vicedo

Oct 29     Diana IacobSequence alignment: local and global
              Advisor: Andrea Schafferhans

Nov 5      Lilian LevinhSequence alignment and searches: heuristic methods
              Advisor: Laszlo Kajan

Nov 12    Lars KaluscheSequence searches using profiles (PSI-Blast et al.)
              Advisor: Maximilian Hecht

Nov 19    cancled!

Nov 26    Rene SchoeffelMultiple sequence alignment
              Advisor: Maximilian Hecht

Dec 3      Johannes RestShort Sequence Motifs
              Advisor: Tobias Hamp

Dec 10    Wilhelm GottschallBiological Networks
              Advisor: Tobias Hamp

Dec 17    Martina WeiglPhylogenetic Prediction
              Advisor: Esmeralda Vicedo

Jan 14    Julian GabryschMonte Carlo Methods
              Advisor: Edda Kloppmann

Jan 21     Vanessa ZwingIntegral membrane protein structures and their classification
              Advisor: Edda Kloppmann

Jan 28    canceled!! 
(Pascal LichtensternSubcellular localization: genome-wide experimental vs. computational methods
              Advisor: Tanya Goldberg)

Feb 4     Carsten UhligConditional Random Fields for Named-Entity Recognition
              Advisor: Juan Miguel Cejuela





Integral membrane protein structures and their classification

Dr. Edda Kloppmann

The important class of integral membrane proteins (IMPs) provides the link between cell and environment or between different cell compartments and is for example involved in ion transport, signaling and cell adhesion. Structures of these proteins are particularly difficult to solve. Nevertheless, a significant number of structures is known today. This talk shall give an introduction to IMP structure and their orientation in the membrane which has to be calculated.


  • Arthur M Lesk. Introduction to bioinformatics.
  • Alberts et alMolecular biology of the cell.
  • Lomize et al. Positioning of proteins in membranes: A computational approach. Protein Science (2006) 15: 1318-1333. (A good introduction can also be found on the website:
  • Tusnády et al. TMDET: web server for detecting transmembrane domains by using 3D structure of proteins.Bioinformatics (2005) 21: 1276-1277

Monte Carlo methods

Dr. Edda Kloppmann

Monte Carlo methods use random sampling for computation. These methods were first used in the 1940s and are still widely applied to obtain predictions for biological systems. This talk shall introduce the history of the Monte Carlo method, the Metropolis-Hastings algorithm and its applications.


  • Herbert L Anderson (1986) Metropolis, Monte Carlo and the MANIAC. [pdf]
  • Nicholas C Metropolis (1987) The beginning of the Monte Carlo method. [pdf]
  • ...

Biological Databases

Dipl. Biol. Esmeralda Vicedo

Huge volumes of primary data are archived in numerous open-access databases, and with new generation technologies becoming more common in laboratories. This seminar shall give an overview of different Databases, how to access them and problems associated.

  • Arthur M. Lesk. Introduction to bioinformatics (Third Edition) Oxford University Press

Sequence alignment: local and global

Dr. Andrea Schafferhans

Finding an alignment of two protein sequences is the basis of all techniques to infer knowledge by homology. This talk shall review well-known local and global alignment methods (Smith-Waterman, Needleman-Wunsch).



Sequence alignment and searches: heuristic methods

Laszlo Kajan

This talk shall explain the heuristic approximations made to speed up sequence alignment and sequence searches (BLAST, FASTA).


Recommended by Benjamin Wellmann:

  • Lecture slides from Tübingen
  • Should be available from the library:

    • Introduction to Bioinformatics

      • Newest book (2008), contains a whole chapter dedicated to alignments.
      • Maybe you should start from here or look in the references of one of the lecture slide/notes.
    • Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids

      • Old school sequence analysis book (don't expect any new methods in here, pub date 1998)
      • They also introduce HMMs, if I remember it correctly.
    • Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology

      • Comprehensive book on algorithms (maybe too detailed), but also old (1997).
      • Very algorithmic as the title says.

Others have found these useful:

  • Understanding bioinformatics. Zvelebil, Marketa, Baum, Jeremy O.
  • Chapter 4: Database Similarity Searching. Chapter 5: Multiple Sequence Alignment. Essential bioinformmatics. Autor: Xiong, Jin
  • Algorithms in computational molecular biology techniques, approaches and applications. Hrsg./Bearb.:Elloumi, Mourad

Recommended by Peter Hönigshmid:



Sequence searches using profiles (PSI-Blast et al.)

Maximilian Hecht

This talk shall explain why and how profiles help in searching sequence databases and how the profile searches work technically.


Multiple sequence alignment

Maximilian Hecht

This talk shall explain the methods used to generate multiple sequence alignments, the complexity of the problem and the approximations made.


  • A.M Lesk Bioinformatik: Eine Einführung. Spektrum Akademischer Verlag, 2002.
  • Thompson JD, Higgins DG, Gibson TJ (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.Nucleic Acids Res 22: 4673–4680.
  • Edgar RC (2006). Multiple Sequence Alignment. Curr opin struct biol 16: 368-373.


HHBlits: HMM-HMM-based lightning-fast iterative sequence search

Tatyana Goldberg

HHblits is a method to build high-quality multiple sequence alignments for remote homology detection. The method represents both query and database sequences by profile hidden Markov models (HMMs). Compared to the most popular iterative sequence methods, HHblists is faster, has higher sensitivity and generates more accurate alignments.


  • Remmert M, Biegert A, Hauser A, Söding J (2011). "HHblits: Lightning-fast iterative protein sequence searching by HMM-HMM alignment.". Nat. Methods 9 (2): 173–175.


Subcellular localization: genome-wide experimental vs. computational methods

Tatyana Goldberg

The identification of a protein’s subcellular localization is an important step for many analyses, as subcellular localization provides hints about protein’s function. Recently, a number of laboratory and computational methods have been developed for prokaryotic genome-wide localization analyses. This talk shall give an introduction to one of the most popular localization prediction methods PSORTb and evaluate it against laboratory methods.


  • J.L. Gardy, M.R. Laird, F. Chen, S. Rey, C.J. Walsh, M. Ester, and F.S.L. Brinkman (2005). PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis. Bioinformatics 21(5):617-623
  • Rey S, Gardy JL, Brinkman FS (2005). Assessing the precision of high-throughput computational and laboratory approaches for thegenome-wide identification of protein subcellular localization in bacteria. BMC Genomics Nov 17;6:162.


Short Sequence Motifs

Tobias Hamp

Short sequence motifs are reoccurring patterns that are functionally important. Here, we want to give an introduction where and how they play a role, how we can find them and what we know so far.


  • Patrik D'haeseleer: What are DNA sequence motifs?, Nature Biotechnology, 24(4): 423-425 (2006)
  • Bailey TL, Elkan C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers, Proc Int Conf Intell Syst Mol Biol, 2:28-36. (1994),



Homology modelling

Dr. Andrea Schafferhans

If the structure of a protein has not been resolved experimentally, one can often model the sequence by homology to other sequences. This seminar shall give an overview of homology modelling techniques.



Biological Networks

Tobias Hamp

In biochemistry, both experimental and predicted data is often represented in the form of networks. This seminar is supposed to give an overview of their various types in terms of differences, commonalities and applications.


  • Marc Vidal, Michael E. Cusick, Albert-László Barabási, Interactome Networks and Human Disease, Cell, 144(6): 986-998,


Phylogenetic Prediction

Dipl. Biol. Esmeralda Vicedo

Variations within a family of related nucleic acid or protein sequences provide an inestimable source of information for evolutionary biology. In this topic we will have an overview of the procedures for phylogenetic analysis, concepts and methods.


  • David Mount, Bioinformatics - Sequence and Genome Analysis (Second Edition), Cold Spring Harbor Laboratory Press

Conditional Random Fields for Named-Entity Recognition

Juan Miguel Cejuela

Conditional random fields (CRF) are popular methods in named-entity recognition (NER) and generally in sequential labeling tasks. This talk shall present the CRF models and their advantages in comparison to other popular models like hidden markov models (HMMs). An example case will focus on the recognition of protein names.

  • Andrew McCallum Charles Sutton. An Introduction to Conditional Random Fields for Relational Learning. In Lise Getoor and Ben Taskar, editors, Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning), chapter 4
  • John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML ’01,