Predicting protein function through gene ontology
Master thesis
Student: Vivien Klose
Supervisor: Burkhard Rost, Christian Schaefer
Extracting binding residues from the Protein Data Bank
Bachelor thesis
Student: Shen Wei
Supervisor: Burkhard Rost, Christian Schaefer
Transmembrane protein 3D structure prediction from evolutionary sequence variation
Alpha-helical transmembrane proteins are an abundant class of proteins involved in a variety of important biological processes such as signaling or transport. Yet, due to the difficulty of solving membrane protein structures experimentally, many protein families remain without structural information inferrable by homology. In this master thesis, we aim to establish a de novo 3D structure prediction method for alpha-helical transmembrane proteins which is exclusively based on sequence information, without the use of homology modeling, threading or sequence fragments.
Master thesis
Student: Thomas Hopf
Supervisor: Burkhard Rost, Chris Sander, Debora Marks
Evaluation of sequence-to-structure alignments
The sequence and structure visualisation tool SRS3D uses a pre-calculated data base of sequence-to-structure alignments that is derived from an enhanced version of HSSP. Likewise, many sequence based prediction methods use sequence-to-structure alignments as their input. Some of these are based on HSSP, others use PSI-Blast results as their input. The purpose of this project is to evaluate different methods of sequence-to-structure alignments for their alignment quality and computational overhead to develop guidelines for the effiicient usage of appropriate methods in the respective context. The standard of truth will be comparison to structural alignments as well as quality of prediction results (e.g. homology models, SNP effects) based on the alignments.
Master thesis
Student: Benjamin Wellmann
Supervisor: Andrea Schafferhans
Automatic protein name recognition
This thesis has as primary goal the development of a text mining tool for the automatic recognition of protein names in articles’ abstracts. The tool’s design is conditioned by its final purpose, namely, the elaboration of a bioinformatics database with a comprehensive mapping between articles and amino acid sequences, respectively mapped to their—possibly multiple—names. Implemented as a web service, this system would be the first of its kind and would boost research in the field by providing new facilities including, but not limited to, search articles by sequence thus avoiding possible name ambiguity, directly find papers on proteins that have a similar sequence, or notify users upon publication of new experiments without the need to specify search-keywords. The accuracy and coverage of current state-of-the-art protein taggers, moreover in combination with protein normalizers, are still insufficient to make the proposed service realizable and, consequently, this thesis’s efforts will be mainly directed to solve this problem.
Master thesis
Student: Juan Miguel Cejuela
Supervisor: Burkhard Rost
Feature construction and selection for predicting structural change upon point mutation in proteins
In this bachelor thesis we investigate basic amino acid propensities with respect to their ability to improve the prediction of local structural change upon point mutation within protein sequences.
Bachelor thesis
Student: Yannick Mahlich
Supervisor: Burkhard Rost, Christian Schaefer
Improving predictions of functional effect of non-synonymous SNP in human
Abstract: In the near future, personal genome sequencing and analysis will become more and more affordable to private persons and therefore increase the public interest in characterizing the effect of single nucleotide polymorphisms (SNPs) in our own genomes. This master thesis is aimed at improving both speed and accuracy for predictions of functional effect of SNPs in human. We will investigate how to limit the search space of homologous proteins to those of a few organisms that best reflect the spectrum of human proteins thus reducing the necessary computational time needed for every prediction. Additionally, a machine learning device will be trained and optimized towards the prediction of SNPs in human by using feature selection techniques.
Master thesis
Student: Maximilian Hecht
Supervisor: Burkhard Rost
Predict subcellular localization for proteins in all kingdoms
An automatic approach for predicting the subcellular localization of proteins, which is an important step towards understanding their function, is developed in this master's thesis project. The sequence-based approach utilizes a number of Support Vector Machines for mimicking the cascading mechanism of cellular sorting. The approach is applicable to soluble and membrane proteins in all taxonomic kingdoms.
Master thesis
Student: Tatyana Goldberg
Supervisor: Burkhard Rost, Tobias Hamp
Compare effects of nsSNPs from human variations with disease related SNPs
Since the sequencing of the human genome was completed in 2000, the analysis of SNPs gained more and more attention in the past few years. SNPs make up about 90% of all human genetic variation and are known to cause several diseases such as cancer. Therefore the analysis of their effects on protein function is of major interest. In this master thesis a comprehensive analysis of the functional effects of SNPs identified by the 1000 genomes project was performed using the SNAP prediction method. The number of effect SNPs in the 1000 genome data was found to be as high as the number of neutral SNPs and the cumulative distribution of SNAP scores looked like the distribution for completely random generated SNPs. These findings led to the conclusion that there exist more SNPs with functional effects in naturally occurring human variation than one has assumed in the past years. Furthermore 15 proteins were identified that accumulated the most function changing SNPs. With these candidate proteins and their direct neighbors a Protein Protein interaction network was built and a GO enrichment analysis was performed. As a result, most of the genes were predicted to belong to the ’intracellular signaling’ and ’protein binding’ GO terms.
Master thesis
Student: Dominik Achten
Supervisor: Burkhard Rost, Shaila Roessle
Evaluation of methods to predict transmembrane alpha-helices in proteins
Proteins containing transmembrane alpha-helices assumedly constitute about 25% of all proteins in an organism. Due to their lipophilic transmembrane region, high-resolution structures of these proteins are even more challenging to obtain than those of soluble proteins. Therefore, methods to predict the position of transmembrane helices and their topology from sequence are of great interest, for example in whole genome analyses. Today, a large number of alpha-helix transmembrane prediction methods exist. The objective of this thesis is an evaluation of several well-known prediction methods.
Bachelor thesis
Student: Jonas Reeb
Supervisor: Burkhard Rost, Edda Kloppmann
Identification of DNA-binding residues from amino acid sequence data
Motivation - DNA-protein interactions are essential for many biological processes e.g. for DNA packaging, DNA replication, DNA recombination and DNA repair. There is currently no established high throughput technology available to experimentally screen DNA-protein binding. At the same time, the number of known proteins explodes. For these purposes we need an in silico method to predict DNA-protein interactions.
Prediction level - There are methods which can predict that a protein will principally inter- act with DNA. Given such a DNA-interacting protein, our novel method focuses on the prediction of DNA-interactivity for each residue.
Input types - Some methods use the tertiary structure (3D), which promises to predict the DNA-binding residues with high accuracy. A weakness of tertiary structure based methods is that they depend on experimental data, which is mostly not applicable. Therefore, our method exclusively uses the raw amino acid sequence data (1D) and nothing more. We generate all necessary features from the raw amino acid sequence e.g. with PredictProtein. As a result, our method is applicable to all known proteins.
Novel dataset - For training, we used a novel dataset containing transcription factors, enzymes and structural / DNA-binding proteins e.g. histone-like proteins. Our purpose is to build a robust classifier for these three protein classes.
Diploma thesis
Student: Michael Menden
Supervisor: Burkhard Rost, Shaila Roessle