Completed Theses 2014

Effect scores of amino acid substitution as measurement for species separation

This Master's Thesis assesses the question whether it is possible to distinguish two species by comparing the severity of amino acid substitutions observable between sets of their orthologous proteins. An approach is presented that utilizes bioinformatics tools designed to determine the effect of a single nucleotide polymorphism on specified protein sequences to calculate effect scores for each observed amino acid substitution between two species and a third reference species. Statistical evaluation of the resulting score sets shows a clear sign that a separation of the two species is possible given a well defined reference species. Additionally it is suggested that information about evolutionary distance between the species is contained within the effect score sets and can be utilized to a certain degree. In general it can be stated that a greater evolutionary distance between two species results into a set of amino acid substitutions with overall less severe effect than a set of SNPs observed within one of the species.

Master thesis
Student: Yannick Malich
Supervisor: Burkhard Rost, Maximilian Hecht


Detection and analysis of protein sub-cellular localization signals

Master thesis
Student: Robert Wagner
Supervisor: Burkhard Rost, Tatyana Goldberg


Mutation analysis of Abl kinases

Abl kinases are tyrosine kinases with key roles in toxic cell response and actin cytoskeleton regulation. They are autoinhibited and the loss of this autoinhibition has been linked to various cancer forms. Selective Abl inhibitors such as imatinib have had significant impact on targeted cancer therapy. This project aims at the analysis of the mutability landscape of Abl. This refers to analyzing the outcome for the functional protein of mutating one specific amino acid, and doing this for all amino acids within Abl. By knowing which residues play vital role in the protein function, a first step is made in the direction of developing effective inhibitors. Such an analysis may be performed with SNAP. To verify and extend the results, it is conceivable to perform the same test on all members of the tyrosine kinase family, so as to find evolutionarily conserved sites. As function is highly conserved in evolution, this will narrow the focus on the most critical amino acids. In addition, evolutionary couplings within the sequence should be analyzed, based on the sequences in the protein family. Strongly constrained interactions allow further understanding of the critical sites within the protein.

Bachelor thesis
Student: Anton Smirnov
Supervisor: Burkhard Rost, Andrea Schafferhans


Analysis of cancer-associated SNP data on the example of TCGA

Master thesis
Student: Jonathan Boidol
Supervisor: Burkhard Rost, Andrea Schafferhans, Lothar Richter


Transmembrane helix prediction in proteins

Transmembrane (TM) proteins have various major functions in human cells. They are involved in signaling, regulation, and transport processes. However, only very few experimentally determined structures of TM proteins are available. This makes methods that can reliably predict TM proteins and their TM segments so valuable, as this helps to understand their function.
In this thesis a new method (TMSEG) to predict TM proteins and the locations of their TM helices was developed and evaluated. The method is based on multiple machine-learning steps that use either a random forest or a neural network. Furthermore, TMSEG incorporates evolutionary information to increase its accuracy. Unlike most other methods, it can also be used subsequently to another prediction tool to potentially improve the accuracy of the other method. This is possibly due to its unusual post-processing procedure, which is performed by a neural network that uses a segment-based approach, rather than the usual window-based one.
TMSEG was compared to established methods, such as PolyPhobius. It showed similar performances when predicting the overall topology of the proteins, but had a higher performance on predicting the TM helices themselves. TMSEG furthermore showed more reliable results when predicting proteins with six or more TM helices, which are usually harder to predict than proteins with less TM helices.

Master thesis
Student: Michael Bernhofer
Supervisor: Burkhard Rost, Edda Kloppmann


Analysis of alpha-transmembrane proteins: A comparison between alpha-transmembrane proteins with and without known structure with focus on model Organisms

Bachelor thesis
Student: Susanne Bakenecker
Supervisor: Burkhard Rost, Edda Kloppmann

Transmembrane protein feature analysis: towards prediction and improvement of expression

Proteins containing transmembrane segments constitute 20 to 30% of a typical proteome and execute a wide array of functions. A large class of transmembrane proteins, G protein-coupled receptors, forms the target of 30% of current drug targets. Therefore, these proteins are highly interesting subjects of experimental studies. However, due to their lipophilic nature, high resolution structures of transmembrane proteins are even more challenging to obtain than for soluble proteins.
Over the last decade, structural genomics centers have been established, that specifically target transmembrane proteins. As part of the Protein Structure Initiative, the New York Consortium on Membrane Protein Structure has investigated more than 10,000 targets in their medium-throughput pipeline. The attrition rate at every experimental stage is high and more than 70% of the targets fail to express, early on in the pipeline.
Here, we present an analysis of this unique dataset and two classifiers for the prediction of expression success in the structural genomics pipeline. To this end, a set of 55 features was compiled by literature search. While no feature stands out on its own, a random forest trained on this feature set performs significantly better than random and can be employed to prioritize future targets. Slightly weaker performance is demonstrated by a second, profile kernel based, classifier, that trained only on the evolutionary profile of the target sequences.

Master thesis
Student: Jonas Reeb
Supervisor: Burkhard Rost, Edda Kloppmann