Completed Theses 2015

Aquaria PredictProtein integration

Through the Aquaria project it is possible to render visualizations for proteins through the comparison of known sequences. The data on these sequences is acquired from various sources such as the Protein Data Bank (PDB), the Uniprot database and the PSSH2 database of sequencestructure alignments. Within Aquaria, there are mechanisms to access the various databases and construct meaningful objects for the functions of Aquaria to process. Here, we will identify interfaces in the code that can be extracted into Application Programming Interface (API) mechanisms. Through these APIs it will then be possible to feed external data, generated on demand, into Aquaria. The APIs shall work with J avaScript Object Notation (JSON) files, eg. to transmit data on alignments. We will use the preexisting APIs to feed in annotations generated by PredictProtein. To this end, PredictProtein results will be parsed and transformed into a JSON format Aquaria can use. Finally, a user interface will be added to submit sequences on demand for the structure mapping and for calculating PredictProtein annotations. The submitted sequences and the calculation results will be integrated into the existing databases.

Interdisciplinary Project
Students: Yichun Lin and Christian Dallago
Supervisors: Andrea Schafferhans, Burkhard Rost


Ensemble Learning in Data Streams

Huge amounts of data are generated nowadays from different application domains e.g. social networks, telecommunications, WWW, etc. These are known as stream data. Tradition machine learning algorithms generally feature a single model or classifier such as Naïve Bayes or MLP learned from the entire training set. For stream environment, due to the volume and the nature of the data, realization from complete dataset is not an option. Also, it is fair to assume that these data might have several possible generalization choices. Thus, choosing a single classifier or the best among several classifier is not always optimal. A better alternative would be to build a classifier ensemble. The goal of this thesis, is to investigate current ensemble learning algorithms for streams, and to develop an efficient ensemble model that facilitates classification of streams depending on contents over different time granularities (e.g. short-term, long-term).

Master thesis
Student: Hossain Mahmud
Supervisor: Burkhard Rost
Advisors: Lothar Richter, Eirini Ntoutsi (Research Associate, Lehr- und Forschungseinheit für Datenbanksysteme, LMU)


Analysis of Nuclear Transport Signals

Most of the eukaryotic proteome is transcribed in the cell’s nucleus, exported into the cell's cytoplasm and finished to functional proteins there. Because some proteins are functional in the nucleus, they need to be imported into the nucleus again. One of the nuclear import mechanisms depends on specific nuclear localization signals that are recognized by the so-called karyopherins importing the proteins. Some proteins also need to be exported from the nucleus for further functions in the cell's interior. For this purpose their sequences carry nuclear export signals. The goal of this Bachelor thesis is to extract and collect these signals from the published data and analyze the proteins together with their signal properties. The assembled knowledge will lead to a more comprehensive understanding of one of the cell's basic mechanisms that affects cells activity in many ways.

Bachelor thesis
Student: Silvana Wolf
Supervisor: Burkhard Rost, Mikael Boden, Tatyana Goldberg


Complete annotation of the human gut microbiome

The gut microbiome has a central role in human health and disease. However, in order to determine specific microbial effects in the human body, it is necessary to understand its functional capacity. Using prediction tools for functional features - such as LocTree3 (subcellular localization), SNAP2 (functional effects of nsSNPs), metastudent (GO term) and ISIS2 (protein-protein interaction), as well as for different types of protein disorder - UCON, NORSne and PROFbval, we try to provide a comprehensive protein functional feature annotation of the human gut microbiome. We then provide an enrichment analysis and attempt to identify the role of proteins in important pathways. Finally the annotation summary and analysis are presented as a set of interactive charts and data visualization components on the web.

Bachelor thesis
Student: Diana Iacob
Supervisor: Burkhard Rost, Guy Yachdav


Protein localization relation extraction from biomedical text

The primary goal of this thesis is to develop a text mining tool for extraction of relation between protein and its subcellular localization. The problem of relation extraction in Biomedical texts has been studied to some extent previously. However, there have been very less focus on extraction relations which span over di erent sentences. The aim would be to develop a method which performs well on intra-sentence as well as inter-sentence relations.

Master thesis
Student: Shrikant Vinchurkar
Supervisor: Burkhard Rost, Juan Miguel Cejuela, Tatyana Goldberg


Sorting the nuclear proteome using Machine Learning

This thesis’ primary goal is to create a prediction method for subnuclear protein localization. Proteins localized in subnuclear structures are of great interest in understanding the interior mechanisms in the nuclei, as they are associated with various nuclear processes. Even though predictions of subcellular localization reach high levels of accuracy and coverage, the distinct substructures inside the nucleus are mostly not covered in these methods. LocNuclei, the prediction method to be developed, tackles this problem and is ought to fill this annotation gap. Depending on sequence-data for novel proteins only, this thesis will create a convenient method to assess subnuclear localization of newly discovered proteins.

Master thesis
Student: Sebastian Seitz
Supervisor: Burkhard Rost, Tatyana Goldberg


Protein mutability landscape contains protein function?

SNAP2, predicts the effect of any nsSNPs for the protein function. Predicting the effect for all possible nsSNPs, we may create a mutation landscape. In this thesis I investigate the question, whether it is possible to extract the protein function out of these mutation landscapes.

Bachelor thesis
Student: Yann Spoeri
Supervisor: Burkhard Rost, Maximilian Hecht