Students are expected to have:
- Basic knowledge of relational databases and NoSQL databases
- Interest working with big data
- Interest in the Game of Thrones show
- Interest in challenge themselves to do something totally cool
- participation in all meetings throughout the presentation week is mandatory. We would only consider one absence that is justified and communicated and approved well in advance.
New This Semester: The Game of Thrones Edition
For instance consider the following toy project - I wanted to analyze the balance of power in Westeros right around the period discussed in the "A Clash of Kings" book. To do this I had to pull data from the wiki of ice and fire -- a wiki site that acts as the most comprehensive source of information about the world of Ice and Fire. The data from the wiki helped me construct a visualization that shows a network of allegiance (appear as edges in the network) of the great houses (appear as nodes in the networks). Here is the visualization - https://rostlab.org/~gyachdav/awoiaf/#/. clicking on nodes will give you detail information and you can use the mouse wheel to zoom and navigate the network.
Checklist to pass the seminar
- Register on TUM Online for this seminar
- use only the Google group for communication with tutors (expect huge delays in responses to emails sent to tutors’ private addresses otherwise). The tutors will use this group also for general announcements.
- check the mailbox of the email address you used to sign up to the Google group regularly!!!
- Upon acceptance to the Google group, send a notification with the group number you would like to join. The tutors will then update the ‘groups assignment’ table below with your name.
- Each group will be assigned one topic and one project to present in the week from March 14th to March 18th. Please see the guidelines for topic and project presentations below.
- The slides for your topic presentation and the preliminary visualization of your project results are due for comments 1 week before the presentation date. Send your drafts to presentations to firstname.lastname@example.org.
- Make sure to read these Hints and Rules for great presentations
- Submit a 5 pages long report (one per group) describing solutions to your topic (4 pages) and the project (1 page). Due: 1 week after the seminar.
We prepared 6 different projects as hands-on exercises.
Project A will be assigned to groups 2 and 3. The students of both groups will need to work together to complete the project. The work can be divided within the groups as the students wish.
Each of the projects B, C and D will also be assigned to two groups (e.g. groups 6 and 7 will work on Project B). The groups will work independently from each other (i.e. group 6 will work independently from group 7). Thus, there will be two different solutions to the same project.
Project E will be assigned to groups 1 and 11. The students of both groups will need to work together to complete the project. The work can be divided within the groups as the students wish.
Project F will be assigned to groups 8 and 9. The students of both groups will need to work together to complete the project. The work can be divided within the groups as the students wish.
|March 25||Submission of the report|
|March 14-18||Students present their topics and the project results:
30 min - topic presentation
10 min - project presentation
15 min - discussion
|March 7-11||Students send their slides and preliminary project results to tutors for review|
|February 22|| Students are assigned to groups.
Students start a GitHub repository (one per assigned project) to make their code available.
Students start working on their topics and projects.
|February 19-21||Use the Google group to choose the group number to join|
|February 18||Acceptance notification on TUM Online|
|Before February 18||Sign up for the seminar on TUM Online|
All groups should take a look and clone this repository: https://github.com/gyachdav/awoiaf. The repo contains scripts (in python) that pull data from the AWOIAF wiki and:
- structures it in JSON format
- organizes as list of houses and characters
- pulls a complete page (for characters), cleans away the html and converts to plain text.
Don’t wait for the teams working on Project A to finish the setup of the database. Instead you can start working on your project with the data provided in the data folder: https://github.com/gyachdav/awoiaf/blob/master/Data/ Whenever you are missing some data, don’t worry, create your own dummy data (by e.g. creating random associations between characters and places) and let project A that you are waiting for them.
Title: Database setup, data integration and creation of database APIs
Description: In this project we will lay the foundations for our system by integrating data from multiple sources into a central database. The database will serve the apps and the visualization tool that will be developed in other projects.
Due: Since the database will serve as the data source for other projects, the deadline for this project is on March 7th. Please send email@example.com your preliminary results for review a week before, on February 29.
Assigned to groups: 1 and 2
Tools: NoSQL database (e.g. MongoDB)
Hint: look at Guy’s GitHub repo here: https://github.com/gyachdav/awoiaf. There are some python scripts Guy wrote to pull the data out of the wiki and populate his own Mongo database - https://github.com/gyachdav/awoiaf.
- Set up a document oriented database (DB)
- Design and develop a set of data extractions and parsing tools that will pull data from the unstructured and semi-structured text on the awoiaf site. Semi-structured text in the awoiaf is only found in the infobox portion of the wiki page.
- You should focus on extracting data from all the following portals in the wiki site:
- Charachters: http://awoiaf.westeros.org/index.php/Portal:Characters
- History: http://awoiaf.westeros.org/index.php/Timeline_of_major_events
- Culture: http://awoiaf.westeros.org/index.php/Portal:Culture
- Geography: http://awoiaf.westeros.org/index.php/Portal:Geography
- Houses: http://awoiaf.westeros.org/index.php/Houses_of_Westeros
- TV episodes: http://awoiaf.westeros.org/index.php/Game_of_Thrones, especially dates of TV episodes.
- Anything else?
Hint: To link a character to a geographic location, pick out the character name in the wiki and then search for name of places mentioned in the Character page. For instance, to see if Arya Stark is connected to Dunkensdale, you will be pulling this page (http://awoiaf.westeros.org/index.php/Arya_Stark) and then search for the Dunkensdale mention. Populate the database with the data extracted from awoiaf site automatically and periodically (once a month is fine).
- Visualize the schema of the database
- Provide an API, so other projects will be able to easily and flexibly query your database. Make sure that it will be easy just look at the visual schema of the DB and at the structure of DB API to perform queries.
- Provide an API that will help us mine tweets and combine those with data that we populated in our database. for instance query a list of characters that belong to “house greyjoy” and then use that list of character names to mine for tweets that mention each name.
- Visualize the statistics: how many data points and what are they?
Title: Which character is in most pressing need for life insurance?
Description: Game of Thrones characters are always in danger of being eliminated. The challenge in this assignment is to see at what risk are the characters that are still alive of being eliminated. The goal of this project is to rank characters by their Percentage Likelihood of Death (PLOD). You will assign a PLOD using machine learning approaches.
Assigned to groups: 3 and 4, 9 and 10
Data: Database developed in Project A (will be made available on March 7th). Before the database is created, please use the data from here (please see above: getting started).
Tool: Machine Learning tools, e.g. https://www.npmjs.com/browse/keyword/machine%20learnin Database from project A
- Select features related to the question if a character is going to die next (e.g. killed, died of age, died of disease)
- Split the data set (stratified cross-validation: the data set is split in e.g. 3 sets, each of which has an equal number of dead and alive people). Possibly stratification based on ranks (e.g. nobleman, warrior, *peasant, males and females, etc.)
- visualization of how features contribute to the prediction (binary separation)
- assign each character a PLOD (percent likelihood of death)
- visualize the PLODs
|1||March 14||10:00||Language basics -- grammar, variables, data structures, control structures, conditionals, functions etc.|
|3||March 14||12:00||The module pattern and AMD|
|4||March 15||10:00||The event handling system using using anonymous functions, callbacks, promises etc.|
|5||March 15||11:00||Functional reactive programming frameworks|
|7||March 16||11:00||The MEAN stack|
|8||March 17||10:00||Web development basics: DOM, DOM manipulation, styles|
|9||March 17||11:00||Web development frameworks (Angular, Backbone, React)|
|10||March 17||12:00||Data visualization using SVG, Canvas and framework libraries|
|11||March 18||10:00||Build tools, continuous integration and distribution|
|12||March 18||11:00||EcmaScript 6 (es6) language features|
- RECOMMENDED VIDEO http://www.paulirish.com/2010/10-things-i-learned-from-the-jquery-source/