Plea: Please register for the seminar only if you will be able to attend all sessions taking place on March 14-18, 2016 and will complete the assignments set for your team prior the seminar takes place.
- 1 Description
- 2 Pre Requisites
- 3 New This Semester: The Game of Thrones Edition
- 4 Preparation
- 5 Schedule
- 6 Projects
- 7 Presentation Schedule
- 8 Recommended literature
Students are expected to have:
- Basic knowledge of relational databases and NoSQL databases
- Interest working with big data
- Interest in the Game of Thrones show
- Interest in challenge themselves to do something totally cool
- participation in all meetings throughout the presentation week is mandatory. We would only consider one absence that is justified and communicated and approved well in advance.
New This Semester: The Game of Thrones Edition
For instance consider the following toy project - I wanted to analyze the balance of power in Westeros right around the period discussed in the "A Clash of Kings" book. To do this I had to pull data from the wiki of ice and fire -- a wiki site that acts as the most comprehensive source of information about the world of Ice and Fire. The data from the wiki helped me construct a visualization that shows a network of allegiance (appear as edges in the network) of the great houses (appear as nodes in the networks). Here is the visualization - https://rostlab.org/~gyachdav/awoiaf/#/. clicking on nodes will give you detail information and you can use the mouse wheel to zoom and navigate the network.
Checklist to pass the seminar
- Register on TUM Online for this seminar
- use only the Google group for communication with tutors (expect huge delays in responses to emails sent to tutors’ private addresses otherwise). The tutors will use this group also for general announcements.
- check the mailbox of the email address you used to sign up to the Google group regularly!!!
- Upon acceptance to the Google group, send a notification with the group number you would like to join. The tutors will then update the ‘groups assignment’ table below with your name.
- Each group will be assigned one topic and one project to present in the week from March 14th to March 18th. Please see the guidelines for topic and project presentations below.
- The slides for your topic presentation and the preliminary visualization of your project results are due for comments 1 week before the presentation date. Send your drafts to presentations to firstname.lastname@example.org.
- Make sure to read these Hints and Rules for great presentations
- Submit a 5 pages long report (one per group) describing solutions to your topic (4 pages) and the project (1 page). Due: 1 week after the seminar.
We prepared 6 different projects as hands-on exercises.
Project A will be assigned to groups 2 and 3. The students of both groups will need to work together to complete the project. The work can be divided within the groups as the students wish.
Each of the projects B, C and D will also be assigned to two groups (e.g. groups 6 and 7 will work on Project B). The groups will work independently from each other (i.e. group 6 will work independently from group 7). Thus, there will be two different solutions to the same project.
Project E will be assigned to groups 1 and 11. The students of both groups will need to work together to complete the project. The work can be divided within the groups as the students wish.
Project F will be assigned to groups 8 and 9. The students of both groups will need to work together to complete the project. The work can be divided within the groups as the students wish.
|March 25||Submission of the report|
|March 14-18||Students present their topics and the project results:
30 min - topic presentation
10 min - project presentation
15 min - discussion
|March 7-11||Students send their slides and preliminary project results to tutors for review|
|February 22|| Students are assigned to groups.
Students start a GitHub repository (one per assigned project) to make their code available.
Students start working on their topics and projects.
|February 19-21||Use the Google group to choose the group number to join|
|February 18||Acceptance notification on TUM Online|
|Before February 18||Sign up for the seminar on TUM Online|
All groups should take a look and clone this repository: https://github.com/gyachdav/awoiaf. The repo contains scripts (in python) that pull data from the AWOIAF wiki and:
- structures it in JSON format
- organizes as list of houses and characters
- pulls a complete page (for characters), cleans away the html and converts to plain text.
Don’t wait for the teams working on Project A to finish the setup of the database. Instead you can start working on your project with the data provided in the data folder: https://github.com/gyachdav/awoiaf/blob/master/Data/ Whenever you are missing some data, don’t worry, create your own dummy data (by e.g. creating random associations between characters and places) and let project A that you are waiting for them.
Title: Database setup, data integration and creation of database APIs
Description: In this project we will lay the foundations for our system by integrating data from multiple sources into a central database. The database will serve the apps and the visualization tool that will be developed in other projects.
Due: Since the database will serve as the data source for other projects, the deadline for this project is on March 7th. Please send email@example.com your preliminary results for review a week before, on February 29.
Assigned to groups: 2 and 3
Tools: NoSQL database (e.g. MongoDB)
Hint: look at Guy’s GitHub repo here: https://github.com/gyachdav/awoiaf. There are some python scripts Guy wrote to pull the data out of the wiki and populate his own Mongo database - https://github.com/gyachdav/awoiaf.
- Set up a document oriented database (DB)
- Design and develop a set of data extractions and parsing tools that will pull data from the unstructured and semi-structured text on the awoiaf site. Semi-structured text in the awoiaf is only found in the infobox portion of the wiki page.
- You should focus on extracting data from all the following portals in the wiki site:
- Charachters: http://awoiaf.westeros.org/index.php/Portal:Characters
- History: http://awoiaf.westeros.org/index.php/Timeline_of_major_events
- Culture: http://awoiaf.westeros.org/index.php/Portal:Culture
- Geography: http://awoiaf.westeros.org/index.php/Portal:Geography
- Houses: http://awoiaf.westeros.org/index.php/Houses_of_Westeros
- TV episodes: http://awoiaf.westeros.org/index.php/Game_of_Thrones, especially dates of TV episodes.
- Anything else?
Hint: To link a character to a geographic location, pick out the character name in the wiki and then search for name of places mentioned in the Character page. For instance, to see if Arya Stark is connected to Dunkensdale, you will be pulling this page (http://awoiaf.westeros.org/index.php/Arya_Stark) and then search for the Dunkensdale mention. Populate the database with the data extracted from awoiaf site automatically and periodically (once a month is fine).
- Visualize the schema of the database
- Provide an API, so other projects will be able to easily and flexibly query your database. Make sure that it will be easy just look at the visual schema of the DB and at the structure of DB API to perform queries.
- Provide an API that will help us mine tweets and combine those with data that we populated in our database. for instance query a list of characters that belong to “house greyjoy” and then use that list of character names to mine for tweets that mention each name.
- Visualize the statistics: how many data points and what are they?
Title: Which character is in most pressing need for life insurance?
Description: Game of Thrones characters are always in danger of being eliminated. The challenge in this assignment is to see at what risk are the characters that are still alive of being eliminated. The goal of this project is to rank characters by their Percentage Likelihood of Death (PLOD). You will assign a PLOD using machine learning approaches.
Assigned to groups: 3 and 4, 9 and 10
Data: Database developed in Project A (will be made available on March 7th). Before the database is created, please use the data from here (please see above: getting started).
Tool: Machine Learning tools, e.g. https://www.npmjs.com/browse/keyword/machine%20learnin Database from project A
- Select features related to the question if a character is going to die next (e.g. killed, died of age, died of disease)
- Split the data set (stratified cross-validation: the data set is split in e.g. 3 sets, each of which has an equal number of dead and alive people). Possibly stratification based on ranks (e.g. nobleman, warrior, *peasant, males and females, etc.)
- visualization of how features contribute to the prediction (binary separation)
- assign each character a PLOD (percent likelihood of death)
- visualize the PLODs
Description: The known GoT world is vast and stretches over the three continents of Westeros, Essos and Sothorys. Readers of the Ice and Fire books will get acquainted and transported from King's Landing to the borders of the Seven Kingdoms, and further on across the Narrow Sea. Over two thousand characters mentioned in the books have been associated with multiple landmarks in the GoT world. Your mission is to find character-place association and put those associations on an interactive GoT map. Such a tool will help us figure out where did Gregor “the hound” Clegane went on his travels and how are these travels coincide with the travels of Breanne of Tarth (hint: they never crossed paths in the books, however they had deadly duel during the show)
Assigned to groups: 5 and 6, 11 and 12
- database from Project A (geographical location; will be made available on March 7th).
- Before the database is created, please use the data from here (please see above: getting started).
- Map (e.g. from here)
- For a list of cities start here http://awoiaf.westeros.org/index.php/Category:Cities
- Visualization of locations mentioned for a specific character in the wiki
- Map used should be in an acceptable resolution and in color.
- You will need to create a coordinate system and log the coordinates of each location into the database.
- For a given name drop pins to locations where a character is associated with.
- Easy user interface to add more characters-locations associations. Use autocomplete when entering a new character name to recreate its locations on the map.
- Different character-location association added to the map should be colored in different colors for easier recognition.
- Hovering over a location name should load and display information from the awoiaf wiki.
Description: Joffrey Baratheon is one of the most loathed characters in TV history. As a matter of fact people were celebrating his TV death on Twitter. We are interested to learn more on how people feel about different characters by analyzing tweets mentioning GoT characters. In this project you will be analyzing twitter feeds across a timeline, you will look for the name of GoT characters in that feed and try to identify whether the tweet is positive or negative. You can then generate a metric that evaluates what is the accumulated sentiment expressed on Twitter for that given character at a given point in time, and what is the trend (positive, negative). It will be interesting to intersect the sentiments for characters following the airing of a certain episode (you can easily get the airing date for an episode from the database constructed in Project A).
Assigned to groups: 6 & 7
- Database from Project A (geographical location; will be made available on March 7th).
- Before the database is created, please use this dummy data.
Hint: a very helpful example http://www.sitepoint.com/creating-sentiment-analysis-application-using-node-js/
- Develop an API to interact with Twitter’s API
- Get the characters’ names
- Get the info from Twitter posts
- Lay them over time
- Show a popularity chart (after the episode how the popularity increased/decreased)
Questions to answer:
- Whom people liked the best from the last episode?
- Analysis of characters popularity over time
- Analyze the sentiments of characters on Twitter over time
- A graph showing a character's sentiment trend over time laid out against episodes’ airing date.
Description: In this project we will put all the apps developed in Projects B, C and D into the website that is developed in Project F. In this project you will pull the code from each project repository, compile it with the set of dependencies and package the apps so they can be easily called from the web site developed in project F.
Assigned to group: 8
- Configuration information for each application created in Projects B, C and D.
- Information on the web framework used in project F and how to include applications/modules into that web app.
- Create unit tests for the applications to make sure you are only integrating packages that were successfully built
- Package applications including their dependencies
- Convert the packages into modules that will fit into the web app*
- Make the process continuous, so that each new release of an app (Projects B-D) would be tested and integrated into the web site.
- Continuous integration of apps into the website
- A page reporting on latest build, where the source code was pulled from, alerts if errors exist (i.e. build history)
Title: Putting it all together on the web
Assigned to group: 9
Description: In this project we will build a web portal for our GoT data analysis and visualization system. The website will integrate all the apps created in projects B-D with the help of the integration team assigned to project 5.
Due: March 31st
- Create the shell of an Model-View-Control website
- Create the menus that will load the apps created in projects B-D
- Create the user interaction controls for the apps integrated and enable them with the help of the integrators from project E
- Create an absolutely awesome UX/UI for the website :)
- Make sure that the home page displays some cool statistics that you pulled from the apps
- Have a gallery showing what the site has to offer
- Make arrangements to host the web site on a host that can scale for traffic demands (preferably there should be no cost associated with this step. Check Heroku, AWS, etc).
|1||March 14||10:00||Language basics -- grammar, variables, data structures, control structures, conditionals, functions etc.|
|3||March 14||12:00||The module pattern and AMD|
|4||March 15||10:00||The event handling system using using anonymous functions, callbacks, promises etc.|
|5||March 15||11:00||Functional reactive programming frameworks|
|7||March 16||11:00||The MEAN stack|
|8||March 17||10:00||Web development basics: DOM, DOM manipulation, styles|
|9||March 17||11:00||Web development frameworks (Angular, Backbone, React)|
|10||March 17||12:00||Data visualization using SVG, Canvas and framework libraries|
|11||March 18||10:00||Build tools, continuous integration and distribution|
|12||March 18||11:00||EcmaScript 6 (es6) language features|
- RECOMMENDED VIDEO http://www.paulirish.com/2010/10-things-i-learned-from-the-jquery-source/