Revision as of 17:46, 11 September 2016
Date/Time: Sep 19-23, 2016/ 10am - 12pm (1pm)
MI 01.10.011 MI 00.13.09A
Students are expected to have:
- Basic knowledge of relational databases and NoSQL databases
- Interest working with big data
- Interest in the worlds of Pokemon and Pokemon GO
- Interest in challenge themselves to do something totally cool
- Participation in all meetings throughout the presentation week is mandatory. We would only consider one absence that is justified and communicated and approved well in advance.
New This Semester: The Pokemon Go Edition
Checklist to pass the seminar
- Join the seminar’s Google group
- use only the Google group for communication with tutors (expect huge delays in responses to emails sent to tutors’ private addresses otherwise). The tutors will use this group also for general announcements.
- students are encouraged to answer questions of their fellow students posted in the Google group
- check the mailbox of the email address you used to sign up to the Google group regularly!!!
- Upon acceptance to the Google group, send a notification with the group number you would like to join. The tutors will then update the ‘groups assignment’ table below with your name.
- Each group will be assigned one topic and one project to present during presentation week (see schedule). Please see the guidelines for topic and project presentations below.
- The slides for your topic presentation and the preliminary visualization of your project results are due for comments 1 week before the presentation date. Send your drafts to presentations to email@example.com.
- Make sure to read these Hints and Rules for great presentations
- Submit a 5 pages long report (one per group) describing solutions to your topic (4 pages) and the project (1 page). Due: 2 weeks after the seminar.
We prepared 5 different projects as hands-on exercises.
Project A will be assigned to groups 1 and 2. The students of both groups will need to work together to complete the project. The work can be divided within the groups as the students wish.
Each of the projects B, and D will also be assigned to two groups (e.g. groups 7 and 8 will work on Project D). The groups will work independently from each other (i.e. group 7 will work independently from group 8). Thus, there will be two different solutions to the same project.
Project C will be assigned only to one team - group 5.
Project E will be assigned to groups 6, 9 and 10. The students of these groups will need to work together to complete the project. The work can be divided within the groups as the students wish.
Project: A (6 students)
Description: In this project you will scrape as much data as you can get about the *actual* sightings of Pokemons. As it turns out, players all around the world started reporting sightings of Pokemons and are logging them into a central repository (i.e. a database). We want to get this data so we can train our machine learning models.
You will of course need to come up with other data sources not only for sightings but also for other relevant details that can be used later on as features for our machine learning algorithm (see Project B). Additional features could be air temperature during the given timestamp of sighting, location close to water, buildings or parks. Consult with Pokemon Go expert if you have such around you and come up with as many features as possible that describe a place, time and name of a sighted Pokemon.
Another feature that you will implement is a twitter listener: You will use the twitter streaming API (https://dev.twitter.com/streaming/public) to listen on a specific topic (for example, the #foundPokemon hashtag). When a new tweet with that hashtag is written, an event will be fired in your application checking the details of the tweet, e.g. location, user, time stamp. Additionally, you will try to parse formatted text from the tweets to construct a new “seen” record that consequently will be added to the database. Some of the attributes of the record will be the Pokemon's name, location and the time stamp.
Additional data sources (here is one: https://pkmngowiki.com/wiki/Pok%C3%A9mon) will also need to be integrated to give us more information about Pokemons e.g. what they are, what’s their relationship, what they can transform into, which attacks they can perform etc.
Data: Here is one end point we already found for you:
Params: minLatitude, maxLatitude, minLongitude, maxLongitude
TIp: You will need to form a strategy on how to carefully pull data from this source. As often happens with online data sources, this one is not very reliable and you may even need to access it from multiple IPs to avoid being blocked or delayed.
Twitter streaming API (https://dev.twitter.com/streaming/public)
Additional data sources: https://pkmngowiki.com/wiki/Pok%C3%A9mon
Outcome: Once you established those data sources you will need to set up a document database (e.g. Mongo) to log all the sighting information you captured. Finally, you will need to make sure to set up an API server that will expose the data to all the downstream apps that will consume it. Use this as an example: api.got.show
Milestones: Set up a document-oriented database (DB) Design and develop a set of data extractions and parsing tools that will pull data from various sources about “sightings” of pokemons Integrate third party sources with information about Pokemons (you can limit this to the 152 Pokemons in the game) You should come up with as many features describing sighting of a Pokemons and the Pokemon himself
Project B (2 groups x 3 students)
Description: In this project we will apply machine learning to establish the TLN (Time, Location and Name - that is where pokemons will appear, at what date and time, and which Pokemon will it be) prediction in Pokemon Go.
Data: You will use the data collected by Project A for your predictor. Before the data will be provided to you, you can use this dummy data set that we created for you.
Dummy data can be found Guy’s repo: https://github.com/gyachdav/pokemongo
- Select features (i.e. properties) that best contribute to the prediction of TLN
- One of the features will be the timestamp. The challenge here is to find out what is the time interval we need to use - a day, a week, a month - that would lead to the best performance of our machine learning tool
Outcome: Given the data set of previously sighted Pokemons over a certain period of time, the algorithm needs to make a prediction when, where and what kind of Pokemon will appear in the future.
Intro to Machine Learning - August 16:
- Video Intro to ML - part 1
- Video Intro to ML - part 2
- File:Slides machine learning JST WS16.pdf
Interesting links to check out
How to apply machine learning (also explained in both video files):
- First, get the weka software and the weka manual
- To start weka, choose the 'Explorer' application and using the 'Open file' button you can upload your data in the ARFF format (e.g. Sample_dataset1_aa_composition.arff from above). Please refer to the manual to understand the ARFF format.
- In the ARFF file, each character will be represented by a string of attributes and the class label '+1' or '-1' - as explained in the video we want to predict if a sample in the data set has a signal peptide or not.
- After uploading the ARFF file, you will see in the GUI window some simple statistic about the dsitribution of features in your data set. Does the distribution look right?
- If yes, you are ready to run the classification! Choose the 'Classify' tab in the GUI
- Select "Use training set" in Test Options to understand how the classifier performs on the features you selected on your training set (i.e. training and testing will be done on the same dataset!!)
- Select the classifier: Choose -> functions -> SMO (this will be a Support Vector Machine; SVM)
- Leave the cross-validation option at 10
- Click on Start!
- What is your performance of your classifier (i.e. Precision, Performance, F-measure, ROC area) - all measures are summarized at the bottom of the output of your predictor
- Now perform some parameter optimization
- for an SVM, you can select a different kernel (and optimize its parameters as well), parameter c and the error penalty parameter chsi
- you can inspect your features with an eye. Select 'visualize' tab -> increase the point size and jitter -> click update. Does your data separate classes well?
- you can also inspect your features using a statistical approach. Select 'select attributes' tab -> select 'ReliefAttributeEval' as an attribute evaluator and 'Ranker' as a search method -> click on 'Start'. Are there features with a negative contribution? Are there features with a minimal contribution? If yes, remove them.
- Run your classifier again with the new feature set. Do your results increase?
- Once you find optimal classifier parameters and once you found optimal features describing your data set, try a new clasifier! No need to optimize features anymore, but optimizing classsifier parameters will need to be done!
- Does the prediction performance improve?
- What happens if you try another classifier (or a couple more of them)?
- Once you found a perfect combination of features (i.e. a classifier, its parameters and features describing your data), you are ready to test your model on previously unseen test data set. To do so, toggle now the 'cross-validation' option in test options and leave the number of folds at 10.
- What is the performance of your classifier now?
- Is it better than random?
- Report all your steps and all your results in GitHub issues, so that we can see what you do and can give you an advice!
Project C: #PokemonGo (4 students)
Description: In this project you will analyze the tweets gleaned from the Twitter STREAMING API. The twitter STREAMING API provides access to live tweets and let us “listen” in on events that happen in real time. You will develop two applications in this project building on work done last semester https://github.com/Rostlab/JS16_ProjectD_Group5. Application #1 (low priority) - Live sentiment analysis on pokemon in a x km radius. We want to know what people think about a Pokemon that appeared nearby. The user of the app should be able to visualize a live sentiment feed around his/her area (that is, given a lat/lng and a specific radius), and be able to see if people around him/her think positively or negatively about that pokemon. Application #2 - You will identify PokeMobs. PokeMobs are events where a large group of people forms in a certain location due to the appearance of a rare pokemon. Here is an example for a PokeMob https://www.youtube.com/watch?v=MLdWbwQJWI0. In this project you will listen in on the twitter streaming API and try to detect a spike in the frequency of tweets about a certain pokemon as well as spike in density of those tweets in a certain geolocation. Once such a spike has been detected you can register it as an alert that a PokeMob is forming. Maybe some sharp mind will be quick to scope this event to the world. Application #3 - We would like to collect sentiment analysis information about each Pokemon collected from the past tweets, since the PokemonGo game came out, up until today. We are interested to know what people said about each of the Pokemones and where they did so (location).
Tips: Since you will become the twitter experts, you will join forces with project A (Poke Data) to realize the live-tweet miner.
Data: You will need the names of Pokémon. These can be obtained from project A or compiled by hand beforehand (in order to match the tweets).
- http://socket.io/ or https://github.com/sockjs/sockjs-node to implement a direct communication between server and client (a stateful connection, as opposed to state-less, which you usually implement with APIs).. And look at their example in the homepage (socket.io) :)
- https://developers.google.com/web/updates/2015/03/push-notifications-on-the-open-web?hl=en AND https://notifications.spec.whatwg.org/ AND http://w3c.github.io/push-api/ BUT MOST OF ALL https://documentation.onesignal.com/docs/getting-started
Milestones: 1. Collaborate with the map projects for these milestones:
- (low priority) Use twitter's STREAMING API to listen to live tweets about Pokémon and perform live sentiment analysis (classify the tweet as positive or negative) as well as storage of tweets location (if available) and any additional data (seen, missed to catch, disappeared,... Can be implemented using Natural Language Processing, ask Juan Miguel Cejuela for more info when getting to this stage). Don't save these tweets in a database, rather just in the current session opened by the visiting user.
- (low priority) Give the possibility to filter live streaming to listen to only a specific location + radius
- Detect spikes in frequency/density of tweets mentioning a certain Pokemon.
- Collaborate with the mapping projects to visualise the place where the PokeMob is forming.
- (low priority) Also visualize on the map what is the trending sentiment for the nearby pokemon.
2. Collaborate with project A for these milestones:
- Use twitter's traditional API to collect historic tweets about Pokémon (done by project A, but you will use *all* RAW tweets as project A will only consider tweets that have geolocation and are formatted in a certain manner) and classify them as positive or negative (by you), localization of tweet (if available), seen, missed to catch, disappeared,... Can be implemented using Natural Language Processing, ask Juan Miguel Cejuela for more info when getting to this stage).
- A graph showing sentiment trend over time for a Pokemon that is nearby.
- A graph showing sentiment trend from moment page is loaded to when page is closed for Every/A specific Pokemon Everywhere/In a specific area.
- An alert about a PokeMob
- Integrating map projects and tweets
Project D: PokeMap (2 groups X 3 students)
Description: The world of Pokemon GO is as big as our planet. Pokemons have been sighted on top of cliffs perched over oceans as well as in your next door coffee shop. We would like to create a world-wide interactive map that shows where Pokemons were predicted to appear. Each pokemon prediction you add to the map should have all relevant information including name, time the pokemon is predicted to appear, prediction confidence rate etc. The map should be filtered by a time range (i.e predicted to appear in the next day) as well as pokemon name and pokemon specie.
Data Sources: You will be using the data coming from the API build by Project A.
- Implement a map with leaflet using Open Cycle Map/Open Street Map as tile layer
- Visualize Pokemon’s location in the last hour on the map
- Visualize data about Pokemon (How it evolves, type, relationships to other Pokemons)
- GeoLocation of the user via web and calculation of most likely pokemon next to user
Outcome: An interactive map that visualizes the TLN predictions on a global scale.
Project E: Catch ‘em all!! (8 students)
Description: Now that we have tons of data about Pokemon (what they are, where they are, what’s their relationship, what they can transform into, which attacks they can perform, aso) we want to integrate it all into a comprehensive website.
This website should contain sections about each Pokemon and its details. Additionally, the website should register the user’s location and tell the user how close is that the predicted pokemon to him/her.
Additionally you will be incorporating the apps that were created by project B,C and D into the website. Your group will need to create automated builds and testing for this apps and use continuous integration to pull in new changes in the code repositories. Apps from projects B-D should be packaged and made available on NPM. Ideally when you completed these tasks the webapp component would integrate the apps by “requiring’ them.
Here is a possible user story: when a user opens the website or the app the current location of the user will be shown. Additionally, the website/app will show automatically where the pokemons that are currently active are and where the pokemons that we predict to active in the nearest future (i.e. within half a day) will be located (all of this will be available from the app developed in project D). Hopefully, the website will be somewhat crowded by that data. Then, there needs to be a menu bar or something available (e.g. above the map or on the right side to it) that will list currently active or predicted pokemons. Clicking on one of them will make other pokemons on the map disappear, except of this clicked one.
Separate web pages would allow the search and presentation of individual Pokemons and the information we gathered about them, including third party data (project A) and twitter analysis (project C)
Milestones: First you will need to put all the apps developed in Projects B, C and D into the website that is developed in Project E. In this project you will pull the code from each project repository, compile it with the set of dependencies and package the apps, so that they can be easily called from the web site developed in project E. Then you will create the website that will publish all apps and enable the user interaction described above as well as possible interapp interactions.
- Create the shell of an Model-View-Control website
- Create the menus that will load the apps created in projects B-D
- Create the user interaction controls for the apps integrated and enable them
- Create an absolutely awesome UX/UI for the website with a really useful homepage :)
- Make arrangements to host the web site on a host that can scale for traffic demands (preferably there should be no cost associated with this step. Check Heroku, AWS, etc). We (mentors) can * also help with university resources.
Seminar pre-meeting: June 30th at 1pm (Room: 00.08.038)
Seminar Dates (presentation week): September 19-23
|1||Sep 19||10:00||Language basics -- grammar, variables, data structures, control structures, conditionals, functions etc.||A||Samit Vaidya, Jonas Heintzenberg, Annette Köhler|
|3||Sep 20||10:00||The module pattern and AMD||B||Marcel Wagenländer, Matthias Baur, Timur Khodzhaev|
|4||Sep 20||11:00||The event handling system using using anonymous functions, callbacks, promises etc.||B||Benjamin Strobel, Siamion Karcheuski, Aurel Roci|
|5||Sep 21||10:00||Functional reactive programming frameworks||C||Karen Reyna, Hannes Dorfmann, Amr Abdelraouf, Philipp Dowling|
|7||Sep 22||10:00||The MEAN stack||D||Gani Qinami, Paul Gualotuna, Oleksandr Fedotov|
|8||Sep 22||11:00||Web development basics: DOM, DOM manipulation, styles||D||Elma Gazetic, Faris Cakaric, Timo Ludwig|
|9||Sep 23||10:00||Web development frameworks (Angular, Backbone, React)||E||Wolfgang Hobmaier, Georgi Aylov, Jochen Hartl|
|10||Sep 23||11:00||Data visualization using SVG, Canvas and framework libraries||E||Mustafa Kaptan, Gilles Tanson|
- RECOMMENDED VIDEO http://www.paulirish.com/2010/10-things-i-learned-from-the-jquery-source/