Javascript technology 2017

From Rost Lab Open
Revision as of 22:00, 19 February 2017 by Gyachdav (Talk | contribs)

Jump to: navigation, search

ANNOUNCEMENTS

  • 08.02.2017 The dates for the presentation week are set to March 22-24, 2017.
  • 03.02.2017 The dates for the seminar were moved by one week after meeting the students at the pre-meeting. This way students can fully concentrate on the seminar after completing their exams. New dates are announced in the Description section.
  • 03.02.2017 The date for the kick-off meeting was also moved, by one day - to February 16, 2017.
  • February 15th, 2017 9:00: Due to some technical issues with the TUM matching system, publishing of matching results will be delayed. We understand that this may affect your ability to decide whether you will need to participate in the kick off event on the 16th and we are looking into that. Please stay tuned by checking back on this page for additional information in a few hours
  • February 15th, 2017 15:40: Students preliminary matched to the seminar should have recieved an email to @tum.de or @mytum.de address from seminar tutors informing them about registration for the seminar and invitation to participation in the kick-off event on Feb 16th!
  • 17.02.2017: Find the code examples and PDF for the short JavaScript talk here: [1]


Plea: Please register for the seminar only if you will be able to attend all sessions taking place on March 22-24, 2017 and will complete the assignments set for your team prior the seminar takes place.

IMPORTANT: Prior to registration to the seminar, please send us an email to jstech@rostlab.org containing the following information:

  1. Your name
  2. A short description of a project you have worked on and that you have enjoyed the most. We are also interested in your role in that project and which technologies you have used.
  3. Contact data of an official who can vouch for your coding experience
  4. Your level of experience with JavaScript
  5. Your CV


Contents

Description

JavaScript Technology - participating students get hands-on experience with designing and building modern JavaScript applications. The students will research the literature for design concepts and available technologies including the use of common JavaScript libraries. The students will prepare presentations and introduce the concepts they chose to use. Each talk is summarized by the students in a seminar report.

This is a completely hands-on seminar which means that you should be building your own app, prepare a presentation that explains what you did and describes the JavaScript concepts you were using. Finally, the entire work will be summarized in a seminar report at the end of the term.

The students will be working in a highly agile environment, meaning that a collaborative work (communication!!) among all students will be essential for the successful completion of the project. Any results thus provided during the coding period of the seminar will need to be communicated and made available to peer coders asap!


Tutors: Dr. Guy Yachdav, Dr. Tatyana Goldberg, Christian Dallago, Kordian Bruck and Philipp Fent

Registration: Prior to registration on TUM Matching System, we would like to get to know you! Please send us an email to jstech@rostlab.org with the information listed on top of this page. Without this information we won't be able to secure a spot for you at the seminar.

Kick off event: Wednesday, Feb 15th at 6pm.Thursday, Feb 16th at 6pm

Location: Arnulfstr. 62 (an der Hackerbrücke)

We will provide an overview for the key JavaScript technologies that will be used during this semester’s project. Of course, this will be accompanied by great beers and wine, as well as food. The event will be a perfect opportunity to meet your tutors as well as fellow coders. Preferrably, we would form project groups at the event. There will also be a live concert that we will organize just for you!

We cordially invite all students of summer semester’s seminar to join the event. All other interested students are invited as well. IMPORTANT: please register here if you will attend the event.

Thanks to all who participated in the kick-off event on Feb 16th! Here are few impressions:

Find the code examples and PDF for the short JavaScript talk here: [2]

Coding period begins: Feb 20th Feb 27th

Feature freeze: April 2nd April 9th

Beta release: April 9th April 16th

Presentation week (in class, participation mandatory): March 20-24, 2017 March 22-24, 2017

Room: 00.13.009A

Pre Requisites

Students are expected to have:

  • Basic familiarity with JavaScript
  • Knowledge in at least one functional OR Object Oriented Programming language
  • Basic knowledge of relational databases and NoSQL databases
  • Interest working with big data
  • Interest in music
  • Interest in challenging themselves to do something totally cool
  • Participation in all meetings throughout the presentation week is mandatory. We would only consider one absence that is justified, documented and approved well in advance.

New This Semester: Social network of classical music data

We want to create the Facebook of classical music. In this social network, friendships can be viewed as either living in the same period of time or being taught by someone. Two authors writing for the same genre can be viewed as two people liking the same sport.

At the end of the seminar, we will have a tool that allows us to group composers, music pieces and musicians based on their likes or their friends.

The tool will be incorporated in a popular online resource that is accessed by over 100k people every day!

Preparation

Checklist to pass the seminar

  1. Send an email to jstech@rostlab.org with the information about yourself listed on top of this page
  2. Register on TUM Matching System for this seminar
  3. Each group will be assigned one topic and one project to present in the week from March 22th to March 24th. Please see the guidelines for topic and project presentations below.
  4. The slides for your topic presentation and the preliminary visualization of your project results are due for comments 1 week before the presentation date. Send your drafts to presentations to jstech@rostlab.org.
  5. Make sure to read these Hints and Rules for great presentations
  6. Submit a 5 pages long report (one per group) describing solutions to your topic (4 pages) and the project (1 page). Due: 2 weeks after the seminar.

Topic presentation

We prepared 6-10 different topics about JavaScript technology for this seminar. These will be assigned to groups of 3-4 people. The students are welcome to divide the work within their team as they wish.

Project presentation

We prepared 3 different projects as hands-on exercises.

Projects

Project A

Title: Data aggregation - filtering the web for entities

Description: In this project we will be scanning structured online resources such as DBPedia, Worldcat, MusicBrainz, IMSLP and other databases, as well unstructured sources such as the 72TB Common Crawl data set that is hosted on AWS. Common Crawl holds the largest current snapshot of the web.

We will be using the data sets to extract the following entities:

  1. Composers
  2. Music works
  3. Musicians and groups of musicians (people, orchestras, choirs, ensembles, etc)

These entities are like the pages on facebook: people, shops, places, sports or artists, and like them they have some attributes like: when where they founded, by whom, etc. More entities can be suggested by the students participating in the group or upon request by the other groups. You can look at https://musicbrainz.org/doc/Style/Artist (the section in the bottom called “Entities”) to get some inspiration.

The entities, together with their sources, will be stored in a database (with sources as a list, e.g.: Wolfgang Amadeus Mozart → [ https://musicbrainz.org/artist/b972f589-fb0e-474e-b64a-803b0364fa75, soruce2, …] )

Important::

  • Group 2 requires the vocabulary defined by Group 1 in order to find occurrences of entities in unstructured text
  • Group 2 will not wait for results from Group 1 to get started with the project. Define your own sample data set and start coding right away.

Assigned to groups: Group 1 (working on structured sources), Group 2 (working on unstructured sources).

Data sources:

Structured data sources:

  • DBPedia
  • Worldcat
  • MusicBrainz
  • IMSLP
  • Are you aware of other sources?

Unstructured data source:

  • Common Crawl data set

Tools:

  • No/SQL database, depending on final size either hosted on Azure, RostLab or the Music Connection Machine project.
  • Ready-to-use python scripts for mining some of structured databases (to mine other sources you will need to write your own JavaScript scrapers!)
  • Online available NLP tools for scraping unstructured data for entities
  • http://orange.biolab.si can be used to explore the data sources, e.g. to extract entities from unstructured data

Hints:

Milestones:

  • Define a vocabulary of entities based on data from structured sources (Group 1)
  • Try to use as many structured sources as possible (Group 1)
  • Scan the unstructured Common Crawl dataset and extract the web pages containing mentions of entities mined in the previous milestone. Store the web pages in the Common Crawl format as files on disk. Provide a database-based index for the files containing metadata such as the domain names, URLs, page sizes, and other extracted parameters. (Group 2)
  • Extend the vocabulary by mining unstructured data. For example, the co-occurrence of words “composer” and “XYZ” will imply that XYZ is a composer (Group 2)
  • Set up and populate a database for storing entities and their sources (Group 1, possibly using CouchDB so that it has a native API querying mechanism, or using Neo4J for out-of-the-box graph visualization capabilities).

Project B

Title: Defining relationships among entities

Description: The mentors and students will brainstorm for a number of words that define a relationship between entities extracted from project A. Some examples of relationships are:

  • Composer X was thought by Composer Y
  • Music piece A was written by Composer B for Composer C
  • Orchestra O played music piece P

Other ideas of relationships can be found here https://musicbrainz.org/doc/Style/Relationships although these relationships are targeted rather to contemporary (and not classical) music.

The relationships are like being friends on facebook, or liking the same band (which forms a relationship).

The students need to extract triplets of “object1 relationship object2” and list of sources for each such triplet, and add this data to the database created by groups of Project A.

If many pages provide support to a certain relationship, we can provide a score for how trustworthy a relationship is. We can implement this score as a pagerank, similar to how Google certifies that a search result is reliable or not.

Assigned to groups: Group 3, Group 4

Both groups will work on the same task in parallel. The solution of one of two groups will be used for the final tool.

Data sources: Database created in Project A. Do not wait for the groups working on project A to complete the project. You can start working on your project using some sample data.

Tools:

  • Online available NLP tools for scraping unstructured data for entities
  • http://orange.biolab.si can be used to explore the data sources, e.g. to extract relationships from unstructured data
  • D3 library for the visualization of statistics

Hint: https://musicbrainz.org/doc/Style/Relationships gives some examples of relationships (but in contemporary music!! Classical music is different ;) ).

Milestones:

  • Define a list of terms for relationships between following entities:
    • Composer - Composer
    • Composer - Music Piece
    • Composer - Musician
    • Music Piece - Music Piece
    • Music Piece - Musician
    • Musician - Musician
  • Mine the sources provided in Project A for the relationships and assign for each relationship a reliability score (aka pagerank). For example relationship mined from scientific articles will score higher than those from user forums.
  • Add relationships for the entities from Project A and their scores into the database of Project A
  • Provide statistics on data sources from Project A and their reliabilities. For example, how many sources are scientific articles, how many relationships we have in general, etc. Provide a nice visualization for the statistics.
  • Provide a documented REST API for the Groups 5 and 6 for accessing the mined data.

Project C

Title: Visualization - creation of social media page and integration with the Petrucci Music Library

Description: In this task Groups 5 and 6 will develop a google-like interface that allows users to query our result database for entities (composers, music pieces, musicians). Once entered a term, the user will be forwarded to a page where all relevant information stored by groups of Project A and Project B will be provided. Results will be presented using tables and also as a navigable graph. There shall be three different views allowing to see relationships to i) composers, ii) music pieces and iii) musicians. Each of these three views will provide a list of relationships (edges) and the entities (nodes) that are leading to. For example here where we see a graph like visualization on top, and a list view by scrolling down.

The website shall also provide an interface for users to submit their comments (using some star or +1 system, and optionally free-form text input for providing valuable insights going beyond what can be expressed with a binary or numeric value).

Assigned to groups: Group 5, Group 6

Both groups will work on the same task in parallel. The solution of one of two groups will be used for the final tool.

Data sources: Database created in Project A and Project B. Do not wait for the groups working on those projects to complete them. You can start working on your project using some sample data.

Tools

  • Elasticsearch can be used for the omni-search page (and it integrates with Neo4J and couchdb :) )
  • D3 and/or Cytoscape.js for the graph visualization

Hint: You will have to define a comprehensive search mechanism: typing an author and a music piece should be a valid search and result in the music piece. Typing in the music piece alone should also produce the right result.

Milestones:

  • Develop an interface for google-like searches
  • For each search entity extract data from the database of Project A and Project B
  • Visualize the relationships of an entity using a graph. The graph shall be viewed in three modes, showing relationships to i) composers, ii) music pieces, iii) musicians
  • Each view mode of the graph shall be accompanied by a text summarizing the relationships and entities
  • The graph edges representing relationships should be linked with the underlying supporting data mined from the Internet and the reliability (aka pagerank, see Project B). Users should be able to judge for themselves whether a connection provided by the system has merit or represents a mistake of the algorithm or an unfounded opinion found online.
  • Provide a visualization for the output of the document clustering task from the Project A.
  • Provide an interface for users to submit their feedback
  • Package the visualization functionality as a web widget that can be loaded on a third-party website, e.g. by the Petrucci Music Library (IMSLP). The users of the IMSLP should be able to access the data straight from the music piece or composer pages.
  • Depending on the size of the filtered dataset, cluster all, or only a random subsample of the collected webpages using a generic document clustering algorithm, e.g. using gensim. This will be first-in-kind analysis of the distribution of information about classical music online. Multiple applications can benefit from this analysis. For us it is also easy to use these results, as for example we could restrict our graphs only to information mined from certain clusters of documents and not others (for quality or exploratory reasons) (Group 2)

Presentation Themes

Theme 1

Title: Continuous Integration and Testing with JavaScript

Description: This topic is all about the software development cycle with JavaScript. Starting with behaviour driven development and unit testing (e.g. with Mocha), we then take a look at larger-scale integration testing and especially continuous integration strategies (e.g. with Travis). To finish the testing cycle, we take a look at system and deployment testing (e.g. with Docker).

Theme 2

Title: JavaScript on the server. A Node.js crash course

Description: JavaScript’s not only a browser language. In this talk, we give a brief intro to the server side application development with Node.js. We’ll talk about the Node eco- and buildsystem (e.g. npm, gulp, webpack) and some commonly used frameworks for Node.js web development (e.g. Express.js, Mojito, Hapi, etc.). We’ll also give an overview of the event model and the event loop Node uses to serve mans concurrent users. (Check topic 2 to not have any overlaps)

Theme 3

Title: Data Driven Documents: Data visualization using D3.js

Description: In this topic, we take a look at the most popular data visualization library for JavaScript: D3.js. We’ll first take a look at what Data Driven Documents are, why and how to use them. This is a very visual topic, so apart from the concept, there will be a focus on the look & feel of D3. Also present a few alternatives that are better or worse than D3 to see what to use in your own projects in the future.

Theme 4

Title: RESTful APIs with the JavaScript Object Notation

Description: JavaScript also introduces one of the most used data exchange formats: JSON. JSON is commonly used to communicate between client and server, utilising the Facade Pattern. We’ll give a rationale, what a RESTful API is and where it is used. Then we’ll take a look at the various JavaScript frameworks to create, document and maintain those APIs (e.g. LoopBack, restify, swagger etc.). We will try to present the various pitfalls when it comes to versioning, maintaining and acceptance of RESTful APIs.

Theme 5

Title: The Storage Question: To SQL or NoSQL?

Description: In a data driven application, one key question is where to persist the data. In the JavaScript world, there are a lot of possible storage concepts: relational, filesystem based, document stores, key-value stores, graph databases and others. Not every type of data fits each database. In this topic, we’ll take a look into the differences and when to use which. Especially what the differences between session and local storage is and when to use cookies. In addition we will see how WebSQL and Sequilize impact the current development of Javascript applications. Finally, we will see if and how we can handle big data with Javascript tools in the frontend as well as in the backend.

Theme 6

Title: People are reactive, why shouldn’t our apps be

Description: Reactive programming is a new paradigm on top of “classical” promises. Each user interaction, network communication and elemental influence can be seen as a stream of events. Smart minds are suggesting that reactive programming might be the solution to problems in UI programming. We will see if reactive programming and the various Rx* frameworks actually solve problems or introduce too many complexities to the already complicated Javascript world.

Presentation Schedule

Group Date Time Theme Project Assigned to GitHub PDF
1 March 22 10:00 Topic: TBA A: Data aggregation - filtering the web for entities Student 1, Student 2, Student 3, Student 4
2 March 22 12:30 Topic: TBA A: Data aggregation - filtering the web for entities Student 1, Student 2, Student 3, Student 4
3 March 23 10:00 Topic: TBA B: Defining relationships among entities Student 1, Student 2, Student 3, Student 4
4 March 23 12:30 Topic: TBA B: Defining relationships among entities Student 1, Student 2, Student 3, Student 4
5 March 24 10:00 Topic: TBA C: Visualization - creation of social media page Student 1, Student 2, Student 3, Student 4
6 March 24 12:30 Topic: TBA C: Visualization - creation of social media page Student 1, Student 2, Student 3, Student 4

Recommended literature

  1. JavaScript: The Definitive Guide, 6th Edition http://shop.oreilly.com/product/9780596805531.do
  2. (Highly recommended:) JavaScript: The Good Parts http://shop.oreilly.com/product/9780596517748.do
  3. http://www.htmlgoodies.com/beyond/javascript/some-javascript-object-prototyping-patterns.html
  4. http://www.adequatelygood.com/JavaScript-Module-Pattern-In-Depth.html
  1. http://jquery.com
  2. http://d3js.org
  3. http://raphaeljs.com
  4. http://nodejs.org
  5. http://jqueryui.com
  6. http://www.jslint.com/lint.html
  7. http://jsfiddle.net
  8. http://www.crockford.com
  9. http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf
  10. http://www.sitepoint.com/creating-sentiment-analysis-application-using-node-js/
  11. Advanced Reading JavaScript Garden - the most quirky parts of the JavaScript programming language https://github.com/BonsaiDen/JavaScript-Garden/tree/master/doc/en
  12. RECOMMENDED VIDEO http://www.paulirish.com/2010/10-things-i-learned-from-the-jquery-source/
Personal tools