The Bioinformatics Lab SS 2013

From Rost Lab Open
(Redirected from The Bioinformatics Lab)
Jump to: navigation, search
AKA Bioinformatician Lab

This practical is a hands-on training that will make you successful in a bioinformatics lab! This term we focus on virtualization and packaging software for Debian/Ubuntu.

Material from previous courses is available at The bioinformatics lab SS 2012 and The bioinformatics lab SS 2011. There you can find many hints and tips and the protocols and presentation slides from last year. You may use them, but especially your protocols and presentations you have to prepare with your own content and own style!

Type Practical (10 SWS)
Ects 8.0
Level Bachelor and Master Bioinformatics
Lecturer Dr. Laszlo Kajan, Dr. Lothar Richter, Matúš Kalaš, Timothy Karl
Place Practical room MI 01.08.021, Garching
Time
Tuesdays 15:00 - 18:00
16,23,30/04; 7,14,21,28/05, 4,11,18,25/06, 2,9,16/07
Invited speaker Matúš Kalaš (BioXSD) 7,14/05
Visit to the LRZ 09/07 15:00-16:00 Main entrance
Language English

Contents

Description

Get hands-on training that will make you successful in a bioinformatics lab. This term we are also going to dwell into virtual systems and high availability.

  • Debian/Ubuntu
  • E-mail
  • User accounts (LDAP)
  • (Network) file systems
  • Mediawiki (content management)
  • Version control
  • Compute cluster
  • Virtualization (preparation for cloud computing)
  • Packaging applications as a way to provide reusable software components

All in the context of everyday work in a bioinformatics lab like the Rost Lab.

Organization

Requirements and features of a new service will be discussed with the tutors every week. After this the students have to set up the service. Students will build up their own servers running services important for Bioinformatics labs.

The goal of the presence time (Tue 15:00 - 18:00) is to raise awareness of issues pertaining the topics at hand. In order achieve successful discussion students are required to prepare with critical background knowledge. Therefore, in addition to the programming challenges, some theoretical preparation is required. In the list of topics links to fitting literature are provided.

Note: please bring your laptops to the sessions - it is much more convenient to work this way.

Criteria to Pass

Students have to collect points during the course. Tests or reviewed programming challenges will be announced in advance.

For example, every student has to write at least one protocol and to give one experience presentation at the beginning of a session. Furthermore, there will be two Linux command line tests.

Topics

A preliminary list of topics is available on this page, under the links to previous courses.

Application

The number of participants is limited to 12 students (of those who need the credits of the course; and up to 2 'stand-by' students). Please register via email to Laszlo Kajan as soon as possible: first-come first-serve. Use a descriptive subject, as Laszlo's email address is spammed very heavily, and filtering works better with meaningful 'subject' lines.

Host names

  • Course computer:
    i12r-tbl.informatik.tu-muenchen.de IN A 131.159.28.37
    Note: ssh server from WAN is on port equal to first 4 digits of Garching ZIP code
Domain uidNumber/gidNumber Base L0 Port Base MAC (Ethernet) Address Base L2 MAC Address Base IP Address Base L2 Virtual Machine IP Address Base
tbl 1000 10000 DE:AD:00:00:00:00 DE:AD:00:00:00:30 192.168.16.0 192.168.16.30
User uid/Host Name Address Offset
Laszlo lkajan 2
Alexander alex 3
Jonas jonas 4
Julia julia 5
Katharina kath 6
Manuel manuel 7
Matthias matthias 8
Michael michael 9
Nikolaos niko 10
Robert robert 11
Sebastian seba 12
Shen shen 13
Sonja sonja 14
Tikira tikira 15
Uwe uwe 16
Verena verena 17

Student Lab Room 01.08.021 Network For Student Notebooks

  • Throughout the room, there are free network cables labelled with either NB-* or TBL*. These cables are connected to the FMI student network.
  • There is also a wireless network with SSID: FMI available.
  • USE DHCP FOR NETWORK SETTING ASSIGNMENT! DO NOT USE ANY STATIC SETTINGS!
  • All connected users have access to the university network without any modifications.
  • You need SSH access, so you must use the LRZ VPN CLIENT.

Semester Challenge

Semester challenge

Course Introduction

  • Date: 2013 / 04 / 16
  • Topics: who you are, who we are, goals of course, assignment of presentations, setting of passwords

Questions

  • Do you have a laptop? Bring it if you like. Especially if it has a terminal emulator, X server and VNC server on it. Any recent Linux will do. Some proprietary operating systems are known to work as well.
  • Do you have experience with free software? With anything free for bioinformatics?
  • Have you ever written a program others (also) wanted to use?

Links for Preparation

Presentation

The Bioinformatics Lab 2013 - Introduction File:TBL2013 Session1 introduction.pdf

Free Operating Systems, Debian, Stable Release

  • Date: 2013 / 04 / 23
  • Topics: Linux distributions, rpm vs. deb, Debian vs. Ubuntu, stable vs. bleeding edge; decisions to make at the time of system installation; disk partitioning, LVM, RAID, iSCSI; choice of file system; using Debian packages; virtualization basics.

Questions

  • What is Linux?
  • What are the leading Linux distributions?
  • What is free software?
  • How is Debian related to, and how does it differ from Ubuntu, a derivative? [DER][DEB]
  • What are the Debian releases?
  • What is in a (server) computer in terms of hardware?
  • Where can you boot a (server) computer from for installing an operating system?
  • What happens when you install an operating system?
  • How to distribute 24 disks between 6 independent projects?

Free Discussion

  • How do you define specifications for new hardware? - server roles and vendor lock-in.

Links for Preparation

Presentation

Programming Challenge

  • Install and configure a Debian stable base system in a virtual machine.
    1. Create a maximum 8G image for the virtual machine.
    2. The MAC address for the virtual machine is DE:AD:00:00:00:<offset>.
    3. Use 'kvm' for virtualization with no more than 1.2GB of RAM.
    4. Connect to the VDE virtual network with these kvm arguments:
      -net vde,sock=/var/run/kvm0.ctl -net nic,macaddr=$MAC
    5. Install Debian Stable on the virtual machine from the installation CD image at /home/tbl/debian-6.0.7-amd64-netinst.iso. For faster installation, do not select to install a 'Desktop Environment' but do select the 'SSH Server'.
    6. Configure the virtual machine with a static IP address and host name according to the table above. Netmask: 255.255.255.0 , gateway: 192.168.16.1, name server: 192.168.16.1, domain: tbl, search: tbl.
    7. Change the user and group ID of yourself (in your virtual machine) to the numbers in the table above.
    8. Start your virtual machine in the background and connect/reconnect to it via VNC and SSH.
  • Get familiar with vim through the 'vimtutor' command.

Hints and tips

  1. Choose English when installing Debian so Laszlo can help.
  2. You will need X forwarding (ssh -X) when running kvm unless you use the vnc solution to provide a virtual console to your virtual machine (VM).
  3. kvm-img is convenient for creating the image - virtual disk of your virtual machine. I recommend the 'raw' format.
  4. A MAC address is of the form '%02X:%02X:%02X:%02X:%02X:%02X'. Everyone needs a unique address on our virtual network.
  5. man kvm - read this man page. kvm starts the hypervisor, a process that runs your virtual machine. kvm provides the virtual hardware your virtual machine runs on. The parameters on the kvm command line describe what the virtual hardware should be like. More about virtualization later (session 3).
  6. The 'virtio' interface for disks and network interfaces is recommended.
  7. 'writeback' caching is recommended for virtual disks.
  8. We use Virtual Distributed Ethernet (VDE) for our virtual network. Use these kvm parameters (with your MAC address) to configure a network interface attached to this network:
    -net vde,sock=/var/run/kvm0.ctl -net nic,macaddr=de:ad:00:00:00:??,model=virtio
  9. Configure a USB tablet device for the virtual machine so that you can use the mouse pointer easily.
  10. In case you have a Mac, you may encounter key mapping issues. Thanks to Christian, the solution is to use the -k kvm argument with de if appropriate. The way to define a libvirt domain with this parameter is to have this in the XML of the domain: <qemu:commandline><qemu:arg value='-k'/><qemu:arg value='de'/></qemu:commandline>.
  11. When running your virtual machine in the background, use a VNC display to see the VGA display of your VM. VNC is also much more suitable than an X window for not-very-fast network connections.
  12. You may want to give the VNC session a password: (-vnc :0,password - password literally, replace 0 with your number (last to digits) in the table above). You will have to set the password in the hypervisor monitor. In order to access the hypervisor monitor, redirect it to a UNIX socket: -chardev socket,id=monitor,path=$HOME/debian-stable.monitor,server,nowait -mon chardev=monitor,mode=readline. The hypervisor monitor allows you to change parameters of kvm while it is active. Connect to the UNIX socket of the hypervisor monitor with: nc -U $HOME/debian-stable.monitor. Now you can use change vnc password in the hypervisor monitor to set the VNC password (just type it into that nc command you started).
  13. In order to connect to the VNC virtual display, you have to forward a port of your laptop to i12r-tbl, for example like this: ssh -L 5902:127.0.0.1:5902 i12r-tbl. Then use a VNC client such as 'xvnc4viewer' to connect, like this: vncviewer :2.
  14. Create a shell script with all the kvm parameters in it, so that you can start your virtual machine easily.
  15. Think about what would happen if you started two instances of your virtual machine (do not actually try this!).

When the initial installation is complete:

  1. Edit the kvm command line and remove the arguments that 'attach' the CD-rom install image to the virtual machine - you are not going to need this any more.
  2. You have to indicate that your file system image file (the 10GB file) is bootable. If you use -drive to specify the virtual disk (instead of say -hda), add boot=on to the parameters of the -drive argument or you get No bootable device..
  3. In case you see name resolution problems (e.g. Could not resolve 'ftp.de.debian.org'), you have to edit /etc/resolv.conf and set the name server, plus optionally the domain and search into it (as given above): use man resolv.conf to learn how to put these into that file.
  4. Choose English as the default language of your system - use 'dpkg-reconfigure locales', add 'en_US.UTF-8' to the list of locales and make it the default.
  5. You can change keyboard layout in the terminal with 'loadkeys'.
  6. Install the 'vim-nox' package, call the 'vimtutor' command and start learning vim UNLESS you are proficient with emacs.

Advanced Challenge

Get a desktop (graphical) environment installed in your virtual machine. Watch the space: 10G may not be enough for installing all the recommended packages and games!

Challenge Presentation

Linux Proficiency, Version Control Systems, 'tar.gz' Packages

  • Date: 2013 / 04 / 30
  • Topics: Linux proficiency, editing text efficiently in a text terminal (vim), shell scripts, version control systems (svn,git), preparing tar.gz distributable archives

Questions

  • What is (shell) input output redirection, and what is it good for?
    • What are shell pipes, and what are they good for?
  • What is the purpose of version control systems?
    • What are the most common operations in a version control system and what are they for? (status, add, remove, commit, log, diff, merge, revert)
  • What is a 'tarball/tar.gz' package?
  • What does 'make(1)' do?
    • Can 'make' make use of multiple cores/CPUS for parallel builds?
  • What are standard 'make' targets? What do they do? (info make)
  • What is a 'staged installation'? (info automake)
  • What does 'architecture dependent/independent' mean? (give examples)
  • How to prepare software for distribution:
    • Compiled code (C/C++)?
    • Perl and Python?
    • Java?
  • What is Plain Old Documentation (POD) format? (man perlpod)

Free Discussion

  • What makes you proficient on the command line? (common commands, input output redirection, pipes, shell syntax)
  • What useful features should a powerful text editor have?

Links for Preparation

Presentation

Programming Challenge

Write a short program that reads text from a file, removes all spaces and writes the result back into a file. Document the modules and executables with man/info pages (POD format recommended). Create a distributable tar archive of your solution (code, examples and documentation).

  • Use either autotools to package the compiled version or
  • the appropriate mechanism for Perl or Python for the script version.
    • For Perl packaging Module::Build or ExtUtils::MakeMaker are appropriate.
    • If you solve the challenge with Python or Java, be ready to present your solution to the group.
  • Make sure that:
    • If you use autotools, your package complies with the GNU standard: do not use the 'foreign' option.
    • The 'make distcheck' (checking staged installation) command succeeds.
  • Staged installation works (in both cases).

make Challenge

This challenge is to be solved by creating a file Makefile that is processed by the make command. There is no need for autotools here.

  • Have your full name in file full.name.

Create a Makefile that:

  • Is executable and is processed by make like make -f Makefile [ARGS]. (hint: have a #! line on the top and use the -f make argument).
    E.g. './Makefile' should invoke the 'all' target, './Makefile clean' should invoke the 'clean' target.
  • Has rule(s) to create files first.name and last.name from the full name (hint: man cut).
    E.g. './Makefile first.name' should produce the expected name in file first.name.
  • Has rule(s) to create files *.xxd with the hex dump of all *.name files (hint: man xxd).
    E.g. './Makefile last.xxd' should produce the hex dump of last.name in file last.xxd.
  • Has rule(s) to create checksum files *.chk for all *.name files (hint: man sha1sum).
    E.g. './Makefile full.chk' should produce the checksum of full.name in file full.chk.
  • Has a clean target to remove all generated files.
  • Has a default target named all that prints out the filename and respective contents of all *.xxd and *.chk files.
  • Has a help target that lists the available targets upon invocation (hint: help should be made a prerequisite of the .PHONY target).
  • Take advantage of 'pattern rules' and variables, assigned with 'substitution references' when possible.

For the curious among you: if you run make with -j4 and insert some sleep statements and run top or ps or pstree, you can observe the parallel make processes that are spawn.

When ready with the challenges, send Laszlo the completed tar.gz packages (the result of make dist/make distcheck or equivalent) and the solution of the make challenge (the Makefile).

Get familiar with terminal-based text editors. We recommend you implement this programming challenge using vim.

Hints and tips

Makefile
autotools
  1. Edit your package sources list (/etc/apt/sources.list) and enable the 'contrib' and 'non-free' sections of the repository: add contrib and non-free after 'main' on each deb and deb-src line. Refresh the package cache.
  2. Install the 'make', 'make-doc', 'automake' and 'autoconf-doc' packages: these provide automake, autoconf and the info documentation.
  3. Learn to navigate the info browser (do info automake, press '?' and read).
  4. Read section 1 Introduction and 2 Autotools Introduction up to and including 2.2.4 Standard Configuration Variables.
  5. Follow the examples (e.g. 'zardoz') in the automake info to create your Makefile.am and configure.ac. You will want to have at least these macros in your configure.ac:
    AC_INIT
    AM_INIT_AUTOMAKE
    AC_CONFIG_FILES
    AC_OUTPUT
    Use the documentation to find out more about these.
  6. You can use the --prefix ./configure option to test the install target at a custom location (e.g. --prefix=/tmp/test) - very useful for non-root testing of the installation (you would normally not test as root!).
  7. I recommend you use the pod syntax to create the man page. Install the 'perl-doc' package to gain access to the 'perlpod' manpage. Read: man perlpod; man pod2man.
  8. Create rules in Makefile.am to have make generate the manpage for your script from a .pod source
  9. If your program also contains scripts, use the SCRIPTS primary in addition to PROGRAMS.
  10. Use the DATA primary to distribute the .pod source and the MANS primary to install the man page.
  11. Make sure the .pod source is not installed but the generated man page is (use the automake 'dist' and 'noinst' prefixes as appropriate).
  12. Make sure your package passes the make distcheck test.
Perl

While it is possible to package Perl with autotools, it is not convenient, especially when it comes to setting module installation paths. Use one of Perl's ways to package Perl code:

  • Prefer Module::Build over ExtUtils::MakeMaker. man Module::Build.
  • You can use '--install_base' (with Module::Build) to change the base/root directory of the installation - useful for non-root installations and testing. If you use ExtUtils::MakeMaker, use PREFIX.

Note for the future: it is not difficult to have a Module::Build/ExtUtils::MakeMaker solution inside a bigger autotools package. This way you can take advantage of the strengths of both build systems.

Advanced Challenge

Package both a compiled (e.g. C, C++, Fortran) solution (with autotools) and a script (e.g. Perl, Python) solution of the challenge.

Advanced+ Challenge

Express functionality of the compiled and script solutions in a library (use libtool) or module. Use Doxygen to generate documentation for your C/C++ work.

Challenge Presentation

Challenge Evaluation (0-5 points)

  • tar.gz:
    • Faulty (invalid syntax) Makefile.am (causes make distcheck to fail): -1.5 (once)
    • Sources are removed during 'make clean' or equivalent: -1 (once)
    • Executable ./setup.py or equivalent in archive with no '#!' line: -1 (once)
    • No documentation (e.g. man page): -1 (once) waived
  • make:
    • No .PHONY for obviously .PHONY type targets: -0.5 (once)

Hypervisors, Virtualization API, Cloud Computing Platforms

  • Date: 2013 / 05 / 07
  • Topics: x86 hardware virtualization, QEMU/KVM, virtualization API libvirt, cloud computing platforms

Questions

  • What is emulation, what is virtualization?
  • What is QEMU, what is KVM?
  • What hardware can KVM virtualize? (CPU, hard drive, graphics card, network card, etc.)
  • What is level 0, level 1 and level 2 in nested virtualization?
  • What is the role of the 'virtio' disk interface, 'virtio' network interface model type and 'vmware' VGA card type?
  • What disk image types and formats are usable with KVM? What are their advantages and disadvantages?
  • What is libvirt? What does domain mean in the context of libvirt?
  • What does it mean to migrate a virtual machine?
  • What are the requirements for migrating a virtual machine?
  • What are cloud computing platforms? (OpenStack, Eucalyptus, OpenNebula)

Links for preparation

Presentation

Programming Challenge

Familiarize yourself with libvirt. Unfortunately the libvirt documentation is quite scary. Fortunately they do have very useful examples hidden among the scary bits - use them. Also there is 'virt-manager', a GUI front end, that hides the scary bits! You can perform the libvirt part of this challenge entirely using 'virt-manager' if you have a graphical interface installed, saving you from all the scary bits.

  • Define a domain [3][4] with bridge-to-LAN [5] networking and
  • install a (small) system into it that you can ssh into.
  • Use the 'L2 virtual machine IP address' column in the table to assign a static IP address for this virtual machine.
  • Send me the domain definition of your L2 virtual machine and keep it up long enough for me to be able to ping it.
  • If you plan to do the advanced challenge, read that challenge before starting to solve this challenge.

Hints and Tips

  • Start with creating a bridge interface (say br0) in your (level 1, L1) virtual machine:
    • Don't do this in an ssh session but rather in a VNC console as a mistake may cut your connection - you will be reconfiguring the network connection.
    • Install the bridge-utils package, then man bridge-utils-interfaces and follow the first example EXCEPT for bridge_ports: instead of all, give the name of your network interface (most likely eth0, you can find out by looking for the interface name in your /etc/network/interfaces).
    • Use the address, netmask asf. from the ethX interface to configure the bridge and disable the automatic configuration of the ethX interface (comment out the auto ethX line - it will become a port of the bridge). Make sure br0 is up (ifup br0) before you continue.
  • I recommend you solve this challenge using virt-manager, the GUI tool. For this you will need a running graphical interface. You can follow Julia's slides (2012), or here is how I got mine going:
    • Add kvm arguments -vga vmware to the call if you do not have it yet and restart the (L1) virtual machine.
    • Nested virtualization works reliably now. You may need '-cpu Opteron_G4' on the level 0 kvm command line to enable this.
    • Install packages gnome-core and xorg. Then running a simple startx in a VNC session should get you into the graphical interface.
  • Install packages qemu and qemu-kvm (before the next step).
  • Install packages libvirt-bin and virt-manager.
  • Download a Debian netinst image into your L1 virtual machine.
  • Put yourself into the kvm, libvirt and vde2-net groups (use usermod or edit /etc/groups and /etc/gshadow).
  • Start virt-manager (in the graphical interface) and click the icon Create new virtual machine and follow the steps:
    • The name should be <yourname>-l2, e.g. lkajan-l2.
    • Choose Local install media and use the netinst image you downloaded.
    • OS type and Version does not really matter.
    • 512M RAM is well enough.
    • Choose to create a disk image on the computer's hard drive, but make it small: 1GB should be enough.
    • In Advanced options choose to Specify shared device name and give your bridge device: br0.
    • Set a new and unique MAC address, choose architecture x86_64.
    • Click Finish: the Debian installer is booted and you can install a system as before, just much more slowly - fortunately you can leave your VNC session and reconnect to it. Install an SSH server (and standard tools) into your L2 virtual machine, but no more.
  • virt-manager has defined the domain for you. You can look at it like this (in your L1 virtual machine): virsh dumpxml <yourname>-l2. Also check out View->Details in the virt-manager menu: you can add and remove the hardware of your virtual machine, you can connect and disconnect a CD ROM image.

Advanced Challenge

Now this will be fun but complex:

  • Successfully migrate an L2 virtual machine from your L1 virtual machine (acting as host) to a friend's L1 and back.

See hints and tips below:

  • Think how to provide the same environment for the L2 virtual machine on both yours and your friend's host:
    • Use iSCSI to provide the volume for disk device of the L2 virtual machine:
      • Set up an iSCSI target and initiator in your L1 VM and have an initiator also in your friend's (connected to the same iSCSI target). 1GB should be enough for this volume, but I would not use a loopback device in the L1 for this because it may be very slow. Instead create another (fast, so raw) 1G device in the L0 (so the tbl) machine and configure this as a second disk to your L1 virtual machine, then make this second disk the iSCSI target volume.
    • Packages for iSCSI: iscsitarget and iscsitarget-dkms for the target, open-iscsi for the initiator.
    • Configure the same bridge interface on both L1 hosts, say br0.
    • Enable the libvirt daemon to listen on tcp for both L1 hosts, check:
      • /etc/libvirt/libvirtd.conf: listen_tcp, set auth_tcp = "none" for simplicity (we trust everyone on our LAN)
      • /etc/default/libvirt-bin: libvirtd_opts
  • Proceed to install the L2 virtual machine into the iSCSI volume either with virt-manager or manually. You may find debootstrap (from debootstrap package) useful in the latter case, but do not forget a kernel and boot loader.
  • When your L2 VM is up and your friend's L1 is configured (network, storage) to receive the virtual machine, attempt the migration. This is truly spectacular when performed from a virt-manager that is connected to both L1 hosts: use qemu+tcp://<friend>.tbl/system to access his/her machine (or qemu+ssh, but then you will need root access to the other machine). Good luck with this!

Challenge Presentation

Example Linux Proficiency Questions

What command would you use to:

  • remove an empty directory
  • remove a potentially filled directory
  • remove all files with '.pl~' extension in a directory tree
  • switch the group write permission on on all files that match the '*.pl' extension in a directory tree
  • list a directory with files sorted on modification time in reverse (newest on bottom)
  • copy a directory tree to another location in an 'archiving' way
  • copy a directory tree to another computer in an efficient way, supposing some of the files are already present on the remote system
  • create the directory /tmp/test/src/linux with one command when only /tmp exists
  • open a man page file in your present directory that is not within the regular man path
  • print your PATH environment variable; what is the function of the PATH environment variable?
  • add your present working directory to your shell search path
  • look at the contents of a text file (name at least two tools)
  • compare two text files
  • list your environment
  • list variables in your environment that are exported
  • kill a process
  • list all 'bash' processes running on your system in user-oriented format
  • temporarily suspend a process
  • resume a temporarily suspended process
  • background a suspended process
  • look at the top processes with respect to memory usage or CPU usage
  • list all ext4 type mounted file systems
  • temporarily mount a fat file system from device sdb1 to a temporary mount point
  • bind-mount /srv/raidarray/project to /srv/nfs4/project
  • eject a cd-rom
  • power off your computer
  • reboot your computer
  • examine the exit status of the last foreground command you executed
  • suspend and resume a process not attached to your terminal


XML, XSD, BioXSD and Ontologies for Interoperability in Bioinformatics

  • Date: 2013 / 05 / 14
  • Topics: XML, XML Schema, semantic annotation using ontologies, Web services (and Web applications), Web Services Description Language (WSDL), Representational State Transfer (REST), Simple Object Access Protocol (SOAP), interoperability
    • Note: we will have a guest speaker, Matúš Kalaš, the author of BioXSD, for this session. This session is being shaped by László and Matúš, and is not yet final.

Questions

  • What is a simple XML document like?
  • What are the practical implications of using XML for input and output?
    (Explain why there is no need to write an input parser. How can you get your input and output conveniently represented in your program? - see Hints and Tips below for ideas.)
  • What does an XML Schema look like (e.g. BioXSD)? What is an XML Schema good for? What does it mean that an XML is a valid instance of an XML Schema?
    (Explain how the XML schema describes a class of XML documents, and can be used to validate instances of that class.)
  • How do you use semantic annotation in an XML schema using an ontology? What is this good for?
    (Show an example of semantic annotation on the BioXSD schema.)
  • What is a Web service, and a Web service client?
  • How do you describe a Representational State Transfer (REST) style Web service with WSDL 2.0?
    Explain (not very deeply) freecontact_rest.wsdl.
  • How does BioXSD and EDAM help interoperability in bioinformatics?

Links for preparation

Presentation

Programming Challenge

  1. Implement output, and optionally input converters for BioXSD XML IO for the PredictProtein component(s) assigned to you.
    • I recommend you make one executable dedicated to output conversion, that reads the input file on standard input, and writes the XML on standard output.
    • Place your XML schema (XSD) into http://i12r-tbl.informatik.tu-muenchen.de/~YOURLOGIN/ and use this URL in xsi:schemaLocation.
  2. Package your format converter for Debian.
  3. Extend the debian/upstream file (for each package) with 'Software' meta data for the PredictProtein component(s) assigned to you, considering the predictor and the converter as one unit.

You will find a version of your PredictProtein components for Wheezy(stable) either in the Debian repository you already have configured, on in the Rost Lab repository. Follow the instructions in the link before to add the Rost Lab repository to your package manager.

Evaluation Criteria

  1. XML document created by output converter has to be valid with 'StdInParse -n -s -f -v=always < your.xml'.
  2. XML conversion solution has to be packaged for Debian with no 'lintian' errors or warnings. (But look out for false lintian warnings: these are not uncommon. Document them in debian/README.source.)
    • Run lintian like this: lintian --color=always --display-experimental --display-info --pedantic --show-overrides
  3. The debian/upstream file has to be extended with upstream 'Software' metadata. This file is in YAML format. Use this validator to get the syntax right. Consider the PredictProtein component and the XML converter as one unit.
  4. The Debian source package of the XML converter solution has to be sent to Laszlo for evaluation.
    • The source package has to build (with e.g. debuild) to the binary package of the expected function.
    • Send the following files:
    1. package_*.orig.tar.?z
    2. package_*.debian.tar.gz # for a debian/source/format = '3.0 (quilt)' package - you all should use this format
    3. package_*.dsc
    4. package_*.changes
  5. All group members should contribute equally.

Groups

# Members Packages Status
1 Manuel, Shen, Michael profbval A2M Representing A2M format with BioXSD DONE
2 Uwe, Robert, Jonas ncbi-seg profisis DONE
3 Matthias, Julia, Sebastian norsp predictnls DONE
4 Katharina, Alexander, Nikos norsnet disulfinder DONE
5 Sonja, Tikira, Verena ncoils proftmb DONE
Laszlo profphd

Hints and Tips

  • Install the packages assigned to you into your virtual machine.
  • If you can not install the software assigned to you, configure the Rost Lab package repository for 'apt', and install from that repository.
  • Use a version control system to coordinate work within the group. Subversion (svn) or git are good choices.

Use freecontact(1) as an example, check out the files below.

XML editing and viewing tools

  • XML Copy Editor (free)
  • XSD Diagram (free XSD navigator)
  • oXygen XML Editor (free trial)
  • XMLSpy (free trial; best but Windows only)
  • vim or emacs
  • eclipse

XML data representation (aka Data Binding)

C++

Laszlo has experience with xsdcxx, and it is very good. (But you probably want a script solution.)

Perl

Disclaimer: Laszlo has no experience with these (ok, a little bit with LibXML), but would start looking along these links if he needed to.

Perl-XML FAQ

Java, C#, .NET, Mono

These have excellent support for XML and XSD-based data binding.

In Java for example: JAXB (maybe the best?), Axis2 Data Binding (ADB, lightweight but limited), or xmlbeans (robust but very heavy-weight).

Other languages

Other main programming languages have reasonably good support for XML and data binding, too. Certainly PHP, Python, ...

BioXSD quick overview guide

BioXSD quick reference

Software Meta Data in debian/upstream

Currently used meta data format: http://wiki.debian.org/UpstreamMetadata .

Fields for 'Software' meta data:

  • Software
    • Name
      Executable name.
    • Topic
      List of one or more <topic term> or { <topic term>: URI } definitions.
    • Function
      List of one or more <function term> or { <function term>: URI } definitions.
    • Input
      • Data
        List of one or more <input data term> or { <input data term>: URI } definitions.
      • Format
        List of one or more <input format term> or { <input format term>: URI } definitions.
    • Output
      Like input.
    • Interface
      List of one or more <interface term> or { <interface term>: URI } definitions.

Example: http://i12r-tbl.informatik.tu-muenchen.de/~lkajan/freecontact/debian/upstream

Advanced Challenge

Create Web service(s) for the the PredictProtein component(s) assigned to you.

Options:

  • RESTful Web service described with WSDL 2.0 (REST uses the HTTP protocol directly, but WSDL 2.0 support is somewhat limited. WSDL 1.1 unfortunately cannot be satisfactorily used for describing REST services.)
  • SOAP Web service described with WSDL 1.1 (SOAP protocol is piggy-backing on top of HTTP. WSDL 1.1 and the standardised *document-literal-wrapped* SOAP binding style are very well supported by libraries.)

Hints and Tips

The best practice towards achieving good interoperability is using the WSDL-first approach.

1. Write your WSDL document which defines a clean programmatic interface (API) to your tool. With good interface and a good WSDL, your Web service will be usable from client programs/scripts/workflows in all main programming languages.

2. Generate server classes for your favourite programming language, using a WSDL-first-enabled Web service library/framework.

3. Implement the business logic of your server, that is the invocation of your backend tool. (Forget about a scheduling queue and batch execution engine for the sake of this exercise. Those would however be necessary for most production services.)

4. Generate client classes in a set of different languages and test if you can use your service. (Test other students' services from your favourite programming languages. Iterate until success ;-) )

WSDL Examples

Testing

  • SoapUI is a nice tool for testing your WSDLs, servers, and clients.
    Unfortunately it doesn't yet support WSDL 2.0.

FreeContact REST WSDL 2.0 Web Service

Web service libraries/frameworks

WSDL 1.1 and SOAP

WSDL 1.1 and document-literal-wrapped SOAP binding have good support in all main programming languages.

You will see however, that some of the frameworks do not satisfactorily support the WSDL-first approach.

  • For Java there are JAX, CXF, Axis2 ...
  • For C++ gSoap, Axis2C, ...
  • For Python there is ZSI (The SUDS framework is lovely, but works only for clients not servers. Soaplib looks promising but may not be fully developed yet.)
  • For Perl SOAP::Lite
  • And other libraries for PHP, C#/.NET/Mono, ...

Note that most WSDL/SOAP frameworks are conveniently integrated with one or more XML/XSD data binding frameworks.

WSDL 2.0 and REST

Axis2 for Java and C++ supports WSDL 2.0 and REST, and there are (hopefully) more libraries also for other languages.

Challenge Presentation

This challenge - the semester challenge - is to be presented on the last session.

User management / directory services

  • Date: 2013 / 05 / 28
  • Topics: Lightweight Directory Access Protocol (LDAP), OpenLDAP server, user authentication and (user) name services with an LDAP database

Questions

  • What are the most important system databases (for our purposes): '/etc/passwd', '/etc/shadow', '/etc/group' and '/etc/hosts'? Explain the information in these.
  • What are name services, such as NIS or DNS (briefly)?
  • What are Pluggable Authentication Modules (PAM) for Linux (briefly)?
  • What is LDAP (briefly)? What is the LDAP directory structure, what is the LDIF format?
  • Can you use LDAP to provide name services? - can you use PAM in conjunction with LDAP to authenticate users (man pam_ldap)?
    • Why is LDAP good for providing name services?
    • What is the directory structure like, what is in the directory when LDAP is used to provides name services and authentication via PAM?
      Show the common classes and attributes used for this.
  • Can you configure the LDAP server daemon 'slapd' at runtime? (yes, man slapd-config)
  • How do you control access to your LDAP database? Explain a few example access rules:
    olcAccess: {0}to * by dn.exact=gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth manage by * break
    olcAccess: {1}to attrs=userPassword,shadowLastChange by self write by anonymous auth by dn="cn=admin,dc=tbl" write by * none
    olcAccess: {2}to dn.base="" by * read
    olcAccess: {3}to * by self write by dn="cn=admin,dc=tbl" write by * read

Links for preparation

Presentation

Programming Challenge

  1. Set up a directory service (LDAP) for the practical and define a fitting directory structure: install 'slapd' and 'ldapvi'. Use dc=tbl as the distinguished name of the search base.
  2. Check the functioning of your installed directory service:
    • ldapvi -h ldapi:/// -b cn=config -Y EXTERNAL - this command displays (in an editor) the current configuration of the slapd (LDAP) daemon.
    • ldapvi -h ldapi:/// --discover -D cn=admin,dc=tbl - this command displays the contents of the LDAP database you have.
  3. This is not needed, since olcDatabase={-1}frontend,cn=config has this set already:
    • Allow access to your LDAP database olcDatabase={1}hdb,cn=config via external authentication for gidNumber=0+uidNumber=0 by copying the corresponding olcAccess line from the configuration of the cn=config database. Root now should be able to edit the database without password (ldapvi -h ldapi:/// --discover -Y EXTERNAL).
  4. Connect the user management of your debian installation to your LDAP:
    1. Move your user and group entries from the files 'passwd', 'shadow', 'group' and 'gshadow' to the LDAP database. Use the organizationalUnit(OU) class to structure the data (user records, group records) in your database. The OU for storing user records is usually called 'people', the OU for groups is 'group'. Use the 'uid' attribute in the distinguished name(DN) of a user record (e.g. uid=lkajan); use the 'cn' attribute in the DN on a group record (e.g. cn=lkajan). Use ldapvi -h ldapi:/// -b cn=config -Y EXTERNAL (as above) to access the configuration of your database: use the class definitions of organizationalUnit, posixAccount, shadowAccount and posixGroup to know what attributes are available.
    2. Install 'libnss-ldap' (and recommended packages). When asked for the LDAP account for root, look up and give the DN of the 'admin' account in your LDAP database. Reboot the machine, so these libraries are loaded (w/o a reboot, you may get weird user and group behaviour).
    3. Modify /etc/nsswitch.conf to use the LDAP database. Use getent to verify the functioning of the name service.
    4. Use 'pam-auth-update' or 'dpkg-reconfigure libpam-runtime' to enable LDAP authentication.
  5. Configure your LDAP server as a replication provider for database {1}. You will need the 'syncprov' overlay (man slapd-config, search for OVERLAYS, man slapo-syncprov) from the 'syncprov' module (to be loaded with 'olcModuleLoad'). You can not change the list of modules with ldapvi, you have to edit the configuration in /etc/ldap/slapd.d/cn=config/cn=module{0}.ldif. Use class 'olcSyncProvConfig' for the configuration of the overlay. Add the olcServerID attribute to cn=config, set it to your offset in the host names table.
  6. Learn to understand LDAP access control statements (olcAccess, man slapd.access(5)). Interpret the olcAccess statements that come with the default configuration.

Hints and Tips

  1. Make sure the host name of your machine ($ hostname, set in /etc/hostname, is fully qualified, i.e. it includes the domain '.tbl'). You probably should reboot after changing the host name.
  2. 'ldapvi' may invoke 'nano' as its editor. You can set your preferred editor by exporting VISUAL and EDITOR with the editor command like this:
    export EDITOR=vim; export VISUAL=$EDITOR
    You may want to put these exports into your ~/.bashrc.
  3. Edit /etc/hosts, have your IP address associated with your host name (see above table), like: 192.168.16.2 lkajan.tbl.
  4. Add two organizational units ou=people,dc=tbl and ou=group,dc=tbl.
  5. Add your group as an 'objectClass: posixGroup': cn=<usr>,ou=group,dc=tbl.
  6. Add your user as an 'objectClass: posixAccount; objectClass shadowAccount; objectClass inetOrgPerson': uid=<usr>,ou=people,dc=tbl.
  7. Use 'slappasswd' to generate the encrypted form of your password OR use the {CRYPT} qualifier and just copy the encrypted password from /etc/shadow.
  8. It is not usually necessary, but it may be a good idea to invalidate the name service cache kept by the 'nscd' daemon after modifying user and group attributes: nscd -i passwd; nscd -i group. During testing you can altogether stop the nscd daemon.
  9. Edit /etc/nsswitch.conf, append 'ldap' to the passwd, group and shadow databases' lines.
  10. Use 'pam-auth-update' to enable LDAP for PAM (authentication, etc.).
  11. Use 'getent passwd' and 'getent group' to verify that you LDAP connection to name services works. If you still have your user and group defined in /etc/passwd and /etc/group (and their shadow) files, you should see your user and group entry listed twice by the getent commands. The second is the one that comes from your LDAP server since 'ldap' appears second (to 'files' or 'compat') in /etc/nsswitch.conf.
  12. If you see that getent returns the right records for yourself and your group, remove your user and group entry from /etc/{passwd,shadow,group,gshadow}. Try to log into your virtual machine as yourself to test that it works.
  13. Configure your ldap server as a replication provider with the syncrepl overlay: man slapo-syncprov.
    1. Make slapd load the 'syncprov' module. Stop the ldap server (it does not see to allow dynamic configuration of the list of modules), edit cn=config/cn=module{0}.ldif, add another olcModuleLoad attribute for syncprov and restart the server.
    2. Add a new olcSyncProvConfig entry for the 'syncprov' overlay, making it a child of the database entry 'olcDatabase={1}hdb,cn=config'. Set the olcSpReloadHint attribute to TRUE as suggested on the man page slapo-syncprov.

Markus' hints and tips

  • You will need these packages: slapd ldap-utils ldap-utils libpam-ldap libnss-ldap nscd.
  • Use ldapsearch to test your ldap server from the command line.

Advanced Challenge

  • Secure the connection to the LDAP server with TLS or SSL and a server certificate (we recommend 'tinyca2' (from 'tinyca' package) for certificate management).
  • Configure your L2 virtual machine as an LDAP replication slave, with your L1 acting as master. Establish a secure connection with ldap:// + TLS (or ldaps, if ldap+TLS does not work).

Challenge Presentation

Web server

  • Date: 2013-06-04
  • Topics: Apache web server, common gateway interface (CGI), PHP

Questions

  • What is the HTTP protocol? Describe briefly (at least) the most common HTTP methods: GET, POST.
  • What does it mean that HTTP is a stateless protocol?
  • What are popular HTTP server packages (e.g. Apache)?
  • Mention ways to make web pages dynamic on the server side.
  • How is the common gateway interface (CGI) used to generate dynamic web pages?
  • How is PHP used to generate dynamic web pages?

Links for Preparation

Presentation

Sebastian Hollizeck File:Sebastian Hollizeck-BiolabExpertTalk.pdf

Programming Challenge

  1. Install Apache.
  2. Create a simple web page with something like 'Hello world!' on it at http://yourname.tbl/ and make it reachable by other course members (you can test it from the L0 or your L2 with say w3m or lynx or wget or curl or Iceweasel).
  3. Set up PHP for use with Apache and create a page (simple web application! :) that:
    • Takes a parameter 'name' and prints 'Hello <name>!'. You can use a form for setting the name if you want.
    • Calls the phpinfo() function.
    • Is available at http://yourname.tbl/test.php .
  4. Create a CGI program at http://yourname.tbl/test.cgi:
    • Take a parameter 'name' and print 'Hello <name>!'.
    • Print out the environment of the CGI program.
    • Print out the received parameters and their values.
  5. Install phpldapadmin and connect it to your LDAP server.
  6. Install ldap-account-manager (LAM) and connect it to your LDAP server.
  7. Configure per-user web-accessible directories with mod_userdir.
  8. Set up HTTPS for the website:
    1. Create a server certificate with tinyca2 (preferably as root) from the tinyca package.
    2. Name your certificate authority (CA) "<uid>", e.g. "lkajan", when tinyca2 asks you.
    3. Copy your new CA certificate (/root/.TinyCA/<uid>/cacert.pem) into /usr/share/ca-certificates, name it <uid>.crt and make it readable by all. Execute dpkg-reconfigure ca-certificates and tick in your certificate to have it installed for your system. Also expose this CA certificate in web space at http://<uid>.tbl/<uid>.crt (a symlink in /var/www is enough).
    4. Set subject alternative names for the server certificate: 'IP:<your_IP_192.168.16.X>', 'DNS:<uid>.tbl' and 'DNS:www.<uid>.tbl' so that clients can recognize your server both by IP address and names. If you want to serve additional names or additional secondary IP addresses, also set these into the subject alternative name. In order to do this, use the menu 'Preferences/OpenSSL Configuration/Server Certificate Settings' and set 'Ask User' into the 'Subject alternative name (subjectAltName)' field (the top one), then click 'raw' below. Then, when you create the new server certificate (5th icon from the right, sorry, apparently no tooltips in this version :| ), set 'IP:<your_IP_192.168.16.X>,DNS:<uid>.tbl,...' into the Subject alternative name (at the request signing step). Do not add the eMail address to Subject DN.
    5. Export your web server key and certificate in one file in PEM format without passphrase into a file in /etc/apache2. You will have to be on the 'Keys' tab to do this. Make sure this file is only readable by root! Set this file into 'SSLCertificateFile' in the configuration of the secure site (default-ssl). Comment 'SSLCertificateKeyFile' out.
    6. Enable Apache site 'default-ssl' and module 'ssl'.
    7. Point your browser at http://<uid>.tbl/<uid>.crt , import and trust your CA certificate.
    8. Make sure the php and CGI pages open properly both with http:// and with https:// protocols.
    9. Observe how the https:// connection to your site is now trusted and verified by your own certificate. Point your browser at http://<uid>.tbl/<uid>.crt of other course members, install and trust their certificate and then visit their sites securely (with https). Your browser should not complain about untrusted connections after you install their CA certificates. Also visit their sites by IP address. Your browser should accept this as secure as well, due to the IP address in the subjectaltname of the certificate.
  9. Create a secure section of the web site that requires authentication: https://<uid>.tbl/secure/index.html . Make this area accessible only via HTTPS. Authenticate against your LDAP database. Implement this in /var/www/secure/.htaccess.
    1. You will have to set the appropriate AllowOverride level for this to work.
    2. You have to enable (a2enmod) the authnz_ldap module for ldap authentication.

Hints and Tips

  • /var/log/apache2/error.log is your friend. Also use the Apache documentation. I find this page very useful.
  • The package for PHP is called libapache2-mod-php5. Installing this will pull in apache2 as well.
  • You can use bluefish to create and edit a web page in a GUI.
  • You can use the Perl CGI module for your CGI program. Alternatively you can use C/C++ to solve this challenge for fun... hmm.
  • One way to make the CGI script accessible at http://lkajan.tbl/test.cgi is to allow the execution of CGI scripts in the document root (with the ExecCGI option) and add/set the 'cgi-script' handler[7] for files with '.cgi' extension.
  • Start tinyca2 as root. When it is run for the first time, it asks you to fill in data to create a new certificate authority (CA). You will use this CA to issue a certificate for your web server. Fill in the fields you understand (e.g.. State: "Bayern"), examine the other fields.

Challenge Presentation

Katharina Hembach File:WebServerPresentation.pdf

Mail, DNS

  • Date: 2013-06-04
  • Topics: mail transport agents (MTA); procmail; Maildir and mbox formats; Internet message access protocol (IMAP); domain name server (DNS)

Questions

  • What is a mail transport agent (MTA)? - explain the role of MTAs on an example of sending an email.
  • What are the most popular MTAs (e.g. postfix, exim, sendmail)? Mention their strengths and weaknesses.
  • What is the internet message access protocol (IMAP) for?
  • What is procmail for? Show and explain examples of procmail 'recipes'.
  • What do domain name servers (DNS) do (briefly)?
  • What is a DNS zone? How do you add zones to your DNS server's configuration?
  • Explain a simple DNS zone file. Show examples of forward (A) and reverse (PTR) address resolution resource records, show an example (and explain) a mail exchange (MX) record.

Links for Preparation

Presentation

Katharina Hembach File:Mail DNS.pdf

Programming Challenge

  • Recommended to install: graphical desktop (gnome-core); iceweasel, icedove.
  • Set up, configure and use a bind9 DNS server.
  • Set up and configure a mail server (postfix recommended).
  • Set up an IMAP server (dovecot recommended).
  • Use Thunderbird / Icedove to send mail to another course member, configure the address book in Icedove/Thunderbird to connect to your LDAP server so that you can have your address book stored in LDAP.

Hints and Tips

  • Ariane's Hints: nice Procmail Tutorial
  • Packages to install: bind9, dnsutils; postfix, postfix-doc, bsd-mailx; dovecot-imapd; icedove; ca-certificates; procmail; tinyca2 (for advanced challenge);
  • Postfix configuration: choose 'Internet site'.
Name server
  • Edit /etc/bind/named.conf.local, add:
zone "tbl" {
       type master;
       file "/etc/bind/db.tbl";
};

zone "16.168.192.in-addr.arpa" {
       type master;
       file "/etc/bind/db.192.168.16";
};
  • Edit /etc/bind/db.tbl and /etc/bind/db.192.168.16, have:

/etc/bind/db.tbl:

;
; BIND data file for tbl zone
;
$TTL    86400
@       IN      SOA     lkajan.tbl. root.lkajan.tbl. (
                       12051501         ; Serial
                         604800         ; Refresh
                          86400         ; Retry
                        2419200         ; Expire
                          86400 )       ; Negative Cache TTL
;
@       IN      NS      lkajan.tbl.

lkajan          A       192.168.16.2
<other course members>

/etc/bind/db.192.168.16:

;
; BIND reverse data file for tbl zone
;
$TTL    86400
@       IN      SOA     lkajan.tbl. root.lkajan.tbl. (
                       12051201         ; Serial
                         604800         ; Refresh
                          86400         ; Retry
                        2419200         ; Expire
                          86400 )       ; Negative Cache TTL
;
@       IN      NS      lkajan.tbl.

2       PTR     lkajan.tbl.
<other course members>
  • Replace lkajan and 192.168.16.2 with your host name and IP in the SOA and NS records.
  • Test-load your named configuration: named-checkconf -z.
  • Restart the name server.
  • Update your /etc/resolv.conf with your own name server:
search tbl
nameserver 127.0.0.1
...
  • Test the name server with: host <name>.tbl; dig <name>.tbl; ping <name>.tbl.
Mail server
  • Basic configuration can be done simply with dpkg-reconfigure postfix. Config files of interest for manual configuration: /etc/postfix/main.cf, /etc/postfix/master.cf, /etc/aliases.
  • Edit your ~/.procmailrc and configure Maildir / mbox delivery as you prefer:
# Maildir:
DEFAULT="$HOME/Maildir/"
  • Send a mail to yourself as root.
  • Examine the mail log and check if the mail was delivered well.
Dovecot (IMAP server)
  • Edit /etc/dovecot/dovecot.conf: do not change anything but look at the protocols and the authentication: PAM does the work for us; ssl_cert_file and ssl_key_file: this is where you can secure communication to the server.
Thunderbird / Icedove
  • Start Icedove and configure a new mail server:
    • Email address: <username>@<hostname>.tbl.
    • Type: IMAP
    • Incoming server: <hostname>.tbl or 127.0.0.1
    • Outgoing server: <hostname>.tbl or 127.0.0.1
    • Configure LDAP: Preferences -> Composition -> Addressing -> Directory server -> Edit directories -> Add:
Hostname: localhost
Base DN: dc=tbl
Port n: 389
Bind DN: uid=<username>,ou=people,dc=tbl
    • Make sure your LDAP server serves connections to ldap://localhost/ (check in /etc/default/slapd).
    • Try sending a mail to another course member, e.g. Laszlo Kajan <lkajan@lkajan.tbl>.

Advanced Challenge

  • Set up procmail recipes that automatically:
    • Reply to the sender that you are busy preparing to an exam if the mail subject contains the word 'work'.
    • Reply to the sender that you are busy with your work when the subject contains the work 'exam'.
    • Reply to the sender that you are ill when the subject contains both 'exam' and 'work'.
  • Create a postfix regular expression table for aliases and use this table to deliver all mail matching the pattern '/^sink/' to /dev/null.
  • Configure spamassassin for your MTA (postfix) or in .procmailrc.

Challenge Presentation

Nikolaos Papadopoulos File:Mail server.pdf

Databases and SQL

  • Date: 2013-06-04
  • Topics: MySQL server setup, important server parameters, user management and access control

Questions

  • What are the differences between Excel and a DBMS?
  • What are famous DBMSs?
  • What advantages does data storage in a DB have over simply putting data into a flat file?
  • What ways exist to access a MySQL database?
  • What ways exist to backup a MySQL database?

Links for Preparation

Presentation

Tikira Temu File:Tikira Temu-sql 2013-06-04.pdf

Programming Challenge

  1. Install and configure a MySQL Database server
  2. Make yourself familiar with basic user management
    1. Create a user u1 and a database db1
      1. Grant u1 full access to db1
    2. Create a second user u2 and give her only read access
    3. How can you retrieve all rights of a given user?
    4. How can you take privilidges away from a user?
  3. Create a table in your database db1 with three columns
    1. The table should contain a primary key that spans two columns
    2. For example use PHP or Perl or Python.
  4. Create a backup from your database.
  5. Read-out the basic server status
    1. How many concurrent client connections are maximally allowed?
    2. How can you find out about the actual established connections?
  6. Write a script that establishes n concurrent connections and monitor the server status.
    1. Raise the maximally allowed connections to the server when you reach the limit.

Hints and Tips

  • Install package: mysql-server
  • Which additional packages will be installed?
  • Which client programs?
  • Familiarize yourself with the default configuration file /etc/my.cnf
  • What is the TCP/IP Port the MySQL server/ client applications will listen to?
  • How many concurrent sessions the MySQL server will allow?
  • What is the size of the query cache used to cache SELECT results?

Advanced Challenge

  1. Fill your table with several million entries (mind the primary key!)
  2. Write a script that establishes several concurrent connections.
    1. During each connection, the script should conduct several advanced SELECT-queries.
    2. Monitor the CPU- and memory usage of your MySQL server during script-lifetime.
    3. Could you think of server variables whose optimization could lead to a performance gain?
  3. Backup the whole database using
    1. mysqldump and
    2. the mysql command
  4. What are storage engines? Which are there and what are their differences?

Challenge Presentation

Matthias Danner File:Presentation Databases Matthias Danner.pdf

Web Content Management Systems

  • Date: 2013-06-11
  • Topics: web content management systems: MediaWiki, Drupal; bug tracking/software development management.

Questions

  • What is a web content management system?
  • How does a web content management system help you maintain a website?
  • Highlight the differences between Drupal and MediaWiki: what for and when would you use which?
  • What is Bugzilla? Can Bugzilla be used as a general request tracker?

Links for Preparation

Presentation

Sonja Ansorge File:WCMS.pdf

Programming Challenge

  • Install MediaWiki.
  • Install the CMS of your choice (Drupal recommended).
  • Install Bugzilla.
  • Connect the user management of the CMS, wiki and Bugzilla to your LDAP.
  • Create a simple web page with your CMS for the practical.
  • Create a wiki page that is editable after login by users of your machine, but not by the world.
  • Create products and components in Bugzilla for your wiki and CMS. Allow everybody to file bugs.

Hints and Tips

Wiki

There are many different wiki engines:

We are going to use MediaWiki (http://www.mediawiki.org), one of the most popular wiki engines available.

1. Install a good and stable debian package: mediawiki

2. Adjust the MediaWiki configuration file to the system environment

  • add to your default virtual host:
Alias /mediawiki /var/lib/mediawiki
  • do not forget to reload the Apache
    which domain do you now use to access the mediawiki?
    the alias can be replaced with any other alias you want

3. Complete the installation settings over the Internet (http://localhost/mediawiki/)

  • Save LocalSettings.php, and then copy (as root) to /etc/mediawiki/. Change the ownership of this file to root:root.
    discuss with your neighbour a suitable configuration

4. Review the settings in the default and the local configuration files

  • the default configuration file should not be edited
    what permissions do you set for the '/etc/mediawiki/LocalSettings.php' file?'

5. Modify the main page to make it a little more personal and at least add a logo. Allow registered users to change the content ($wgGroupPermissions).

Advanced Challenge

  • Enable LDAP Authentication for your MediaWiki (debian package: mediawiki-extensions-ldapauth)

Challenge Presentation

Julia Rackerseder File:CMS Julia Rackerseder Homework.pdf

Rapid Web Development with MongoDB and Node.js

  • Date: 2013-06-11
  • Topics: MongoDB, Node.js
  • Tutor: Manuel Kroiss

Questions

  • What is MongoDB and what is JSON? Why would you use it instead of SQL?
  • What is thread blocking?
  • What is NodeJS and how is it different from PHP?

Links for Preparation

http://www.mongodb.org/ (intro + try it out) http://visionmedia.github.io/masteringnode/book.html

Presentation

Manuel Kroiss File:Rapid web development.pdf

Links for the Challenge

[1] http://docs.mongodb.org/manual/tutorial/getting-started/

[2] https://github.com/joyent/node/wiki/Installing-Node.js-via-package-manager

[3] http://expressjs.com/api.html

[4] http://mongoosejs.com/

[5] http://theholmesoffice.com/mongoose-and-node-js-tutorial/

Programming Challenge

  • Install MongoDB and test the shell [1]
  • Add the following two documents to the collection 'students':
    { name: 'manuel', age: 24, courses: ['programming', 'math'] }
    { name: 'tikira', age: 25, courses: ['informatics', 'neuroscience'] }
  • Install NodeJS [2]
  • Install the modules express and mongoose with npm (node package manager)
  • Create a http server with express [3]
  • Define a mongoose Model 'Student' for the data we already inserted above [4][5]
    Note: In mongoose the model name 'Student' will use the MongoDB collection 'students' (plural s is always added)
  • Create event bindings with express for the urls:
    • GET /list
      list all students in the database, and link the name to /:id (see next GET)
      example: <a href="/da8fas8f4afsadf8sdf8sf">manuel</a>...
    • GET /:id
      print json object of a student by ObjectID (automatically assigned)
    • GET /add?name=Max&age=19&courses=first&courses=second
      add student by url

Advanced Challenge

  • Add a field 'visitors' in the Student model and count each time the /:id page of a student is requested
  • Create additional bindings for:
    • /:id/xml
      xml output of the student object
    • /:id/edit
      edit by html form
    • /db
      json output of a mysql table

Challenge Presentation

Tikira Temu File:Homework mongoDB and NodeJS Tikira Temu 2013-06-18.pdf

Network Filesystems and Grid Computing

  • Date: 2013-06-18
  • Topics: filesystem sharing (NFS, SMB/CIFS and sshfs), batch-queueing/high throughput computing (Grid Engine)

Questions

  • List use cases for file system sharing - what is this good for?
  • How would you make the same file system available to Linux, Windows and OS X clients?
    • Can you make your home file system available to a Windows virtual machine guest running on your Linux host?
  • How would you make a file system available to Linux clients where high performance is important? (hint: NFS, but also think of OCFS2 [8])
  • Can you browse a remote file system as if it were local in case it is on a host with SSH access?
  • Is it possible to combine multiple hosts to serve out a single file system (c.f. OCFS2, GlusterFS)?
  • What are the most popular batch-queueing/grid computing engines?
  • What is a batch system like the Open Grid Scheduler good for?
    • What happens when more jobs are submitted than the number of available cores?
    • Is it possible to prioritize users?
  • Can the Open Grid Scheduler/Grid Engine handle non-uniform (say some have 32G memory, others 64G) execution hosts?
    • Is it possible to match execution hosts to job requirements (e.g. memory, number of cores, installed software)? Show examples.
  • When the (Sun) Grid Engine is installed on a cluster, it is common that all execution hosts mount at least one shared file system, say /mnt/home. Why do you think this is?

Links for Preparation

Presentation

Matthias Danner File:Presentation NFS Grid Computing Matthias Danner.pdf

Cluster Filesystems

Object Store

Programming Challenge

  • Install the Linux kernel NFS server (nfs-kernel-server package, make sure you have the nfs-common package installed as well).
    • Export (man exports) - with NFS (up to version 4) - the root of your home directory (/home) read-only, 'root_squash' and 'all_squash' to the world ('*').
    • Export the root of your home directory (/home) read-write, no squash (not even root) to your L1 and L2 virtual machine, as well as to 127.0.0.1 (localhost).
    • Mount your exported home directory to /mnt/<uid>-home/, either in your L1 or L2: man mount. Use the 'nfs4' protocol. Enter the mounted file system into /etc/fstam, but with the 'noauto' option.
  • Install the SMB/CIFS server (samba package). I recommend you also install the SMB client (smbclient package) and the samba-doc package. Workgroup/Domain name: 'TBL'.
    • Unfortunately the UNIX passwords are not usable for Samba. UNIX and Samba encrypted passwords have to be kept separately. Use 'smbpasswd -a <user>' to set the Samba password for your user - but look out: this sets your UNIX password as well (with the default smb.conf file).
    • Create a new share 'clipboard' (man smb.conf) that exports /srv/samba/clipboard read-write to valid users. Make sure /srv/samba/clipboard is writable to everyone. Make the share browseable and force files and directories created to be modifiable by all (force create and directory mode 0777).
    • Start a file manager in the GUI (X) of your L1 virtual machine and browse the windows network: find your clipboard share and log in to it. Create/copy a file in a new directory.
  • Install the SSH filesystem client (sshfs package) in your L1 virtual machine. Put yourself into the 'fuse' group.
    • Use sshfs (man sshfs) to mount your home directory on i12r-tbl (tbl:/home/tbl2012/<yourname>) to ~/L0home on the L1. You will probably have to use id mapping: '-o idmap=user'. Create/copy a file in a new directory in ~/L0home.
    • Use 'fusermount -u <mount_point>' to unmount the sshfs file system.
  • Install the (Sun) Grid Engine (packages gridengine-{client,exec,master,qmon}, you will also need xfonts-100dpi and xfonts-75dpi (thanks Daniel), but you probably have these already). Let debconf configure SGE automatically. SGE cell name: <uid> (e.g. lkajan) from the table above. Mater host: <host name> (e.g. lkajan.tbl) from the table above.
    1. Start (as root) the qmon graphical management interface (forward X or in a VNC session).
    2. Add '<yourhost.tbl>' as a submit host (under Host Configuration button).
    3. Create a new queue (from Queue Control) 'default' with shell '/bin/sh' instead of '/bin/csh'. Add '<yourhost>.tbl' to its Hostlist.
    4. Add yourself to the list of users (User Configuration, User tab). Set the total number of Share Tree tickets (Policy Configuration) to 10,000 (10k) and give yourself 1000 tickets in the Share Tree Policy (Share Tree Policy -> Add Leaf (to Root node - add root first)). Configure the total number of tickets to be distributed among unspecified users to 1000 (Add Leaf to root, give Name = 'default').
    5. Submit the binary job /bin/date with 'qsub' (man qsub). Join the standard output and error of the job. Where do you get the output of this job?
    6. Write a simple job script that calls /bin/date, but have this job script define the necessary arguments for joining standard out and standard error, so that there is no need to give these on the command line (hint: man qsub, search for '#$'). Submit this job script.

Hints and Tips

  • Use a bind mount (man mount) to make /home available in /srv/nfs4/home.
  • Use 'exportfs -v' to check what, and how is exported.
  • Restart service nfs-common after making changes to /etc/exports or /etc/fstab. Setting 'NEED_IDMAPD=yes' in '/etc/default/nfs-common' may also be helpful to avoid confusion. If you do not get proper user/group mapping for your NFSv4 mount, the lack of a running rpc.idmapd daemon may be the cause.
  • Use 'smbclient -U <user> -L //<yourhost>.tbl/' to check what is exported by the Samba server for a particular user.

Advanced Challenge

  • Configure the ldapsam authentication backend for the Samba server.
  • Mount (-t cifs) your Samba clipboard share to your L2 virtual machine (cifs-utils package, man mount.cifs).
  • Turn your L2 virtual machine into an SGE execution host. Share your home file system with the L2 so that your jobs can access this file system easily.

Challenge Presentation

Jonas Reeb File:Nfs and gridcomputing.pdf

Packages Used by Major Linux Distributions and Creating Packages for Debian and Derivatives

  • Date: 2012-06-25
  • Topics: RPM and Debian packages overview, Debian Social Contract, Debian Policy, Debian Med Pure Blend, creating Debian packages, Debian package quality control, contributing packages to Debian

Questions

  • What is the advantage of an rpm or deb package over a tar.gz package?
  • What type of package do popular commercial (enterprise) Linux distributions use (e.g. RedHat Enterprise Linux, SUSE Linux Enterprise)?
  • What is the Debian Social Contract and the Debian Free Software Guidelines?
  • What is the Debian Policy?
  • What are the archive areas in Debian?
  • What are the major steps of 'Debianizing' a piece of software?
  • What is the tool 'debhelper' good for?
  • What is the role of the tool 'lintian' in packaging for Debian?
  • What is the role of the 'quilt' tool in packaging for Debian?
  • Who are ftpmasters?

Links for Preparation

Presentation

Nikolaos Papadopoulos Prezi File:Debian.pdf

Michael Kluge File:Packaging.pdf

Programming Challenge

  • Package your output file XML format converter for Debian.

Hints and Tips

  • Packages you need: dpkg-dev, packaging-dev

Advanced Challenge

  • Prepare the RPM package of one of your assigned packages. A CentOS virtual machine or chroot environment is useful here.

Challenge Presentation

Sebastian Hollizeck File:Sebastian-rpm-master.pdf

Manuel Kroiss File:Manuel-debian-packaging.pdf