MLDemos

by Basilio Noris

Learning Algorithms and Systems Laboratory
Ecole Polytechnique Fédérale de Lausanne

Introduction

During my PhD I've come across a number of machine learning algorithms for classification, regression and clustering. While there is a great number of libraries, source code and binaries for different algorithms, it is always difficult to get a good grasp of what they do. Moreover, one ends up spending a great amount of time just getting the algorithm to display the results in an understandable way. Change the algorithm and you will have to do the work all over again. Some people have tried, and succeeded, to combine several algorithms into a single multi-purpose library, making their libraries extremely useful (you will find many of their names in the acknowledgements below), but still they didn't solve the problem of visualization and ease of use. Matlab is an easy answer to that, but while extremely easy to use for generating and displaying single instances of data processing (data, results, models), Matlab is annoyingly slow and cumbersome when it comes to creating an interactive GUI. While preparing the exercice sessions for the Applied Machine Learning class at EPFL, I decided to combine the code snippets, example programs and libraries I had at hand into a seamless application where the different algorithms could be compared and studied easily.

Downloads

Most likely this is what you're here for, so...

Binaries:


windows
MLDemos 0.4.6b for Windows
minimum requirements: XP SP3
mac
MLDemos 0.4.6b for Mac
minimum requirements: Snow Leopard
linux
MLDemos 0.3.2_CDE
minimum requirements: kernel 2.6.X
thanks to Philip Guo!

The Linux binaries have been packaged with CDE (website here), a packaging tool that allows to easily create self-containing software and scripts on x86-linux machines. To run it:
1.) Download package
2.) tar -jxvf MLDemos-0.3.2-cde.tar.bz2
3.) MLDemos-0.3.2-cde/mldemos
In its current form, the CDE package does not allow loading and saving of external files, but every other functionality works. A big thanks to Philip Guo for making this tool possible!

Sources:

GIT repository (current devel release 0.4.6)
source_backup (0.3.0)
The code was created originally on Visual Studio, and therefore generates a ludicrous amount of warnings with GCC...
The software was compiled and tested on Windows XP/7, Ubuntu and Kubuntu 10.04, Gentoo and Mac OSX Snow Leopard, using QtCreator 1.3 and 2.1.

Requirements
The code requires Qt (4.6) and (in part) OpenCV (2.2) and Boost (1.47). Previous versions of both libraries might work as well but you might as well use the newer version. Be sure to adjust your include and lib paths to point them to the correct directories.

- Debian
Prof. Barak A. Pearlmutter has created a debian package, which will be available soon. In the meantime you can build it following the instruction below:
 git clone git://github.com/barak/mldemos.git
 cd mldemos
 git checkout debian
 dpkg-checkbuilddeps
 fakeroot debian/rules binary
 sudo dpkg --install ../mldemos_*.deb

Note: OpenCV 2.2 is not available directly (only 2.1 is), which will require you to build OpenCV2.2. This is only necessary to use MLP and Boosting. These are two important algorithms, so you might as well make the effort:
 git clone git://github.com/barak/opencv.git
 cd opencv
 git checkout master
 dpkg-checkbuilddeps
 fakeroot debian/rules binary
 sudo dpkg --install ../*opencv*.deb


Again, a huge thanks to Barak!

Known Bugs
Approximate KNN classification creates weird blank spaces on some machines and with some metrics.
Saving does not work on the linux CDE package
Resizing the canvas when a reward map has been drawn does not update the underlying data (avoid doing it).
There seem to be some rare hangups when clearing the canvas, will have to track those down.
 
What's New
0.4.3
- Fixed a number of nasty bugs, split kernel methods into SVM- and GPR-based methods, and merged k-means algorithms together. Mac binaries are available once again! 0.4.2
- Added the possibility to choose which dimensions are used for regression or classification (when more than 2 dimensions are available) 0.4.1
- Changed the way multivariate regression errors are displayed - Fixed the way CSVImport imports files (added the possibility to have ignore the class label, or to use the first row as header for columns) - Plugged some memory leaks - The Statistics panel now shows classification results for the last classification - Fixed the way ROC values are computed

Changelog


Legalities

The package contains binary versions of a number of opensource libraries. I am including them here with the knowledge that this might not be entirely compatible with the distribution policies of each respective library. I will try to contact and get the necessary permissions, to the extent to which this is possible, from the related parties. In the meantime, I distribute this software in good faith, my goal is for people to be able to study and work with the different methods implemented here. See the acknowledgements section below for a list of the people who contributed.
You are free to use this software for personal and educational purposes, you are not allowed to use it for commercial purposes. You can redistribute the software as long as you provide a link to this page. Then again, this page will always link to the latest version of the software so you may be better off taking the version here anyway.

Features

The user friendly interface gives you easy access to everything you need
The user interface with option panels

Compare different methods for classification, regression, dynamical systems or clustering on data you can quickly draw in a 2d space
Compare different methods
(SVM+RBF classification on a checkerboard problem)

Compare different methods
(Sparse Optimized Gaussian Process regression)

Compare different methods
(SEDS + DS Avoid on some drawn trajectories)

Compare different methods
(Maximization with Genetic Algorithms)

Play with the algorithms parameters to learn and understand how they influence the results
Compare different parameters
(GPR with RBF kernel and different kernel widths)

Display different types of informations, from the basic output of the algorithm, to model-specific information, and confidence/likelihood maps
Different display modes
(GMM+GMR regression, different information display)

Implemented Methods


Classification Regression Dynamical Systems Clustering ProjectionsMaximization / Reinforcement Learning
Support Vector Machine (SVM) (C, nu, Pegasos)
Relevance Vector Machine (RVM)
Gaussian Mixture Models (GMM)
Multi-Layer Perceptron + BackPropagation
Gentle AdaBoost + Naive Bayes
Approximate K-Nearest Neighbors (KNN)
Support Vector Regression (SVR)
Relevance Vector Regression (RVR)
Gaussian Mixture Regression (GMR)
MLP + BackProp
Approximate KNN
Gaussian Process Regression (GPR)
Sparse Optimized Gaussian Processes (SOGP)
Locally Weighed Projection Regression (LWPR)
GMM+GMR
LWPR
SVR
SEDS
SOGP (Slow!)
MLP
KNN
K-Means
Soft K-Means
Kernel K-Means
GMM
One Class SVM
Principal Component Analysis (PCA)
Kernel PCA
Independent Component Analysis (ICA)
Linear Discriminant Analysis (LDA)
Fisher Linear Discriminant
EigenFaces to 2D (using PCA)
Random Search
Random Walk
PoWER
Genetic Algorithms (GA)
Particle Swarm Optimization
Particle Filters
Donut
Gradient-Free Methods (nlopt)
Needless to say, I will add methods as I go. This first implementation wanted to cover what our own students would see in the class (plus some other methods where the coding-effort-to-result was fairly low). The model information display for some of the methods is not very informative yet (if at all), and so is the confidence maps (for the methods that have one). If you have suggestions, requirements, ideas, please contact me (info below).

Extremely Short User Guide

- Launch the software
- Select a tool from the drawing toolbar
    right-click on the tool to open the tool options (for some tools only)
- Draw samples by clicking either the left or right mouse button.
    left-click generates samples of class 0
    right-click generates samples of the class selected in the toolbar (default: 1)
- Select the Display Options icon
    this will allow you to display model information, confidence/likelihood maps and to hide the original samples
    the mouse wheel will allow you to zoom in and out
    alt+dragging will allow you to pan around the space
- Select one of the algorithm icons to open their respective option panels
    click the train button to run the algorithm on the current data

Importing data

Generating data in MLDemos is done in three different ways: by manually drawing samples, by projecting image data through PCA (via the Projection panel), or by loading external data.
The data format used by the software is ascii-based and contains on the first line the # of samples followed by # of dimensions (only the first 2 dimensions are used in the software). The file then contains #samples lines with the coordinates of each sample followed by a class number (0 ... 255) and a flag (0-3) that allows to determine whether a sample is unused or in the training, validation or test sets.
a concrete example would be
4 2
0.10 0.11 0 0
0.14 0.91 0 0
0.43 0.74 1 0
0.28 0.34 1 0
which presents 4 two-dimensional samples, two from class 0 and one from class 1.
When the file is saved from MLDemos, the software adds the current algorithm parameters (provided an algorithm was selected), which can be useful for demonstration purposes. If no such information is present, the default algorithm parameters are selected. You should be able to convert your own data to this format with any script.

Acknowledgements

This program would not exist if a number of people had not put a lot of effort into implementing the different algorithms that are combined here into a single program.

Moreover, the program itself would be far less performant without the work of the support and development team at Lasa: Christophe Paccolat, Nicolas Sommer and Otpal Vittoz

Thanks also to the people who have not contributed code but have contributed no less directly: Aude Billard, for being one of the best bosses one could wish for, François Fleuret, for a bunch of fruitful discussions, and the AML 2010 and 2011 classes for patiently giving it a first test-drive.
contact: mldemosøb4silio.com
© 2010-2012 basilio noris