Posts

Showing posts from July, 2012

CVM and BVM from the LIBCVM toolkit

The Support Vector Machines algorithms are well-known in the supervised learning domain. They are especially appropriate when we handle a dataset with a large number “p” of descriptors . But they are much less efficient when the number of instances “n” is very high. Indeed, a naive implementation is of complexity O(n^3) for the calculation time and O(n^2) for the storing of the values. In consequence, instead of the optimal solution, the learning algorithms often highlight the near-optimal solutions with a tractable computation time . I recently discovered the CVM (Core Vector Machine) and BVM (Ball Vector Machine) approaches. The idea of the authors is really clever: since only approximate best solutions can be highlighted, their approaches try to resolve an equivalent problem which is easier to handle - the minimum enclosing ball problem in computational geometry - to detect the support vectors. So, we have a classifier which is as efficient as those obtained by the other SVM learnin...

Revolution R Community 5.0

The R software is a fascinating project. It becomes a reference tool for the data mining process. With the R package system, we can extend its features potentially at the infinite. Almost all existing statistical / data mining techniques are available in R. But if there are many packages, there are very few projects which intend to improve the R core itself. The source code is freely available. In theory anyone can modify a part or even the whole software. Revolution Analytics proposes an improved version of R. It provides Revolution R Enterprise, it seems (according to their website) that: it improves dramatically the fastness of some calculations; it can handle very large database; it provides a visual development environment with a debugger. Unfortunately, this is a commercial tool. I could not check these features . Fortunately, a community version is available. Of course, I have downloaded the tool to study its behavior. Revolution R Community is a slightly improved version of the...

Introduction to SAS proc logistic

In my courses at the University, I use only free data mining tools (R, Tanagra, Sipina, Knime, Orange, etc.) and the spreadsheet applications (free or not). Sometimes, my students ask me if the commercial tools (e.g. SAS which is very popular in France) have different behavior, in terms of how to use, or for the reading of the results. I say them that some of these commercial tools are available on the computers of our department. They can learn how to use them by taking as a starting point the tutorials available on the Web. But unfortunately, especially in the French language, they are not numerous about the logistic regression. We need a didactic document with clear screenshots which show how to: (1) import a data file into a SAS bank; (2) define an analysis with the appropriate settings; (3) read and understand the results. In this tutorial, we describe the use of the SAS PROC LOGISTIC (SAS 9.3). We measure its quickness when we handle a moderate sized dataset. We compare the resul...