Posts

Showing posts from April, 2010

Linear discriminant analysis on PCA factors

In this tutorial, we show that in certain circumstances, it is more convenient to use the factors computed from a principal component analysis (from the original attributes) as input features for the linear discriminant analysis algorithm. The new representation space maintains the proximity between the examples. The new features known as "factors" or "latent variables", which are a linear combination of the original descriptors, have several advantageous properties: (a) their interpretation very often allows to detect patterns in the initial space; (b) a very reduced number of factors allows to restore information contained in the data, we can moreover remove the noise from the dataset by using only the most relevant factors (it is a sort of regularization by smoothing the information provided by the dataset); (c) the new features form an orthogonal basis, learning algorithms such as linear discriminant analysis have a better behavior. This approach has a connectio

Induction of fuzzy rules using Knime

This tutorial is the continuation of the one devoted to the induction of decision rules (Supervised rule induction - Software comparison ). I have not included Knime in the comparison because it implements a method which is different compared with the other tools. Knime computes fuzzy rules. It wants that the target variable is continuous. That seems rather mysterious in the supervised learning context where the class attribute is usually discrete. I thought it was more appropriate to detail the implementation of the method in a tutorial that is exclusively devoted to the Knime rule learner (version 2.1.1). Especially, it is important to detail the reason of the data preparation and the reading of the results. To have a reference, we compare the results with those provided by the rule induction tool proposed by Tanagra . Scientific papers about the method are available on line. Keywords : induction of rules, supervised learning, fuzzy rules Components : SAMPLING, RULE INDUCTION, TEST

"Wrapper" for feature selection (continuation)

This tutorial is the continuation of the preceding one about the wrapper feature selection in the supervised learning context (http://data-mining-tutorials.blogspot.com/2010/03/wrapper-for-feature-selection.html). We analyzed the behavior of Sipina , and we have described the source code for the wrapper process (forward search) under R (http://www.r-project.org/). Now, we show the utilization of the same principle under Knime 2.1.1 , Weka 3.6.0 and RapidMiner 4.6 . The approach is as follows: (1) we use the training set for the selection of the most relevant variables for classification; (2) we learn the model on selected descriptors; (3) we assess the performance on a test set containing all the descriptors. This third point is very important. We cannot know the variables that will be finally selected. We do not have to manually prepare the test file by including only those which have been selected by the wrapper procedure. This is essential for the automation of the process. Indeed