Posts

Showing posts from December, 2012

Discriminant Correspondence Analysis

The aim of the canonical discriminant analysis is to explain the belonging to pre-defined groups of instances of a dataset. The groups are specified by a dependent categorical variable (class attribute, response variable); the explanatory variables (descriptors, predictors, independent variables) are all continuous. So, we obtain a small number of latent variables which enable to distinguish as far as possible the groups. These new features, called factors, are linear combinations of the initial descriptors. The process is a valuable dimensionality reduction technique. But its main drawback is that it cannot be directly applied when the descriptors are discrete. Even if the calculations are possible if we recode the variables using dummy variables for instance, the interpretation of the results - which is one of the main goals of the canonical discriminant analysis - is not really obvious. In this tutorial, we present a variant of the discriminant analysis which is applicable to discre

Tanagra - Version 1.4.48

New components have been added. K-Means Strengthening . This component was suggested to me by Mrs. Claire Gauzente. The idea is to strengthen an existing partition (e.g. from a HAC) by using K-Means algorithm. A comparison of groups before and after optimization is proposed, indicating the efficiency of the optimization. The approach can be plugged to all clustering algorithm into Tanagra. Thanks to Claire for this valuable idea. Discriminant Correspondence Analysis . This is an extension of the canonical discriminant analysis to discrete attributes (Hervé Abdi, 2007). The approach is based on a clever transformation of the dataset. The initial dataset is transformed into a crosstab. The values of the target attribute are in row, all the values of the input attributes are in column. The algorithm performs a correspondence analysis to this new data table to identify the associations between the values of the target and the input variables. Thus, we dispose of the tools of the correspond