Posts

Showing posts from March, 2013

Factor Analysis for Mixed Data

Usually, as a factor analysis approach, we use the principal component analysis (PCA) when the active variables are quantitative; the multiple correspondence analysis (MCA) when they are all categorical. But what to do when we have a mix of these two types of variables? A possible strategy is to discretize the quantitative variables and use the MCA. But this procedure is not recommended if we have a small dataset (a few number of instances), or if the number of qualitative variables is low in comparison with the number of quantitative ones. In addition, the discretization implies a loss of information. The choice of the number of intervals and the calculation of the cut points are not obvious. Another possible strategy is to replace each qualitative variable by a set of dummy variables (a 0/1 indicator for each category of the variable to recode). Then we use the PCA. This strategy has a drawback. Indeed, because the dispersions of the variables (the quantitative variables and the indi

Correspondence Analysis - Tools comparison

The correspondence analysis (or factorial correspondence analysis) is an exploratory technique which enables to detect the salient associations in a two-way contingency table. It proposes an attractive graphical display where the rows and the columns of the table are depicted as points. Thus, we can visually identify the similarities and the differences between the rows profiles (between the columns profiles). We can also detect the associations between rows and columns. The correspondence analysis (CA) can be viewed as an approach to decompose the chi-squared statistic associated with a two-way contingency table into orthogonal factors. In fact, because CA is a descriptive technique, it can be applied to tables even if the chi-square test of independence is not appropriate. The only restriction is that the table must contain positive or zero values, the calculating the sum of the rows and the columns is possible, the rows and columns profiles can be interpreted. The correspondence ana