Posts

Showing posts from January, 2013

New features for PCA in Tanagra

Principal Component Analysis (PCA)  is a very popular dimension reduction technique. The aim is to produce a few number of factors which summarizes as better as possible the amount of information in the data. The factors are linear combinations of the original variables. From a certain point a view, PCA can be seen as a compression technique. The determination of the appropriate number of factors is a difficult problem in PCA. Various approaches are possible, it does not really exist a state-of-art method. The only way to proceed is to try different approaches in order to obtain a clear indication about the good solution. We had shown how to program them under R in a recent paper . These techniques are now incorporated into Tanagra 1.4.45 . We have also added the KMO index (Measure of Sampling Adequacy – MSA) and the Bartlett's test of sphericity  in the Principal Component Analysis tool. In this tutorial, we present these new features incorporated into Tanagra on a realistic examp

Choosing the number of components in PCA

Principal Component Analysis (PCA)  is a dimension reduction technique. We obtain a set of factors which summarize, as well as possible, the information available in the data. The factors (or components) are linear combinations of the original variables. Choosing the right number of factors is a crucial problem in PCA. If we select too much factors, we include noise from the sampling fluctuations in the analysis. If we choose too few factors, we lose relevant information, the analysis is incomplete. Unfortunately, there is not an indisputable approach for the determination of the number of factors. As a rule of thumb, we must select only the interpretable factors, knowing that the choice depends heavily on the domain expertise. And yet, this last one is not always available. We intend precisely to build on the data analysis to get a better knowledge on the studied domain. In this tutorial, we present various approaches for the determination of the right number of factors for PCA based

PCA using R - KMO index and Bartlett's test

Principal Component Analysis (PCA) is a dimension reduction technique. We obtain a set of factors which summarize, as well as possible, the information available in the data. The factors are linear combinations of the original variables. The approach can handle only quantitative variables. We have presented the PCA in previous tutorials. In this paper, we describe in details two indicators used for the checking of the interest of the implementation of the PCA on a dataset: the Bartlett's sphericity test and the KMO index. They are directly available in some commercial tools (e.g. SAS or SPSS). Here, we describe the formulas and we show how to program them under R. We compare the obtained results with those of SAS on a dataset. Keywords : principal component analysis, pca, spss, sas, proc factor, princomp, kmo index, msa, measure of sampling adequacy, bartlett's sphericity test, xlsx package, psych package, R software Components : VARHCA, PRINCIPAL COMPONENT ANALYSIS Tutorial :