Posts

Showing posts from March, 2012

Sipina add-on for OOCalc

Combining a spreadsheet with the data mining tools is essential for the popularity of these last ones. Indeed, when we deal with a moderate sized dataset (thousands of rows and tens of variables), the spreadsheet is a practical tool for the data preparation. This is also a valuable tool for the preparation of the reports. It is thus not surprising that Excel, and generally speaking a spreadsheet, is one the most used tool by data miners. Both Tanagra and Sipina provide an add-on for Excel. The add-on enables to insert a data mining tool menu into the spreadsheet. The user can select and send the dataset to Tanagra (or Sipina), which is automatically launched. But, only Tanagra provides an add-on for Open Office Calc and Libre Office Calc. It is not available for Sipina. This omission has been corrected for this new version of Sipina ( Sipina 3.9 ). In this tutorial, we show how to install and use the “ SipinaLibrary.oxt ” add-on for Open Office Calc 3.3.0 (OOCalc). The process is the

Tanagra - Version 1.4.43

A few bugs have been fixed and some new features added. The computed contributions of individuals in PCA ( PRINCIPAL COMPONENT ANALYSIS ) have been corrected. It was not valid when we work on a subsample of our data file. This error has been reported by Mr. Gilbert Laffond. The standardization of the factors after VARIMAX ( FACTOR ROTATION ) have been corrected so that their variance coincides with the sum of the squares of the correlations with the axes, and thus with the eigen value associated to the axis. This modification has been suggested by Mr. Gilbert Laffond. During the calculation of the confidence interval of the PLS regression coefficients ( PLS CONF. INTERVAL ), an error may occur when the requested number of axes was upper than the number of predictor variables. It is now corrected. This error has been reported by Mr. Alain Morineau. In some circumstances, an error may occur in FISHER FILTERING , especially when Tanagra is run under Wine for Linux. We introduce some addit

Sipina - Version 3.9

The add-on “ SipinaLibrary.oxt ” was added to the distribution. An additional menu is incorporated into spreadsheet OOCalc . It enables to launch SIPINA from a dataset (range of cells). The add-on operates with Open Office (tested for version 3.3.0) and Libre Office (version 3.5.1). Note that a similar add-on exists for Excel (sipina.xla). It allows to make a connection between Sipina and Excel. Keywords : sipina, OOCalc, open office, libre office, add-on, add-in Sipina website : Sipina Download : Setup file References : Tanagra - SIPINA add-in for Excel Tanagra - Tanagra add-in for Excel 2007 and 2010 Open Office -  http://www.openoffice.org/ Libre Office - http://www.libreoffice.org/

RExcel, a bridge between Excel and R

Combining a specialized data mining tool with a spreadsheet is a very interesting idea. Most of the people know handle a spreadsheet such as Excel (but also LibreOffice Calc, Open Office Calc, Gnumeric, etc.). It is really popular because it is a very easy to use tool for data manipulation. Many data mining tools can read XLS or XLSX file formats. But, it is even more interesting to implement a bridge between the data mining tools and Excel in a bidirectional way. So, we can lead easily the whole analysis by navigating between the tools: transforming the variables into Excel, performing the analysis into the data mining tool, and post-processing the results into Excel. In this tutorial, we describe RExcel library for R. It sets a new menu into Excel. Thus, we can send a dataset to R on the one hand; retrieve dataset or more generally a vector or a matrix from R on the other hand. The tool is really easy to use. Keywords : data importation, excel file format, xls, xlsx, addin, add-in, a

PSPP, an alternative to SPSS

I spend a lot of time to analyze the available free statistical and data mining tools. There is not bad software, but some tools are more appropriate for some tasks. Thus, we must identify the one which is the best suited to our configuration. For that, we must know a large number of tools. In this tutorial, we describe PSPP. It is presented as an alternative to the well-known SPSS: “PSPP is a program for statistical analysis of sampled data. It is a free replacement for the proprietary program SPSS, and appears very similar to it with a few exceptions”. Instead of to describe in detail each feature, the documentation is available on the website, we present some statistical techniques. We compare the results with those of Tanagra , R 2.13.2 and OpenStat (build 24/02/2012). This is also a way to validate them. If they provide different results, it means that there is a problem. Keywords : pspp, R software, openstat, spss, descriptive statistics, t-test , welch test, comparison of mea

Regression analysis with LazStats (OpenStat)

LazStat  is a statistical software which is developed by Bill Miller, the father of OpenStat, a well-know tool by statisticians since many years. These are tools of the highest quality. OpenStat is one of tools that I use when I want to validate my own implementations. Several variants of OpenStat are available. In this tutorial, we study LazStat . It is a version programmed in Lazarus, a development environment which is very similar to Delphi. It is based on the Pascal language. Projects developed in Lazarus benefit to the "write once, compile anywhere" principle i.e. we write our program on an OS (e.g. Windows), but we can compile it on any OS as long as Lazarus and the compiler are available (e.g. Linux). This idea has been proposed by Borland with Kylix  some years ago. We could program a project for both Windows and Linux. But, unfortunately, Kylix has been canceled. It seems that the Lazarus is more mature. In addition, it enables us also to compile the same project for