Posts

Showing posts from September, 2011

A PRIORI PT updated

A PRIORI PT is a tool dedicated for the extraction of association rules. This is one of the few components of Tanagra based on external library. We use the Borgelt's "apriori.exe" program. Until the version 1.4.40 of Tanagra, we used the 4.31 version of "apriori.exe". From the Tanagra 1.4.41 , we introduce the latest update 5.57 (2011/09/02). Even if the settings of the tool are slightly modified, we observe that the extracted rules and the readings of the results are identical. We take again a former tutorial to describe the behavior of this component ( Association Rule Learning using A PRIORI PT ). Thus, we do not detail the construction of the diagram here. We try above all to highlight the improvement of the library, especially about the computation time. We observe that this improvement is really impressive. Keywords : association rule, large dataset Components : A priori PT Tutorial : en_Tanagra_AprioriPT_Updated.pdf Dataset : assoc_census.zip Reference :

Tanagra - Version 1.4.41

A PRIORI PT . This component generates association rules. It is based on the Borgelt's apriori.exe program which has been recently updated (2011/09/02 - 5.57 version). The improvement of this new version, in terms of calculation time, is impressive. FREQUENT ITEMSETS . Also based on the Borgelt's apriori.exe program (version 5.57), this component generates frequent (or closed, maximum, generators) itemsets. Some tutorials are coming soon to describe the use of these new tools. Donwload page : setup

New GUI for RapidMiner 5.0

RapidMiner is a very popular data mining tool. It is (one of) the most used by the data miners according to the annual Kdnuggets polls (2011, 2010, 2009, 2008, 2007). There are two versions. We describe here the Community Edition which freely downloadable from the editor's website. The new RapidMiner 5.0 has a new graphical user interface which is very similar to that of Knime. The organization of the workspace is the same. The sequence of data processing (using operators) is described with a diagram called "process" into the RapidMiner documentation. In fact, this version 5.0 joined the presentation adopted by the vast majority of data mining software. Some features are shared with many tools, among others: the connection to the R software; the meta-nodes which implements a loop or a standard succession of operations; the description of the methods underlying operators which is continuously in the right part of the main window. RapidMiner 5.0 having evolved substantiall

Regression model deployment

Model deployment is one of the main objectives of the data mining process. We want to apply a model learned on a training set on unseen cases i.e. any people coming from the population. In the classification framework, the aim is to assign to the instance its class value from their description [e.g. Apply a classifier on a new dataset (Deployment) ]. In the clustering framework, we try to detect the group which is as similar as possible to the instance according their characteristics (e.g. K-Means - Classification of a new instance ). We are concerned about the regression framework here . The aim is to predict the values of the dependent variable for unseen instances (or unlabeled instances) from the observed values on the independent variables. The process is rather basic if we handle a linear regression model. We apply the computed parameters on the unseen instances. But, it becomes difficult when we want to treat more complex models such as support vector regression with nonlinear k