Posts

Showing posts from April, 2012

Pentaho Data Integration - Kettle

The Pentaho BI Suite is an open source Business Intelligence suite with integrated reporting, dashboard, data mining, workflow and ETL capabilities ( http://en.wikipedia.org/wiki/Pentaho ). In this tutorial, we talk about the Pentaho BI Suite Community Edition (CE) which is freely downloadable. More precisely, we present the Pentaho Data Integration (PDI-CE) , called also Kettle. We show briefly how to load a dataset and perform a simplistic data analysis. The main goal of this tutorial is to introduce a next one focused on the deployment of the models designed with Knime, Sipina or Weka by using PDI-CE. This document is based on the 4.0.1 stable version of PDI-CE. Keywords : ETL, pentaho data integration, community edition, kettle, BI, business intelligence, data importation, data transformation, data cleansing Tutorial : PDI-CE Dataset : titanic32x.csv.zip References : Pentaho, Pentaho Community

Mining frequent itemsets

Searching regularities from dataset is the main goal of the data mining. They may have various representations. In the market basket analysis, we search the co occurrences of goods (items) i.e. the goods which are often purchased simultaneously. They are called “frequent itemset”. For instance, one result may be "milk and bread are purchased simultaneously in 10% of caddies". Frequent itemset mining is often presented as the preceding step of the association rule learning algorithm. At the end of the process, we highlight the direction of the relation. We obtain rules. For instance, a rule may be "90% of the customers which buy milk and bread will purchase butter also". This kind of rule can be used in various manners. For instance, we can promote the sales of milk and bread in order to increase the sales of butter. In fact, frequent itemsets provide also valuable information. Detecting the goods which are purchased simultaneously enables to understand the relation