Posts

Showing posts from March, 2016

Dummy coding for categorical predictor variables

In this tutorial, we show how to perform a dummy coding for categorical predictor variables in the context of the logistic regression learning process. In fact, this is an old tutorial that I was written a long time ago (2007), but it is not referenced in this blog (which was created in 2008). I found it in my archives because I plan to write soon a tutorial about the strategies for the selection of categorical variables in logistic regression. I was wondering if I had already written something that may be linked to this subject (the treatment of the categorical predictors in logistic regression) in the past. Obviously, I would have to check most often my archives. We use Tanagra 1.4.50 in this tutorial. Keywords : logistic regression, dummy coding, categorical predictor variables Components : SAMPLING, O_1_BINARIZE, BINARY LOGISTIC REGRESSION, TEST Tutorial : Dummy coding - Logistic Regression Dataset : heart-c.xlsx   References : Wikipedia, " Logistic Regression "

Cost-Sensitive Learning (slides)

This course material presents approaches for the consideration of misclassification costs in supervised learning. The baseline method is the one for which we do not take into account the costs. Two issues are studied : the metric used for the evaluation of the classifier when a misclassification cost matrix is provided i.e. the expected cost of misclassification (ECM); some approaches which enable to guide the machine learning algorithm towards the minimization of the ECM. Keywords : cost matrix, misclassification, expected cost of misclassification, bagging, metacost, multicost Slides : Cost Sensitive Learning References : Tanagra Tutorial, " Cost-senstive learning - Comparison of tools ", March 2009. Tanagra Tutorial, " Cost-sensitive decision tree ", November 2008.

Hyper-threading and solid-state drive

After more than 6 years of good and faithful service, I decided to change my computer. It must be said that the former (Intel Core 2 Quad Q9400 2.66 Ghz - 4 cores - running Windows 7 - 64 bit) began to make disturbing sounds. I am obliged to put music to cover the rumbling of the beast and be able to work quietly. The choice of the new computer was another matter. I spent the age of the race to the power which is necessarily fruitless anyway, given the rapid evolution of PCs. Nevertheless, I was sensitive to two aspects that I could not evaluate previously: The hyper-threading  technology is effective in programming multithreaded algorithms of data mining? The use of temporary files to relieve the memory occupation takes advantage of SSD  disk technology? The new PC runs under Windows 8.1 (I wrote the French version of this tutorial one year ago). The processor is a Core I7 4770S (3.1 Ghz). It has 4 physical cores but 8 logical cores with the hyper-threading technology. The system disk