Posts

Showing posts with the label Statistical methods

Python - Statistics with SciPy (slides)

This course material presents the use of some modules of SciPy, a library for scientific computing in Python. We study especially the stats package, it allows to perform statistical tests such as comparison of means for independent and related samples, comparison of variances, measuring the association between two variables. We study also the cluster package, especially the k-means and the hierarchical agglomerative clustering algorithms. SciPy handles NumPy vectors and matrices which were presented previously. Keywords : python, numpy, scipy, descriptive statistics, cumulative distribution functions, sampling, random number generator, normality test, test for comparing populations, pearson correlation, spearman correlation, cluster analysis, k-means, hac, dendrogram Slides : scipy.stats and scipy.cluster Dataset and programs: SciPy - Programs and dataset References : SciPy Reference Guide sur SciPy.org Python - Official Site

Correlation analysis (slides)

The aim of the correlation analysis is to characterize the existence, the nature and the strength of the relationship between two quantitative variables. The visual inspection of scatter plots is a prime instrument in a first step, when we have no idea about the form of the underlying relationship between the variables. But, in second step, we need statistical tools to measure the strength of the relationship and to assess its significance. In these slides, we present the Pearson's product-moment correlation. We show how to estimate its value using a sample. We present the inferential tools which enable to realize hypothesis testing and confidence interval estimation. But the Pearson correlation is appropriate only to characterize linear relationship. We study the possible solutions for problematic situations with, among others, the Spearman's rank correlation coefficient (Spearman's rho). Last, the partial correlation coefficient and the related inferential tools are descr...

PSPP, an alternative to SPSS

I spend a lot of time to analyze the available free statistical and data mining tools. There is not bad software, but some tools are more appropriate for some tasks. Thus, we must identify the one which is the best suited to our configuration. For that, we must know a large number of tools. In this tutorial, we describe PSPP. It is presented as an alternative to the well-known SPSS: “PSPP is a program for statistical analysis of sampled data. It is a free replacement for the proprietary program SPSS, and appears very similar to it with a few exceptions”. Instead of to describe in detail each feature, the documentation is available on the website, we present some statistical techniques. We compare the results with those of Tanagra , R 2.13.2 and OpenStat (build 24/02/2012). This is also a way to validate them. If they provide different results, it means that there is a problem. Keywords : pspp, R software, openstat, spss, descriptive statistics, t-test , welch test, comparison of mea...