# Articles related to "statistics"

## A correlation measure based on Theil-Sen regression

• Association and correlation measures are important tools in descriptive statistics and exploratory data analysis.
• Maybe somewhat surprisingly, for the Anscombe 4 data set, the Theil-Sen, Spearman, and Kendall correlation measures prove to be highly unstable: any noise of even the smallest amplitude added to the data points produces arbitrary values of correlation.
• On the other hand, the Theil-Sen estimator produces arbitrarily large values for the slope of the regression line m(y, x) and values arbitrarily close to zero for m(x, y), leading to an ill-defined product when subjected to noise.
• The median absolute deviation of the x values within the Anscombe 4 data set vanishes, so it comes to no surprise that the Theil-Sen estimated correlation and the rank correlation measures are ill-defined.
• The Theil-Sen estimator for robust simple linear regression can be used to define a correlation measure in analogy to the relation of Pearson’s correlation coefficient with least squares regression.

save | comments | report | share on

## A correlation measure based on Theil-Sen regression

• Association and correlation measures are important tools in descriptive statistics and exploratory data analysis.
• Maybe somewhat surprisingly, for the Anscombe 4 data set, the Theil-Sen, Spearman, and Kendall correlation measures prove to be highly unstable: any noise of even the smallest amplitude added to the data points produces arbitrary values of correlation.
• On the other hand, the Theil-Sen estimator produces arbitrarily large values for the slope of the regression line m(y, x) and values arbitrarily close to zero for m(x, y), leading to an ill-defined product when subjected to noise.
• The median absolute deviation of the x values within the Anscombe 4 data set vanishes, so it comes to no surprise that the Theil-Sen estimated correlation and the rank correlation measures are ill-defined.
• The Theil-Sen estimator for robust simple linear regression can be used to define a correlation measure in analogy to the relation of Pearson’s correlation coefficient with least squares regression.

save | comments | report | share on

## A correlation measure based on Theil-Sen regression

• Association and correlation measures are important tools in descriptive statistics and exploratory data analysis.
• Maybe somewhat surprisingly, for the Anscombe 4 data set, the Theil-Sen, Spearman, and Kendall correlation measures prove to be highly unstable: any noise of even the smallest amplitude added to the data points produces arbitrary values of correlation.
• On the other hand, the Theil-Sen estimator produces arbitrarily large values for the slope of the regression line m(y, x) and values arbitrarily close to zero for m(x, y), leading to an ill-defined product when subjected to noise.
• The median absolute deviation of the x values within the Anscombe 4 data set vanishes, so it comes to no surprise that the Theil-Sen estimated correlation and the rank correlation measures are ill-defined.
• The Theil-Sen estimator for robust simple linear regression can be used to define a correlation measure in analogy to the relation of Pearson’s correlation coefficient with least squares regression.

save | comments | report | share on

## Double Slit Experiment and Bayes (2019)

• The short answer is that the Bayes rule holds if we have a joint distribution \$(y,x)\$, from which a version of the conditional probability \$p(y\vert x)\$ is defined.
• Real-world models are important for the application of probability, and it makes a lot of sense to me that such an important concept has many different real-world analogies, none of which are perfect.
• We should not use a mixture model in the first place, and therefore it is misleading to write the conditional law \$p(y\vert x=1)\$.
• Just as \$p(y\vert x)\$ implies a wrong mixture model, by writing \$(y_{1, \dots, n}\vert M_k)\$ we have already assumed there is a single model \$M_k\$ that generates all the data.
• Not surprisingly, fitting a mixture model (and infer which data comes from which model) results in a different predictive distribution than fitting two models separately and mix them back.

save | comments | report | share on