Sign Up Now!

Sign up and get personalized intelligence briefing delivered daily.


Sign Up

Articles related to "statistics"


A correlation measure based on Theil-Sen regression

  • Association and correlation measures are important tools in descriptive statistics and exploratory data analysis.
  • Maybe somewhat surprisingly, for the Anscombe 4 data set, the Theil-Sen, Spearman, and Kendall correlation measures prove to be highly unstable: any noise of even the smallest amplitude added to the data points produces arbitrary values of correlation.
  • On the other hand, the Theil-Sen estimator produces arbitrarily large values for the slope of the regression line m(y, x) and values arbitrarily close to zero for m(x, y), leading to an ill-defined product when subjected to noise.
  • The median absolute deviation of the x values within the Anscombe 4 data set vanishes, so it comes to no surprise that the Theil-Sen estimated correlation and the rank correlation measures are ill-defined.
  • The Theil-Sen estimator for robust simple linear regression can be used to define a correlation measure in analogy to the relation of Pearson’s correlation coefficient with least squares regression.

save | comments | report | share on


A correlation measure based on Theil-Sen regression

  • Association and correlation measures are important tools in descriptive statistics and exploratory data analysis.
  • Maybe somewhat surprisingly, for the Anscombe 4 data set, the Theil-Sen, Spearman, and Kendall correlation measures prove to be highly unstable: any noise of even the smallest amplitude added to the data points produces arbitrary values of correlation.
  • On the other hand, the Theil-Sen estimator produces arbitrarily large values for the slope of the regression line m(y, x) and values arbitrarily close to zero for m(x, y), leading to an ill-defined product when subjected to noise.
  • The median absolute deviation of the x values within the Anscombe 4 data set vanishes, so it comes to no surprise that the Theil-Sen estimated correlation and the rank correlation measures are ill-defined.
  • The Theil-Sen estimator for robust simple linear regression can be used to define a correlation measure in analogy to the relation of Pearson’s correlation coefficient with least squares regression.

save | comments | report | share on


A correlation measure based on Theil-Sen regression

  • Association and correlation measures are important tools in descriptive statistics and exploratory data analysis.
  • Maybe somewhat surprisingly, for the Anscombe 4 data set, the Theil-Sen, Spearman, and Kendall correlation measures prove to be highly unstable: any noise of even the smallest amplitude added to the data points produces arbitrary values of correlation.
  • On the other hand, the Theil-Sen estimator produces arbitrarily large values for the slope of the regression line m(y, x) and values arbitrarily close to zero for m(x, y), leading to an ill-defined product when subjected to noise.
  • The median absolute deviation of the x values within the Anscombe 4 data set vanishes, so it comes to no surprise that the Theil-Sen estimated correlation and the rank correlation measures are ill-defined.
  • The Theil-Sen estimator for robust simple linear regression can be used to define a correlation measure in analogy to the relation of Pearson’s correlation coefficient with least squares regression.

save | comments | report | share on


Double Slit Experiment and Bayes (2019)

  • The short answer is that the Bayes rule holds if we have a joint distribution $(y,x)$, from which a version of the conditional probability $p(y\vert x)$ is defined.
  • Real-world models are important for the application of probability, and it makes a lot of sense to me that such an important concept has many different real-world analogies, none of which are perfect.
  • We should not use a mixture model in the first place, and therefore it is misleading to write the conditional law $p(y\vert x=1)$.
  • Just as $p(y\vert x)$ implies a wrong mixture model, by writing $(y_{1, \dots, n}\vert M_k)$ we have already assumed there is a single model $M_k$ that generates all the data.
  • Not surprisingly, fitting a mixture model (and infer which data comes from which model) results in a different predictive distribution than fitting two models separately and mix them back.

save | comments | report | share on