t-test

t - 分布

t - 分布,用于根据小样本来估计呈正态分布且方差未知的总体的均值的统计显著性。如果总体方差已知(例如在样本数量足够多时),则应该用正态分布来估计总体均值。

t-test 改进了 z-test 在样本量小于 30 时误差大的问题。

假设 XX 是正态分布的随机变量,其中样本为 nn,均值是 μ\mu ,方差是 σ2\sigma^2 ,均未知。

样本均值为

Xn=inXin\overline{X}_n = \frac{\sum_i^n X_i}{n}

样本方差 (Bessel-corrected) 为

Sn=1n1in(XiX)2S_n = \frac{1}{n-1}\sum_i^n (X_i - \overline X)^2

以下随机变量服从均值为 0 方差为 1 的正态分布

Xnμ0σ/n\frac{\overline X_n - \mu_0}{\sigma / \sqrt n}

单总体情况,t 统计量定义为

t=Xnμ0Sn/nt = \frac{\overline X_n - \mu_0}{S_n / \sqrt n}

双总体情况,t 统计量定义为

t=X1X2SX1X2t = \frac{\overline X_1 - \overline X_2}{S_{\overline X_1 - \overline X_2}}

应用 1

给定样本:算法 A / B 流量下的用户是否转化

提出假设:H0H_0 算法 B 与算法 A 效果一致

计算:样本的 tt 统计量 t0t_0;给定置信度 α\alpha(如 95%),自由度 N1N - 1NN 为样本数),查表得到 t1t_1

结论:如果 t0>>t1t_0 >> t_1,我们有理由拒绝原假设。

应用2

For example, given a sample with a sample variance 2 and sample mean of 10, taken from a sample set of 11 (10 degrees of freedom), using the formula

X±tα,n1Snn\overline X \pm t_{\alpha, n-1}\cdot \frac{S_n}{n}

we can determine that at 90% confidence, we have a true mean lying below 10.58 and over 9.41.

In other words, on average, 80% of the times that upper and lower thresholds are calculated by this method, the true mean is both below the upper threshold and above the lower threshold. This is not the same thing as saying that there is an 80% probability that the true mean lies between a particular pair of upper and lower thresholds that have been calculated by this method.

最小样本数计算

https://www.cnstat.org/samplesize/11/

http://www.evanmiller.org/how-not-to-run-an-ab-test.html

http://www.evanmiller.org/ab-testing/sample-size.html

https://zhuanlan.zhihu.com/p/40919260

原假设 H0:希望推翻的假设

备选假设 H1:希望验证的假设

判断\实际 没区别 H0 有区别 H1
有区别 α\alpha Type I Error, Sig. Level 1β1-\beta Statisic Power
没区别 1α1-\alpha β\beta Type II Error

Statistical Power (1β1 -\beta)统计学功效:在假设检验中, 拒绝原假设后, 接受正确的替换假设的概率。在假设检验中有 α\alpha 错误和 β\beta 错误。α\alpha 错误是 FP 错误, β\beta 错误是 FN 错误。

Significiant Level 显著性水平:α\alpha。同时也是 Type I Error 出现的概率,FP,第一类错误意味着新的产品对业务其实没有提升,我们却错误的认为有提升。

Minimum Detectable Effect 最小改善程度:δ=(tα/2+tβσ2/n)\delta = (t_{\alpha/2} + t_{\beta} \sigma \sqrt{2/n})

样本方差:σ2\sigma^2,在二项分布中 σ2=p×(1p)\sigma^2 = p \times (1 - p)

A/B 测试最小样本: n=16σ2δ2n = 16 \frac{\sigma^2}{\delta^2}

tα,n1t_{\alpha,n -1}

img

tbeta,n1t_{beta,n-1}

本文有帮助?