ANOVA

Confidence Interval

$μ_{X} - μ_{Y}$ with Normal Distribution

P - z_{\frac{α}{2}} \leq \frac{( X ˉ - Y ˉ ) - ( μ _{X} - μ _{Y} )}{\frac{σ _{X}^{2}}{n} + \frac{σ _{Y}^{2}}{m}} \leq z_{\frac{α}{2}} = 1 - α

Hypothesis Testing

Two-sample z-test with known variance

P \frac{X ˉ - Y ˉ}{\frac{σ _{X}^{2}}{n} + \frac{σ _{Y}^{2}}{m}} > z_{\frac{α}{2}} = α

Reject $H_{0}$ at a confidence level $α$ if

∣ \overset{x}{ˉ} - \overset{y}{ˉ} ∣ > z_{\frac{α}{2}} \frac{σ _{X}^{2}}{n} + \frac{σ _{Y}^{2}}{m}

Two-sample t-test with unknown equal variance

S_{p}^{2} = \frac{( n - 1 ) S _{n - 1, X}^{2} + ( m - 1 ) S _{n - 1, Y}^{2}}{n + m - 2}

Reject $H_{0}$ at a confidence level $α$ if

Two-sided test

\overset{x}{ˉ} - \overset{y}{ˉ} > t_{n + m - 2, \frac{α}{2}} S_{p} \frac{1}{n} + \frac{1}{m}

One sided greater test

\overset{x}{ˉ} - \overset{y}{ˉ} > t_{n + m - 2, α} S_{p} \frac{1}{n} + \frac{1}{m}

One-sided less test

\overset{x}{ˉ} - \overset{y}{ˉ} > t_{n + m - 2, α} S_{p} \frac{1}{n} + \frac{1}{m}

Two-sample t-test with unknown unequal variance (Welch’s t-test)

v = \frac{( \frac{s _{n - 1, X}^{2}}{n} + \frac{s _{n - 1, Y}^{2}}{m} ) ^{2}}{\frac{1}{n - 1} ( \frac{s _{n - 1, X}^{2}}{n} ) ^{2} + \frac{1}{m - 1} ( \frac{s _{n - 1, Y}^{2}}{M} ) ^{2}}

∣ \overset{x}{ˉ} - \overset{y}{ˉ} ∣ > t_{v, α /2} \frac{s _{n - 1, X}^{2}}{n} + \frac{s _{n - 1, Y}^{2}}{m}

Use Welch’s t-test by default if no information is given

Power t-test (optional)

power.t.test(delta=2.5, sd=2, sig.level = 0.05, power=0.80, type="two.sample",alternative = "two.sided")

ANOVA

Compare two or more groups of data/samples
Factor is a categorical variable denoting the different groups the data come from
Possible values of factor is called levels
The main numerical variable of the sample values is called the dependent variable
If only 1 factor, one-way ANOVA
If multiple factors, multi-way ANOVA (optional for this course)

Median

Order the data points from small to large
If n is odd, then median is the $\frac{n + 1}{2}$ th data point
If n is even, then median is the average of and $\frac{n}{2}$ and $\frac{n}{2} + 1$ th data points

Quantile

Order the data points from small to large and calculate $k = n q + 0.5$
If k is an integer, the k-th data point is the quantile
If k is not an integer, the quantile is the average $⌊ k ⌋$ and $⌊ k ⌋ + 1$ th data point. $⌊ k ⌋$ is the largest integer smaller than k.

Boxplot

Points outside $[Q 1 - 1.5 \times I QR, Q 3 + 1.5 \times I QR]$ are potential outliers
LEFT: Smallest value greater than lower quartile minus 1.5 times IQR
RIGHT: Largest value less than upper quartile plus 1.5 times IQR
- Think of it as shrinking the 2 bars in order to fit to a data point

One-way ANOVA

Group 1: $Y_{1, 1}, \dots, Y_{1, n_{1}}$ , iid samples from $N (μ_{1}, σ_{1}^{2})$
Group n: $Y_{n, 1}, \dots, Y_{n, n_{k}}$ , iid samples from $N (μ_{k}, σ_{k}^{2})$
Assume samples from different groups are independent
Assume variance unknown but equal, i.e. $σ_{1}^{2} = \dots = σ_{k}^{2} = σ^{2}$

Y_{ij} = μ_{i} + ϵ_{ij} \sim N (0, σ^{2})

$ϵ$ is noise

H_{0} : μ_{1} = \dots = μ_{k} H_{1} : some means are different

Types of Variance

Total variance, sum of square total

SST = i = 1 \sum k j = 1 \sum n_{i} (y_{ij} - \overset{y}{ˉ}_{..})^{2}

$\overset{y}{ˉ}_{..}$ is the mean of all the sample data

Between (group) variance, sum of square treatment

S S_{T re a t} = i = 1 \sum k j = 1 \sum n_{i} (\overset{y}{ˉ}_{i .} - \overset{y}{ˉ}_{..})^{2} = i = 1 \sum k n_{i} (\overset{y}{ˉ}_{i .} - \overset{y}{ˉ}_{..})^{2}

Within (group) variance, sum of square error

SSE = i = 1 \sum k j = 1 \sum n_{i} (y_{ij} - \overset{y}{ˉ}_{i .})^{2}

Relations

SST = S S_{T re a t} + SSE

F Statistics

If null hypo is true, SST should be close to 0

A large $\frac{S S _{T re a t}}{SSE}$ support $H_{1}$

F = \frac{S S _{T re a t} / ( k - 1 )}{SSE / ( n - k )} = \frac{M S _{T re a t}}{MSE} \sim F (k - 1, n - k)

MS_Treat = Mean square treatment
MSE = Mean square error

n - k because each group have n_each - 1 freedom, sum together we have n_total - k

Reject $H_{0}$ if F is large

Under $H_{0}$ , $E (M S_{T re a t}) = E (MSE) = E (MST) = σ^{2}$ , $MST = \frac{SST}{n - 1}$

Reject H0 if $F > F_{k - 1, n - k, α}$

Relation to t-test

if $k = 2$ , we can apply both two-sided two-sample t-test and ANOVA

Equivalent if $k = 2$ , $F = t^{2}$
Welch’s t-test is equivalent to Welch’s ANOVA at $k = 2$ , $F_{W} = t^{2}$

Tukey’s Honestly Significant Difference (HSD)

res_aov = aov(time~treatment, data=rat_poison)
TukeyHSD(res_aov)
plot(TukeyHSD(res_aov))

🏡

Explorer

ANOVA

Confidence Interval

$μ_{X} - μ_{Y}$ with Normal Distribution

Hypothesis Testing

Two-sample z-test with known variance

Two-sample t-test with unknown equal variance

Two-sample t-test with unknown unequal variance (Welch’s t-test)

Power t-test (optional)

ANOVA

Median

Quantile

Boxplot

One-way ANOVA

Types of Variance

F Statistics

Relation to t-test

Tukey’s Honestly Significant Difference (HSD)

Explorer

Table of Contents

Backlinks

🏡

Explorer

ANOVA

Confidence Interval

μX​−μY​ with Normal Distribution

Hypothesis Testing

Two-sample z-test with known variance

Two-sample t-test with unknown equal variance

Two-sample t-test with unknown unequal variance (Welch’s t-test)

Power t-test (optional)

ANOVA

Median

Quantile

Boxplot

One-way ANOVA

Types of Variance

F Statistics

Relation to t-test

Tukey’s Honestly Significant Difference (HSD)

Explorer

Table of Contents

Backlinks

$μ_{X} - μ_{Y}$ with Normal Distribution