Confidence Interval

with Normal Distribution

Hypothesis Testing

Two-sample z-test with known variance

Reject at a confidence level if

Two-sample t-test with unknown equal variance

Reject at a confidence level if

  • Two-sided test
  • One sided greater test
  • One-sided less test

Two-sample t-test with unknown unequal variance (Welch’s t-test)

Use Welch’s t-test by default if no information is given

Power t-test (optional)

power.t.test(delta=2.5, sd=2, sig.level = 0.05, power=0.80, type="two.sample",alternative = "two.sided")

ANOVA

  • Compare two or more groups of data/samples

  • Factor is a categorical variable denoting the different groups the data come from

  • Possible values of factor is called levels

  • The main numerical variable of the sample values is called the dependent variable

  • If only 1 factor, one-way ANOVA

  • If multiple factors, multi-way ANOVA (optional for this course)

Median

  1. Order the data points from small to large
  2. If n is odd, then median is the th data point
  3. If n is even, then median is the average of and and th data points

Quantile

  1. Order the data points from small to large and calculate
  2. If k is an integer, the k-th data point is the quantile
  3. If k is not an integer, the quantile is the average and th data point. is the largest integer smaller than k.

Boxplot

  • Points outside are potential outliers
  • LEFT: Smallest value greater than lower quartile minus 1.5 times IQR
  • RIGHT: Largest value less than upper quartile plus 1.5 times IQR
    • Think of it as shrinking the 2 bars in order to fit to a data point

One-way ANOVA

  1. Group 1: , iid samples from
  2. Group n: , iid samples from
  3. Assume samples from different groups are independent
  4. Assume variance unknown but equal, i.e.

is noise

Types of Variance

  • Total variance, sum of square total

is the mean of all the sample data

  • Between (group) variance, sum of square treatment
  • Within (group) variance, sum of square error
  • Relations

F Statistics

If null hypo is true, SST should be close to 0

A large support

MS_Treat = Mean square treatment
MSE = Mean square error

n - k because each group have n_each - 1 freedom, sum together we have n_total - k

Reject if F is large

Under , ,

Reject H0 if

Relation to t-test

if , we can apply both two-sided two-sample t-test and ANOVA

  • Equivalent if ,
  • Welch’s t-test is equivalent to Welch’s ANOVA at ,

Tukey’s Honestly Significant Difference (HSD)

res_aov = aov(time~treatment, data=rat_poison)
TukeyHSD(res_aov)
plot(TukeyHSD(res_aov))