1
a
x = c(121, 105, 111, 119, 108, 101, 90, 131, 106, 112)
y = c(101, 110, 107, 98, 89, 103, 86, 117, 113, 87)
len_x = length(x)
len_y = length(y)
mean_x = mean(x)
mean_y = mean(y)
var_x = var(x)
var_y = var(y)
v = (var_x/len_x+var_y/len_y)^2 /
( (var_x/len_x)^2/(len_x-1) + ((var_y/len_y)^2/(len_y-1)) )
t_v = qt(0.05/2, df=v, lower.tail=FALSE)
c(mean_x-mean_y-t_v*sqrt(var_x/len_x+var_y/len_y),
mean_x-mean_y+t_v*sqrt(var_x/len_x+var_y/len_y))
# OR
t.test(x, y, mu=0, alternative="two.sided", conf.level=0.95)[1] -1.2458 19.8458
Welch Two Sample t-test
data: x and y
t = 1.8529, df = 17.979, p-value = 0.08039
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.2458 19.8458
sample estimates:
mean of x mean of y
110.4 101.1
We use the Welch’s t-test to obtain the following CI
We do not reject because 0 is in the CI, i.e. we do not have enough evidence to conclude that there is a difference in mean
b
t_val = (mean_x-mean_y-25) / sqrt(var_x/len_x+var_y/len_y)
p_val = pt(t_val, df=v, lower.tail=FALSE)
t_sta = qt(0.05, df=v, lower.tail=FALSE)
print(c(t_val, p_val, t_sta))
# OR
t.test(x, y, mu=25, alternative="greater", conf.level=0.95)[1] -3.1279976 0.9970909 1.7341734
Welch Two Sample t-test
data: x and y
t = -3.128, df = 17.979, p-value = 0.9971
alternative hypothesis: true difference in means is greater than 25
95 percent confidence interval:
0.5958623 Inf
sample estimates:
mean of x mean of y
110.4 101.1
The p-value is 0.9970909, which is not less than 0.05
We do not reject , i.e. we do not have enough evidence to conclude that the mean difference is greater than 25
2
a
- Group 1: , iid samples from
- Group 2: , iid samples from
- …
- …
- Group 5: , iid samples from
Assume samples from different groups are independent and is unknown but equal
concrete_data = read.csv("concrete_data.csv")
x1 = concrete_data$X1
x2 = concrete_data$X2
x3 = concrete_data$X3
x4 = concrete_data$X4
x5 = concrete_data$X5
boxplot(concrete_data)
dat = data.frame(
moisture=c(x1, x2, x3, x4, x5),
aggregate=as.factor(
c(rep(1,length(x1)),
rep(2,length(x2)),
rep(3,length(x3)),
rep(4,length(x4)),
rep(5,length(x5))
)
)
)
# F-stat
print(qf(0.02, df1=5-1, df2=30-5, lower.tail=FALSE))
res_aov = aov(moisture~aggregate, data=dat)
summary(res_aov)[1] 3.549423
Df Sum Sq Mean Sq F value Pr(>F)
aggregate 4 85356 21339 4.302 0.00875 **
Residuals 25 124020 4961
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Since (), we reject , i.e. some means are different
b
TukeyHSD(res_aov, conf.level=0.98) Tukey multiple comparisons of means
98% family-wise confidence level
Fit: aov(formula = moisture ~ aggregate, data = dat)
$aggregate
diff lwr upr p adj
2-1 -57.1666667 -193.158527 78.825194 0.6297485
3-1 -145.3333333 -281.325194 -9.341473 0.0116387
4-1 -41.1666667 -177.158527 94.825194 0.8472695
5-1 0.1666667 -135.825194 136.158527 1.0000000
3-2 -88.1666667 -224.158527 47.825194 0.2243248
4-2 16.0000000 -119.991861 151.991861 0.9946026
5-2 57.3333333 -78.658527 193.325194 0.6272414
4-3 104.1666667 -31.825194 240.158527 0.1088202
5-3 145.5000000 9.508139 281.491861 0.0115253
5-4 41.3333333 -94.658527 177.325194 0.8453941
The result from TukeyHSD shows that for 3-1 and 5-3 that 0 is not inside their CI and their p-value is less than 0.02, so we have enough evidence to conclude that they are different
We can also see that aggregate 3 is less than 1 and aggregate 3 is less than 5
c
oneway.test(moisture~aggregate, data=dat) One-way analysis of means (not assuming equal variances)
data: moisture and aggregate
F = 5.4163, num df = 4.000, denom df = 12.372, p-value = 0.009433
By using Welch’s ANOVA, the p-value is 0.009433
3
k = 3
ni = 12
n = ni*k
ma = 32
mb = 40
mc = 30
m = (ma+mb+mc)/k
va = 145
vb = 138
vc = 150
SSTreat = ni*(ma-m)^2 + ni*(mb-m)^2 + ni*(mc-m)^2
MSTreat = SSTreat/(k-1)
SSE = (ni-1)*va + (ni-1)*vb + (ni-1)*vc
MSE = SSE/(n-k)
F_value = MSTreat/MSE
print(F_value)
F_stat = qf(0.05, df1=k-1, df2=n-k, lower.tail=FALSE)
print(F_stat)
p_value = pf(F_value, df1=k-1, df2=n-k, lower.tail=FALSE)
print(p_value)[1] 2.327945
[1] 3.284918
[1] 0.1133019
Since (p-value > 0.05), we do not reject , i.e. we do not have enough evidence to reject that the mean time to clear a mild asthmatic attack is the same for all three steroids