R code: the t-test
Code for the Hinton examples
# A company has produced a videotape that they claim improves your IQ. We have a sample of IQ scores from 100 people who have watched the video tape. They have a mean IQ of 104. We want to measure the probability that they would have this mean were they sampled from the general population. THIS EXAMPLE IS EQUIVALENT TO HINTON'S CYADAMINE EXAMPLE
#We can calculate the standard deviation of the sampling distribution. This will be the population standard distribution divided by the square root of the sample size, aka the standard error
standarderror = 15/sqrt(100)
#We can then calculate a z score for this sample mean
z = (104 - 100)/standarderror
#We can find the probability of a sample with this mean or higher being drawn from the general population by either using the table in our book or using pnorm.
1 - pnorm(z,0,1)
# WHEN WE DON'T HAVE A KNOWN POPULATION STANDARD DEVIATION
# This is the shopping example from Perry Hinton's Statistics Explained, pp. 64-69
shoppingdata = c(30, 44, 19, 32, 25, 30, 16, 41, 28, 45, 28, 20, 18, 31, 15, 32, 40, 42, 29, 35, 34, 22, 30, 27, 36, 26, 38, 30, 33, 24, 15, 48, 31, 27, 37, 45, 12, 29, 33, 23, 20, 32, 28, 26, 38, 40, 28, 32, 34, 22)
#We know that the average number of purchases in the supermarket is 25, but we do not have the standard deviation. The closest thing we have is the sample standard deviation. We can use this in the place of the population standard deviation and the math remains basically the same, but instead of a z statistic, we then have a t statistic, and we need to consult a t-distribution to obtain our probabilities. This is known as a one-sample t-test.
standarderror = (sd(shoppingdata)/sqrt(length(shoppingdata)))
t = (mean(shoppingdata)-25) / standarderror
# and is this a significant difference?
# test whether this t value is far out in the upper tail of the relevant t distribution
pt(t, df = length(shoppingdata) - 1, lower.tail = F)
#A simple r command for performing these steps
t.test(shoppingdata,mu=25,alternative="g")
# comparing two samples, Hinton's "two math tests" example
hinton.math = data.frame(participant = 1:8, morning = c(6,4,3,5,7,6,5,6), afternoon = c(5,2,4,4,3,4,5,3))
n = nrow(hinton.math)
# adding the differences into the data frame
hinton.math$diff = hinton.math$morning - hinton.math$afternoon
# s_{X1 - X_2}
mean.diff = mean(hinton.math$diff)
s.sample = sqrt(sum((hinton.math$diff - mean.diff)**2)/(n-1))
# which is the same as to say:
s.sample = sd(hinton.math$diff)
# _ _
# s_{X_1 - X_2}: the standard error
stderr.estimate = s.sample / sqrt(n)
# t value
t = (mean(hinton.math$morning) - mean(hinton.math$afternoon)) / stderr.estimate
# degrees of freedom: n-1 = 7
#
# prediction: scores in sample 1 are higher, that is, mean(X_1) - mean(X_2) is high
pt(t, df=n-1, lower.tail = FALSE)
# which is the same as
t.test(hinton.math$morning, hinton.math$afternoon, alternative = "greater", paired = TRUE)
# COMPARING TWO SAMPLES
# Instead of having one shopping sample following one advertising campaign, we might also have two shopping samples following two different advertising campaigns. We want to determine whether one advertising campaign had a significantly greater impact on sales than the other.
shoppingdata2 = c( 31, 33, 43, 12, 43, 53, 46, 39, 37, 37, 31, 28, 27, 37, 39, 41, 42, 37, 51, 30, 22, 31, 44, 19, 38, 32, 32, 48, 31, 39, 32, 39, 34, 41, 46, 31, 30, 42, 35, 33, 32, 38, 36, 35, 30, 25, 45, 40, 49, 27)
# We can do this using a two-sample t-test
# The test used will vary depending on whether the shopping data was drawn from the same shoppers - were paired samples - or from different shoppers - were independent samples
t.test(shoppingdata,shoppingdata2,paired=T)
t.test(shoppingdata,shoppingdata2,paired=F)