A nonparametric approach to computing the p-value for any test statistic.

#### Overview

In almost all hypothesis testing scenarios, the null hypothesis can be interpreted as follows.

$$H_0$$: Any pattern that has been witnessed in the sampled data is simply due to random chance.

Permutation Tests depend completely on this single idea. If all patterns in the data really are simply due to random chance, then random re-samples of the data should show similar patterns. Thus, the process of a permutation test is:

1. Compute a test statistic for the data.
2. Reuse the data thousands of times to create a sampling distribution of the test statistic.

The sampling distribution is created by permuting (randomly rearranging) the data thousands of times and calculating a test statistic on each permuted version of the data. A histogram of the test statistics then provides the sampling distribution of the test statistic needed to compute the p-value of the original test statistic.

#### Explanation

The most difficult part of a permutation test is in the random permuting of the data. How the permuting is performed depends on the type of hypothesis test being performed.

##### Paired Data Example

See the Sleep Paired t Test example for the background and context of the study. Here is how to perform the test as a permutation test instead of a t test.

The question that this sleep data can answer concerns which drug is more effective at increasing the amount of extra sleep an individual receives. The associated hypotheses would be $H_0: \mu_d = 0$ $H_a: \mu_d \neq 0$ where $$\mu_d$$ denotes the true mean of the differences between the observations for each drug obtained from each individual. Differences would be obtained by $$d_i = \text{extra}_{1i} - \text{extra}_{2i}$$.

To perform a permutation test of the hypothesis that the drugs are equally effective, we use the following code.

# Perform the initial test:
myTest <- with(sleep, t.test(extra[group==1], extra[group==2], paired = TRUE, mu = 0))
# Get the test statistic from the test:
observedTestStat <- myTest$statistic # Obtain the permutation sampling distribution N <- 2000 permutedTestStats <- rep(NA, N) for (i in 1:N){ permuteData <- sample(x=c(-1,1), size=10, replace=TRUE) permutedTest <- with(sleep, t.test(permuteData*(extra[group==1] - extra[group==2]), mu = 0)) #Note, t.test(group1 - group2) is the same as t.test(group1, group2, paired=TRUE). permutedTestStats[i] <- permutedTest$statistic
}
hist(permutedTestStats)
abline(v=observedTestStat, col='skyblue', lwd=3)

# Greater than p-value: (not what we want here)
sum(permutedTestStats >= observedTestStat)/N
## [1] 1
# Less than p-value:
sum(permutedTestStats <= observedTestStat)/N
## [1] 0.0025
# Correct two sided p-value for this study:
2*sum(permutedTestStats <= observedTestStat)/N
## [1] 0.005