This post is based on the the paper Berger and Hsu (1996), one might want to check the paper if detailed derivation and proofs are needed. Here I just wish to briefly summarize and explain some interesting conclusions by my intuitive thoughts as well as a R simulation.

As discussed earlier, the TOST method (Two One-Sided t Tests) is developed for establishing the bioequivalence of two drugs. Three conclusions regarding the TOST are interesting and unusual to me, which I think needs to be better understood.

**First**, given a specified significance level α (say, 5%) for each t test, the TOST is exactly a size-α test, which means that two combined size-α tests form a size-α test in this case. This conclusion seems strange at first glance. The rational behind is that the two t tests are perfectly correlated with each other in the bioequivalence setting. By this I mean only both of the two null hypotheses are rejected that can we establish bioequivalence. Yes, they need to be **rejected simultaneously**! This further explains why there is no need for multiplicity adjustment for the test or p-value, as one might think that adjustment for multiply hypotheses should be employed.

**Second**, the confidence interval given by the following is actually the 100(1-2*α*)% confidence interval that corresponds to the size-*α* TOST.

[D-t(α, r)*SE(D),D+t(α, r)*SE(D)] (1)

where *D* is the sample difference, and *SE(D)* is the standard error, *t(α, r)* is the t table value with *α* the significance level and *r* the degrees of freedom. This conclusion is straightforward in that it could be treated as the intersection of two 100(1-*α*)% confidence intervals corresponding to those two 1-sided t tests.

**Third**, the following confidence interval is the 100(1-α)% confidence interval that corresponds to the size-*α* TOST.

[min(0,D-t(α, r)*SE(D)), max(0,D+t(α, r)*SE(D))] (2)

The notations remain the same.

And quote from the paper, “the 100(1-*α*)% interval (2) is equal to the 100(1-2*α*)% interval (1) when the interval (1) contains zero. But, when the interval (1) lies to the right (left) of zero, the interval (2) extends from zero to the upper (lower) endpoint of interval (1)”.

This last point is hard to grasp. The paper gave the rigorous theoretical proof. Here I am going to demonstrate this conclusion by a R simulation. See the source code below.

# TOST CI simulation, By Aaron Zeng, 3/13/2014 # set.seed(20140313) R100 <- 0.51 # Reference mean T100 <- 1*R100 # simulate data under alternative H, assume equal, one can change this. D.true <- T100 - R100 sigma <- 0.1 alpha <- 0.05 n <- 100 # sample size # simulate 100 coverage probabilities, compute means. sim.num <- 100 # generate 1000 confidence intervals for computing coverage probability rep <- 1000 coverge.ci90 <- NULL coverge.ci95 <- NULL for(k in 1:sim.num) { ci.90 <- NULL ci.95 <- NULL for(i in 1:rep) { # simulate data for Reference group samp.R100 <- rnorm(n, mean=R100, sd=sigma) samp.T100 <- rnorm(n, mean=T100, sd=sigma) D.mean <- mean(samp.T100 - samp.R100) # Assume equal variance, use pooled variance s.pool <- sqrt((n-1)*var(samp.R100)/(2*n-2) + (n-1)*var(samp.T100)/(2*n-2)) D.se <- s.pool*sqrt(1/n + 1/n) ci.90 <- rbind(ci.90, c(D.mean - qt(1-alpha, df = 2*n-2)*D.se, D.mean + qt(1-alpha, df = 2*n-2)*D.se)) ci.95 <- rbind(ci.95, c(min(0, D.mean - qt(1-alpha, df = 2*n-2)*D.se), max(0, D.mean + qt(1-alpha, df = 2*n-2)*D.se))) } coverge.ci90 <- c(coverge.ci90, sum(1*((ci.90[, 1] <= D.true) & (ci.90[, 2] >= D.true)))/rep) coverge.ci95 <- c(coverge.ci95, sum(1*((ci.95[, 1] <= D.true) & (ci.95[, 2] >= D.true)))/rep) } mean(coverge.ci90) mean(coverge.ci95)

The simulation results are as below.

> coverge.ci90 [1] 0.891 0.907 0.893 0.886 0.904 0.904 0.890 0.902 0.903 0.889 0.900 0.900 0.876 0.903 0.886 [16] 0.903 0.896 0.907 0.892 0.897 0.894 0.902 0.900 0.906 0.890 0.903 0.885 0.894 0.905 0.891 [31] 0.898 0.898 0.905 0.895 0.900 0.899 0.910 0.907 0.893 0.897 0.894 0.896 0.912 0.894 0.909 [46] 0.900 0.880 0.896 0.896 0.926 0.893 0.912 0.897 0.901 0.885 0.904 0.896 0.894 0.900 0.880 [61] 0.901 0.895 0.892 0.915 0.898 0.917 0.896 0.896 0.890 0.922 0.885 0.907 0.899 0.910 0.906 [76] 0.884 0.902 0.909 0.904 0.909 0.916 0.909 0.912 0.903 0.895 0.909 0.902 0.906 0.901 0.906 [91] 0.892 0.895 0.905 0.914 0.898 0.905 0.906 0.905 0.896 0.884 > coverge.ci95 [1] 0.937 0.950 0.946 0.945 0.934 0.952 0.943 0.945 0.955 0.951 0.954 0.942 0.944 0.947 0.950 [16] 0.951 0.940 0.949 0.941 0.954 0.948 0.949 0.954 0.959 0.950 0.939 0.944 0.944 0.952 0.945 [31] 0.942 0.958 0.946 0.949 0.954 0.950 0.961 0.954 0.947 0.957 0.954 0.940 0.958 0.947 0.958 [46] 0.946 0.937 0.948 0.951 0.957 0.949 0.960 0.952 0.945 0.933 0.953 0.941 0.945 0.948 0.948 [61] 0.961 0.947 0.945 0.959 0.942 0.954 0.945 0.958 0.941 0.967 0.942 0.954 0.947 0.954 0.961 [76] 0.948 0.952 0.959 0.948 0.955 0.962 0.943 0.952 0.960 0.943 0.954 0.947 0.946 0.948 0.947 [91] 0.947 0.952 0.963 0.955 0.944 0.961 0.951 0.946 0.948 0.941 > mean(coverge.ci90) [1] 0.89962 > mean(coverge.ci95) [1] 0.94951

Basically the simulation proves the conclusions about the two types of confidence intervals. And note that the equal variance assumption is required for the formulation of hypotheses for bioequivalence.

Reference:

- Berger and Hsu (1996), Bioequivalence trials, intersection-union tests and equivalence confidence sets. Statistical Science, Vol. 11, No. 4, 283-319.