Running Non-inferiority Tests
Eppo now support Non-inferiority tests with the Guardrail cutoffs.
This How To guide walks you through how to run non-inferiority tests in Eppo. This evaluation allows you to measure that a new treatment is not significantly worse than an existing or standard treatment in terms of effectiveness or safety.
Analysis
For this guide, we assume that the way you run the non-inferiority analysis is by running a one-sided hypothesis test on whether the impact is at worst %, where is your inferiority tolerance. The closer it is to , the stricter your test is; basically you need stronger evidence of before you call a test non-harmful.
The left endpoint of the confidence interval in Eppo has the same information as the non-inferiority test at half the significance level ().
- To perform the test in Eppo, visually check that the left side of the confidence interval is higher than your non-inferiority tolerance. If it's higher than the tolerance, then you can call the experiment non-harmful. If it's lower than the tolerance, then you don't have enough data to call it non-harmful.
- If the right endpoint is lower than , then you can say the test is harmful. Note that with a permissive tolerance and high statistical power, both of these may happen at the same time!
- For metrics where lower is better, flip everything above. You'll compare the right endpoint to a threshold above 0.
- If want to run your non-inferiority test with , then Eppo's confidence interval with the default of will be what you want. If you are using a one-sided test with , then you would have to set the in Eppo to get the same results.
Example
In this example experiment, you might want to do a non-inferiority test on "Total revenue". Let's say you're willing to move forward as long as the impact is no worse than %. You see that the left side of the confidence interval is %, so you can reject the null hypothesis, aka declare that the test caused no harm. If instead you had a stricter threshold of %, you wouldn't have enough evidence (at that sample size) to make the call that the treatment caused no harm.