WMW(): About the Examples
The in-depth examples within the WMW() code explore far more than basic analyses. Some strategies challenge confidential "wisdom" in statistical practice. Statistical science should never be mechanistic.
Typical data analyses. Almost all Y measurements encountered in practice have the potential to have ties between Y1 and Y2. When Y has only a few possible values, such ties will typically be plentiful. Example 1 stems from Hanley and McNeil (1982) in which Y is discrete with only as five ordered categories. Example 2 reanalyzes data used by Agresti (1980) in which Y has only three categories. For Example 3, I created a dataset that conforms to the touted published results from a small clinical trial in which the Y was theoretically continuous but possibly had ties due to measurement rounding.
Tailoring the CI to the research question. We gain analytic precision and improve communication by tailoring the type of CI to the research question. This often leads to the sound use of one-sided CIs, thus challenging the dogma that CIs must be two-sided.
Many research questions are not hypotheses. When a statistical estimate fully answers the research question, providing a confidence interval is judicious, but computing a p-value is misguided.
If the null hypotheses is prima facie untrue, why test it? Here, the p-value for classical hypothesis comparison, H0: WMWprob = 0.50 versus H1: WMWprob ≠ 0.50, assesses the how likely the observed WMWprob or one more deviant from H0 would have been observed if the true WMWprob was exactly 0.50. If H0 is true (and the data satisfy all distributional conditions perfectly), the resulting p-value has a standard uniform distribution, so the chance is P that that the p-value will be less than or equal to P. The p-value is a convoluted surrogate for directly assessing how likely H0 is true, as any Bayesian will tell you. Although diehard frequentists may hate the idea, shouldn't anyone using a p-value always consider the a priori viability of the null hypothesis?
Suppose subjects with a given disease were randomly assigned to receive drugs A or B, both known to be active in the disease process but in different ways, and Y is some biological measure that captures efficacy. What is the chance that groups A and B are identical with respect to Y? Miniscule. If so, the p-value H0: WMWprob = 0.50 versus H1: WMWprob ≠ 0.50 has little or no probative value in comparing A and B.
Yet, some null hypotheses are viable (and some investigators and statisticians are still entrenched in p-value-ism). Hence, if H0 is specified, WMW() will compute a p-value .
Click here for more.
Suppose subjects with a given disease were randomly assigned to receive drugs A or B, both known to be active in the disease process but in different ways, and Y is some biological measure that captures efficacy. What is the chance that groups A and B are identical with respect to Y? Miniscule. If so, the p-value H0: WMWprob = 0.50 versus H1: WMWprob ≠ 0.50 has little or no probative value in comparing A and B.
Yet, some null hypotheses are viable (and some investigators and statisticians are still entrenched in p-value-ism). Hence, if H0 is specified, WMW() will compute a p-value .
Click here for more.
Sample-size analysis based on confidence intervals (not p-values). Classical power analysis asks, What's the probability that the p-value for my hypothesis test will be significant? If, however, confidence intervals are the primary analysis tool, the statistical planning should guesstimate the properties of those CIs that have been tailored to address the key research questions. Example 4a deals with designing a study to find differences between the groups, the most common type. A distinction is made between traditional and essential confidence interval power. Examples 4b and 4c deal with designing non-inferiority and equivalence studies.
Qscores. WMW() returns an object called $Qscore, which supports creating custom Tufte-esque plots to "show the data" in a manner consistent with WMWprob. See Example 5.
The WMW method does not compare medians. Contrary to widespread belief, the WMW method has no functional relationship to the difference between the two medians. This was discussed at length by Divine, Norton, Baron, Juarez-Colunga (2018), who endorsed the use of WMWprob (in their notation, p'' = 1 WMWprob) and WMWodds. Examples 6.1 and 6.2 revisit their Counterexamples 1 and 4. thus showing how to get effective confidence intervals and congruent p-values.
Both SAS and base R have implemented the Hodges-Lehmann method to obtain an estimate and CI for the median difference (Example 7). However, such results are only credible when Y1 and Y2 conform to a model in which their distributions have identical shapes and spread. This is rarely tenable, and violating this model can lead to bizarre results even with infinite sample sizes. In short, the method lacks criterion robustness.
Both SAS and base R have implemented the Hodges-Lehmann method to obtain an estimate and CI for the median difference (Example 7). However, such results are only credible when Y1 and Y2 conform to a model in which their distributions have identical shapes and spread. This is rarely tenable, and violating this model can lead to bizarre results even with infinite sample sizes. In short, the method lacks criterion robustness.
Stress testing using Monte Carlo experimentation. The simulation studies carried out in Example 8 amply support the worthiness of Mee's CI method, which is why I made it the default.