WMW(): Beyond the Basics
Does the WMW method compare medians? The Hodges-Lehmann (1963) method to compare two medians is frequently associated with the WMW method. However, while the methods have one aspect in common, they are not functionally linked. Critically, the Hodges-Lehmann CI is only valid when the population distributions of Y1 and Y2 have identical shapes and spread, perhaps differing only in location, which hardly ever reflects reality. When this location-shift model is violated, the Hodges-Lehmann method can behave illogically, even with infinitely large sample sizes. Divine, Norton, Baron, and Juarez-Colunga (2018) discussed in depth why "The Wilcoxon-Mann-Whitney Procedure Fails as a Test of Medians," and they stress the effectiveness of WMWprob (p'', in their notation) and WMWodds to use the WMW method correctly. Example 6a is taken from their Counterexample 1; Example 6b from their Counterexample 4. WMW() goes well beyond what Divine, et al. presented.
Complete separation of Y1 and Y2. In rare cases, WMWprob is 0.0 or 1.0, because all Y values in one group exceed all those in the other group. If so, the CI methods degenerate. To produce an acceptable solution, WMW() tweaks the dataset minimally to give it exactly one tied (Y1, Y2) pair but no other overlap, which tweaks WMWprob to 0.50/(n1*n2) or 1 - 0.50/(n1*n2). The reported WMWprob remains 0.0 or 1.0, but the two-sided CI has the form [0.0, UCL] or [LCL, 1.0]. Because any requested p-value is based on this CI, the two are still congruent.
Study planning: Monte Carlo studies with WMW(). Developing excellent study protocols requires an honest and thorough assessment to demonstrate that the proposed statistical methods are likely to perform satisfactorily in the given situation using the chosen sample size. Too often, such work (including ordinary "power analysis") employs asymptotic results and/or on distributional assumptions for the data that poorly match the study being planned. As demonstrated in Example 8, the Mee CI for WMWprob performs well even under extreme conditions.. While theoretical purists may grumble when a 95% CI has an "inflated" true coverage around 0.96, working statisticians know that most methods in their everyday toolboxes are similarly imperfect, but nevertheless satisfactory. Through Monte Carlo simulation, any function for data analysis can also serve as the core tool for study planning. Example 4 illustrates using WMW() to assess studies with three different missions:
- Finding that Y1 and Y2 are essentially different, the most common goal.
- Finding that Y1 is not essentially "inferior" (less than) to Y2, a non-inferority study.
- Finding that Y1 and Y2 are essentially equal, an equivalence study.
Plotting the data to align with a WMW analysis. The page "Qscores: Showing the Data to Visualize WMWprob" and Example 5 delineate how to plot the individual data points in a manner consistent with the estimate and standard error of WMWprob.
Why does WMW() include Newcombe's Method 3 but not his Method 5? Newcombe's (2006) excellent work on obtaining CIs for WMWprob was of great help in developing WMW(), but I fault his recommendation of Method 5 over his Methods 3 and 6 (Mee's). See the page "On Newcombe's Method 5."
Area under the receiver operator curve (AUC). WMWprob is identical to computing the area under the receiver operator curve (AUC), a methodology used to summarize sensitivity versus specificity in signal detection and diagnostic testing studies. Example 1 revisits the well-known medical imaging example introduced by Hanley and McNeil (1982).
Generalized odds ratio. In those rarest of situations where Y1 and Y2 cannot be tied, WMWodds is equivalent to Agresti's (1980) generalized odds ratio,
GOR = Prob[Y1 > Y2]/Prob[Y1 < Y2].
In Example 2, 39,781 pairs have Y1 > Y2 and 23,552 have Y1 < Y2, so the sample GOR is 39781/23552 = 1.69. However, this ignores the 32,139 ties, which clearly indicate substantial stochastic similarity. Accordingly, ties shrink WMWodds towards 1.00; in this case it is (39781 + 32139/2)/(23552 + 32139/2) = 1.41.
Consider an extreme case having 100 Y1 > Y2 pairs, 5 Y1 < Y2 pairs, and 1000 ties. The GOR value is 100/5 = 20, and WMWodds is (100 + 1000/2)/(5 + 1000/2) = 1..18.
GOR = Prob[Y1 > Y2]/Prob[Y1 < Y2].
In Example 2, 39,781 pairs have Y1 > Y2 and 23,552 have Y1 < Y2, so the sample GOR is 39781/23552 = 1.69. However, this ignores the 32,139 ties, which clearly indicate substantial stochastic similarity. Accordingly, ties shrink WMWodds towards 1.00; in this case it is (39781 + 32139/2)/(23552 + 32139/2) = 1.41.
Consider an extreme case having 100 Y1 > Y2 pairs, 5 Y1 < Y2 pairs, and 1000 ties. The GOR value is 100/5 = 20, and WMWodds is (100 + 1000/2)/(5 + 1000/2) = 1..18.