Toothpics()
Example 2: Log-scaling works!
Example 3: Other than log-scaling
Example 4: How large can N be in a group?
Example 5: A 2 x 2 factorial design
Example 2: Log-scaling works!
Example 3: Other than log-scaling
Example 4: How large can N be in a group?
Example 5: A 2 x 2 factorial design
Example 1: Basic Ideas and Comparisons
By the start of the 21st century, a major journal like the New England Journal of Medicine should have become a model with respect to publishing quality statistical graphics. Yet, let us consider Ramakrishna et al. Amylase-resistant starch plus oral rehydration solution for cholera. NEJM 2000; 342:308-13.
From the abstract:
METHODS. We randomly assigned 48 adolescents and adults with cholera to
treatment with standard oral rehydration therapy (16 patients), standard
therapy and 50 g of rice flour per liter of oral rehydration solution (16
patients), or standard therapy and 50 g of high-amylose maize starch, an
amylase-resistant starch, per liter of oral rehydration solution (16
patients). The primary end points were fecal weight (for every 12-hour period
during the first 48 hours after enrollment) and the length of time to the
first formed stool.
Figure 1 in the article displays the sample means and standards deviations for the three treatment groups.
This classic "detonator plot" displays only 6 numbers (3 means and 3 SDs). We are left to imagine how the individual 16 raw values in each group were distributed. So, in a clinical study with only 48 cholera patients, we cannot see what their times were or how much they overlapped group to group. This is hardly a bygone problem; see Example 2.
I will spare you my criticism about how this figure's caption presents P-values but no estimates or confidence intervals.
The Toothpics() Rfunc file contains the code I used to generate a dataset that conforms to the results shown in Figure 1 of the Ramakrishna, et al. article. The values were jittered with
TimeFirstStool.jit <- FitterJitter(Y=TimeFirstStool, Groups=Treatment)$y.jit
The window I used for these plots came from running
PlotDirector(PlotSize=list(w=6.5, h=4.5), CloseOld=TRUE)
Example 1a
Toothpics() statements, Graphic 1a
TimeFirstStool.jit <- FitterJitter(Y=TimeFirstStool, Groups=Treatment)$y.jit
The window I used for these plots came from running
PlotDirector(PlotSize=list(w=6.5, h=4.5), CloseOld=TRUE)
Example 1a
Toothpics() statements, Graphic 1a
Toothpics(
Title="Fictionalized Data Based on\nFigure 1 in Ramakrishna et al. (2000)",
Y=TimeFirstStool.jit, # obtained using FitterJitter()
Group=Treatment,
GroupLevels=c("Standard","RiceFlour","ResistantStarch"),
RawY=TimeFirstStool,
YLabel="Time to First Formed Stool (hr)",
YLabelMoveRight = 0.25,
# YAxisLimits=c(0,175), # to anchor the Y-axis at 0 hours
# YTicksAt=seq(0,175,25),
GroupLabel="Supplement to Standard Therapy",
GroupLabelMoveUp = -0.3,
GroupNames=c("Standard Only","Rice Flour","Resistant Starch"),
Quantiles=c(0.50, 0.80),
TPThickness=1.5,
RelFontSize=c(1, 1, 1, 1, 1, 1.2)
)
Graphic 1a
Example 1b
Graphic 1a displays all 48 observations. For each group, we have the means and their 95% confidence intervals, and the median and the 0.80 quantiles. The scale of Y is linear. The Y-axis depicts the range of Y over the whole sample, a Tufte innovation called a range-frame. The graphic has the minimalist look I like so much about Tufte's principles. The text discussion about the graphic would contain estimates of effect sizes and their confidence intervals. If you still like P-values--or some archaic journal policy requires them--then include those, too. The same kind of graphic would support a Bayesian analysis, too.
Note that the specifications for YAxisLimits= and YTicksAt= are commented to suppress their execution. Uncommenting them anchors the Y-axis at 0.00 hours, a practice preferred--for sound reasons--by many for a problem like this. Here, however, I argue that keeping the Y-axis as a range-frame (default) maximizes the amount of space to separate the toothpicks, thus giving us a better picture of how the individual Y values differ by group without--in this case, anyway--exaggerating those differences by not having the 0.00 anchor. The main point is to think about it.
Suppose the focal question was about how much the dietary supplements reduced the 0.80 quantile, Q(0.80). The changes and additions to the code producing Graphic 1a are given in bold purple.
Toothpics() statements, Graphic 1b
Toothpics(
Title=paste("Comparing 0.80 Quantiles, Q(0.80)\n",
"(fictionalized from Ramakrishna et al., 2000)"),
Y=TimeFirstStool.jit, # obtained using FitterJitter()
Group=Treatment,
GroupLevels=c("Standard","RiceFlour","ResistantStarch"),
RawY=TimeFirstStool,
YLabel="Time to First Formed Stool (hr)",
YLabelMoveRight = 0.25,
GroupLabel="Supplement to Standard Therapy",
GroupLabelMoveUp = -0.3,
GroupNames=c("Standard Only","Rice Flour","Resistant Starch"),
PlotMeans=FALSE,
Quantiles=c(0.80),
PlotQCIs=0.95,
QuantileColor="red",
TPThickness=1.5,
RelFontSize=c(1, 1, 1, 1, 1, 1.2)
)
Graphic 1b
Red was used to draw attention to the estimates and 95% confidence intervals for the 0.80 quantiles, which are obtained using the Harrell-Davis method coupled with the bias-corrected accelerated (BCa) percentile bootstrap algorithm of Efron and Tibshirani, as described for the Rfunc HDquantile().
Example 1c
Redo the Toothpics() plot above with log-scaling of the Y-axis. Why consider this? Time is a measure that may not be linearly related to the biological phenomena being studied. In this case, ask: Is the biological outcome difference between 40 and 80 hours the same as between 100 and 140 hours? If 80 vs. 40 is better thought of as 80/40 = 2.0 (twice as long), and 140 vs. 100 as 140/100 = 1.40 (only 40% longer), then log-scaling is appropriate. Also, log(Y) may be more Normally distributed than Y, a benefit when using ANOVA, but not a critical concern unless the sample sizes are quite small. An ANOVA-type analysis on log(Y) can be viewed as comparing the groups' geometric means. As taught routinely in introductory statistics, this lessens the influence of the largest values when estimating central tendency.
For the log scaling,
TimeFirstStool.logjit <- FitterJitter(Y=TimeFirstStool, Groups=Treatment, LogY=TRUE)$y.jit
Toothpics() statements, Graphic 1c
Toothpics(
Title=paste("Comparing Geometric Means\n",
"(fictionalized from Ramakrishna et al., 2000)"),
Y=TimeFirstStool.logjit, # obtained using FitterJitter
Group=Treatment,
GroupLevels=c("Standard","RiceFlour","ResistantStarch"),
RawY=TimeFirstStool,
LogY=TRUE,
YLabel="Time to First Formed Stool\n(hr; log scaling)",
YAxisLimits=c(25,175),
YTicksAt= seq(25,175,25),
GroupLabel="Supplement to Standard Therapy",
GroupLabelMoveUp = -0.3,
GroupNames=c("Standard Only","Rice Flour","Resistant Starch"),
MeanColor="red",
Quantiles=NA,
TPLength=0.9,
TPThickness=1.5,
RelFontSize=c(1, 1, 1, 1, 1, 1.2)
)
Graphic 1c
Note that I forced the Y-axis to have limits of 25 and 175. In this case, using range-frame limits produced such a large, empty gap below 50 that it was difficult to tell what the lower values were.
If these graphics are not clearly superior to the detonator plot of Ramakrishna, et al., then I need to quit working on The Rfunc Project, so I can play more golf.
Comparison #1: R's boxplot() function
Tufte promoted John Tukey's box plot and suggested cool ways to increase the data-ink ratio. But how well do box plots actually show the data? Consider this basic use of R's boxplot() function.
The Toothpics() Rfunc file contains the code that created .TimeFirstStool and .Treatment.
Statements to produce box plot ... abstruse!
PlotDirector(PlotSize=list(w=6.5, h=5), CloseOld=TRUE)
par(mar=c(5, 7, 6, 2) + 0.1, mgp=c(2.3, 0.7, 0.5), bty="n")
boxplot(.TimeFirstStool ~ .Treatment, names=
c("Standard Only","Rice Flour","Resistant Starch"),
yaxt="n", xaxt="n", yaxs="i", ylim = c(20,180))
title(paste("Basic Use of R's boxplot()\n",
"(data fictionalized from Ramakrishna et al., 2000)"))
axis(side=1,tck=0.00, at=1:3,
labels=c("Standard Only","Rice Flour","Resistant Starch"), lwd=0,
cex.axis=1)
axis(side=2,at=seq(25,175,25), cex.axis=0.85, las=1)
mtext("Supplement to Standard Therapy", side=1, line=2.5, cex=1.25)
mtext("Time to First Formed Stool (hr)", side=2, line=3, cex=1.25)
Take a moment to see all the abstruse R graphics specifications I used to produce this plot, such as
par(mar=c(5, 7, 6, 2) + 0.1, mgp=c(2.3, 0.7, 0.5), bty="n")
This indicates what Toothpics() handles "automagically" behind the scenes--and why I may never "finish" developing it.
Box plot produced
This is not the place to discuss how well box plots satisfy Tufte's principles. They certainly display more information than detonator plots, but Toothpics() communicates much more with no increase in ink.
Comparison #2: ggplot2's dot plot
Hadley Wickham's ggplot2 R package is probably the most popular of such tools today. Let's see how its dot plot tool handles this problem.
ggplot2 statements to produce a dot plot
notNA <- !(is.na(.Treatment) | is.na(.TimeFirstStool))
dframe <- data.frame(.Treatment, .TimeFirstStool)[notNA,]
PlotDirector(PlotSize=list(w=6.5, h=5), CloseOld=TRUE)
require(ggplot2)
ggplot(dframe, aes(x = .Treatment, y = .TimeFirstStool)) +
geom_dotplot(binaxis = "y", stackdir = "centerwhole", binwidth=5)
While I trust that a ggplot2 expert could improve on what I did here, the many ggplot2 dot plots I've seen have the same basic look.
Dot plot produced by ggplot2
Dot plots adhere better to Tufte's principles. I've seen a few statistics-savvy biologists do a fine job summarizing their entire study with one well-crafted dot plot. However, this ggplot2 tool partitions all the values into bins. Notice the two smallest observations in the Resistant Starch group (blue arrow). The values are 38.0 and 35.0, but here they appear as ties. One can reduce the binwidth= value, but this can make the data symbols excessively tiny.
Notice the same two observations in the Toothpics() plots. With no reason to cluster or jitter such values, FitterJitter() and Toothpics() worked to display these two observations distinctly and accurately. FitterJitter() only jitters when necessary and does so minimally.
Also, even though the shading and grid lines in the ggplot2 graphic are muted enough as to not be bad chart junk, they still unnecessarily compete with the data and reduce the data-to-ink ratio. A ggplot2 fan told me he easily corrects this, but few people seem to.