I have a function wherein I'm trying to use stat_summary()
to plot the value of the median just above the median line on a geom_boxplot()
. I've reduced my problem and created a toy example to simplify but retain context.
library(ggplot2)
set.seed(20191120)
dat <- data.frame(var = sample(c("a", "b"),
50,
replace = TRUE),
value = rpois(50, 5))
lims <- c(0, 10)
myplot <- function(DATA, YLIMS) {
ggplot(data = DATA,
aes(x = var)) +
geom_boxplot(aes(y = value),
outlier.shape = NA,
coef = 0) +
stat_summary(aes(y = ifelse(value > (YLIMS[2]*0.9), # if median in top10% of plot window
(value - (YLIMS[2]/10)), # put it below bar
(value + (YLIMS[2]/10))), # else put it above
label = round(..y.., 2)), #round(median(value), 2))
fun.y = median,
geom = "text") +
coord_cartesian(ylim = YLIMS)
}
myplot(dat, lims)
My actual plots have several facets, a variety of ranges, and some of the medians are at the top or bottom of the range. As you can see, I've excluded whiskers and outliers. This is where the YLIMS argument comes in to zoom and focus on the boxes and exclude unused plot space. I've used these YLIMS values to also position the label at +/- 10% of the range which works out perfectly.
I tried using the ..y.. value to get the value of the median for the label argument of stat_summary(aes())
but it is instead taking the new value. As you can see from the plot, we'd expect both labels to be "5" but they are instead "6" as that 10% of 10 has been added.
I also tried recalculating the median (as you can see commented out) but that takes a simple median of all the data and doesn't control for groupings/facets/etc.
I know of ways to refactor my code to calculate to create values for the y labels and positions in the data, or by aggregating and using identity with the boxplot, but I'm wondering if there is a way to calculate this in-line like my attempt is close to doing.