I have a tibble with numerical and logical variables, e.g. like this:
x f y
<dbl> <int> <dbl>
1 -2 1 -0.801
2 -1.96 0 -2.27
3 -1.92 0 -1.75
4 -1.88 0 -2.44
5 -1.84 1 -0.123
...
For reproducibility, it can be generated using:
library(tidyverse)
set.seed(0)
tb1 = tibble(
x=(-50:50)/25,
p=plogis(x),
f=rbinom(p, 1, p),
y = x+f+rnorm(x, 0, .5)
) %>% select(-p)
I'd like to plot the points and draw regression lines, once taking x
as the predictor and f
as the outcome (logistic regression), and once taking x
and f
as predictors and y
as the outcome (linear regression). This works well for the logistic regression.
ggplot(tb1, aes(x, f)) +
geom_point() +
geom_smooth(method="glm", method.args=list(family="binomial"))
produces:
but:
ggplot(tb1, aes(x, y, colour=f)) +
geom_point() +
geom_smooth(method="lm")
produces:
which is wrong. I want f
treated as a factor, producing two regression lines, and a discrete instead of the continuous-coloured legend. I can force f
manually to a logical value:
tb2 = tb1 %>% mutate(f = f>0)
and obtain the correct linear regression graph:
but now I cannot plot the logistic regression. I get the
Warning message:
Computation failed instat_smooth()
:
y values must be 0 <= y <= 1
For some reason, both lm()
and glm()
have no problems:
summary(glm(f ~ x, binomial, tb1))
summary(lm(y ~ x + f, tb1))
summary(glm(f ~ x, binomial, tb2))
summary(lm(y ~ x + f, tb2))
all produce reasonable results, and the results are identical for tb1
and tb2
, as they should be. So is there a way of convincing geom_smooth
to accept logical variables, or must I use two redundant variables, with identical values but of a different type, e.g. f.int
and f.lgl
?