I have the following data:
set.seed(3)
library(data.table)
library(lme4)
a <- rep(1:5, times = 20)
b <- rep(c(1,1,1,1,1,2,2,2,2,2), times = 50)
ppt <- rep(101:110, each = 10)
item <- rep(1:10, times = 10)
dv <- rnorm(n = 100)
contrasts(data$a) = contr.sum(4)
data <- data.table(cbind(ppt, item, a, b, dv))
data$ppt <- as.factor(data$ppt)
data$item <- as.factor(data$item)
data$a <- as.factor(data$a)
data$b <- as.factor(data$b)
I would like to get a coefficient for each level of a. u/omsa_d00d and u/dead-serious pointed me to the idea of running a model without an intercept.
If I run this model:
m1 <- lmer(dv ~ a + b -1 +(1|ppt) + (1|item), data = data)
I get coefficients for each level of a.
However if I run this model in which b comes first:
m2 <- lmer(dv ~ b + a -1 +(1|ppt) + (1|item), data = data)
I get coefficients for each level of b, but not a.
What exactly is happening in each case?
Additionally, is running m1 sufficient to get an effect of each level of a compared to the grand mean, while also controlling for b?
Does it matter if I mean centre my predictors first?
What are the different implications of dummy vs. sum coding factor a?