I'd like a way to compactly use R's formula notation -- or some other formalism --to include all the quadratic terms between a set of variables A through E, excluding the D:E interaction. (My real problem has a longer list of A - C type variables and D - E type variables)
I wrote a little function to check my work based on this post (Thanks, @Gregor!).
expand_form <- function(FUN){
out <- reformulate(labels(terms(FUN)), FUN[[2]])
out
}
I thought this would do it:
f <- y ~ (A + B + C + D + E)^2 -D:E
>expand_form(f)
y ~ A + B + C + D + E + A:B + A:C + A:D + A:E + B:C + B:D + B:E +
C:D + C:E
<environment: 0x00000218fc153928>
but it does not include the single-variable squared terms. Of course I could just explicitly add those terms as A:A, B:B
, etc. -- or no, actually. I just tried that and it has no effect on the output of expand_form(). And neither does adding A^2, B^2
, etc. terms. Not sure if this is a problem with my formula or with my expand_form() function.
I looked at 5 or six posts on related topics, but none seemed to provide a compact solution in formula notation, which I am assuming exists.
In response to @Maurits Evers' very clear and helpful comments/answer below, I want to clarify my question to more clearly recognize:
- that the thing I want to do is what most people will want to do in certain contexts; and
- that I now recognize that R's standard formula notation, used in its usual way, does not do this.
If you have numeric variables, all your two-way interaction terms are second degree polynomials. In that context, it is clear that if you include interactions between a variable and itself (which you don’t have to do) it is clear that you do not want them interpreted as a second copy of the variable, nor do you want them removed. If that is what you wanted, you just would not include the self-interaction terms. But this is what R’s standard formula notation does: It interprets the interaction of a numeric variable with itself as identical with the variable, and then removes it as redundant. So formulas which include self-interaction are tautologically identical to formulas which don’t. I think that is never the behavior one would prefer in a model where all your variables are numeric.
If always removing self-interaction terms is the behavior you want – and with dummy variables it is – R’s formula notation allows you to express any pattern of interactions, including some very complex patterns, very concisely. But the only way I have found to express patterns of interaction that treat self-interactions as squared terms is to individually write out all the squared terms. This is awkward and verbose and in models with a lot of variables I think it will often lead to error. So it seems reasonable to me that in this context the interaction term should normally be the square.
So the question is, is there any straightforward way to tell R’s formula notation you want to treat self-interaction as squaring, or alternatively, is there any way of concisely expressing such expressions if you can not get the formula notation to do it.
I think this is partially a disciplinary difference. Econometrics is primarily a quasi-experimental field, and we have to take our treatments as we find them. So the treatment effect interpretation of dummy variables does not come as naturally to us.