I've been trying to implement a GLM model using data on the success of a genetic test (yes= successful test; no= unsuccesful test).
> head(dataraw)
success pre during season observer
1: no pre-wet dry winter JvD&OK
2: yes pre-wet dry winter JvD&OK
3: no pre-wet dry winter JvD&OK
4: yes pre-wet dry winter JvD
5: yes pre-wet dry winter JvD
6: yes pre-wet dry winter JvD
Four predictor variables are used to explain the ocurrence of the response variable success
, being pre
(pre-wet
or pre-dry
), during
(wet
or dry
),season
(winter
or fall
) and observer
(up to 10 different observers).
I would like to find which variables are the most important in explaining a succesful test, i.e. success
:yes
.
I've constructed the models following the code below with and without interactions between the different effects, and have chosen the most parsimonious model following a theoretical approach with AIC values:
m1 <- glm((success) ~ pre , data=dataraw , family=binomial)
summary(m1)
plot(allEffects(m1))
AIC(m1)
m2 <- glm((success) ~ during , data=dataraw , family=binomial)
summary(m2)
plot(allEffects(m2))
AIC(m2)
m3 <- glm((success) ~ season , data=dataraw , family=binomial)
summary(m3)
plot(allEffects(m3))
AIC(m3)
m4 <- glm((success) ~ observer , data=dataraw , family=binomial)
summary(m4)
plot(allEffects(m4))
AIC(m4)
m5 <- glm((success) ~ pre*during , data=dataraw , family=binomial)
summary(m4)
plot(allEffects(m4))
AIC(m4)
etc.
I'm unsure whether I'm following the good approach and if my code is correct, specially since I've seen other people use 1
(for yes) and 0
(for no) when using a binomial distribution. Does that matter? Is my dataset dataraw
implemented correctly?
Hope somebody can set me on the right track and I hope this question can be of interest.