I have a clinical dataset which is made of subject IDs as rows and different variables as columns. I wanted to make a prediction model and split my data into testing and training data appropriately. I built a logistic regression model but for some reason the summary output for the fit is showing me the subject IDs as coefficients instead of the columns/variables.
This is what the dataset looks like:
subjectkey sex height weight interview_age flanker_score cardsort_score intbehaviour_score
NDAR_INV09AUXBBT M 59.00000 104.00000 118 107 109 GOOD
NDAR_INV0BVP2PTD F 50.25000 60.00000 120 92 103 GOOD
NDAR_INV0CV2Y4YR M 55.30000 97.00000 120 83 94 BAD
NDAR_INV0X45NBYM M 63.50000 104.50000 128 101 103 BAD
This is the code I'm using to fit the model:
data.train.glm <- glm(intbehaviour_score~., data = data.train, family = binomial)
#summary of fit
summary(data.train.glm)
This is the output I'm getting:
Call:
glm(formula = intbehaviour_score ~ ., family = binomial, data = data.train)
Deviance Residuals:
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[34] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[67] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[100] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[133] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[166] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[199] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[232] 0 0 0 0
Coefficients: (11 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.657e+01 3.561e+05 0 1
subjectkeyNDAR_INV0BVP2PTD -5.916e-13 5.036e+05 0 1
subjectkeyNDAR_INV0CV2Y4YR 5.313e+01 5.036e+05 0 1
subjectkeyNDAR_INV0X45NBYM 5.313e+01 5.036e+05 0 1
subjectkeyNDAR_INV10EP1VM2 -6.084e-13 5.036e+05 0 1
I don't understand why the subject IDs are coming up as the coefficients and not the variables.