Google Forms outputs the survey answers for tickbox questions (i.e. multiple answers can be ticked in a single question) as a single variable with the ticked answers separated by a semicolon.
I now want to turn this compound variable into several dummy variables/columns for each unique answer. I.e. I start from
df$question <- c("answer1; answer2", "answer1; answer3", "answer2; answer3")
and I want to arrive at:
df$answer1 <- c(1, 1, 0) #i.e. observations 1 and 2 ticked answer 1, observation 3 didn't
df$answer2 <- c(1, 0, 1)
So far, I have separated the string variable into unique values by using separate_rows()
from the tidyverse. This has produced multiple rows per observation, each containing only 1 of the answers
df$question <- c("answer1", "answer2", "answer1", "answer3", "answer2", "answer3")
Then, I have spread this column into dummies using model.matrix
:
dummies_personalpl <- model.matrix( ~ personalpl - 1, data = survey_factors_personalpl)
Now, I have the 6 dummy columns I wanted, but I still need to get rid of the additional rows per column that it produced in the process. In my example above, observation 1 would have 2 rows since it has produced 2 unique answers, but I of course only want to have 1 row with the dummies of the ticked answers being 1.
Is my approach valid? Is there a shorter way to extract the unique answers from the original variable and turn them into dummy variables/columns right away without having to spread them over several rows per observation?
Thanks in advance, Daniel