Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201977

R: Encoding tickbox answers from surveys to dummy variables

$
0
0

Google Forms outputs the survey answers for tickbox questions (i.e. multiple answers can be ticked in a single question) as a single variable with the ticked answers separated by a semicolon.

I now want to turn this compound variable into several dummy variables/columns for each unique answer. I.e. I start from

df$question <- c("answer1; answer2", "answer1; answer3", "answer2; answer3")

and I want to arrive at:

df$answer1 <- c(1, 1, 0) #i.e. observations 1 and 2 ticked answer 1, observation 3 didn't
df$answer2 <- c(1, 0, 1)

So far, I have separated the string variable into unique values by using separate_rows() from the tidyverse. This has produced multiple rows per observation, each containing only 1 of the answers

df$question <- c("answer1", "answer2", "answer1", "answer3", "answer2", "answer3")

Then, I have spread this column into dummies using model.matrix:

dummies_personalpl <- model.matrix( ~ personalpl - 1, data = survey_factors_personalpl)

Now, I have the 6 dummy columns I wanted, but I still need to get rid of the additional rows per column that it produced in the process. In my example above, observation 1 would have 2 rows since it has produced 2 unique answers, but I of course only want to have 1 row with the dummies of the ticked answers being 1.

Is my approach valid? Is there a shorter way to extract the unique answers from the original variable and turn them into dummy variables/columns right away without having to spread them over several rows per observation?

Thanks in advance, Daniel


Viewing all articles
Browse latest Browse all 201977

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>