I am attempting to merge two or more columns in an automated way in some survey data. Example data:
data <- data.frame("Q1: What is your gender?" = c("Male","Male",NA,NA,"Male"),
"Q1: What is your gender?" = c(NA,NA,"Female","Female",NA),
"Q2: Where do you live?" = c("North","North",NA,NA,NA),
"Q2: Where do you live?" = c(NA,NA,NA,NA,"South"),
"Q2: Where do you live?" = c(NA,NA,NA,"West",NA),
"Q2: Where do you live?" = c(NA,NA,"East",NA,NA))
data[] <- lapply(data, as.character)
And this is what I want to achieve:
data.wanted <- data.frame("Q1: What is your gender?" = c("Male","Male","Female","Female","Male"),
"Q2: Where do you live?" = c("North","North","East","West","East"))
data.wanted[] <- lapply(data.wanted, as.character)
Each respondent only has one non-NA
response per question. I had a look at (amongst others) Merging two columns into one in R, but can't figure out how to use coalesce
across many questions which may represent a varying number of columns. I could do for each question:
data["Q1"] <- coalesce(data[,1],data[,2])
data["Q2"] <- coalesce(data[,3],data[,4],data[,5],data[,6])
i.e. the manual way. However, since I have many questions, each following the above structure, I am really looking for an automated way to do this, through either looping, where I refer to the column names through grep
or some alternative method.
Any suggestions are much appreciated!