I want to do block randomize my data into 3 arms with respect to both gender and smoking status as best as possible.
Here is some simulated data similar to my actual data. Note that males & females and smokers & non-smokers are unevenly sampled.
set.seed(33)
mydata <- data.frame("gender"=rep(c("female", "male"), times=c(40,10)),
"smoker"=rep(c("yes", "no"), each=50),
"measurement"=rnorm(n=50, mean=15, sd=3),
"outcome of interest"= rep(c("positive", "negative"), times=c(20,30)))
head(mydata)
# gender smoker measurement outcome.of.interest
# 1 female yes 12.309256 positive
# 2 female yes 15.554548 positive
# 3 female yes 19.763536 positive
# 4 female yes 11.608873 positive
# 5 female yes 14.759245 positive
# 6 female yes 15.39726 positive
I found the randomizr
package useful for randomizing according to 1 variable, but I get unbalanced distribution of the other:
set.seed(2)
library(randomizr)
Z <- block_ra(blocks = mydata[,"gender"], num_arms = 3)
table(Z, mydata$gender)
# Z female male
# T1 26 7
# T2 27 6
# T3 27 7
table(Z, mydata$smoker)
# Z no yes
# T1 17 16
# T2 13 20
# T3 20 14
Z <- block_ra(blocks = mydata[,"smoker"], num_arms = 3)
table(Z, mydata$smoker)
# Z no yes
# T1 17 17
# T2 17 16
# T3 16 17
table(Z, mydata$gender)
# Z female male
# T1 29 5
# T2 24 9
# T3 27 6
How can I block randomize according to 2 or more parameters?