Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 206126

How can I perform constrained randomization in R, nested by custom number of study factors?

$
0
0

In short: I want to achieve randomization of samples in a nested manner, given an arbitrary number of factors.

Background In several types of biochemical analysis there is systematic bias in response due to e.g., instrumental drift or sample degradation. Hence the need to randomize analytical sequence order. However, we also wish to keep samples from the the same individual and e.g. Treatment close together to minimize drift between these samples. Thus, we can employ constrained randomization:

  1. We first randomize between individuals
  2. For each individual, we randomize between treatments
  3. For each treatment, we randomize different time points

Some pseudocode for a three-factor case:

constRandom <- function(metadata, factors) {
 for (i in factors[1]) {
    sample(factors[j]) # random order of the 2nd factor within the 1st factor
    for (j in factors[2]) {
      sample(factor[3]) # random order of the 3rd factor within the 2nd factor
    }
 }
}

I have looked at Constrained Randomization of data frame in R which makes nice code for constrained/nested randomization for two study factors.

The first issue is that I want to be able to perform this for >2 factors (sometimes I have 4 or even 5 such factors). I have coded a (less tidyverse) example for a three factor case:

constRand3 <- function(metadata, factors) {
 newMeta <- metadata # set up df for constrained randomized data
 uniq1 <- unique(metadata[, factors[1]]) %>% sample() # extract random order of 1st factor
 n1 <- length(uniq1) # Number of levels in 1st factor
 nSamp <- 0 # Index of where in newMeta to put in random samples
 # Loop thru randomized levels of 1st factor
 for (i in 1: n1) {
    level1 <- uniq1[i]
    meta1 <- metadata[metadata[,factors[1]]==level1,] # extract samples at the ith level of factor 1
    uniq2 <- unique(meta1[,factors[2]]) %>% sample() # extract random order of 2nd factor within 1st factor
    n2 <- length(uniq2)
    # Loop thru randomized levels of 2nd factor
    for (j in 1:n2) {
      level2 <- uniq2[j]
      meta2 <- meta1[meta1[,factors[2]]==level2,] # extract samples at the jth level of factor 2
      uniq3 <- unique(meta2[,factors[3]]) %>% sample() # extract random order of 2nd factor within 1st factor
      newMeta[(nSamp+1):(nSamp+nrow(meta2)),] <- meta2[match(meta2[,factors[3]], uniq3) %>% order(),]
      nSamp <- nSamp+nrow(meta2)
    }
 }
 return(newMeta)
}

Which seems to perform well, although I'm sure a more tidyverse solution would look nicer (tips on coding solutions and style are more than welcome):

# Make synthetic data
n_ID <- 3
n_Trt <- 3
n_Time <- 3
df <- data.frame(sample=paste0('s',1: (n_ID*n_Trt*n_Time)), ID=paste0('ID', rep(1: n_ID, each=n_Trt*n_Time)), Trt=rep(paste0('Trt', LETTERS[1: n_Trt]), each=n_Trt), time=paste0('t',1: n_Time))
# Perform constrained randomization
constRand3(df,2:4)

 sample ID Trt time
1 s21 id3 TrtA t3
2 s20 id3 TrtA t2
3 s19 id3 TrtA t1
4 s25 id3 TrtC t1
5 s27 id3 TrtC t3
6 s26 id3 TrtC t2
7 s23 id3 TrtB t2
8 s22 id3 TrtB t1
9 s24 id3 TrtB t3
10 s6 id1 TrtB t3
11 s5 id1 TrtB t2
12 s4 id1 TrtB t1
13 s2 id1 TrtA t2
14 s1 id1 TrtA t1
15 s3 id1 TrtA t3
16 s9 id1 TrtC t3
17 s7 id1 TrtC t1
18 s8 id1 TrtC t2
19 s11 id2 TrtA t2
20 s12 id2 TrtA t3
21 s10 id2 TrtA t1
22 s18 id2 TrtC t3
23 s16 id2 TrtC t1
24 s17 id2 TrtC t2
25 s14 id2 TrtB t2
26 s13 id2 TrtB t1
27 s15 id2 TrtB t3

We see that all samples for an individual are kept together (but in random order between individuals). All samples per treatment are kept together (but in random order between treatments) and timepoints per treatment are in random order.

The second and main issue is that I want to be able to generalize this into a function where I can specify a user-defined number of randomization factors, such that I can call:

constRand(df, 1)

To perform a complete randomization on sample level only. Or:

constRand(df, 2:3)

To randomize between individuals and treatments but leave timepoint order intact. Or:

constRand(otherDF, c(4,2,5,7))

To randomize by four factors in a specific order (I hope you'll forgive that the last is not a reproducible example).

But I don't know how to construct a recursive loop or split to accomodate for a user-defined number of factors. Using my for-loop approach, I don't see how I can nest a dynamic amount of levels. Perhaps something using eval/do.call or tidyverse with splits?


Viewing all articles
Browse latest Browse all 206126

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>