Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 209819

split strings in n columns of a data frame in R

$
0
0

I have the following data frame (actual data has a larger number of columns):

df <- data.frame(
l1=c(ind1='000000',ind2='100100'), 
l2=c(ind1='200204',ind2='124124'), 
l3=c(ind1='400204',ind2='124124'))

In R I would like to split each column into two of length 3. Column names don't matter as long as the original order is conserved. My desired output therefore is:

ind1 000 000 200 204 400 204
ind2 100 100 124 124 124 124

I did find some pointers as to how this could work so I made a function based on one of the answers found in this SO post.

splitGT <- function(x) {
  return(strsplit(x, "(?<=.{3})", perl=TRUE)[[1]])
}

While this does the splitting correctly, the result when applying it to the dataframe is an array separated by the original columns:

apply(df, c(1,2), splitGT)

, , l1

     ind1  ind2 
[1,] "000""100"
[2,] "000""100"

, , l2

     ind1  ind2 
[1,] "200""124"
[2,] "204""124"

, , l3

     ind1  ind2 
[1,] "400""124"
[2,] "204""124"

I managed to get past this with adply but this produced a data frame with two lines per ind and one column per original column. While this is closer to what I need I feel like I am missing something very obvious as this appears way too complicated to me.

adply(apply(df, c(1,2), splitGT), c(1, 2))

  X1   X2    l1     l2     l3
1  1 ind1    000    200    400
2  2 ind1    000    204    204
3  1 ind2    100    124    124
4  2 ind2    100    124    124

Viewing all articles
Browse latest Browse all 209819

Trending Articles