Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201867

split a data frame with duplicate splits and name the new list

$
0
0

I have a data frame which looks like:

   cat        split_me       
   <chr>      <chr>          
 1 MVHYM7693B c(1, 7)        
 2 ZRRBS1363E c(2, 7, 18)    
 3 ZXYLV2407F 3              
 4 HXPPE8608M 4              
 5 JDARX0644Q c(5, 19)       
 6 HDBOK8136L 6              
 7 DCJPS0833K c(1, 2, 7, 18) 

I can use the following to split the data:

splt <- to_split %>% 
  split(.$split_me)

Which gives me a list of 19 elements. However the original data had 20 elements. The split which repeats itself is split c(5, 19). How can I ignore this repetition and split c(5, 19) twice?

I want to name the splits according to the cat column in to_split Therefore c(5, 19) will have different names (JDARX0644Q) and (BZRXF3978Z).

Data:

to_split <- structure(list(cat = c("MVHYM7693B", "ZRRBS1363E", "ZXYLV2407F", 
"HXPPE8608M", "JDARX0644Q", "HDBOK8136L", "DCJPS0833K", "UGDYS1458B", 
"ROQIP3617B", "HZMGG4347S", "EHESH8836T", "YGXZY0073I", "NMRDZ9798F", 
"WXBKD9937H", "JEMQK6388P", "QQMSV0889M", "IBMJM4467Q", "IOIDB2993Q", 
"BZRXF3978Z", "NJLNW3044Z"), split_me = c("c(1, 7)", "c(2, 7, 18)", 
"3", "4", "c(5, 19)", "6", "c(1, 2, 7, 18)", "8", "9", "10", 
"11", "12", "c(13, 18)", "14", "15", "16", "17", "c(2, 7, 13, 18)", 
"c(5, 19)", "20")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-20L))

EDIT:

What I think is best for my data is to re-order the split_me column. At the moment the data looks like:

# A tibble: 20 x 2
   cat        split_me       
   <chr>      <chr>          
 1 MVHYM7693B c(1, 7)        
 2 ZRRBS1363E c(2, 7, 18)    
 3 ZXYLV2407F 3              
 4 HXPPE8608M 4              
 5 JDARX0644Q c(5, 19)       
 6 HDBOK8136L 6              
 7 DCJPS0833K c(1, 2, 7, 18) 
 8 UGDYS1458B 8              
 9 ROQIP3617B 9              
10 HZMGG4347S 10             
11 EHESH8836T 11             
12 YGXZY0073I 12             
13 NMRDZ9798F c(13, 18)      
14 WXBKD9937H 14             
15 JEMQK6388P 15             
16 QQMSV0889M 16             
17 IBMJM4467Q 17             
18 IOIDB2993Q c(2, 7, 13, 18)
19 BZRXF3978Z c(5, 19)       
20 NJLNW3044Z 20

Here 1 in the c(1, 7) corresponds to row 1 of the data and 7 corresponds to row 7. I think I should re-arrange the column such that:

row 2 does not change. I.e. c(2, 7, 18) - the 2 is first and thus does not need to change. Row 5 also does not change since the 5 in the c(5, 19) is first and matches the row number 5.

Row 7 changes. Originally it is c(1, 2, 7, 18) however 7 is the third in the sequence, I want to move it to the first. So c(7, 1, 2, 18).

Row 13 does not change. Row 18 changes - original c(2, 7, 13, 18) desired output c(18, 2, 7, 13) Row 19 changes - original c(5, 19) desired output c(19, 5).

This should fix the duplicate issue and problems I have later on with the code. For example I am trying to get it such that for splits c(5, 19): the data contained in 5 will be arranged or above the data in 19 (in a data frame). Then for split c(19, 5) the data in 19 will be above the data in 5. (I hope this makes sense)


Viewing all articles
Browse latest Browse all 201867

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>