I have a data frame which looks like:
cat split_me
<chr> <chr>
1 MVHYM7693B c(1, 7)
2 ZRRBS1363E c(2, 7, 18)
3 ZXYLV2407F 3
4 HXPPE8608M 4
5 JDARX0644Q c(5, 19)
6 HDBOK8136L 6
7 DCJPS0833K c(1, 2, 7, 18)
I can use the following to split the data:
splt <- to_split %>%
split(.$split_me)
Which gives me a list of 19 elements. However the original data had 20 elements. The split which repeats itself is split c(5, 19)
. How can I ignore this repetition and split c(5, 19)
twice?
I want to name the splits according to the cat
column in to_split
Therefore c(5, 19)
will have different names (JDARX0644Q
) and (BZRXF3978Z
).
Data:
to_split <- structure(list(cat = c("MVHYM7693B", "ZRRBS1363E", "ZXYLV2407F",
"HXPPE8608M", "JDARX0644Q", "HDBOK8136L", "DCJPS0833K", "UGDYS1458B",
"ROQIP3617B", "HZMGG4347S", "EHESH8836T", "YGXZY0073I", "NMRDZ9798F",
"WXBKD9937H", "JEMQK6388P", "QQMSV0889M", "IBMJM4467Q", "IOIDB2993Q",
"BZRXF3978Z", "NJLNW3044Z"), split_me = c("c(1, 7)", "c(2, 7, 18)",
"3", "4", "c(5, 19)", "6", "c(1, 2, 7, 18)", "8", "9", "10",
"11", "12", "c(13, 18)", "14", "15", "16", "17", "c(2, 7, 13, 18)",
"c(5, 19)", "20")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-20L))
EDIT:
What I think is best for my data is to re-order the split_me
column. At the moment the data looks like:
# A tibble: 20 x 2
cat split_me
<chr> <chr>
1 MVHYM7693B c(1, 7)
2 ZRRBS1363E c(2, 7, 18)
3 ZXYLV2407F 3
4 HXPPE8608M 4
5 JDARX0644Q c(5, 19)
6 HDBOK8136L 6
7 DCJPS0833K c(1, 2, 7, 18)
8 UGDYS1458B 8
9 ROQIP3617B 9
10 HZMGG4347S 10
11 EHESH8836T 11
12 YGXZY0073I 12
13 NMRDZ9798F c(13, 18)
14 WXBKD9937H 14
15 JEMQK6388P 15
16 QQMSV0889M 16
17 IBMJM4467Q 17
18 IOIDB2993Q c(2, 7, 13, 18)
19 BZRXF3978Z c(5, 19)
20 NJLNW3044Z 20
Here 1
in the c(1, 7)
corresponds to row 1 of the data and 7
corresponds to row 7
. I think I should re-arrange the column such that:
row 2
does not change. I.e. c(2, 7, 18)
- the 2
is first and thus does not need to change. Row 5
also does not change since the 5
in the c(5, 19)
is first and matches the row number 5
.
Row 7
changes. Originally it is c(1, 2, 7, 18)
however 7
is the third in the sequence, I want to move it to the first. So c(7, 1, 2, 18)
.
Row 13
does not change.
Row 18 changes - original c(2, 7, 13, 18)
desired output c(18, 2, 7, 13)
Row 19 changes - original c(5, 19)
desired output c(19, 5)
.
This should fix the duplicate issue and problems I have later on with the code. For example I am trying to get it such that for splits c(5, 19)
: the data contained in 5
will be arranged
or above the data in 19
(in a data frame). Then for split c(19, 5)
the data in 19
will be above the data in 5
. (I hope this makes sense)