So I am somewhat new to R and I'm trying to return a column value based on a condition which is proving rather difficult for me to figure out!
I have two databases I'm working with - One is generally 70 000 rows with a single unix time number in each row (lets call it df1). The other provides a start time and finish time (which I have converted to a unix time number) and an activity name completed between the start and finish time for multiple participants (lets say df2).
I have managed to filter df2 to the participant who's data I am using in df1 which looks like this:
head(df2, 5)
Name Period.Name Start.Time End.Time Unix.Start.Time Unix.End.Time
27 Name 1 Period 1 17:59:40 18:11:00 1579075181 1579075860
53 Name 1 Period 2 18:11:59 18:15:13 1579075919 1579076114
79 Name 1 Period 3 18:17:55 18:23:22 1579076275 1579076603
96 Name 1 Period 4 18:24:58 18:31:56 1579076699 1579077116
131 Name 1 Period 5 18:37:45 18:45:30 1579077465 1579077930
and df1 looks like this:
head(df1, 20)
data.point Label Timestamp Name dateCode
1 0 Label 1 1579075180 Name 1 200115
2 1 Label 1 1579075181 Name 1 200115
3 1 Label 1 1579075182 Name 1 200115
4 2 Label 1 1579075183 Name 1 200115
5 2 Label 1 1579075184 Name 1 200115
6 2 Label 1 1579075185 Name 1 200115
7 1 Label 1 1579075186 Name 1 200115
8 1 Label 1 1579075187 Name 1 200115
9 1 Label 1 1579075188 Name 1 200115
10 3 Label 1 1579075189 Name 1 200115
11 3 Label 1 1579075190 Name 1 200115
12 3 Label 1 1579075191 Name 1 200115
13 3 Label 1 1579075192 Name 1 200115
14 4 Label 1 1579075193 Name 1 200115
15 4 Label 1 1579075194 Name 1 200115
16 4 Label 1 1579075195 Name 1 200115
17 2 Label 1 1579075196 Name 1 200115
18 2 Label 1 1579075197 Name 1 200115
19 1 Label 1 1579075198 Name 1 200115
20 0 Label 1 1579075199 Name 1 200115
I am trying to create a new column in df1 which returns the respective period name from df2$Period.Name if the df1$Timestamp value is between df2$Unix.Start.Time and df2$Unix.End.Time to look like this:
data.point Label Timestamp Name dateCode Period
1 0 Label 1 1579075180 Name 1 200115 Null
2 1 Label 1 1579075181 Name 1 200115 Period 1
3 1 Label 1 1579075182 Name 1 200115 Period 1
4 2 Label 1 1579075183 Name 1 200115 Period 1
5 2 Label 1 1579075184 Name 1 200115 Period 1
6 2 Label 1 1579075185 Name 1 200115 Period 1
7 1 Label 1 1579075186 Name 1 200115 Period 1
8 1 Label 1 1579075187 Name 1 200115 Period 1
9 1 Label 1 1579075188 Name 1 200115 Period 1
10 3 Label 1 1579075189 Name 1 200115 Period 1
...
1001 3 Label 1 1579075916 Name 1 200115 Null
1002 3 Label 1 1579075917 Name 1 200115 Null
1003 3 Label 1 1579075918 Name 1 200115 Null
1004 4 Label 1 1579075919 Name 1 200115 Period 2
1005 4 Label 1 1579075920 Name 1 200115 Period 2
1006 4 Label 1 1579075921 Name 1 200115 Period 2
1007 2 Label 1 1579075922 Name 1 200115 Period 2
1008 2 Label 1 1579075923 Name 1 200115 Period 2
1009 1 Label 1 1579075924 Name 1 200115 Period 2
1010 0 Label 1 1579075925 Name 1 200115 Period 2
This is a process completed a few times a week and each time the length of both data frames is different and the time stamp is also different.
I have tried the ifelse function but haven't been able to figure out how to evaluate the df1$Timestamp value across all the df2 unix time points and return the row value from period name where the df1$Timestamp fits.
Thanks in advance!