I asked a question a few days ago which you guys helped me solve and I am forever grateful! However, a new issue has presented itself and I am in need of your help once again!
Here's a link to the original problem: (R) Cumulatively Count Gaps in Sequential Numbers
I was trying to cumulatively counts gaps in sequential numbers for each selection of UniqueIDs. This was my dataset:
UniqueID Month
ABC123 1
ABC123 2
ABC123 3
ABC123 4
ABC123 6
ABC123 7
DEF456 3
DEF456 4
DEF456 10
DEF456 11
DEF456 12
DEF456 14
GHI789 2
GHI789 3
JKL012 12
JKL012 13
JKL012 14
Using your help, I tweaked the code provided from the link above as follows:
data2=data %>%
group_by(UniqueID) %>%
mutate(Skip = if_else(Month - lag(Month, default = first(Month) - 1) - 1 > 0, 1, 0),
CountSkip = cumsum(Skip))
data2 = data2%>%
group_by(UniqueID) %>%
mutate(LastValue = if_else(Month == last(Month), 1, 0))
data2=as.data.frame(data2)
data2$FinalTally=ifelse(data2$LastValue==1 & data2$Month!=14,1,0)
data2$SeqCount=data2$FinalTally+data2$CountSkip
This was the resulting dataset:
UniqueID Month Skip CountSkip LastValue FinalTally SeqCount
ABC123 1 0 0 0 0 0
ABC123 2 0 0 0 0 0
ABC123 3 0 0 0 0 0
ABC123 4 0 0 0 0 0
ABC123 6 1 1 0 0 1
ABC123 7 1 2 1 1 2
DEF456 3 0 0 0 0 0
DEF456 4 0 0 0 0 0
DEF456 10 1 1 0 0 1
DEF456 11 1 1 0 0 1
DEF456 12 1 1 0 0 1
DEF456 14 2 2 1 0 2
GHI789 2 0 0 0 0 0
GHI789 3 0 1 1 1 1
JKL012 12 0 0 0 0 0
JKL012 13 0 0 0 0 0
JKL012 14 0 0 1 0 0
This is what I wanted...or so I thought.
When adding in new data for the next month (15), I edited the second to last line of my code to account for 15 being the new final month. However, I noticed the sum of SeqCount by Month differed from the sum of that same month before the new data was added. I filtered down to one month and found an example of one UniqueID where the SeqCount sum had differed.
Here is an example before the new data was included:
UniqueID Month Skip CountSkip LastValue FinalTally SeqCount
ZZZ999 2 0 0 0 0 0
ZZZ999 3 0 0 0 0 0
ZZZ999 4 0 0 0 0 0
ZZZ999 5 0 0 0 0 0
ZZZ999 6 0 0 1 1 1
Here is the example when the new data was included:
UniqueID Month Skip CountSkip LastValue FinalTally SeqCount
ZZZ999 2 0 0 0 0 0
ZZZ999 3 0 0 0 0 0
ZZZ999 4 0 0 0 0 0
ZZZ999 5 0 0 0 0 0
ZZZ999 6 0 0 0 0 0
ZZZ999 15 1 1 1 0 1
This is the problem: Month 6 loses a value of SeqCount when new data is added in.
My ultimate goal is to run a regression model for each month with SeqCount as the response with some other columns as predictors (I didn't include them for ease of reading). Whenever I add new data in, the response will change and my estimates will not be consistent.
Is there a way I can structure my code differently so when I add new data, the logic of the code does not change the information from previous values of SeqCount?
Any help would be appreciated!
Thank you!