Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201867

Cumulatively Count Gaps in Sequential Numbers Results in Different Answers When New Data Added

$
0
0

I asked a question a few days ago which you guys helped me solve and I am forever grateful! However, a new issue has presented itself and I am in need of your help once again!

Here's a link to the original problem: (R) Cumulatively Count Gaps in Sequential Numbers

I was trying to cumulatively counts gaps in sequential numbers for each selection of UniqueIDs. This was my dataset:

UniqueID  Month  
ABC123    1       
ABC123    2      
ABC123    3      
ABC123    4      
ABC123    6      
ABC123    7      
DEF456    3      
DEF456    4      
DEF456    10     
DEF456    11     
DEF456    12     
DEF456    14     
GHI789    2      
GHI789    3  
JKL012    12     
JKL012    13     
JKL012    14    

Using your help, I tweaked the code provided from the link above as follows:

data2=data %>%
       group_by(UniqueID) %>%
       mutate(Skip = if_else(Month - lag(Month, default = first(Month) - 1) - 1 > 0, 1, 0),
       CountSkip = cumsum(Skip))

data2 = data2%>% 
       group_by(UniqueID) %>%
       mutate(LastValue = if_else(Month == last(Month), 1, 0))

data2=as.data.frame(data2)
data2$FinalTally=ifelse(data2$LastValue==1 & data2$Month!=14,1,0)
data2$SeqCount=data2$FinalTally+data2$CountSkip

This was the resulting dataset:

UniqueID  Month  Skip CountSkip LastValue  FinalTally   SeqCount
ABC123    1      0    0         0          0            0
ABC123    2      0    0         0          0            0
ABC123    3      0    0         0          0            0 
ABC123    4      0    0         0          0            0
ABC123    6      1    1         0          0            1
ABC123    7      1    2         1          1            2
DEF456    3      0    0         0          0            0
DEF456    4      0    0         0          0            0
DEF456    10     1    1         0          0            1
DEF456    11     1    1         0          0            1
DEF456    12     1    1         0          0            1  
DEF456    14     2    2         1          0            2
GHI789    2      0    0         0          0            0
GHI789    3      0    1         1          1            1
JKL012    12     0    0         0          0            0
JKL012    13     0    0         0          0            0 
JKL012    14     0    0         1          0            0

This is what I wanted...or so I thought.

When adding in new data for the next month (15), I edited the second to last line of my code to account for 15 being the new final month. However, I noticed the sum of SeqCount by Month differed from the sum of that same month before the new data was added. I filtered down to one month and found an example of one UniqueID where the SeqCount sum had differed.

Here is an example before the new data was included:

UniqueID  Month  Skip CountSkip LastValue  FinalTally   SeqCount
ZZZ999    2      0    0         0          0            0
ZZZ999    3      0    0         0          0            0
ZZZ999    4      0    0         0          0            0 
ZZZ999    5      0    0         0          0            0
ZZZ999    6      0    0         1          1            1

Here is the example when the new data was included:

UniqueID  Month  Skip CountSkip LastValue  FinalTally   SeqCount
ZZZ999    2      0    0         0          0            0
ZZZ999    3      0    0         0          0            0
ZZZ999    4      0    0         0          0            0 
ZZZ999    5      0    0         0          0            0
ZZZ999    6      0    0         0          0            0
ZZZ999    15     1    1         1          0            1

This is the problem: Month 6 loses a value of SeqCount when new data is added in.

My ultimate goal is to run a regression model for each month with SeqCount as the response with some other columns as predictors (I didn't include them for ease of reading). Whenever I add new data in, the response will change and my estimates will not be consistent.

Is there a way I can structure my code differently so when I add new data, the logic of the code does not change the information from previous values of SeqCount?

Any help would be appreciated!

Thank you!


Viewing all articles
Browse latest Browse all 201867

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>