Customer date sales1 sales2
c1 2019-01-01 67 35
c1 2019-01-07 70 32
c1 2019-01-14 72 40
c2 2019-01-01 100 12
c2 2019-01-07 134 20
c2 2019-01-14 174 23
Making date column as number for forecasting purpose.
df <- df %>% group_by(customer) %>% mutate(dt_seq = row_number())
n <- data.frame()
for(i in unique(df$customer)){
one <- df[df$customer ==i,]
one$customer <- NULL
model <- lm(sales1 ~ dt_seq,data=one)
fin <- data.frame(matrix(0,52,1))
colnames(fin) <- 'dt_seq'
fin$dt_seq <- seq(max(one$dt_seq)+1,max(one$dt_seq)+52,1)
pre <- as.data.frame(predict(model,fin))
temp <- cbind(cbind(fin,pre),i)
temp$dt <- seq(max(one$dt)+7,max(one$dt)+7*52,7)
colnames(temp) <- c("dt_seq","sales1","customer","dt")
n <- rbind(n,temp)
}
This is taking long time as I have many customers data and is there any other way to run parallel using spark.lapply() function. Edit : I would like to predict for future values for next 52 periods.For each customer the for loop is taking too long time as I have many customers data. Is there other way to calculate linear regression for time series for grouped data and forecast values.