I'm trying (for the first time) to add an external variable to prophet with the add_regressor
function, but the results I'm getting look wild. The dataset I'm using is freely available on kaggle (the well known shampoo sales) here. I'm attempting to use freely available data for the SPY stock index using R's quantmod
package as my external variable.
Here's how I start the code:
library(prophet)
library(quantmod)
library(dplyr)
df <- read.csv("~/shampoo.csv")
#now get the min and max dates in the column
min_date <- min(df$Date, na.rm = TRUE)
max_date <- max(df$Date, na.rm = TRUE)
#download the SPY stock data
getSymbols("SPY", from = min_date, to = max_date)
#SPY closes stored into a df and massage a bit
Close <- data.frame(Cl(SPY))
Close <- cbind(ds = rownames(Close), Close)
rownames(Close) <- NULL
Close_no_rename <- Close
colnames(Close)[colnames(Close) == 'SPY.Close'] <- 'y'
colnames(Close_no_rename)[colnames(Close_no_rename) == 'SPY.Close'] <- 'SPY_CLOSE'
#now put this into prophet and make a forecast for the forecast_period for SPY
stock_model <- prophet(Close)
#make a forecast dataframe
future_stocks <- make_future_dataframe(stock_model, periods = 30, freq = "month", include_history = FALSE)
#the below df will have predicted stock prices of the SPY. want to extract the future y values as point forecast along with dates
forecast <- predict(stock_model, future_stocks) %>% select(ds, yhat)
colnames(forecast)[colnames(forecast) == 'yhat'] <- 'SPY_CLOSE'
#rename the columns of the actual df
colnames(df)[colnames(df) == 'Date'] <- 'ds'
colnames(df)[colnames(df) == 'Value'] <- 'y'
#now want to merge the Close df y historic values onto the training df, merge by date ds column
df_historic_with_SPY_close <- merge(df, Close_no_rename, by = "ds")
#now actually forecast using prophet
model <- prophet()
#add the SPY regressor
model <- add_regressor(model, 'SPY_CLOSE', prior.scale = 0.0000001, standardize = FALSE)
model <- fit.prophet(model, df_historic_with_SPY_close)
forecast_final <- predict(model, forecast)
plot(model, forecast_final)
This does not throw any errors but the plot of the forecast looks...wrong. It looks as if the scale is off or something. I tried fiddling with the prior and standardize settings with no luck. Thanks for any help!
Here is the shampoo dataset being used as the main variable:
Date Value
2017-01-01 266
2017-02-01 145.9
2017-03-01 183.1
2017-04-01 119.3
2017-05-01 180.3
2017-06-01 168.5
2017-07-01 231.8
2017-08-01 224.5
2017-09-01 192.8
2017-10-01 122.9
2017-11-01 336.5
2017-12-01 185.9
2018-01-01 194.3
2018-02-01 149.5
2018-03-01 210.1
2018-04-01 273.3
2018-05-01 191.4
2018-06-01 287
2018-07-01 226
2018-08-01 303.6
2018-09-01 289.9
2018-10-01 421.6
2018-11-01 264.5
2018-12-01 342
2019-01-01 339.7
2019-02-01 440.4
2019-03-01 315.9
2019-04-01 439.3
2019-05-01 401.3
2019-06-01 437.4
2019-07-01 575.5
2019-08-01 407.6
2019-09-01 682
2019-10-01 475.3
2019-11-01 581.3
2019-12-01 646.9