I'm trying to write a function that will calculate the partial dependence of all the variables in a model and store them in a data frame. But I'm new to loops in R and I'm not sure how to achieve this. Below is some example code to explain what I'm trying to achieve.
Setting up the model:
x1 <- rnorm(100,0,1)
x2 <- rnorm(100,0,1)
x3 <- rnorm(100,0,1)
x4 <- rnorm(100,0,1)
x5 <- rnorm(100,0,1)
y <- x1*100 + x2*10
df <- data.frame(x1,x2,x3,x4,x5,y)
library(randomForest)
rf <- randomForest(y~., data=df)
Then I'm using the pdp
package in R
to calculate the partial dependence (pd).
What I'm trying to achieve is to write a function that will calc the pd for each variable and then store those values in a data frame. For example, if I were to manually calc the pd for each variable I would do something like this:
library(pdp)
pdp <- partial(rf, pred.var = "x1")
pdp2 <- partial(rf, pred.var = "x2")
:
etc
:
pdp5 <- partial(rf, pred.var = "x5")
and then create a df of the values and all the y-hats, like so:
pdpDF <- data.frame(pdp,pdp2,...,pdp5)
But I would like to automate the process. Im not sure how to do this in R
. Very naively I would say it would look something like this:
xVars <- df[-6] # remove y
for (i in 1:length(xVars))
pdpValues <- partial(rf, pred.var = xVars[I]) #calc pdp for each variable
pdpVal <-cbind(all the pdpValues for each variable) #column bind all the values
pdpDF<- data.frame(pdpVal) # Create df
but I have no idea how to make this work!? Any suggestions?