I am using random forest for my classification problem and I have a data set consisting of categorical variables with too many levels (>100). I want to decrease the number of levels by applying woe. I use function woe in R as follows :
woe.object <- woe(target ~brand_id, data = train, zeroadj = 0)
However, I receive this error:
Error in woe.default(x, grouping, weights = weights, ...) : x should be of type data frame
I planned to calculate woe manually but I can do it only for training set as test set doesn't have the target variable. When I searched on the internet I found that I can modify the above woe function by adding other categorical variables as follows:
woe(target ~brand_id+item_id, data = train, zeroadj = 0)
When I compare the woe values which are calculated manually and with this woe(target ~brand_id+item_id, data = train, zeroadj = 0)
, I noticed that they are not the same.
Can anyone help me with how to calculate woe for both data and test sets in r correctly? Moreover, I would like to know if I should bin the levels after getting the woe scores.