Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201945

How to apply WOE in r (both training and test sets)

$
0
0

I am using random forest for my classification problem and I have a data set consisting of categorical variables with too many levels (>100). I want to decrease the number of levels by applying woe. I use function woe in R as follows :

woe.object <- woe(target ~brand_id, data = train, zeroadj = 0)

However, I receive this error:

Error in woe.default(x, grouping, weights = weights, ...) : x should be of type data frame

I planned to calculate woe manually but I can do it only for training set as test set doesn't have the target variable. When I searched on the internet I found that I can modify the above woe function by adding other categorical variables as follows:

woe(target ~brand_id+item_id, data = train, zeroadj = 0)

When I compare the woe values which are calculated manually and with this woe(target ~brand_id+item_id, data = train, zeroadj = 0), I noticed that they are not the same.

Can anyone help me with how to calculate woe for both data and test sets in r correctly? Moreover, I would like to know if I should bin the levels after getting the woe scores.


Viewing all articles
Browse latest Browse all 201945

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>