Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 206305

Feature importance plot using xgb and also ranger. Best way to compare

$
0
0

I'm working on a script that trains both a ranger random forest and a xgb regression. Depending on which performs best based on rmse, one or the other is used to test against hold out data.

I would also like to return feature importance for both in a comparable way.

With the xgboost library, I can get my feature importance table and plot like so:

> xgb.importance(model = regression_model)
                 Feature        Gain       Cover  Frequency
1:              spend_7d 0.981006272 0.982513621 0.79219969
2:                   IOS 0.006824499 0.011105014 0.08112324
3:  is_publisher_organic 0.006379284 0.002917203 0.06770671
4: is_publisher_facebook 0.005789945 0.003464162 0.05897036

Then I can plot it like so:

> xgb.importance(model = regression_model) %>% xgb.plot.importance()

enter image description here

That was using xgboost library and their functions. With ranger random forrest, if I fit a regression model, I can get feature importance if I include importance = 'impurity' while fitting the model. Then:

regression_model$variable.importance
             spend_7d        d7_utility_sum  recent_utility_ratio                   IOS  is_publisher_organic is_publisher_facebook 
         437951687132                     0                     0             775177421             600401959            1306174807 

I could just create a ggplot. But the scales are entirely different between what ranger returns in that table and what xgb shows in the plot.

Is there an out of the box library or solution where I can plot the feature importance of either the xgb or ranger model in a comparable way?


Viewing all articles
Browse latest Browse all 206305

Trending Articles