Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201977

R vtreat prepare() cat_P output

$
0
0

I have a dataset containing categorical variables and numeric features:

Experiment Replicate Batch Condition Cellline  Feature1 Feature2   ...
  <chr>      <chr>   <chr>   <chr>    <chr>     <dbl>     <dbl>    ...

I am using the vtreat package in R to treat my data before modeling.

my_treatment <- vtreat::designTreatmentsZ(
  dframe = data,
  varlist = colnames(data),
  minFraction = 0.05
)
data_treated <- vtreat::prepare(my_treatment, data)

After using prepare() I check the catP columns to check the levels of the categorical variables:

> table(data_treated$Cellline_catP)

0.0914634146341463  0.103658536585366  0.109756097560976  0.121951219512195 
                15                 17                 72                 60 

However, although I have 9 cell lines in my dataset, I see only 4 in data$Cellline_catP.

> dplyr::count(data, dplyr::n_distinct(Cellline))
# A tibble: 1 x 2
  `dplyr::n_distinct(Cellline)`     n
                          <int> <int>
1                             9   164

Shouldn't there be also 9 different categories in data$Cellline_catP? I tried renaming the lines (it's a mix of numbers and letters), and excluding some lines, but it doesn't change.


Viewing all articles
Browse latest Browse all 201977

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>