Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

data.table bug: lapply on .SD reorder columns when using get(). Possible workaround?

$
0
0

I found a strange behavior of data.table. I would like to know if there is a way to avoid it, or a workaround.

In my data management, I use often lapply with .SD, to assign new values to columns. To assign properly several columns, the order of the output column of the lapply must be kept. I found a situation where it is not the case.

Here the normal behavior

library(data.table)
plouf <- data.table(x = 1, y = 2, z = 3)
cols <- c("y","x")
plouf[,.SD,.SDcols = cols ,by = z]
plouf[,lapply(.SD,function(x){x}),.SDcols = cols ,by = z]
plouf[,lapply(.SD[x == 1],function(x){x}),.SDcols = cols ,by = z]

All these lines give :

   z y x
1: 3 2 1

which I need for example to reassign to c("y","x"). But if I do:

plouf[,lapply(.SD[get("x") == 1],function(x){x}),.SDcols = c("y","x"),by = z]

   z x y
1: 3 1 2

Here the order of x and y changed without reason, when it should yield the same result as the last "working" example. If then assign the wrong values to c("y","x") if I assign the output of lapply to new vector of columns. It seems that the use of get in the i part of .SD triggers this bug.

Example of the effect of this on assignment:

plouf[, c(cols ) := lapply(.SD[get("x") == 1],function(x){x}),
      .SDcols = cols ,by = z][]
#    x y z
# 1: 2 1 3

Does anyone have a workaround ? The code I am using looks more like :

 plouf[, c(cols ) := lapply(.SD[get("x") >= 1 & get("x") <= 3],function(x){mean}),
          .SDcols = cols ,by = z]

the issue on github: https://github.com/Rdatatable/data.table/issues/4089


Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>