Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 202012

R data.table : creating a count table of values in multiple columns by using .N

$
0
0

Here is my test DT;

a<-data.table(cluster=sample(LETTERS[1:3], size = 10, replace = T), a=sample(x=1:2, size=10, replace = T), b=sample(x=1:2, size=10, replace = T), c=sample(x=1:2, size=10, replace = T), d=sample(x=1:3, size=10, replace=T))

a
    cluster a b c d
 1:       B 1 2 1 2
 2:       C 1 1 1 1
 3:       B 2 1 1 3
 4:       A 2 2 1 1
 5:       C 2 2 1 2
 6:       A 2 2 1 3
 7:       A 2 2 1 1
 8:       A 2 1 1 2
 9:       C 2 1 1 1
10:       C 2 2 1 1

I use ply package's count to generate a count table as follows;

> a[, lapply(.SD, function(x) count(x)), .SDcols=2:5]
   a.x a.freq b.x b.freq c.x c.freq d.x d.freq
1:   1      2   1      4   1     10   1      5
2:   2      8   2      6   1     10   2      3
3:   1      2   1      4   1     10   3      2

It is pretty ugly but somewhat serves a purpose. The output that I really wish is as follows ;

    a.x a.freq b.x b.freq c.x c.freq d.x d.freq
    1:   1      2   1      4   1     10   1      5
    2:   2      8   2      6   1     10   2      3
    3:   NA     NA  NA    NA  NA     NA   3      2

Also, I would like to group them with cluster vectors if possible but adding by=cluster fails. Furthermore, I've tried using UniqueN and .N, which works fine with a single column but not with multiple columns. At this point, I'd really appreciate any pointers.


Viewing all articles
Browse latest Browse all 202012

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>