Here is my test DT;
a<-data.table(cluster=sample(LETTERS[1:3], size = 10, replace = T), a=sample(x=1:2, size=10, replace = T), b=sample(x=1:2, size=10, replace = T), c=sample(x=1:2, size=10, replace = T), d=sample(x=1:3, size=10, replace=T))
a
cluster a b c d
1: B 1 2 1 2
2: C 1 1 1 1
3: B 2 1 1 3
4: A 2 2 1 1
5: C 2 2 1 2
6: A 2 2 1 3
7: A 2 2 1 1
8: A 2 1 1 2
9: C 2 1 1 1
10: C 2 2 1 1
I use ply
package's count
to generate a count table as follows;
> a[, lapply(.SD, function(x) count(x)), .SDcols=2:5]
a.x a.freq b.x b.freq c.x c.freq d.x d.freq
1: 1 2 1 4 1 10 1 5
2: 2 8 2 6 1 10 2 3
3: 1 2 1 4 1 10 3 2
It is pretty ugly but somewhat serves a purpose. The output that I really wish is as follows ;
a.x a.freq b.x b.freq c.x c.freq d.x d.freq
1: 1 2 1 4 1 10 1 5
2: 2 8 2 6 1 10 2 3
3: NA NA NA NA NA NA 3 2
Also, I would like to group them with cluster vectors if possible but adding by=cluster
fails. Furthermore, I've tried using UniqueN and .N, which works fine with a single column but not with multiple columns. At this point, I'd really appreciate any pointers.