I have a panel data set which looks as follows:
library(plm)
library(Hmisc)
library(data.table)
set.seed(1)
DT <- data.table(panelID = sample(50,50), # Creates a panel ID
Country = c(rep("Albania",30),rep("Belarus",50), rep("Chilipepper",20)),
some_NA = sample(0:5, 6),
some_NA_factor = sample(0:5, 6),
Group = c(rep(1,20),rep(2,20),rep(3,20),rep(4,20),rep(5,20)),
Time = rep(seq(as.Date("2010-01-03"), length=20, by="1 month") - 1,5),
norm = round(runif(100)/10,2),
Income = round(rnorm(10,-5,5),2),
Happiness = sample(10,10),
Sex = round(rnorm(10,0.75,0.3),2),
Age = sample(100,100),
Educ = round(rnorm(10,0.75,0.3),2))
DT [, uniqueID := .I] # Creates a unique ID
DT[DT == 0] <- NA # https://stackoverflow.com/questions/11036989/replace-all-0-values-to-na
DT$some_NA_factor <- factor(DT$some_NA_factor)
DTp <- plm::pdata.frame(DT, index= c("panelID", "Time"))
I want to evaluate, for each panel observation, whether some_NA_factor
or for example Country
changes from one time period to another (a 1
for a change and a 0
for no change). I would like to write something like:
setDT(DT)[, difference := c(-1,1)*diff(some_NA_factor), by=panelID]
But I don't know how to write this when it concerns factors. If I apply this to the data.table I expectedly get:
Warning messages:
1: In Ops.factor(c(-1, 1), diff(weight)) : ‘*’ not meaningful for factors
If I apply the same thing to the pdata.frame
. I get:
setDT(DTp)[, difference := c(-1,1)*diff(some_NA_factor), by=panelID]
Error in alloc.col(x) :
Internal error: length of names (14) is not length of dt (13)
Additionally, when apply this to my actual data I get the following error:
Supplied 107438 items to be assigned to group 1 of size 2 in column 'difference'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.
And I am not sure why that happens (I cannot seem to reproduce it in the example).
Any ideas?