I have a 14 million row dataset of products, tariff rates, trade volumes, and year-month combinations in the following format:
df <- as.data.frame(matrix(c(1220, "2013-1", 10011900, 29307, .1,
1220, "2013-2", 10011900, 28202, .1,
1220, "2013-3", 10011900, 22383, .15,
1220, "2013-4", 10011900, 21303, .15,
1220, "2013-5", 10011900, 21201, .15,
1220, "2013-1", 10019900, 9960, .12,
1220, "2013-2", 10019900, 10043, .12,
1220, "2013-3", 10019900, 11001, .1,
1220, "2013-4", 10019900, 10997, .1,
1220, "2013-5", 10019900, 12038, .1),
ncol = 5, byrow = T))
colnames(df) <- c("country", "date", "product", "value", "rate" )
I'm trying to add a column to the data such that I'll be able to use to create a set of indicator variables marking how many months before / after a change in the tariff rate occurred. So, the above would look like this:
df_transformed <- as.dataframe(matrix(c(1220, "2013-1", 10011900, 29307, .1, -2,
1220, "2013-2", 10011900, 28202, .1, -1,
1220, "2013-3", 10011900, 22383, .15, 0,
1220, "2013-4", 10011900, 21303, .15, 1,
1220, "2013-5", 10011900, 21201, .15, 2,
1220, "2013-1", 10019900, 9960, .12, -2,
1220, "2013-2", 10019900, 10043, .12, -1,
1220, "2013-3", 10019900, 11001, .1, 0,
1220, "2013-4", 10019900, 10997, .1, 1,
1220, "2013-5", 10019900, 12038, .1, 2)))
colnames(df_transformed) <- c("country", "date", "product", "value", "rate", "months_since_change")
I'm not sure how to best find when the tariff variable changes and create a new column based on that.
Thanks for the help!