Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 206430

R - calculating the average value of previous quarter's dependent variable as independent variable

$
0
0

I am running a logistic regression of the form:

model_1 <- lrm(dependent_variable ~ var1 + var2 + var3, data = merged_dataset, na.action="na.delete")

What I would like to do is to include the previous quarter's average dependent variable as an independent variable.

My data look like:

> print(head(merged_dataset[,c("dependent_variable", "qy")]))
   dependent_variable      qy
1:                  0 2008 Q1
2:                  0 2008 Q1
3:                  0 2008 Q1
4:                  0 2008 Q1
5:                  0 2008 Q1
6:                  0 2008 Q1

where the dependent variable takes values 0 or 1, and the qy variable takes values such as 2008 Q1, 2008 Q2, .... , 2017 Q4.

EDIT:

I run the following code:

# function that gets the name of previous quarter
PrevQ = function(quarter = "2018 Q1"){
  ifelse(grepl("Q1", quarter), 
         paste0(as.numeric(substr(quarter, 1,4))-1, " Q4"), 
         paste0(substr(quarter, 1,6), as.numeric(substr(quarter,7,7))-1)
  )
}

# means in current quarters
merged_dataset$dep_means = with(merged_dataset, ave(dependent_variable, qy, FUN = mean))

# get names of previous quarters   
merged_dataset$prev_qy = PrevQ(merged_dataset$qy)

# merge mean of previous quarter by name of the quarter
merged_dataset$var4 = with(merged_dataset, dep_means[match(prev_qy, qy)])

print(head(merged_dataset[,c("dependent_variable", "qy")]))


print(head(cbind(merged_dataset$var4,merged_dataset$qy)))

which gives me

> print(head(merged_dataset[,c("dependent_variable", "qy")]))
   dependent_variable      qy
1:                  0 2008 Q1
2:                  0 2008 Q1
3:                  0 2008 Q1
4:                  0 2008 Q1
5:                  0 2008 Q1
6:                  0 2008 Q1
> 
> 
> print(head(cbind(merged_dataset$var4,merged_dataset$qy)))
     [,1] [,2]
[1,]   NA 2008
[2,]   NA 2008
[3,]   NA 2008
[4,]   NA 2008
[5,]   NA 2008
[6,]   NA 2008

It seems to be changing the qy variable and only the year remains? And I get all NAs.

> print(head(cbind(merged_dataset$var4,merged_dataset$qy,merged_dataset$dep_means)))
     [,1] [,2]      [,3]
[1,]   NA 2008 0.1292719
[2,]   NA 2008 0.1292719
[3,]   NA 2008 0.1292719
[4,]   NA 2008 0.1292719
[5,]   NA 2008 0.1292719
[6,]   NA 2008 0.1292719


> print(tail(cbind(merged_dataset$var4,merged_dataset$qy,merged_dataset$dep_means)))
         [,1]    [,2]       [,3]
[32008,]   NA 2017.75 0.09802372
[32009,]   NA 2017.75 0.09802372
[32010,]   NA 2017.75 0.09802372
[32011,]   NA 2017.75 0.09802372
[32012,]   NA 2017.75 0.09802372
[32013,]   NA 2017.75 0.09802372

Viewing all articles
Browse latest Browse all 206430

Trending Articles