Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

How do I choose a denominator based on the values in another data frame?

$
0
0

My task is to calculate the ratio of academic credits per student/expected academic credits (with no courses failed) during the time period. Since I have got limited access to university databases, I have to create a reference table for each class, where I can see what the amount of credits the students should have attained given the current date.

My reference table will look something like this (I haven't yet gathered data from more than one class though):

   Cred   Sumb weeks startdate  ExpCred Start_date_points End_date_points
1   15.0    0   10  2018-09-02  0.0     2018-09-02        2018-12-01
2   15.0    0   10  2018-09-02  15.0    2018-12-02        2019-02-09
3   15.0    0   10  2018-09-02  30.0    2019-02-10        2019-04-20
4   15.0    0   10  2018-09-02  45.0    2019-04-21        2019-06-29
5   7.5     1   5   2018-09-02  60.0    2019-06-30        2019-10-26
6   7.5     1   5   2018-09-02  67.5    2019-10-27        2019-11-30
7   15.0    1   10  2018-09-02  75.0    2019-12-01        2020-02-08
8   7.5     1   5   2018-09-02  90.0    2020-02-09        2020-03-14
9   7.5     1   5   2018-09-02  97.5    2020-03-15        2020-04-18
10  15.0    1   10  2018-09-02  105.0   2020-04-19        2020-06-27
11  15.0    2   10  2018-09-02  120.0   2020-06-28        2020-11-28
12  15.0    2   10  2018-09-02  135.0   2020-11-29        2021-02-06
13  30.0    2   20  2018-09-02  150.0   2021-02-07        2021-06-26
14  0.0     2   0   2018-09-02  180.0   2021-06-27        2021-06-26
15  15.0    0   10  2019-09-03    0.0   2019-09-03        2019-12-02
16  15.0    0   10  2019-09-03   15.0   2019-12-03        2020-02-10
17  15.0    0   10  2019-09-03   30.0   2020-02-11        2020-04-20
18  15.0    0   10  2019-09-03   45.0   2020-04-21        2020-06-29
19  7.5     1   5   2019-09-03   60.0   2020-06-30        2020-10-26
20  7.5     1   5   2019-09-03   67.5   2020-10-27        2020-11-30
21  15.0    1   10  2019-09-03   75.0   2020-12-01        2021-02-08
22  7.5     1   5   2019-09-03   90.0   2021-02-09        2021-03-15
23  7.5     1   5   2019-09-03   97.5   2021-03-16        2021-04-19
24  15.0    1   10  2019-09-03   105.0  2021-04-20        2021-06-28
25  15.0    2   10  2019-09-03   120.0  2021-06-29        2021-11-29
26  15.0    2   10  2019-09-03   135.0  2021-11-30        2022-02-07
27  30.0    2   20  2019-09-03   150.0  2022-02-08        2022-06-27
28  0.0     2   0   2019-09-03   180.0  2022-06-28        2022-06-27

When I try to use my reference table in calculations, however, I run into problems. I write:

fulldata<-fulldata%>%mutate(PERC_CREDIT=CREDITS/ifelse(UTBILDNINGSTILLFALLE_STARTDATUM==ekon_program$sd & Sys.Date()>=ekon_program$finished_date,180,
ifelse(UTBILDNINGSTILLFALLE_STARTDATUM==ekon_program$sd & Sys.Date()>=ekon_program$start_date_points & Sys.Date()<=ekon_program$end_date_points,ekon_program$points_expected,100000 )))

And get the following error messages:

Warning messages:
1: In UTBILDNINGSTILLFALLE_STARTDATUM == ekon_program$sd :
  longer object length is not a multiple of shorter object length
2: In UTBILDNINGSTILLFALLE_STARTDATUM == ekon_program$sd & Sys.Date() >=  :
  longer object length is not a multiple of shorter object length
3: In UTBILDNINGSTILLFALLE_STARTDATUM == ekon_program$sd :
  longer object length is not a multiple of shorter object length
4: In UTBILDNINGSTILLFALLE_STARTDATUM == ekon_program$sd & Sys.Date() >=  :
  longer object length is not a multiple of shorter object length
5: In UTBILDNINGSTILLFALLE_STARTDATUM == ekon_program$sd & Sys.Date() >=  :
  longer object length is not a multiple of shorter object length

What I want to do is to choose a denominator based on the information in my reference table. First and formost I want to limit the observations the calculations are based on to the ones with the same starting point for the program as the one for the current observation in the main table.

If the current sys.date() exceeds the calculated end date for the program (starting at that specified date) in the reference table, I want the denominator to be set to 180.

If the sys.date is less than that, I want it to reflect the value of points_expected for whatever observation in the reference table (with the same starting point for the program as the current observation in the main table) where the current sys.date falls between the values of start_date_points and end_date_points.

How can I make this happen and what am I doing wrong?

Excerpt main table

structure(list(UTBILDNINGSTILLFALLE_STARTDATUM = c("2018-09-03", 
"2018-09-03", "2018-09-03", "2018-09-03", "2018-09-03", "2018-09-03"
), CREDITS = c(30, 0, 22.5, 9.5, 0, 54)), row.names = c(NA, 6L
), class = "data.frame")

Excerpt reference table:

structure(list(sd = c("2018-09-03", "2018-09-03", "2018-09-03", 
"2018-09-03", "2018-09-03", "2018-09-03"), points_ekon = c(15, 
15, 15, 15, 7.5, 7.5), summer_break_ekon = c(0, 0, 0, 0, 1, 1
), weeks_course = c(10, 10, 10, 10, 5, 5), points_expected = c(0, 
15, 30, 45, 60, 67.5), order = 1:6, start_date = structure(c(17776, 
17776, 17776, 17776, 17776, 17776), class = "Date"), start_date_points = structure(c(17776, 
17867, 17937, 18007, 18077, 18196), class = "Date"), end_date_points = structure(c(17866, 
17936, 18006, 18076, 18195, 18230), class = "Date"), finished_date = structure(c(18805, 
18805, 18805, 18805, 18805, 18805), class = "Date")), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), groups = structure(list(
    start_date = structure(17776, class = "Date"), .rows = list(
        1:6)), row.names = c(NA, -1L), class = c("tbl_df", "tbl", 
"data.frame"), .drop = TRUE))

Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>