I have a dataframe with four columns, the first one has the names of counties, the second one has periods in it and the third one has actual measured values(IPC class) in it and the fourth one has forecasted values(Forecast) in it. Both the actual values and the forecasted values have a range of 1 to 5. These are the 32 first rows of the dataframe sorted by county.:
structure(list(County = c("Baringo", "Baringo", "Baringo", "Baringo",
"Baringo", "Baringo", "Baringo", "Baringo", "Baringo", "Baringo",
"Baringo", "Baringo", "Baringo", "Baringo", "Baringo", "Baringo",
"Baringo", "Baringo", "Baringo", "Baringo", "Baringo", "Baringo",
"Baringo", "Baringo", "Baringo", "Baringo", "Baringo", "Baringo",
"Baringo", "Baringo", "Baringo", "Baringo"), `Period of measurement Kenya` = c("2011-01",
"2011-04", "2011-07", "2011-10", "2012-01", "2012-04", "2012-07",
"2012-10", "2013-01", "2013-04", "2013-07", "2013-10", "2014-01",
"2014-04", "2014-07", "2014-10", "2015-01", "2015-04", "2015-07",
"2015-10", "2016-02", "2016-06", "2016-10", "2017-02", "2017-06",
"2017-10", "2018-02", "2018-06", "2018-10", "2018-12", "2019-02",
"2019-06"), `IPC class` = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 3, 2, 1, 1, 1, 1, 1, 2
), Forecast = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 1, 1, 2, 2, 1, 1, 2, 1, 2, 3, 1, 1, 1, 1, 2, 1)), row.names = c(1L,
48L, 95L, 142L, 189L, 236L, 283L, 330L, 377L, 424L, 471L, 518L,
565L, 612L, 659L, 706L, 753L, 800L, 847L, 894L, 941L, 988L, 1035L,
1082L, 1129L, 1176L, 1223L, 1270L, 1317L, 1364L, 1411L, 1458L
), class = "data.frame")
So for my report I need to know how many crisis transitions and how many misforecasted crisis transitions there were during the period I am researching. A crisis transition is when the values in the actual values column went from 1 or 2 to 3,4 or 5. In the part of the dataframe you can see that the county Baringo had 1 crisis transition. To count this the following code was used:
SUB_count_cristrans_KE <- long.SUB_dfCSKE_tot %>% mutate(crisis = ifelse(`IPC class` %in% 3:5, 1, 0)) %>%
arrange(County, `Period of measurement Kenya`) %>%
group_by(County) %>%
summarize(SUB_crisis_trans_count = sum(diff(crisis) > 0))
A misforecasted crisis transition is when the forecast column doesn't show the same value as the IPC class column in the event of a crisis transition. As you can see in the part of the dataframe the crisis transition of Baringo was misforecasted, as the value in the Forecast column wasn't a 3, 4 or 5. So my question is: what would be a correct condition in the ifelse
function to substract the misforecasted crisis periods by county? In words this would be: First it has to check if a period is a crisis transition, so that it went from a 1 or 2 to a 3,4 or 5. If that's the case, is the value in the forecast column a 3, 4 or 5. If that's not the case then it is a misforecasted crisis transition.
The code I have right now is :
SUB_count_crismiss_KE <- long.SUB_dfCSKE_tot %>% mutate(crisis_miss = ifelse(`IPC class` %in% 3:5 & (!Forecast %in% 3:5), 1, 0)) %>%
arrange(County, `Period of measurement Kenya`) %>%
group_by(County) %>%
summarize(SUB_crisis_miss_count_KE = sum(diff(crisis_miss) > 0))
Let me know if I have to add something or clarify! Thanks in advance.
Below I've highlighted the county Garissa to make it more clear to what the problem is I'd like to solve or the goal I'd like to reach. ;)
> subset(sorted_long.SUB_dfCSKE_tot, County=="Garissa")
County Period of measurement Kenya IPC class Forecast
7 Garissa 2011-01 2 3
54 Garissa 2011-04 2 2
101 Garissa 2011-07 3 3
148 Garissa 2011-10 3 2
195 Garissa 2012-01 2 2
242 Garissa 2012-04 2 2
289 Garissa 2012-07 3 3
336 Garissa 2012-10 3 2
383 Garissa 2013-01 2 2
430 Garissa 2013-04 2 2
477 Garissa 2013-07 2 2
524 Garissa 2013-10 2 2
571 Garissa 2014-01 2 2
618 Garissa 2014-04 2 2
665 Garissa 2014-07 2 2
712 Garissa 2014-10 3 2
759 Garissa 2015-01 3 2
806 Garissa 2015-04 3 2
853 Garissa 2015-07 2 2
900 Garissa 2015-10 2 2
947 Garissa 2016-02 2 2
994 Garissa 2016-06 2 2
1041 Garissa 2016-10 2 2
1088 Garissa 2017-02 3 2
1135 Garissa 2017-06 3 3
1182 Garissa 2017-10 2 3
1229 Garissa 2018-02 3 2
1276 Garissa 2018-06 1 3
1323 Garissa 2018-10 1 1
1370 Garissa 2018-12 2 1
1417 Garissa 2019-02 2 2
1464 Garissa 2019-06 2 2
A crisis transition occurred between 2011-04 and 2011-07; the IPC value went from a 2 to a 3. However, between periods 2011-07 and 2011-10 there was no crisis transition, because the IPC value stayed 3. So now to the misforecasted part. The crisis transition between aforementioned periods was properly forecasted; the forecast value was 3, 4 or 5. The forecasted value of 2011-10 is incorrect, but because there was no crisis transition the value should not be counted. So how can I make a condition that skips the forecasted values without a crisis transition? I hope it is clearer now.