Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201867

condition for comparing two columns

$
0
0

I have a dataframe with four columns, the first one has the names of counties, the second one has periods in it and the third one has actual measured values(IPC class) in it and the fourth one has forecasted values(Forecast) in it. Both the actual values and the forecasted values have a range of 1 to 5. These are the 32 first rows of the dataframe sorted by county.:

structure(list(County = c("Baringo", "Baringo", "Baringo", "Baringo", 
"Baringo", "Baringo", "Baringo", "Baringo", "Baringo", "Baringo", 
"Baringo", "Baringo", "Baringo", "Baringo", "Baringo", "Baringo", 
"Baringo", "Baringo", "Baringo", "Baringo", "Baringo", "Baringo", 
"Baringo", "Baringo", "Baringo", "Baringo", "Baringo", "Baringo", 
"Baringo", "Baringo", "Baringo", "Baringo"), `Period of measurement Kenya` = c("2011-01", 
"2011-04", "2011-07", "2011-10", "2012-01", "2012-04", "2012-07", 
"2012-10", "2013-01", "2013-04", "2013-07", "2013-10", "2014-01", 
"2014-04", "2014-07", "2014-10", "2015-01", "2015-04", "2015-07", 
"2015-10", "2016-02", "2016-06", "2016-10", "2017-02", "2017-06", 
"2017-10", "2018-02", "2018-06", "2018-10", "2018-12", "2019-02", 
"2019-06"), `IPC class` = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 3, 2, 1, 1, 1, 1, 1, 2
), Forecast = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 1, 1, 2, 2, 1, 1, 2, 1, 2, 3, 1, 1, 1, 1, 2, 1)), row.names = c(1L, 
48L, 95L, 142L, 189L, 236L, 283L, 330L, 377L, 424L, 471L, 518L, 
565L, 612L, 659L, 706L, 753L, 800L, 847L, 894L, 941L, 988L, 1035L, 
1082L, 1129L, 1176L, 1223L, 1270L, 1317L, 1364L, 1411L, 1458L
), class = "data.frame") 

So for my report I need to know how many crisis transitions and how many misforecasted crisis transitions there were during the period I am researching. A crisis transition is when the values in the actual values column went from 1 or 2 to 3,4 or 5. In the part of the dataframe you can see that the county Baringo had 1 crisis transition. To count this the following code was used:

SUB_count_cristrans_KE <- long.SUB_dfCSKE_tot %>% mutate(crisis = ifelse(`IPC class` %in% 3:5, 1, 0)) %>%
  arrange(County, `Period of measurement Kenya`) %>%
  group_by(County) %>%
  summarize(SUB_crisis_trans_count = sum(diff(crisis) > 0))

A misforecasted crisis transition is when the forecast column doesn't show the same value as the IPC class column in the event of a crisis transition. As you can see in the part of the dataframe the crisis transition of Baringo was misforecasted, as the value in the Forecast column wasn't a 3, 4 or 5. So my question is: what would be a correct condition in the ifelse function to substract the misforecasted crisis periods by county? In words this would be: First it has to check if a period is a crisis transition, so that it went from a 1 or 2 to a 3,4 or 5. If that's the case, is the value in the forecast column a 3, 4 or 5. If that's not the case then it is a misforecasted crisis transition. The code I have right now is :

SUB_count_crismiss_KE <- long.SUB_dfCSKE_tot %>% mutate(crisis_miss = ifelse(`IPC class` %in% 3:5 & (!Forecast %in% 3:5), 1, 0)) %>%
  arrange(County, `Period of measurement Kenya`) %>%
  group_by(County) %>%
  summarize(SUB_crisis_miss_count_KE = sum(diff(crisis_miss) > 0))

Let me know if I have to add something or clarify! Thanks in advance.

Below I've highlighted the county Garissa to make it more clear to what the problem is I'd like to solve or the goal I'd like to reach. ;)

> subset(sorted_long.SUB_dfCSKE_tot, County=="Garissa")
      County Period of measurement Kenya IPC class Forecast
7    Garissa                     2011-01         2        3
54   Garissa                     2011-04         2        2
101  Garissa                     2011-07         3        3
148  Garissa                     2011-10         3        2
195  Garissa                     2012-01         2        2
242  Garissa                     2012-04         2        2
289  Garissa                     2012-07         3        3
336  Garissa                     2012-10         3        2
383  Garissa                     2013-01         2        2
430  Garissa                     2013-04         2        2
477  Garissa                     2013-07         2        2
524  Garissa                     2013-10         2        2
571  Garissa                     2014-01         2        2
618  Garissa                     2014-04         2        2
665  Garissa                     2014-07         2        2
712  Garissa                     2014-10         3        2
759  Garissa                     2015-01         3        2
806  Garissa                     2015-04         3        2
853  Garissa                     2015-07         2        2
900  Garissa                     2015-10         2        2
947  Garissa                     2016-02         2        2
994  Garissa                     2016-06         2        2
1041 Garissa                     2016-10         2        2
1088 Garissa                     2017-02         3        2
1135 Garissa                     2017-06         3        3
1182 Garissa                     2017-10         2        3
1229 Garissa                     2018-02         3        2
1276 Garissa                     2018-06         1        3
1323 Garissa                     2018-10         1        1
1370 Garissa                     2018-12         2        1
1417 Garissa                     2019-02         2        2
1464 Garissa                     2019-06         2        2

A crisis transition occurred between 2011-04 and 2011-07; the IPC value went from a 2 to a 3. However, between periods 2011-07 and 2011-10 there was no crisis transition, because the IPC value stayed 3. So now to the misforecasted part. The crisis transition between aforementioned periods was properly forecasted; the forecast value was 3, 4 or 5. The forecasted value of 2011-10 is incorrect, but because there was no crisis transition the value should not be counted. So how can I make a condition that skips the forecasted values without a crisis transition? I hope it is clearer now.


Viewing all articles
Browse latest Browse all 201867

Trending Articles