Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201977

R: conditionally combine text from adjacent rows while retaining associated information

$
0
0

The script needs to:

a) combine text in adjacent rows, the number of adjacent rows may vary, the grouping of rows to be combined is determined by the first row preceded by a NA and the last row followed by NA,

b) retain row ids for future checking

c) retain a numeric variable associated with one row from each of the adjacent rows to be combined

d) retain the overall order

Before and after tables

I have achieved this using a for loop and a load of data wrangling with dplyr and stringer.

The for loop is inelegant as I'm struggling with the logic to identify adjacent rows sequentially. This is not important as the grouping variable is just a helper - but it galls me.

I also wonder if there might be a more efficient way to do this altogether maybe using rowwise and mutate with lead or lag.

Any guidance or pointers would be appreciated.

library(tidyverse)

tib <- tibble(id = 1:11,
              var = c("a", NA, NA, "b", "c" , NA, "d", NA, NA, NA, "e"),
              txt = c( NA, "the", "cat",  NA,  NA, "sat", NA, "on", "the", "mat", NA),
              nr = c( NA,  NA, 5, NA, NA, 10, 7, NA, NA, 15, 11),
              txt_group = NA_integer_)

# txt_group = helper column for text grouping variable

txt_group_counter <- 1L


for(i in seq_len(nrow(tib))){

  if (!is.na(tib$txt[i]) | !is.na(lag(tib$txt[i]))){

    tib$txt_group[i] <- txt_group_counter
   } 

  if(is.na(tib$txt[i]) | !is.na(lead(tib$txt[i]))){

    txt_group_counter <- txt_group_counter + 1
  }

}


tib1 <- 
  tib %>%
  filter(!is.na(txt_group)) %>% 
  group_by(txt_group) %>% 
  mutate(id_comb = paste(id, collapse = ", "),
         txt = paste(txt, collapse = ""),
         nr = paste(nr, collapse = "")) %>% 
  select(-id) %>% 
  distinct() %>% 
  ungroup() %>% 
  mutate(id = as.numeric(str_extract(id_comb, "^\\d")),
         nr = as.numeric(str_remove_all(nr, "[NA]"))) %>% 
  select(id, id_comb, everything()) %>% 
  bind_rows(tib %>% filter(is.na(txt_group))) %>% 
  arrange(id) %>% 
  select(-txt_group)

Viewing all articles
Browse latest Browse all 201977

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>