We receive a monthly report in excel format where I only need specific values. Previously, I was able to use readxl to grep the desired column for the row numbers and go from there:
library(readxl)
file <- read_excel(readxl_example("deaths.xlsx"), col_names = FALSE))
row_pos <- grep(pattern = "actor", file$..2)
I could then grep some more for the specific column I wanted:
col_pos <- grep(pattern = "Has Kids", file)
This used to return the row positions I wanted, I could extract and proceed to munge my data.
I am intentionally using the now incorrect $..2
syntax here. A recent update changed this convention to $...2
My question is how to implement a more robust selection for the first grep
, so that I don't have to update all my code when a minor syntax change in readxl (or any other package) is implemented?
I have tried:
row_pos <- grep(pattern = "actor", x = file %>% select(contains("2")))
But that only returns the first value.
Here's the rest of the pipeline for some context on what happens to the data.
values <- as.data.frame(t(file[row_pos, col_pos]), stringsAsFactors = FALSE, row.names = NULL)
etc. Thanks!