Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 208775

How to apply the same function to several variables in R?

$
0
0

I know that similar questions have already been asked (e.g. Passing list element names as a variable to functions within lapply or R - iteratively apply a function of a list of variables), but I couldn't manage to find a solution for my problem based on these posts.

I have an event dataset (~100 variables, >2000 observations) that contains variables with information on the involved actors. One variable can only contain one actor, so if several actors have been involved in the event, they are spread over several variables (e.g. actor1, actor2, ...). These actors can be classified into two groups ("s" and "nons"). For later use, I need two lists of actors: one that contains all actors of the category "s" and one that contains all actors of "nons". "s" only consists of three actors while "nons" consists of dozens of actors.

# create example data
df <- data.frame(id = c(1:8),
                 actor1 = c("s1", "s2", "nons1", "nons2", "nons3", "nons4", "nons5", NA),
                 actor2 = c("s1", NA, "s2", "s3", "nons2", "nons6", "nons1", "nons4"))

df <-  
  df %>%
  mutate(actor1 = as.character(actor1),
         actor2 = as.character(actor2))

Since the script I am about to prepare is supposed to be used on updated versions of the dataset in the future, I would like to automate as much as possible and keep the parts of the script that would need to be adapted as limited as possible. My idea was to create one function per category that extracts the actors of the respective category (e.g. "nons") from one variable (e.g. actor1) in a list and then "loop" this function over the other variables (ideally with the apply family).

I know which category each actor belongs to, which allows me to define a separation rule as used in the function below (the filter command). But please note that in the real dataset it is not possible deduct the category from the actor name. Obvious names like "s1" and "nons1" are used for illustrative purposes in this example only.

# create function
nons_function <- function(col) {
  col_ <- enquo(col)
  nons_list <-
    df %>%
    filter(!is.na(!!col_), !!col_ != "s1", !!col_ != "s2", !!col_ != "s3") %>%
    distinct(!!col_) %>%
    pull()
  nons_list
}

# create list of variables to "loop" over
actorlist <- c("actor1", "actor2")

This results in the following. Instead of two lists of actors I get a list that contains the variable names as character strings.

> lapply(actorlist, nons_function)
[[1]]
[1] "actor1"

[[2]]
[1] "actor2"

What I would like to get is something like the following:

> lapply(actorlist, nons_function)
[[1]]
[1] "nons1""nons2""nons3""nons4""nons5"

[[2]]
[1] "nons2""nons6""nons1""nons4"

The problem is probably the way I am passing the variable names to my function within lapply. Apparently, my function is not able use a character input as variable names. However, I have not found a way to either adapt my function in a way that allows for character input or to provide my function with a list of variables to loop over in a way it can digest.

Any help appreciated!


Viewing all articles
Browse latest Browse all 208775

Latest Images

Trending Articles



Latest Images