The problem
Hi all,
I am trying to join a few dataframes together dynamically. For me that means that I have a dataframe that I start with df_A
, to which I want to join multiple other dataframesdf_B1
, df_B2
,df_B3
, etc..
df_A
contains a column for each of the df_B...
tables to join against. Column_join_B1
, Column_join_B2
, Column_join_B3
, etc. (Although in reality these have obscure names). These names are also in a vector df_A_join_names
.
df_B1
, df_B2
, df_B3
, etc.. are stored in a list df_B
, which I understand is good practice to do :). This is also how I access them in my loop.
Each of these has two columns. One with the value to join against df_A
The other with information.
I even tried renaming the first column to match the column in df_A
before the join, but to no avail.
What I am trying
left_join()
does not allow me simply use by = c(df_A_join_names[1], "Column_join_A")
so I have to use setNames
, but I cannot get this to work.
Below a function which I want to iterate in a loop:
my_join <- function(df_a, df_b, a_name, b_name){
df_joined <- left_join(df_a, df_b,
by = setNames(b_name, a_name))
return(df_joined)
}
I want to use this function in a loop to join all my df_B...
dataframes against df_A
.
for (i in 1: length(df_A_join_names)){
df_A <- my_join(df_a = df_A,
df_b = df_B[i],
a_name = as.character(df_A_join_names[i]),
b_name = "Column_join_A"
)
}
Running this I get:
Error in UseMethod("tbl_vars") :
no applicable method for 'tbl_vars' applied to an object of class "list"
Some stuff to play with
#Making df_A
A_a <- seq(1,10, by = 1)
Column_join_B1 <- seq(11,20, by = 1)
Column_join_B2 <- seq(21,30, by = 1)
df_A <- data.frame(cbind(A_a, Column_join_B1, Column_join_B2) )
#Making df_B
Column_join_A <- seq(11,20, by = 1)
B_a <- LETTERS[1:10]
df_B1 <- data.frame(Column_join_A, B_a )
Column_join_A <- seq(21,30, by = 1)
B_b <- LETTERS[11:20]
df_B2 <- data.frame(Column_join_A, B_b)
# In my own code I make this using a loop. maybe not the prettiest.
df_B <- list()
df_B[[1]] <- df_B1
df_B[[2]] <- df_B2
df_A_join_names <- c("Column_join_B1", "Column_join_B2")
References
I'm trying to apply this:
Dplyr join on by=(a = b), where a and b are variables containing strings?
I'm curious to hear what you guys think!