I have 1000 dataframes importing from 1000 different txt files as below.
Surname Name Type Age Dept
Gold Craige 1 24 100
Goodwin Madison 1 41 49
Young Emma 2 31 34
Young Rose 2 26 3
Young Brad 2 42 76
Young Kim 2 30 100
Smith Emma 2 18 50
Smith Kim 2 21 70
Hacksaw Ben 2 33 88
Hacksaw Richard 2 28 77
Hacksaw Charles 2 49 250
Based on the Type column, each df needs to be categorised and saved as below: if Type =1, save the related row into an individual xlsx file naming as Surename.
if Type=2, create a folder naming as the surname in Column A and save individual xlsx files for each name based on column B.
What I currently do is using the split function for each Type and put a loop for type 1 and a nested loop for Type 2 to create surname folders and individual name xlsx files as below which is very time-consuming (more than 13 hours to finish). The semi code is as below.
for (i in 1:lenght(file_names))
rawdata <- read the data frame [i]
TYPE1 <- rawdata %>% filter(TYPE == "1")
TYPE2 <- rawdata %>% filter(TYPE == "2")
Split.TYPE1 <- split(TYPE1, TYPE1$Surname)
Split.TYPE2 <- split(TYPE2, TYPE2$Surname)
#--------------------------------- Save the TYPE 1 reports---------------------------------------
for (nm in names(Split.TYPE1)){
file<-paste0(nm,".xlsx")
d1<-as.data.frame(Split.TYPE1[[nm]])
wb<-createWorkbook(file)
addWorksheet(wb, "test", gridLines = T)
writeData(wb, sheet = "test", x = d1)
saveWorkbook(wb, file, overwrite = TRUE)
}
# #------------------------------ Save the TYPE 2 in a folder ----------------------------------
for (dn in names(Split.TYPE2)){
dnn <- paste0(dn)
dir.create(dnn)
sub_Split.TYPE2 <- split(Split.TYPE2[[dn]], Split.TYPE2[[dn]]$Name)
for (fn in names(sub_Split.TYPE2)){
file<-file.path(dnn, paste0(fn,".xlsx"))
d1<-as.data.frame(sub_Split.TYPE2[[fn]])
wb<-createWorkbook(file)
addWorksheet(wb, "test", gridLines = T)
writeData(wb, sheet = "test", x = d1)
saveWorkbook(wb, file, overwrite = TRUE)
}}
gc()
}
I am just wondering if there is a faster and more optimised method to generate the same outputs using fewer loops. Applying parallel(foreach package) computing did not speed up that much.
Thanks.