Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 205793

The most efficient and fastest way to categorise data in dataframes and save in R

$
0
0

I have 1000 dataframes importing from 1000 different txt files as below.

Surname Name   Type Age Dept
Gold    Craige  1   24  100
Goodwin Madison 1   41  49
Young   Emma    2   31  34
Young   Rose    2   26  3
Young   Brad    2   42  76
Young   Kim     2   30  100
Smith   Emma    2   18  50
Smith   Kim     2   21  70
Hacksaw Ben     2   33  88
Hacksaw Richard 2   28  77
Hacksaw Charles 2   49  250

Based on the Type column, each df needs to be categorised and saved as below: if Type =1, save the related row into an individual xlsx file naming as Surename.

if Type=2, create a folder naming as the surname in Column A and save individual xlsx files for each name based on column B.

What I currently do is using the split function for each Type and put a loop for type 1 and a nested loop for Type 2 to create surname folders and individual name xlsx files as below which is very time-consuming (more than 13 hours to finish). The semi code is as below.

for (i in 1:lenght(file_names))

rawdata <- read the data frame [i]


TYPE1 <- rawdata %>% filter(TYPE == "1") 
TYPE2 <- rawdata %>% filter(TYPE == "2") 

Split.TYPE1 <- split(TYPE1, TYPE1$Surname) 
Split.TYPE2 <- split(TYPE2, TYPE2$Surname) 


#--------------------------------- Save the TYPE 1 reports---------------------------------------
for (nm in names(Split.TYPE1)){
  file<-paste0(nm,".xlsx")
  d1<-as.data.frame(Split.TYPE1[[nm]])

  wb<-createWorkbook(file)
  addWorksheet(wb, "test", gridLines = T)
  writeData(wb, sheet = "test", x = d1)
  saveWorkbook(wb, file, overwrite = TRUE)
}

# #------------------------------ Save the TYPE 2 in a folder ----------------------------------
for (dn in names(Split.TYPE2)){
   dnn <- paste0(dn)
  dir.create(dnn)
  sub_Split.TYPE2 <- split(Split.TYPE2[[dn]], Split.TYPE2[[dn]]$Name)
  for (fn in names(sub_Split.TYPE2)){
    file<-file.path(dnn, paste0(fn,".xlsx"))

    d1<-as.data.frame(sub_Split.TYPE2[[fn]])

wb<-createWorkbook(file)
    addWorksheet(wb, "test", gridLines = T)
    writeData(wb, sheet = "test", x = d1)
    saveWorkbook(wb, file, overwrite = TRUE)
  }}
gc()
}

I am just wondering if there is a faster and more optimised method to generate the same outputs using fewer loops. Applying parallel(foreach package) computing did not speed up that much.

Thanks.


Viewing all articles
Browse latest Browse all 205793

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>