Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

Optimize/Vectorize Database Query with R

$
0
0

I am attempting to use R to query a large database. Due to the size of the database, I have written the query to fetch 100 rows at a time My code looks something like:

library(RJDBC)
library(DBI)
library(tidyverse)

options(java.parameters = "-Xmx8000m")

drv<-JDBC("driver name", "driver path.jar")

conn<-
  dbConnect(
    drv, 
    "database info",
    "username",
    "password"
)

query<-"SELECT * FROM some_table"

hc<-tibble()
res<-dbSendQuery(conn,query)
repeat{
  chunk<-dbFetch(res,100)
  if(nrow(chunk)==0){break}
  hc<-bind_rows(hc,chunk)
  print(nrow(hc))
}

Basically, I would like write something that does the same thing, but via the combination of function and lapply. In theory, given the way R processes data via loops, using lapply will speed up query. Some understanding of the dbFetch function may help. Specifically, how in the repeat loop it doesn't just keep selecting the first initial 100 rows.

I have tried the following, but nothing works:

df_list <- lapply(query , function(x) dbGetQuery(conn, x)) 

hc<-tibble()
res<-dbSendQuery(conn,query)
test_query<-function(x){
  chunk<-dbFetch(res,100)
  if(nrow(chunk)==0){break}
  print(nrow(hc))
}
bind_rows(lapply(test_query,res))

Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>