Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 205343

How to pipe SQL into R's dplyr?

$
0
0

I can use the following code in R to select distinct rows in any generic SQL database. I'd use dplyr::distinct() but it's not supported in SQL syntax. Anyways, this does indeed work:

dbGetQuery(database_name, 
           "SELECT t.* 
           FROM (SELECT t.*, ROW_NUMBER() OVER (PARTITION BY column_name ORDER BY column_name) AS SEQNUM 
           FROM table_name t
           ) t 
           WHERE SEQNUM = 1;")

I've been using it with success, but wonder how I can pipe that same SQL query after other dplyr steps, as opposed to just using it as a first step as shown above. This is best illustrated with an example:

distinct.df <- 
  left_join(sql_table_1, sql_table_2, by = "col5") %>% 
  sql("SELECT t.* 
      FROM (SELECT t.*, ROW_NUMBER() OVER (PARTITION BY column_name ORDER BY column_name) AS SEQNUM 
      FROM table_name t
      ) t 
      WHERE SEQNUM = 1;")

So I dplyr::left_join() two SQL tables, then I want to look at distinct rows, and keep all columns. Do I pipe SQL code into R as shown above (simply utilizing the sql() function)? And if so what would I use for the table_name on the line FROM table_name t?

In my first example I use the actual table name that I'm pulling from. It's too obvious! But in this case I am piping and am used to using the magrittr pronoun . or sometimes the .data pronoun from rlang if I were in memory working in R without databases.

I'm in a SQL database though... so how do I handle this situation? How do I properly pipe my known working SQL into my R code (with a proper table name pronoun)? dbplyr's reference page is a good starting point but doesn't really answer this specific question.


Viewing all articles
Browse latest Browse all 205343

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>