Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

How to select a nested field with bigrquery using dplyr syntax?

$
0
0

I'd like to explore a Google Analytics 360 data with bigrquery using dplyr syntax (rather than SQL), if possible. The gist is that I want to understand user journeys—I'm interested in finding the most common sequences of pages at the user level (even across sessions).

I thought I could do it this way:

sample_query <- ga_sample %>%
  select(fullVisitorId, date, visitStartTime, totals, channelGrouping,
  hits.page.pagePath) %>% 
  collect()

But I get an error that hits.page.pagePath was not found. Then I tried:

sample_query <- ga_sample %>%
  select(fullVisitorId, date, visitStartTime, totals, channelGrouping, hits) %>% 
  collect() %>% 
  unnest_wider(hits)

But the result is Error: Requested Resource Too Large to Return [responseTooLarge], which makes perfect sense.

From what I've gathered, with the SQL syntax, the workaround is to unnest remotely, and select only the hits.page.pagePath field (rather than the entire hits top-level field).

E.g., something like this (which is a different query, but conveys the point):

SELECT
  hits.page.pagePath
FROM
  'bigquery-public-data.google_analytics_sample.ga_sessions_20160801' AS GA,
  UNNEST(GA.hits) AS hits
GROUP BY
  hits.page.pagePath

Is it possible to do something similar with dplyr syntax? If it's not possible, what's the best approach with SQL?

Thanks!


Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>