I'd like to explore a Google Analytics 360 data with bigrquery
using dplyr
syntax (rather than SQL), if possible. The gist is that I want to understand user journeys—I'm interested in finding the most common sequences of pages at the user level (even across sessions).
I thought I could do it this way:
sample_query <- ga_sample %>%
select(fullVisitorId, date, visitStartTime, totals, channelGrouping,
hits.page.pagePath) %>%
collect()
But I get an error that hits.page.pagePath
was not found. Then I tried:
sample_query <- ga_sample %>%
select(fullVisitorId, date, visitStartTime, totals, channelGrouping, hits) %>%
collect() %>%
unnest_wider(hits)
But the result is Error: Requested Resource Too Large to Return [responseTooLarge]
, which makes perfect sense.
From what I've gathered, with the SQL syntax, the workaround is to unnest
remotely, and select
only the hits.page.pagePath
field (rather than the entire hits
top-level field).
E.g., something like this (which is a different query, but conveys the point):
SELECT
hits.page.pagePath
FROM
'bigquery-public-data.google_analytics_sample.ga_sessions_20160801' AS GA,
UNNEST(GA.hits) AS hits
GROUP BY
hits.page.pagePath
Is it possible to do something similar with dplyr
syntax? If it's not possible, what's the best approach with SQL?
Thanks!