Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

How to select a nested field with bigrquery using dplyr syntax?

$
0
0

I'd like to explore a Google Analytics 360 data set with bigrquery using dplyr syntax (rather than SQL), if possible. The gist is that I want to understand user journeys—that is, I want to understand which are the most common sequences of pages at the user level (even across sessions).

I thought I could do it this way:

sample_query <- ga_sample %>%
  select(fullVisitorId, date, visitStartTime, totals, channelGrouping, hits.page.pagePath) %>% 
  collect()

But I get an error that hits.page.pagePath was not found. Then I tried:

sample_query <- ga_sample %>%
  select(fullVisitorId, date, visitStartTime, totals, channelGrouping, hits) %>% 
  collect() %>% 
  unnest_wider(hits)

But the result is Error: Requested Resource Too Large to Return [responseTooLarge], which makes perfect sense.

From what I've gathered, with the SQL syntax, the workaround is to unnest remotely, and select only the hits.page.pagePath field (rather than the entire hits top-level field).

E.g., something like this (which is a different query, but conveys the point):

SELECT
  **hits.page.pagePath**
FROM
  'bigquery-public-data.google_analytics_sample.ga_sessions_20160801' AS GA,
  **UNNEST(GA.hits) AS hits**
GROUP BY
  **hits.page.pagePath**

Is it possible to do something similar with dplyr syntax? If it's not possible, what's the best approach with SQL?

Thanks!


Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>