Limited support for more user friendly calls to BigQuery Google Analytics 360 exports.

This is intended for users who may be familar with Google Analytics but not with SQL necessary to extract equivalent data from a BigQuery export of Google Analytics 360 data.

google_analytics_bq parses your metrics and dimensions into BigQuery queries.

Setup

To use, you need to also install the bigQueryR library.

Stable version is on CRAN:

install.packages("bigQueryR")

Development version on devtools:

devtools::install_github("MarkEdmondson1234/bigQueryR")

Once installed, authenticate to BigQuery:

library(bigQueryR)

## go through Google oAuth2 flow
## needs email that has access to the BigQuery dataset
bqr_auth()

## get lists of your project and datasets
bqr_list_projects()
bqr_list_datasets("project-id")

If you want to authenticate with Google Analytics and BigQuery in the same session (or others) then its best to authenticate with googleAuthR::gar_auth() with the appropriate scopes set. The below lets you authenticate with Google Analytics, Google Cloud Storage and BigQuery:

options(googleAuthR.scopes.selected = c("https://www.googleapis.com/auth/analytics",
                                        "https://www.googleapis.com/auth/cloud-platform",
                                        "https://www.googleapis.com/auth/bigquery"))
googleAuthR::gar_auth()

You may also want to use a JSON file to authenticate with BigQuery. Make sure to add the service email to the users of the Google project, and then download the JSON file and authenticate via:

googleAuthR::gar_auth_service("gwt-download-XXXX.json")

See googleAuthR help for more details.

Exporting data

For BigQuery Google Analytics 360 exports, the dataset is the same as the GA View ID you are exporting.

You can then export BigQuery data via:

bq <- google_analytics_bq("project-id", "dataset-id-ga-viewid", 
                           start = "2016-01-01", end = "2016-02-01", 
                           metrics = "users", 
                           dimensions = c("source","medium"))
head(bq)
             source   medium  users
1   mavas|blahbs.ru  display  47837
2          examp.ae referral   6090
3   lm.facebook.com referral    335
4       ghtie_yahoo  display 133900
5      fjsfs_inmobi  display  19887
6 tripadvisor.co.dk referral    307                         

At the moment the metrics/dimensions are limited as they need to added manually to a lookup table.
Segments/filters are not supported. The data is unsampled though.

bq2 <- google_analytics_bq("project-id", "dataset-id-ga-viewid", 
                           start = "2016-01-01", end = "2016-02-01", 
                           metrics = "users", 
                           dimensions = c("source","medium","landingPagePath"))
                           
> Error in google_analytics_bq("project-id", "dataset-id-ga-viewid", start = "2016-01-01",  : 
  dimension not yet supported. Must be one of referralPath, campaign, source, medium, keyword, adContent, adwordsCampaignID, adwordsAdGroupID, transactionId, date, visitorId, visitId, visitStartTime, visitNumber                           

Raw BigQuery SQL

You can query the BigQuery table directly using the query parameter.

This is just as you would in the BigQuery interface, and allows queries you can also do there.

q <- "SELECT
  date,
  SUM (totals.visits) visits,
  SUM (totals.pageviews) pageviews,
  SUM (totals.transactions) transactions,
  SUM (totals.transactionRevenue)/1000000 revenue
FROM [87010628.ga_sessions_20160327],[87010628.ga_sessions_20160328],[87010628.ga_sessions_20160329]
GROUP BY date
ORDER BY date ASC "

bq3 <- google_analytics_bq("project-id", "dataset-id-ga-viewid", 
                           query = q)

If you pass in the parameter return_query_only you can output the query for use within the interface:

just_query <- google_analytics_bq("project-id", "dataset-id-ga-viewid", 
                                  start = "2016-01-01", end = "2016-02-01", 
                                  metrics = "users", 
                                  dimensions = c("source","medium"),
                                  return_query_only = TRUE)
just_query
# [1] "SELECT trafficSource.source as source, trafficSource.medium as medium, COUNT(fullVisitorId) as users 
# FROM (TABLE_DATE_RANGE([dataset-id-ga-viewid.ga_sessions_], TIMESTAMP('2016-01-01'),
# TIMESTAMP('2016-02-01'))) GROUP BY source, medium  LIMIT 100"

You could then paste this query into the BigQuery interface, or modify it and send it back in via google_analytics_bq.

Implemented metrics and dimensions

The metrics and dimensions implemented so far are in the two lookups below.

They include the BigQuery exclusive hitTimestamp, fullVisitorId, visitId etc.

Read the full Google Analytics 360 BigQuery data schema for an idea on what can be queried.

lookup_bq_query_m <- c(visits = "SUM(totals.visits) as sessions",
                       sessions = "SUM(totals.visits) as sessions",
                       pageviews = "SUM(totals.pageviews) as pageviews",
                       timeOnSite = "SUM(totals.timeOnSite) as timeOnSite",
                       bounces = "SUM(totals.bounces) as bounces",
                       transactions = "SUM(totals.transactions) as transactions",
                       transactionRevenue = "SUM(totals.transactionRevenue)/1000000 as transactionRevenue",
                       newVisits = "SUM(totals.newVisits) as newVisits",
                       screenviews = "SUM(totals.screenviews) as screenviews",
                       uniqueScreenviews = "SUM(totals.uniqueScreenviews) as uniqueScreenviews",
                       timeOnScreen = "SUM(totals.timeOnScreen) as timeOnScreen",
                       users = "COUNT(fullVisitorId) as users",
                       exits = "COUNT(hits.isExit) as exits",
                       entrances = "COUNT(hits.isEntrance) as entrances",
                       eventValue = "SUM(hits.eventinfo.eventValue) as eventValue",
                       metricXX = {a function to output hit level custom metrics})

lookup_bq_query_d <- c(referralPath = "trafficSource.referralPath as referralPath",
                       hitTimestamp = "(visitStartTime + (hits.time/1000)) as hitTimestamp",
                       campaign = "trafficSource.campaign as campaign",
                       source = "trafficSource.source as source",
                       medium = "trafficSource.medium as medium",
                       keyword = "trafficSource.keyword as keyword",
                       adContent = "trafficSource.adContent as adContent",
                       adwordsCampaignID = "trafficSource.adwordsClickInfo.campaignId as adwordsCampaignId",
                       adwordsAdGroupID = "trafficSource.adwordsClickInfo.adGroupId as adwordsAdGroupId",
                       # adwords...etc...
                       transactionId = "hits.transaction.transactionId as transactionId",
                       date = "date",
                       fullVisitorId = "fullVisitorId",
                       userId = "userId",
                       visitorId = "visitorId",
                       visitId = "visitId",
                       visitStartTime = "visitStartTime",
                       visitNumber = "visitNumber",
                       browser = "device.browser as browser",
                       browserVersion = "device.browserVersion as browserVersion",
                       operatingSystem = "device.operatingSystem as operatingSystem",
                       operatingSystemVersion = "device.operatingSystemVersion as operatingSystemVersion",
                       mobileDeviceBranding = "device.mobileDeviceBranding as mobileDeviceBranding",
                       flashVersion = "device.flashVersion as flashVersion",
                       language = "device.language as language",
                       screenColors = "device.screenColors as screenColors",
                       screenResolution = "device.screenResolution as screenResolution",
                       deviceCategory = "device.deviceCategory as deviceCategory",
                       continent = "geoNetwork.continent as continent",
                       subContinent = "geoNetwork.subContinent as subContinent",
                       country = "geoNetwork.country as country",
                       region = "geoNetwork.region as region",
                       metro = "geoNetwork.region as metro",
                       pagePath = "hits.page.pagePath as pagePath",
                       eventCategory = "hits.eventInfo.eventCategory as eventCategory",
                       eventAction = "hits.eventInfo.eventAction as eventAction",
                       eventLabel = "hits.eventInfo.eventLabel as eventLabel",
                       dimensionXX = {a function to output hit level custom dimensions})

This will increase as I get time to work on it, but feel free to submit a pull request with more.