The Data API and Google Analytics Admin API are used with the Google Analytics 4 properties, the newest version of Google Analytics and evolution from Universal Analytics.
See a demo video on GA4 and some of the new ga_data()
features in this GA4 API FTW YouTube video.
You can see the GA4 accounts you have access to under your authentication via ga_account_list("ga4")
library(googleAnalyticsR)
ga_auth(email="my@email.com")
ga_account_list("ga4")
Universal metadata for what you can query via the Data API can be found by specifying that API in the ga_meta()
function:
metadata <- ga_meta(version = "data")
You may have custom dimensions and metrics set up for your web property - to get a list of those specify the web property in the meta data call:
# Google Analytics 4 metadata for a particular Web Property
ga_meta("data", propertyId = 206670707)
Custom events and user scoped custom dimensions have names starting with customEvent:
and customUser:
respectively. Include them in your reports with the prefix.
Make sure to authenticate first using ga_auth()
or otherwise.
The primary data fetching function is ga_data()
You need your propertyId
to query data, and then at least a metric and date range:
# replace with your propertyId
my_property_id <- 206670707
basic <- ga_data(
my_property_id,
metrics = c("activeUsers","sessions"),
date_range = c("2020-03-31", "2020-04-27")
)
basic
# A tibble: 1 x 2
# activeUsers sessions
# <dbl> <dbl>
#1 1716 2783
Dimensions can be added to split out your results:
# split out metrics by the dimensions specified
dimensions <- ga_data(
my_property_id,
metrics = c("activeUsers","sessions"),
dimensions = c("date","city","dayOfWeek"),
date_range = c("2020-03-31", "2020-04-27")
)
dimensions
# A tibble: 100 x 5
# date city dayOfWeek activeUsers sessions
# <date> <chr> <chr> <dbl> <dbl>
# 1 2020-04-08 (not set) 4 18 21
# 2 2020-04-08 Rome 4 12 14
# 3 2020-04-15 (not set) 4 9 11
# 4 2020-04-27 (not set) 2 9 11
# 5 2020-04-09 (not set) 5 8 10
# 6 2020-04-14 (not set) 3 8 8
# 7 2020-04-22 (not set) 4 8 9
# 8 2020-03-31 (not set) 3 7 10
# 9 2020-04-08 Bologna 4 7 7
#10 2020-04-07 (not set) 3 6 7
# … with 90 more rows
Some of the most useful dimensions and metrics to fetch are eventName
and eventCount
, which is effectively the most granular report given the new GA4 data model. Combined with filters (see below) this can get you to useful conversion events.
# if you set metrics=NULL you will get only the distinct events
ga_data(
my_property_id,
metrics = NULL,
dimensions = "eventName",
date_range = c("2020-03-31", "2020-04-27")
)
## A tibble: 9 x 1
# eventName
# <chr>
#1 click
#2 first_visit
#3 page_view
#4 scroll
#5 session_start
#6 user_engagement
#7 video_complete
#8 video_progress
#9 video_start
# eventsCounts per eventName
events <- ga_data(
my_property_id,
metrics = c("eventCount"),
dimensions = c("date","eventName"),
date_range = c("2020-03-31", "2020-04-27")
)
events
## A tibble: 100 x 3
# date eventName eventCount
# <date> <chr> <dbl>
# 1 2020-04-08 page_view 239
# 2 2020-04-08 session_start 207
# 3 2020-04-09 page_view 203
# ...
# filter to one event
ga_data(
my_property_id,
metrics = "eventCount",
dimensions = "eventName",
date_range = c("2020-03-31", "2020-04-27"),
dim_filters = ga_data_filter(eventName=="video_start")
)
## A tibble: 1 x 2
# eventName eventCount
# <chr> <dbl>
#1 video_start 4
By default the API returns 100 results. Add the limit
parameter to change the number of results returned. To get all results, use -1
only_5 <- ga_data(
my_property_id,
metrics = c("activeUsers","sessions"),
dimensions = c("date","city","dayOfWeek"),
date_range = c("2020-03-31", "2020-04-27"),
limit = 5
)
only_5
# A tibble: 5 x 5
# date city dayOfWeek activeUsers sessions
# <date> <chr> <chr> <dbl> <dbl>
#1 2020-04-08 (not set) 4 18 21
#2 2020-04-08 Rome 4 12 14
#3 2020-04-15 (not set) 4 9 11
#4 2020-04-27 (not set) 2 9 11
#5 2020-04-09 (not set) 5 8 10
all_results <- ga_data(
my_property_id,
metrics = c("activeUsers","sessions"),
dimensions = c("date","city","dayOfWeek"),
date_range = c("2020-03-31", "2020-04-27"),
limit = -1
)
all_results
# A tibble: 1,763 x 5
# date city dayOfWeek activeUsers sessions
# <date> <chr> <chr> <dbl> <dbl>
# 1 2020-04-08 (not set) 4 18 21
# 2 2020-04-08 Rome 4 12 14
# 3 2020-04-15 (not set) 4 9 11
# 4 2020-04-27 (not set) 2 9 11
# 5 2020-04-09 (not set) 5 8 10
# 6 2020-04-14 (not set) 3 8 8
# 7 2020-04-22 (not set) 4 8 9
# 8 2020-03-31 (not set) 3 7 10
# 9 2020-04-08 Bologna 4 7 7
#10 2020-04-07 (not set) 3 6 7
# … with 1,753 more rows
You can also select how big the API will page. The default is the maximum 100000 rows - you may want to change this if you experience problems downloading a lot of data with several columns. Set via the page_size
parameter:
# get all results, but page every 500 rows
all_results_paged <- ga_data(
my_property_id,
metrics = c("activeUsers","sessions"),
dimensions = c("date","city","dayOfWeek"),
date_range = c("2020-03-31", "2020-04-27"),
limit = -1,
page_size = 500L
)
#ℹ 2021-02-20 12:26:11 > Downloaded [ 500 ] of total [ 1763 ] rows
#ℹ 2021-02-20 12:26:11 > Paging API from offset [ 500 ]
#ℹ 2021-02-20 12:26:12 > Downloaded [ 500 ] of total [ 1763 ] rows
#ℹ 2021-02-20 12:26:12 > Paging API from offset [ 1000 ]
#ℹ 2021-02-20 12:26:13 > Downloaded [ 500 ] of total [ 1763 ] rows
#ℹ 2021-02-20 12:26:13 > Paging API from offset [ 1500 ]
#ℹ 2021-02-20 12:26:14 > Downloaded [ 263 ] of total [ 1763 ] rows
# get 510 rows, page every 500 rows (so 2 total)
top510_paged500 <- ga_data(
my_property_id,
metrics = c("activeUsers","sessions"),
dimensions = c("date","city","dayOfWeek"),
date_range = c("2020-03-31", "2020-04-27"),
limit = 510,
page_size = 500L
)
#ℹ 2021-02-20 12:26:45 > Downloaded [ 500 ] of total [ 1763 ] rows
#ℹ 2021-02-20 12:26:45 > Paging API from offset [ 500 ]
#ℹ 2021-02-20 12:26:46 > Downloaded [ 10 ] of total [ 1763 ] rows
When fetching custom dimensions and metrics, specify the prefix as you read it in the meta data table. See this Google article on custom definitions for details.
# will include your custom data
my_meta <- ga_meta("data", propertyId = 206670707)
custom <- ga_data(
my_property_id,
metrics = c("customEvent:credits_spent"),
dimensions = c("date","customUser:last_level","customEvent:achievement_id"),
date_range = c("2020-03-31", "2020-04-27"),
limit = -1
)
You can also create custom metrics and dimensions on the fly with calculated metrics. These let you send in combinations of metrics or dimensions:
# metric and dimension expressions
# create your own named metrics
ga_data(
my_property_id,
metrics = c("activeUsers","sessions",sessionsPerUser = "sessions/activeUsers"),
dimensions = c("date","city","dayOfWeek"),
date_range = c("2020-03-31", "2020-04-27"),
limit = 100
)
## A tibble: 100 x 6
# date city dayOfWeek activeUsers sessions sessionsPerUser
# <date> <chr> <chr> <dbl> <dbl> <dbl>
# 1 2020-04-03 Kaliningrad 6 3 3 1
# 2 2020-04-19 Munich 1 3 4 1.33
# 3 2020-04-27 Hamburg 2 3 3 1
# 4 2020-04-16 London 5 5 6 1.2
#...
# create your own aggregation dimensions
ga_data(
my_property_id,
metrics = c("activeUsers","sessions"),
dimensions = c("date","city","dayOfWeek", cdow = "city/dayOfWeek"),
date_range = c("2020-03-31", "2020-04-27"),
limit = 100
)
## A tibble: 100 x 6
# date city dayOfWeek cdow activeUsers sessions
# <date> <chr> <chr> <chr> <dbl> <dbl>
# 1 2020-04-18 (not set) 7 (not set)/7 5 6
# 2 2020-04-14 Madrid 3 Madrid/3 5 6
# 3 2020-04-17 London 6 London/6 5 6
# useful for creating referral data
ga_data(
my_property_id,
metrics = c("activeUsers","sessions"),
dimensions = c("date", sessionMediumSource = "sessionMedium/sessionSource"),
date_range = c("2020-03-31", "2020-04-27"),
limit = 100
)
## A tibble: 100 x 4
# date sessionMediumSource activeUsers sessions
# <date> <chr> <dbl> <dbl>
# 1 2020-04-27 organic/google 80 93
# 2 2020-04-15 organic/google 79 101
# 3 2020-04-07 organic/google 72 87
You can send in up to 4 date ranges, via a vector of dates:
date_range4 <- ga_data(
my_property_id,
metrics = c("activeUsers","sessions"),
dimensions = c("date","city","dayOfWeek"),
date_range = c("2020-03-31", "2020-04-06",
"2020-04-07", "2020-04-14",
"2020-04-15", "2020-04-22",
"2020-04-23", "2020-04-30"),
limit = -1
)
The date is output with a dateRange
column indicating which date ranges the data belongs to:
date_range4
# A tibble: 7,948 x 6
# date city dayOfWeek dateRange activeUsers sessions
# <date> <chr> <chr> <chr> <dbl> <dbl>
# 1 2020-04-06 Laval 2 date_range_0 1 1
# 2 2020-04-29 Ghent 4 date_range_3 1 1
# 3 2020-03-31 Wokingham 3 date_range_0 1 1
# 4 2020-04-01 Zielona Gora 4 date_range_0 1 1
# 5 2020-04-16 Charlottesville 5 date_range_2 1 1
# 6 2020-04-25 Fulshear 7 date_range_3 1 1
# … with 7,938 more rows
You can send in dates as strings in YYYY-MM-DD format; as R Dates (e.g. Sys.Date()
) or via the API keywords “today”, “yesterday”, or “NdaysAgo”)
# use R to make useful date vectors
dates <- rev(seq(Sys.Date(), by = -7, length.out = 8))
dates
#[1] "2020-10-19" "2020-10-26" "2020-11-02" "2020-11-09" "2020-11-16" "2020-11-23" "2020-11-30" "2020-12-07"
# r dates
ga_data(
my_property_id,
metrics = c("activeUsers","sessions"),
dimensions = c("date","city","dayOfWeek"),
date_range = dates,
limit = -1
)
# text dates
ga_data(
my_property_id,
metrics = c("activeUsers","sessions"),
dimensions = c("date","city","dayOfWeek"),
date_range = c("yesterday","today","3daysAgo","2daysAgo"),
limit = -1)
Filters are simpler to create but more flexible than in Universal Analytics.
There is now only one filter function - ga_data_filter()
. As was the case for google_analytics()
, you use the filter function to construct metric filters or dimension filters in the dim_filters
or met_filters
parameters in your ga_data()
call.
ga_data(
206670707,
metrics = c("activeUsers","sessions"),
dimensions = c("date","city","dayOfWeek"),
date_range = c("2020-03-31", "2020-04-27"),
dim_filters = ga_data_filter(city=="Copenhagen"),
limit = 100
)
# A tibble: 17 x 5
# date city dayOfWeek activeUsers sessions
# <date> <chr> <chr> <dbl> <dbl>
# 1 2020-04-16 Copenhagen 5 3 4
# 2 2020-04-10 Copenhagen 6 2 2
# 3 2020-04-15 Copenhagen 4 2 2
# 4 2020-04-17 Copenhagen 6 2 2
# ...
The base object is ga_data_filter()
- this holds the logic for the specific metric or dimension you are using. The function uses a new DSL for GA4 filters, the syntax rules are detailed below:
ga_data()
city=="Copenhagen"
ga_meta("data")
"city"
, "session"
), or unquoted ( city
or session
) if you want to use validation.city
or session
), or you have fetched your custom fields via ga_meta("data", propertyId=123456)
==, >, >=, <, <=
for metrics==, %begins%, %ends%, %contains%, %contains%, %regex%, %regex_partial%
for dimensions%BEGINS%, %ENDS%
etc. or %==%
for case insensitive exact matches"dim1"
), numerics (55
), string vectors (c("dim1", "dim2")
), numeric vectors (c(1,2,3)
) or NULL (NULL
) which will correspond to different types of filters&, |, !
which correspond to AND
, OR
and NOT
respectively.Fields are metrics and dimensions that are available to your GA4 implementation, including custom fields. You can see what is available by calling the ga_meta("data")
function.
Do not construct metric filters and use in the dim_filters
argument or vice-versa.
You can use quoted ("city"
, "session"
) or unquoted fields ( city
or session
) which will check the field is valid before you send it to the API.
If you want to use custom fields from your property, do a call to ga_meta("data", propertyId=1234546)
replacing your propertyId for the GA4 property that has the custom fields. Once fetched, they will be placed in your local environment for all future calls to ga_data_filter()
- use the custom events with a cust_
prefix e.g.
# gets fields including custom event field "customEvent:test_dim"
my_meta <- ga_meta("data", propertyId = 206670707)
# use the custom event in a filter
ga_data_filter(cust_test_dim == "test")
Values are checked in the filter object based on the R class of the object you are passing in as its value:
character
: stringFilter - e.g. "Copenhagen"
character vector
: inListFilter e.g. c("Copenhagen","London","New York")
numeric
: NumericFilter e.g. 5
numeric 2-length vector
: BetweenFilter e.g. c(5, 10)
NULL
: Will filter for NULL e.g. city==NULL
- often used with !
(not NULL)e.g. if you pass in a character of length one ("Copenhagen"
) then it will assume to be a class StringFilter
(all dimensions that match “Copenhagen”) if you pass in a character of length greater than one c("Copenhagen","London","New York")
, then it assume to be a class InListFilter
(dimensions must match one in the list)
All filters are made up of filter expressions using ga_data_filter()
:
simple_filter <- ga_data(
206670707,
metrics = c("activeUsers","sessions"),
dimensions = c("date","city","dayOfWeek"),
date_range = c("2020-03-31", "2020-04-27"),
dim_filters = ga_data_filter(city=="Copenhagen"),
limit = 100
)
simple_filter
# A tibble: 17 x 5
# date city dayOfWeek activeUsers sessions
# <date> <chr> <chr> <dbl> <dbl>
# 1 2020-04-16 Copenhagen 5 3 4
# 2 2020-04-10 Copenhagen 6 2 2
# 3 2020-04-15 Copenhagen 4 2 2
# 4 2020-04-17 Copenhagen 6 2 2
# ...
If you need more complicated filters, then build them using the DSL syntax. This lets you combine ga_data_filter()
objects in various ways.
## filter clauses
# OR string filter
ga_data_filter(city=="Copenhagen" | city == "London")
## ===orGroup:
## [[1]]
## --GA4 Filter:
## --| city
## ----stringFilter:
## value: Copenhagen | matchType: EXACT | caseSensitive: TRUE
## [[2]]
## --GA4 Filter:
## --| city
## ----stringFilter:
## value: London | matchType: EXACT | caseSensitive: TRUE
# inlist string filter - equivalent to above
ga_data_filter(city==c("Copenhagen","London"))
## --| city
## ----inListFilter:
## values: Copenhagen London
## caseSensitive: TRUE
# AND string filters
ga_data_filter(city=="Copenhagen" & dayOfWeek == "5")
## ===andGroup:
## [[1]]
## --GA4 Filter:
## --| city
## ----stringFilter:
## value: Copenhagen | matchType: EXACT | caseSensitive: TRUE
## [[2]]
## --GA4 Filter:
## --| dayOfWeek
## ----stringFilter:
## value: 5 | matchType: EXACT | caseSensitive: TRUE
# NOT string filter
ga_data_filter(!(city=="Copenhagen" | city == "London"))
## ===notExpression:
## ===orGroup:
## [[1]]
## --GA4 Filter:
## --| city
## ----stringFilter:
## value: Copenhagen | matchType: EXACT | caseSensitive: TRUE
## [[2]]
## --GA4 Filter:
## --| city
## ----stringFilter:
## value: London | matchType: EXACT | caseSensitive: TRUE
# multiple filter clauses
ga_data_filter(city==c("Copenhagen","London","Paris","New York") &
(dayOfWeek=="5" | dayOfWeek=="6"))
## =======andGroup:
## [[1]]
## --GA4 Filter:
## --| city
## ----inListFilter:
## values: Copenhagen London Paris New York
## caseSensitive: TRUE
##
## [[2]]
## ===orGroup:
## [[1]]
## --GA4 Filter:
## --| dayOfWeek
## ----stringFilter:
## value: 5 | matchType: EXACT | caseSensitive: TRUE
## [[2]]
## --GA4 Filter:
## --| dayOfWeek
## ----stringFilter:
## value: 6 | matchType: EXACT | caseSensitive: TRUE
Validation is carried out if the field is unquoted. If you don’t want validation use quotes.
# validation of fields - correct
ga_data_filter(city=="Copenhagen")
## --| city
## ----stringFilter:
## value: Copenhagen | matchType: EXACT | caseSensitive: TRUE
# validation of fields - error
tryCatch(ga_data_filter(cittty=="Copenhagen"),
error = function(err) err$message)
## [1] "object 'cittty' not found"
# avoid validation by quoting
ga_data_filter("cittty"=="Copenhagen")
## --| cittty
## ----stringFilter:
## value: Copenhagen | matchType: EXACT | caseSensitive: TRUE
For custom fields use ga_meta("data", propertyId=12345)
first to fetch them.
# gets fields including custom event field "customEvent:test_dim"
my_meta <- ga_meta("data", propertyId = 206670707)
# use the custom event in a filter
ga_data_filter(cust_test_dim == "test")
met_filters
An example of metric filters are below:
metric_filter <- ga_data(
206670707,
metrics = c("activeUsers","sessions"),
dimensions = c("date","city","dayOfWeek"),
date_range = c("2020-03-31", "2020-04-27"),
met_filters = ga_data_filter(sessions>10),
limit = 100
)
metric_filter
# A tibble: 7 x 5
# date city dayOfWeek activeUsers sessions
# <date> <chr> <chr> <dbl> <dbl>
#1 2020-04-08 (not set) 4 18 21
#2 2020-04-08 Rome 4 12 14
#3 2020-04-15 (not set) 4 9 11
#4 2020-04-27 (not set) 2 9 11
# ...
## numeric filter types
# numeric equal filter
ga_data_filter(sessions==5)
## --| sessions
## ----numericFilter:
## operation: EQUAL | value: 5
# between numeric filter
ga_data_filter(sessions==c(5,6))
## --| sessions
## ----betweenFilter:
## from: 5 to: 6
# greater than numeric
ga_data_filter(sessions > 0)
## --| sessions
## ----numericFilter:
## operation: GREATER_THAN | value: 0
# greater than or equal
ga_data_filter(sessions >= 1)
## --| sessions
## ----numericFilter:
## operation: GREATER_THAN_OR_EQUAL | value: 1
# less than numeric
ga_data_filter(sessions < 100)
## --| sessions
## ----numericFilter:
## operation: LESS_THAN | value: 100
# less than or equal numeric
ga_data_filter(sessions <= 100)
## --| sessions
## ----numericFilter:
## operation: LESS_THAN_OR_EQUAL | value: 100
dim_filters
All the string filters that can be used are below:
## string filter types
# begins with string
ga_data_filter(city %begins% "Cope")
## --| city
## ----stringFilter:
## value: Cope | matchType: BEGINS_WITH | caseSensitive: TRUE
# ends with string
ga_data_filter(city %ends% "hagen")
## --| city
## ----stringFilter:
## value: hagen | matchType: ENDS_WITH | caseSensitive: TRUE
# contains string
ga_data_filter(city %contains% "ope")
## --| city
## ----stringFilter:
## value: ope | matchType: CONTAINS | caseSensitive: TRUE
# regex (full) string
ga_data_filter(city %regex% "^Cope")
## --| city
## ----stringFilter:
## value: ^Cope | matchType: FULL_REGEXP | caseSensitive: TRUE
# regex (partial) string
ga_data_filter(city %regex_partial% "ope")
## --| city
## ----stringFilter:
## value: ope | matchType: PARTIAL_REGEXP | caseSensitive: TRUE
By default string filters are case sensitive. Use UPPERCASE operator to make then case insensitive
# begins with string (case insensitive)
ga_data_filter(city %BEGINS% "cope")
## --| city
## ----stringFilter:
## value: cope | matchType: BEGINS_WITH | caseSensitive: FALSE
# ends with string (case insensitive)
ga_data_filter(city %ENDS% "Hagen")
## --| city
## ----stringFilter:
## value: Hagen | matchType: ENDS_WITH | caseSensitive: FALSE
# case insensitive exact match
ga_data_filter(city %==% "copeNHAGen")
## --| city
## ----stringFilter:
## value: copeNHAGen | matchType: EXACT | caseSensitive: FALSE
You can also recursively nest filter expressions to make more complicated ones.
The filter below looks for visitors from Copenhagen, London, Paris or New York who arrived on day 5 or 6 of the week, or from Google referrers but not including Google Ads.
# use previously created filters to build another filter expression:
# multiple filter clauses
f1 <- ga_data_filter(city==c("Copenhagen","London","Paris","New York") &
(dayOfWeek=="5" | dayOfWeek=="6"))
# build up complicated filters
f2 <- ga_data_filter(f1 | sessionSource=="google" & !city=="(not set)")
f3 <- ga_data_filter(f2 & !sessionMedium=="cpc")
f3
## ===============andGroup:
## [[1]]
## ===========orGroup:
## [[1]]
## =======andGroup:
## [[1]]
## --GA4 Filter:
## --| city
## ----inListFilter:
## values: Copenhagen London Paris New York
## caseSensitive: TRUE
##
## [[2]]
## ===orGroup:
## [[1]]
## --GA4 Filter:
## --| dayOfWeek
## ----stringFilter:
## value: 5 | matchType: EXACT | caseSensitive: TRUE
## [[2]]
## --GA4 Filter:
## --| dayOfWeek
## ----stringFilter:
## value: 6 | matchType: EXACT | caseSensitive: TRUE
##
##
## [[2]]
## ===andGroup:
## [[1]]
## --GA4 Filter:
## --| sessionSource
## ----stringFilter:
## value: google | matchType: EXACT | caseSensitive: TRUE
## [[2]]
## notExpression:
## --GA4 Filter:
## --| city
## ----stringFilter:
## value: (not set) | matchType: EXACT | caseSensitive: TRUE
##
##
## [[2]]
## notExpression:
## --GA4 Filter:
## --| sessionMedium
## ----stringFilter:
## value: cpc | matchType: EXACT | caseSensitive: TRUE
You can use filter expression objects above directly like so:
complex_filter <- ga_data(
206670707,
metrics = c("activeUsers","sessions"),
dimensions = c("date","city","dayOfWeek"),
date_range = c("2020-03-31", "2020-04-27"),
dim_filters = f3,
limit = 100
)
complex_filter
## A tibble: 100 x 5
# date city dayOfWeek activeUsers sessions
# <date> <chr> <chr> <dbl> <dbl>
# 1 2020-04-27 Melbourne 2 3 4
# 2 2020-04-10 Eindhoven 6 2 3
# 3 2020-04-07 Bristol 3 2 2
# 4 2020-04-01 Reston 4 2 2
# ...
You can order or sort results using a new DSL via the function ga_data_order()
that operate on the fields (dimensions/metrics) in your returned data.
The DSL will validate the fields you are sending in are valid. You then prefix the field with a +
for ascending order or -
for descending order like so:
# session in descending order
ga_data_order(-sessions)
## [[1]]
## ==GA4 OrderBy==
## Metric: sessions
## Descending: TRUE
# city dimension in ascending alphanumeric order
ga_data_order(+city)
## [[1]]
## ==GA4 OrderBy==
## Dimension: city
## OrderType: ALPHANUMERIC
## Descending: FALSE
You can have multiple orders - add the -/+{field} after the former without commas:
# as above plus sessions in descending order
ga_data_order(+city -sessions)
## [[1]]
## ==GA4 OrderBy==
## Dimension: city
## OrderType: ALPHANUMERIC
## Descending: FALSE
##
## [[2]]
## ==GA4 OrderBy==
## Metric: sessions
## Descending: TRUE
# as above plus activeUsers in ascending order
ga_data_order(+city -sessions +activeUsers)
## [[1]]
## ==GA4 OrderBy==
## Dimension: city
## OrderType: ALPHANUMERIC
## Descending: FALSE
##
## [[2]]
## ==GA4 OrderBy==
## Metric: sessions
## Descending: TRUE
##
## [[3]]
## ==GA4 OrderBy==
## Metric: activeUsers
## Descending: FALSE
For dimensions, you have a choice on what type of sort is available:
ALPHANUMERIC
- For example, “2” < “A” < “X” < “b” < “z”CASE_INSENSITIVE_ALPHANUMERIC
- Case insensitive alphanumeric sort by lower case Unicode code point. For example, “2” < “A” < “b” < “X” < “z”NUMERIC
- Dimension values are converted to numbers before sorting. For example in NUMERIC sort, “25” < “100”, and in ALPHANUMERIC sort, “100” < “25”. Non-numeric dimension values all have equal ordering value below all numeric valuesUse ga_data_order()
objects in the orderBys
argument:
ga_data(
206670707,
metrics = c("activeUsers","sessions"),
dimensions = c("date","city","dayOfWeek"),
date_range = c("2020-03-31", "2020-04-27"),
orderBys = ga_data_order(-sessions -dayOfWeek)
)
## A tibble: 100 x 5
# date city dayOfWeek activeUsers sessions
# <date> <chr> <chr> <dbl> <dbl>
# 1 2020-04-08 (not set) 4 18 21
# 2 2020-04-08 Rome 4 12 14
# 3 2020-04-20 London 2 6 14
# 4 2020-04-09 Warsaw 5 5 11
# 5 2020-04-15 (not set) 4 9 11
# 6 2020-04-21 Amsterdam 3 3 11
# 7 2020-04-27 (not set) 2 9 11
# ...
You can also combine ga_data_order()
objects with c()
a <- ga_data_order(-sessions)
b <- ga_data_order(-dayOfWeek, type = "NUMERIC")
ga_data(
206670707,
metrics = c("activeUsers","sessions"),
dimensions = c("date","city","dayOfWeek"),
date_range = c("2020-03-31", "2020-04-27"),
orderBys = c(a, b),
limit = 100
)
## A tibble: 100 x 5
# date city dayOfWeek activeUsers sessions
# <date> <chr> <chr> <dbl> <dbl>
# 1 2020-04-08 (not set) 4 18 21
# 2 2020-04-08 Rome 4 12 14
# 3 2020-04-20 London 2 6 14
# 4 2020-04-09 Warsaw 5 5 11
# 5 2020-04-15 (not set) 4 9 11
# 6 2020-04-21 Amsterdam 3 3 11
# 7 2020-04-27 (not set) 2 9 11
# ...
Fields are metrics and dimensions that are available to your GA4 implementation, including custom fields. You can see what is available by calling the ga_meta("data")
function.
You can use quoted ("city"
, "session"
) or unquoted fields ( city
or session
) which will check the field is valid before you send it to the API.
If you want to use custom fields from your property, do a call to ga_meta("data", propertyId=1234546)
similar to the filter fields explained above, and prefix the custom field with cust_
Real-time data can be fetched with the same function as the regular Data API, but it is calling another endpoint. Add the realtime=TRUE
argument to the function.
A limited subset of dimensions and metrics are available in the real-time API. Date ranges are ignored.
# run a real-time report
realtime <- ga_data(
206670707,
metrics = "activeUsers",
dimensions = "city",
dim_filters = ga_data_filter(city=="Copenhagen"),
limit = 100,
realtime = TRUE)
Each API call includes the TOTAL, MAXIMUM and MINIMUM metric aggregations in the returned data.frame’s metadata. You can access this data using base R’s attr()
function or use ga_data_aggregations()
to extract the data.frames:
aggs <- ga_data(
206670707,
date_range = c(Sys.Date() - 4, Sys.Date() - 1),
metrics = "activeUsers",
dimensions = c("city","unifiedScreenName"),
limit = 100)
all_aggs <- ga_data_aggregations(aggs, type = "all")
all_aggs
#$totals
## A tibble: 1 x 4
# city unifiedScreenName dateRange activeUsers
# <chr> <chr> <chr> <dbl>
#1 RESERVED_TOTAL RESERVED_TOTAL date_range_0 250
#
#$maximums
## A tibble: 1 x 4
# city unifiedScreenName dateRange activeUsers
# <chr> <chr> <chr> <dbl>
#1 RESERVED_MAX RESERVED_MAX date_range_0 4
#
#$minimums
## A tibble: 1 x 4
# city unifiedScreenName dateRange activeUsers
# <chr> <chr> <chr> <dbl>
#1 RESERVED_MIN RESERVED_MIN date_range_0 0
# now in a list
all_aggs$totals
## A tibble: 1 x 4
# city unifiedScreenName dateRange activeUsers
# <chr> <chr> <chr> <dbl>
#1 RESERVED_TOTAL RESERVED_TOTAL date_range_0 250
# extract individual aggregations individually
maxes <- ga_data_aggregations(aggs, type = "maximums")
maxes
## A tibble: 1 x 4
# city unifiedScreenName dateRange activeUsers
# <chr> <chr> <chr> <dbl>
#1 RESERVED_MAX RESERVED_MAX date_range_0 4
If you request more than one date range, the aggregations will be calculated for each one:
# use R to generate some dates
dates <- rev(seq(Sys.Date(), by = -7, length.out = 8))
dates
#[1] "2020-10-19" "2020-10-26" "2020-11-02" "2020-11-09" "2020-11-16" "2020-11-23" "2020-11-30" "2020-12-07"
aggs <- ga_data(
206670707,
date_range = dates,
metrics = "activeUsers",
dimensions = c("city","unifiedScreenName"),
limit = 100)
# get the totals
ga_data_aggregations(aggs, type = "totals")
## A tibble: 4 x 4
# city unifiedScreenName dateRange activeUsers
# <chr> <chr> <chr> <dbl>
#1 RESERVED_TOTAL RESERVED_TOTAL date_range_0 200
#2 RESERVED_TOTAL RESERVED_TOTAL date_range_1 4
#3 RESERVED_TOTAL RESERVED_TOTAL date_range_2 2
#4 RESERVED_TOTAL RESERVED_TOTAL date_range_3 399
There is no sampling but there are token quotas for API fetches on a per-project basis. Normally these are not visible until you are close to the quota limits, but you can see them if you set the googleAuthR verbose level below 3 (options(googleAuthR.verbose=2)
)) or if the API call costs more than 50 units.
The bigger and more complex the query you make, the more tokens you use.
See this Google guide on how API quotas are assigned. The quotas are on a per GCP project and Ga4 web property level, and you get more if on GA360.
An example taken from above:
options(googleAuthR.verbose=2)
complex_filter <- ga_data(
206670707,
metrics = c("activeUsers","sessions"),
dimensions = c("date","city","dayOfWeek"),
date_range = c("2020-03-31", "2020-04-06",
"2020-04-07", "2020-04-14",
"2020-04-15", "2020-04-22",
"2020-04-23", "2020-04-30"),
metricAggregations = c("TOTAL","MINIMUM","MAXIMUM"),
orderBys = ga_data_order(-sessions +city),
dim_filters = ga_data_filter(
city==c("Copenhagen","London","Paris","New York") &
(dayOfWeek=="5" | dayOfWeek=="6")),
limit = -1
)
#>ℹ 2020-11-24 16:12:36 > tokensPerDay: Query Cost [ 15 ] / Remaining [ 24847 ]
#>ℹ 2020-11-24 16:12:36 > tokensPerHour: Query Cost [ 15 ] / Remaining [ 4967 ]
#>ℹ 2020-11-24 16:12:36 > concurrentRequests: 10 / 10
#>ℹ 2020-11-24 16:12:36 > serverErrorsPerProjectPerHour: 10 / 10
The quotas are:
tokensPerDay
- how many tokens in 24hrstokensPerHour
- how many tokens in 1 hrconcurrentRequests
- how many API requests at onceserverErrorsPerProjectPerHour
- how many bad API calls you can make per project/hrIf you ever need to inspect the API request, set options(googleAuthR.verbose=2)
to see the request JSON. This is helpful in bug reports.
If you use the raw_json
argument only in ga_data()
to send in a string of the JSON as specified by the runReport
object, this can be used to fetch unimplemented API features or buggy requests to help debugging.
ga_data(206670707, raw_json = '{"metrics":[{"name":"sessions"}],"orderBys":[{"dimension":{"orderType":"ALPHANUMERIC","dimensionName":"date"},"desc":false}],"dimensions":[{"name":"date"}],"dateRanges":[{"startDate":"2021-01-01","endDate":"2021-01-07"}],"keepEmptyRows":true,"limit":100,"returnPropertyQuota":true}')
# ℹ 2021-04-13 10:50:17 > Making API request with raw JSON: {"metrics":[{"name":"sessions"}],"orderBys":[{"dimension":{"orderType":"ALPHANUMERIC","dimensionName":"date"},"desc":false}],"dimensions":[{"name":"date"}],"dateRanges":[{"startDate":"2021-01-01","endDate":"2021-01-07"}],"keepEmptyRows":true,"limit":100,"returnPropertyQuota":true}
# ℹ 2021-04-13 10:50:17 > Request: https://analyticsdata.googleapis.com/v1beta/properties/206670707:runReport/
# ℹ 2021-04-13 10:50:17 > Body JSON parsed to: "{\"metrics\":[{\"name\":\"sessions\"}],\"orderBys\":[{\"dimension\":{\"orderType\":\"ALPHANUMERIC\",\"dimensionName\":\"date\"},\"desc\":false}],\"dimensions\":[{\"name\":\"date\"}],\"dateRanges\":[{\"startDate\":\"2021-01-01\",\"endDate\":\"2021-01-07\"}],\"keepEmptyRows\":true,\"limit\":100,\"returnPropertyQuota\":true}"
#ℹ 2021-04-13 10:50:18 > tokensPerDay: Query Cost [ 1 ] / Remaining [ 24927 ]
#ℹ 2021-04-13 10:50:18 > tokensPerHour: Query Cost [ 1 ] / Remaining [ 4927 ]
#ℹ 2021-04-13 10:50:18 > concurrentRequests: 10 / 10
#ℹ 2021-04-13 10:50:18 > serverErrorsPerProjectPerHour: 10 / 10
#ℹ 2021-04-13 10:50:18 > Downloaded [ 7 ] of total [ 7 ] rows
## A tibble: 7 x 2
# date sessions
# <date> <dbl>
#1 2021-01-01 33
#2 2021-01-02 34
#3 2021-01-03 66
#4 2021-01-04 89
# ...
You can also query the realtime API:
ga_data(
propertyId = ga4_propertyId,
raw_json = '{"metrics":[{"name":"activeUsers"}],"limit":100,"returnPropertyQuota":true}',
realtime = TRUE)
## A tibble: 1 x 1
# activeUsers
# <dbl>
#1 2
If you pass in an R list instead of raw JSON string that will be converted into JSON via jsonlite::fromJSON()