The User
Activity API lets you query an individual user’s movement through
your website, by sending in the individual clientId
or
userId
. It is accessed via the
ga_clientid_activity()
function.
Use Activity is available in version of
googleAnalyticsR >= 0.6.9000
and needs
googleAuthR >= 0.7.0.9000
so install via:
remotes::install_github("MarkEdmondson1234/googleAuthR")
remotes::install_github("MarkEdmondson1234/googleAnalyticsR")
You first need to have a clientId
or userId
to query. You can now get this from the reporting API via the dimension
ga:clientId
.
Its also available via the User Explorer report in the Web UI, or via a BigQuery export, or you may be capturing the ID in a custom dimension.
Once you have an ID, specify the Google Analytics view that user was browsing and the data range of the activity you want to query:
viewId <- 123456
date_range <- c("yesterday","yesterday")
cids <- google_analytics(viewId, date_range = date_range,
metrics = "sessions", dimensions = "clientId")
users <- ga_clientid_activity(cids$clientId,
viewId = viewId,
date_range = date_range)
#> 2019-08-15 10:43:15> Fetching id: 19123.15505123
#> 2019-08-15 10:43:16> Fetching id: 12123.15657123
#> 2019-08-15 10:43:17> Fetching id: 11234.15657123
#> 2019-08-15 10:43:17> Fetching id: 21123.1565123
#> ..etc..
You can send in multiple IDs of the same type in a vector:
two_clientIds <- c("1106980347.1461227730", "476443645.1541099566")
two_users <- ga_clientid_activity(two_clientIds,
viewId = 81416156,
date_range = c("2019-01-01","2019-02-01"))
The API returns two types of data: session level and activity hit
level. Access it via $sessions
or $hits
:
two_users$sessions
# sessionId deviceCategory platform dataSource sessionDate id
#1 1548361067 desktop Macintosh web 2019-01-24 1106980347.1461227730
#2 1548261976 desktop Macintosh web 2019-01-23 1106980347.1461227730
#3 1548251272 desktop Macintosh web 2019-01-23 1106980347.1461227730
#4 1548017997 desktop Macintosh web 2019-01-20 1106980347.1461227730
# ...
two_users$hits
# A tibble: 102 x 26
# sessionId activityTime source medium channelGrouping campaign keyword hostname
# <chr> <dttm> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 15483610… 2019-01-24 21:17:47 t.co refer… Social (not se… (not s… code.ma…
# 2 15482619… 2019-01-23 17:46:16 t.co refer… Social (not se… (not s… code.ma…
# 3 15482512… 2019-01-23 14:47:52 t.co refer… Social (not se… (not s… code.ma…
# ...
The amount of data returned is rich for the activity, the data columns are shown below (Although some will be empty for some rows if not applicable).
names(two_users$hits)
# [1] "sessionId" "activityTime" "source" "medium"
# [5] "channelGrouping" "campaign" "keyword" "hostname"
# [9] "landingPagePath" "activityType" "customDimension" "pagePath"
#[13] "pageTitle" "screenName" "mobileDeviceBranding" "mobileDeviceModel"
#[17] "appName" "ecommerce" "goals" "has_goal"
#[21] "eventCategory" "eventAction" "eventLabel" "eventValue"
#[25] "eventCount" "id"
The data.frames returned include the ID you sent in as the
$id
column so you can distinguish between users.
The output uses nested columns for some values so you may want to get
familiar with the tidyr::unnest()
function when working
with the data.
The nested columns are hits$customDimension
,
hits$ecommerce
and hits$goals
.
The nesting is necessary as you can have multiple of these events per
hit, and expanding them in the response would make a very large
data.frame
to work with.
An example on how to unnest goals is shown below:
library(tidyr)
library(purrr)
library(dplyr)
a_user$hits %>%
filter(has_goal) %>% # filter to just hits with goals
select(id, sessionId, activityTime, goals) %>%
unnest(goals) %>% # unnest the goals list column
mutate(goalIndex = map_chr(goals, "goalIndex"),
goalName = map_chr(goals, "goalName"),
goalCompletionLocation = map_chr(goals, "goalCompletionLocation")) %>%
select(-goals)
## A tibble: 4 x 6
# id sessionId activityTime goalIndex goalName goalCompletionLocation
# <chr> <chr> <dttm> <chr> <chr> <chr>
#1 1106980347.1461227… 1548016803 2019-01-20 21:40:53 20 Visited over 4 pa… /googleAnalyticsR/articles/setup.html
#2 1106980347.1461227… 1546979541 2019-01-08 21:34:18 1 Time over a minut… /googleAnalyticsR/articles/ganalytics…
#3 1106980347.1461227… 1546802623 2019-01-06 20:26:59 1 Time over a minut… /googleAnalyticsR/articles/v4.html
#4 1106980347.1461227… 1546467261 2019-01-02 23:15:50 1 Time over a minut… /googleAnalyticsR/
To unnest custom dimensions, some example code is below:
library(tidyr) # devtools::install_github("tidyverse/tidyr")
library(purrr)
library(dplyr)
a_user$hits %>%
select(id, sessionId, activityTime, customDimension) %>%
unnest(customDimension) %>%
mutate(cd_index = map_chr(customDimension, "index"),
cd_value = map_chr(customDimension, ~ .$value %||% NA_character_)) %>%
filter(!is.na(cd_value)) %>%
select(-customDimension) %>%
distinct() %>%
pivot_wider(names_from = cd_index, values_from = cd_value, names_prefix = "customDim")
To unnest ecommerce and filter to only transactions, an example is shown below:
a_user$hits %>%
filter(activityType == "ECOMMERCE") %>%
select(id, sessionId, activityTime, ecommerce) %>%
mutate(transaction = map(ecommerce, "transaction"),
transactionRevenue = map_dbl(transaction, ~.[["transactionRevenue"]] %||% NA),
transactionId = map_chr(transaction, ~.[["transactionId"]] %||% NA)) %>%
filter(!is.na(transactionRevenue)) %>%
select(-transaction, -ecommerce)
To get the traffic sources per hit, you only need the first hit per session so can compute via:
If you specify the activity_type
parameter, you can
filter down the response to only the events you include in a vector.
The permitted types are:
c("PAGEVIEW","SCREENVIEW","GOAL","ECOMMERCE","EVENT")
-
include some of these to specify which you would like to see.
only_goals <- ga_clientid_activity(two_clientIds,
viewId = 81416156,
date_range = c("2019-01-01","2019-02-01"),
activity_types = "GOAL")
If you are capturing Google Analytics cookie ID or a userId in a custom dimension then the below workflow shows how you can use the standard reporting API to fetch more detail on the users:
library(googleAnalyticsR)
al <- ga_account_list()
# get a viewID you know has implemented putting cookie ID in a custom dimension
viewId <- 84714057
view_row <- al[al$viewId == viewId,]
# get the custom dimensions
cus_dims <- ga_custom_vars_list(accountId = view_row$accountId,
webPropertyId = view_row$webPropertyId,
type = "customDimensions")
#In this example, user.id is in ga:dimension2
# date range of ids to query
dates <- c(Sys.Date() - 30, Sys.Date() - 1)
# download all client.ids who had a pageview in last 30 days
# change this query to a segment of users you are interested in
cids <- google_analytics(viewId, date_range = dates,
dimensions = "dimension2", metrics = "pageviews",
order = order_type("pageviews", "DESCENDING"),
max = 1000)
# download user activity for all the users
user_activities <- ga_clientid_activity(cids$dimension2,
viewId = viewId,
date_range = dates)
The API response may be sampled - it will send a message if this happens. If it does, follow the advice on the API documentation such as splitting up the call into smaller date ranges.
Also bear in mind each API call counts against your Analytics Reporting v4 API quota which by default is 50k per day, so you won’t be able to fetch more user activity than that without increasing your API quota.