With Snowplow, AWS and Google Tag Manager
Presentation on Web Analytics Wednesday, Melboune, 06 Nov 2019.
Based on blog posts at https://www.ownyourbusinessdata.net/
Presentation on how to chat with PDF using ChatGPT code interpreter
How to collect Google Analytics events to your own data warehouse and do it on budget
1. How to collect Google Analytics events to your
own data warehouse and do it on budget
Alex Levashov
Web Analytics Wednesday presentation
06 Nov 2019
2. Brief Intro
WWW.OWNYOURBUSINESSDATA.NET
• eCommerce consultant, run own small consultancy Magenable, specializing in
Magento
• Deal with many eCommerce related things: from strategy to implementation to
support, so not only web analytics
• Started OwnYourBusinessData.net couple months ago
3. OwnYourBusinessData
WWW.OWNYOURBUSINESSDATA.NET
• Own data warehouse over vendor locked in
• Central data warehouse over silos
• Open, transferable data format over vendor proprietary
• De-coupled warehouse, ETL and business analysis tool over monolith
• Open-source over proprietary
The data generated by a business should be owned by this business
for its own and its customers benefits.
4. WHY BOTHER TO COLLECT COPY OF GA DATA?
MOTIVATION
Why in general?
1. Being paranoid and control freak
2. Centralization
3. Sampling
4. API Limits
Why this way?
1. Affordability
2. Low maintenance
3. Learn something new
WWW.OWNYOURBUSINESSDATA.NET
5. INSPIRATION AND CREDITS
Existing Snowplow GA Plugin
Google Analytics plugin for Snowplow
Approach in general
Blog post at Bostata.com “Client-side instrumentation for under $1 per month. No servers
necessary.”
WWW.OWNYOURBUSINESSDATA.NET
6. DISCLAMERS, NOTES
• I am just starting to use Snowplow
• Alternative ways are there and may work
better in other cases
• Link to blog post that describes the process
in more details and git repository will be
provided, so no need to write everything
WWW.OWNYOURBUSINESSDATA.NET
10. WWW.OWNYOURBUSINESSDATA.NET
What you need to start?
WWW.OWNYOURBUSINESSDATA.NET
1. Google Analytics account
2. Google Tag Manager account
3. AWS account
4. Terraform (optional, but saves your time)
11. WWW.OWNYOURBUSINESSDATA.NET
Step 1. Deploy AWS infrastructure
WWW.OWNYOURBUSINESSDATA.NET
1. Manually
2. Or use Terraform script:
https://github.com/ownyourbusinessdata/snowplow-google-analytics-enrich-lambda
At the end of process you’ll get:
• Cloudfront distribution
• 3 S3 buckets for logs, tracking pixel and Athena queries results
• Tracking pixel in one S3 bucket
• Python Lambda function that does data processing and enrichment
• Athena table (empty now)
AWS Cloudfront AWS Lambda
Python
AWS S3 AWS Athena
12. Step 2. Deploy JS tracker
With Google Tag Manager
WWW.OWNYOURBUSINESSDATA.NET
Create User Defined Variable (Custom Javascript type), where you insert your tracker
13. Step 2. Deploy JS tracker
With Google Tag Manager
WWW.OWNYOURBUSINESSDATA.NET
Make another variable with type Variable Configuration and add there your Custom Javascript variable was a field
14. Step 2. Deploy JS tracker
With Google Tag Manager
WWW.OWNYOURBUSINESSDATA.NET
Use that configuration variable to modify tag configuration
15. Step 2. Deploy JS tracker
With Google Tag Manager
WWW.OWNYOURBUSINESSDATA.NET
Use that configuration variable to modify tag configuration
17. Few words about enrichment
AWS Lambda (Python)
WWW.OWNYOURBUSINESSDATA.NET
Part that we had to develop
• Processing turns logs to text
files
• Enrichment adds geo data (use
MaxMindDB)
18. Let’s check what we get
Access from R demo
WWW.OWNYOURBUSINESSDATA.NET
AWR.Athena package comes handy
# sample R connector to Athena DB with Snowplow events generated via
Google Analytics plugin collected
# required package to instal AWR.Athena
# connect to Athena
# install.packages("AWR.Athena")
library(AWR.Athena)
require(DBI)
library(tidyverse)
library(lubridate)
# You need AWS API user with proper access to S3 and Athena
# AWS Access Key and Secret should be set via AWS CLI, run "aws configure"
from command line
# S3OutputLocation should be taken from your Athena settings
con <- dbConnect(AWR.Athena::Athena(), region='us-west-2',
S3OutputLocation='s3://aws-athena-query-results-518190832416-us-
west-2/',
Schema='default')
# get list of tables available
dbListTables(con)
#query specific table (all records, SQL statement can be any supported by
Athena)
df <- as_tibble(dbGetQuery(con, "Select * from eventsga"))
19. Let’s check what we get
AWS S3 and Athana live demo
WWW.OWNYOURBUSINESSDATA.NET
20. WWW.OWNYOURBUSINESSDATA.NET
References
WWW.OWNYOURBUSINESSDATA.NET
• Collect Google Analytics events in your own cheap AWS warehouse with Snowplow (OwnYourBusinessData)
https://www.ownyourbusinessdata.net/collect-google-analytics-events-in-your-own-cheap-aws-warehouse-with-snowplow
• Snowplow data enrichment with Lambda (OwnYourBusinessData)
https://www.ownyourbusinessdata.net/enrich-snowplow-data-with-aws-lambda-function/
• Connect R to Athena (OwnYourBusinessData)
https://www.ownyourbusinessdata.net/connecting-r-to-athena-to-analyse-snowplow-events/
• Own Your Business Data Git
https://github.com/ownyourbusinessdata/
• Client-side instrumentation for under $1 per month. No servers necessary (Bostata)
https://bostata.com/client-side-instrumentation-for-under-one-dollar/