SlideShare ist ein Scribd-Unternehmen logo
1 von 22
How to collect Google Analytics events to your
own data warehouse and do it on budget
Alex Levashov
Web Analytics Wednesday presentation
06 Nov 2019
Brief Intro
WWW.OWNYOURBUSINESSDATA.NET
• eCommerce consultant, run own small consultancy Magenable, specializing in
Magento
• Deal with many eCommerce related things: from strategy to implementation to
support, so not only web analytics
• Started OwnYourBusinessData.net couple months ago
OwnYourBusinessData
WWW.OWNYOURBUSINESSDATA.NET
• Own data warehouse over vendor locked in
• Central data warehouse over silos
• Open, transferable data format over vendor proprietary
• De-coupled warehouse, ETL and business analysis tool over monolith
• Open-source over proprietary
The data generated by a business should be owned by this business
for its own and its customers benefits.
WHY BOTHER TO COLLECT COPY OF GA DATA?
MOTIVATION
Why in general?
1. Being paranoid and control freak 
2. Centralization
3. Sampling
4. API Limits
Why this way?
1. Affordability
2. Low maintenance
3. Learn something new
WWW.OWNYOURBUSINESSDATA.NET
INSPIRATION AND CREDITS
Existing Snowplow GA Plugin
Google Analytics plugin for Snowplow
Approach in general
Blog post at Bostata.com “Client-side instrumentation for under $1 per month. No servers
necessary.”
WWW.OWNYOURBUSINESSDATA.NET
DISCLAMERS, NOTES
• I am just starting to use Snowplow
• Alternative ways are there and may work
better in other cases
• Link to blog post that describes the process
in more details and git repository will be
provided, so no need to write everything
WWW.OWNYOURBUSINESSDATA.NET
WWW.OWNYOURBUSINESSDATA.NET
TECHNOLOGIES USED
Approach
Snowplow architecture
Technologies we used
AWS Cloudfront AWS Lambda
Python
AWS S3 AWS Athena
WWW.OWNYOURBUSINESSDATA.NET
WWW.OWNYOURBUSINESSDATA.NET
PROCESS
Approach
WWW.OWNYOURBUSINESSDATA.NET
JS tracker
• Calls
tracking
pixel
Cloudfront
• Produces
logs
Lambda
function
• Processes
logs
• Enriches
data
• Puts to S3
Athena
• Takes S3
data
• Creates
SQL tables
WWW.OWNYOURBUSINESSDATA.NET
Why this way?
Benefits
WWW.OWNYOURBUSINESSDATA.NET
• Easy to implement
• Serverless, low resource
usage and costs (under
$1/month)
• Reliable/low maintenance
• Easy access to data (SQL)
WWW.OWNYOURBUSINESSDATA.NET
What you need to start?
WWW.OWNYOURBUSINESSDATA.NET
1. Google Analytics account
2. Google Tag Manager account
3. AWS account
4. Terraform (optional, but saves your time)
WWW.OWNYOURBUSINESSDATA.NET
Step 1. Deploy AWS infrastructure
WWW.OWNYOURBUSINESSDATA.NET
1. Manually
2. Or use Terraform script:
https://github.com/ownyourbusinessdata/snowplow-google-analytics-enrich-lambda
At the end of process you’ll get:
• Cloudfront distribution
• 3 S3 buckets for logs, tracking pixel and Athena queries results
• Tracking pixel in one S3 bucket
• Python Lambda function that does data processing and enrichment
• Athena table (empty now)
AWS Cloudfront AWS Lambda
Python
AWS S3 AWS Athena
Step 2. Deploy JS tracker
With Google Tag Manager
WWW.OWNYOURBUSINESSDATA.NET
Create User Defined Variable (Custom Javascript type), where you insert your tracker
Step 2. Deploy JS tracker
With Google Tag Manager
WWW.OWNYOURBUSINESSDATA.NET
Make another variable with type Variable Configuration and add there your Custom Javascript variable was a field
Step 2. Deploy JS tracker
With Google Tag Manager
WWW.OWNYOURBUSINESSDATA.NET
Use that configuration variable to modify tag configuration
Step 2. Deploy JS tracker
With Google Tag Manager
WWW.OWNYOURBUSINESSDATA.NET
Use that configuration variable to modify tag configuration
Wait
The data updates every 5-15 mins
WWW.OWNYOURBUSINESSDATA.NET
Few words about enrichment
AWS Lambda (Python)
WWW.OWNYOURBUSINESSDATA.NET
Part that we had to develop
• Processing turns logs to text
files
• Enrichment adds geo data (use
MaxMindDB)
Let’s check what we get
Access from R demo
WWW.OWNYOURBUSINESSDATA.NET
AWR.Athena package comes handy
# sample R connector to Athena DB with Snowplow events generated via
Google Analytics plugin collected
# required package to instal AWR.Athena
# connect to Athena
# install.packages("AWR.Athena")
library(AWR.Athena)
require(DBI)
library(tidyverse)
library(lubridate)
# You need AWS API user with proper access to S3 and Athena
# AWS Access Key and Secret should be set via AWS CLI, run "aws configure"
from command line
# S3OutputLocation should be taken from your Athena settings
con <- dbConnect(AWR.Athena::Athena(), region='us-west-2',
S3OutputLocation='s3://aws-athena-query-results-518190832416-us-
west-2/',
Schema='default')
# get list of tables available
dbListTables(con)
#query specific table (all records, SQL statement can be any supported by
Athena)
df <- as_tibble(dbGetQuery(con, "Select * from eventsga"))
Let’s check what we get
AWS S3 and Athana live demo
WWW.OWNYOURBUSINESSDATA.NET
WWW.OWNYOURBUSINESSDATA.NET
References
WWW.OWNYOURBUSINESSDATA.NET
• Collect Google Analytics events in your own cheap AWS warehouse with Snowplow (OwnYourBusinessData)
https://www.ownyourbusinessdata.net/collect-google-analytics-events-in-your-own-cheap-aws-warehouse-with-snowplow
• Snowplow data enrichment with Lambda (OwnYourBusinessData)
https://www.ownyourbusinessdata.net/enrich-snowplow-data-with-aws-lambda-function/
• Connect R to Athena (OwnYourBusinessData)
https://www.ownyourbusinessdata.net/connecting-r-to-athena-to-analyse-snowplow-events/
• Own Your Business Data Git
https://github.com/ownyourbusinessdata/
• Client-side instrumentation for under $1 per month. No servers necessary (Bostata)
https://bostata.com/client-side-instrumentation-for-under-one-dollar/
Q&A TIME
WWW.OWNYOURBUSINESSDATA.NET
WWW.OWNYOURBUSINESSDATA.NET
Contacts
Web: OwnYourBusinessData.Net
Twitter: https://twitter.com/own_data
LinkedIn: https://www.linkedin.com/groups/12283165/
OwnYourBusinessData
Web: https://levashov.biz/
Twitter: https://twitter.com/levashovbiz
LinkedIn: https://www.linkedin.com/in/alevashov/
Alex Levashov
Looking for people interested to join the course

Weitere ähnliche Inhalte

Was ist angesagt?

Spark logs made easy
Spark logs made easySpark logs made easy
Spark logs made easySimona Meriam
 
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)Amazon Web Services
 
A quick introduction to AWS Lambda
A quick introduction to AWS LambdaA quick introduction to AWS Lambda
A quick introduction to AWS Lambdaogeisser
 
Automating Application over OpenStack using Workflows
Automating Application over OpenStack using WorkflowsAutomating Application over OpenStack using Workflows
Automating Application over OpenStack using WorkflowsYaron Parasol
 
SmartNews's journey into microservices
SmartNews's journey into microservicesSmartNews's journey into microservices
SmartNews's journey into microservicesSmartNews, Inc.
 
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...Amazon Web Services
 
AWS re:Invent 2016 Recap: What Happened, What It Means
AWS re:Invent 2016 Recap: What Happened, What It MeansAWS re:Invent 2016 Recap: What Happened, What It Means
AWS re:Invent 2016 Recap: What Happened, What It MeansRightScale
 
(New)SQL on AWS: Aurora serverless
(New)SQL on AWS: Aurora serverless(New)SQL on AWS: Aurora serverless
(New)SQL on AWS: Aurora serverlessClaudio Pontili
 
StripeCon EU 2021 - Can you make it more like google?
StripeCon EU 2021 - Can you make it more like google?StripeCon EU 2021 - Can you make it more like google?
StripeCon EU 2021 - Can you make it more like google?Andrew Paxley
 
Best Practices for Securing Serverless Applications (SEC362-R1) - AWS re:Inve...
Best Practices for Securing Serverless Applications (SEC362-R1) - AWS re:Inve...Best Practices for Securing Serverless Applications (SEC362-R1) - AWS re:Inve...
Best Practices for Securing Serverless Applications (SEC362-R1) - AWS re:Inve...Amazon Web Services
 
Big data for dot net Devs with Spark
Big data for dot net Devs with SparkBig data for dot net Devs with Spark
Big data for dot net Devs with SparkNilesh Gule
 
Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...
Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...
Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...Amazon Web Services
 
Denver AWS Users' Group meeting - September 2017
Denver AWS Users' Group meeting - September 2017Denver AWS Users' Group meeting - September 2017
Denver AWS Users' Group meeting - September 2017David McDaniel
 
New AWS Services for Bioinformatics
New AWS Services for BioinformaticsNew AWS Services for Bioinformatics
New AWS Services for BioinformaticsLynn Langit
 
Eric Williams (Rackspace) - Using Heat on OpenStack
Eric Williams (Rackspace) - Using Heat on OpenStackEric Williams (Rackspace) - Using Heat on OpenStack
Eric Williams (Rackspace) - Using Heat on OpenStackOutlyer
 
Event driven workloads on Kubernetes with KEDA
Event driven workloads on Kubernetes with KEDAEvent driven workloads on Kubernetes with KEDA
Event driven workloads on Kubernetes with KEDANilesh Gule
 
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...Amazon Web Services
 
New Relic Plugin for Cassandra | Blue Medora
New Relic Plugin for Cassandra | Blue MedoraNew Relic Plugin for Cassandra | Blue Medora
New Relic Plugin for Cassandra | Blue MedoraBlue Medora
 
Getting started with Serverless applications on Microsoft Azure
Getting started with Serverless applications on Microsoft AzureGetting started with Serverless applications on Microsoft Azure
Getting started with Serverless applications on Microsoft AzureNilesh Gule
 
Automate all your EMR related activities
Automate all your EMR related activitiesAutomate all your EMR related activities
Automate all your EMR related activitiesEitan Sela
 

Was ist angesagt? (20)

Spark logs made easy
Spark logs made easySpark logs made easy
Spark logs made easy
 
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
 
A quick introduction to AWS Lambda
A quick introduction to AWS LambdaA quick introduction to AWS Lambda
A quick introduction to AWS Lambda
 
Automating Application over OpenStack using Workflows
Automating Application over OpenStack using WorkflowsAutomating Application over OpenStack using Workflows
Automating Application over OpenStack using Workflows
 
SmartNews's journey into microservices
SmartNews's journey into microservicesSmartNews's journey into microservices
SmartNews's journey into microservices
 
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
 
AWS re:Invent 2016 Recap: What Happened, What It Means
AWS re:Invent 2016 Recap: What Happened, What It MeansAWS re:Invent 2016 Recap: What Happened, What It Means
AWS re:Invent 2016 Recap: What Happened, What It Means
 
(New)SQL on AWS: Aurora serverless
(New)SQL on AWS: Aurora serverless(New)SQL on AWS: Aurora serverless
(New)SQL on AWS: Aurora serverless
 
StripeCon EU 2021 - Can you make it more like google?
StripeCon EU 2021 - Can you make it more like google?StripeCon EU 2021 - Can you make it more like google?
StripeCon EU 2021 - Can you make it more like google?
 
Best Practices for Securing Serverless Applications (SEC362-R1) - AWS re:Inve...
Best Practices for Securing Serverless Applications (SEC362-R1) - AWS re:Inve...Best Practices for Securing Serverless Applications (SEC362-R1) - AWS re:Inve...
Best Practices for Securing Serverless Applications (SEC362-R1) - AWS re:Inve...
 
Big data for dot net Devs with Spark
Big data for dot net Devs with SparkBig data for dot net Devs with Spark
Big data for dot net Devs with Spark
 
Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...
Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...
Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...
 
Denver AWS Users' Group meeting - September 2017
Denver AWS Users' Group meeting - September 2017Denver AWS Users' Group meeting - September 2017
Denver AWS Users' Group meeting - September 2017
 
New AWS Services for Bioinformatics
New AWS Services for BioinformaticsNew AWS Services for Bioinformatics
New AWS Services for Bioinformatics
 
Eric Williams (Rackspace) - Using Heat on OpenStack
Eric Williams (Rackspace) - Using Heat on OpenStackEric Williams (Rackspace) - Using Heat on OpenStack
Eric Williams (Rackspace) - Using Heat on OpenStack
 
Event driven workloads on Kubernetes with KEDA
Event driven workloads on Kubernetes with KEDAEvent driven workloads on Kubernetes with KEDA
Event driven workloads on Kubernetes with KEDA
 
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
 
New Relic Plugin for Cassandra | Blue Medora
New Relic Plugin for Cassandra | Blue MedoraNew Relic Plugin for Cassandra | Blue Medora
New Relic Plugin for Cassandra | Blue Medora
 
Getting started with Serverless applications on Microsoft Azure
Getting started with Serverless applications on Microsoft AzureGetting started with Serverless applications on Microsoft Azure
Getting started with Serverless applications on Microsoft Azure
 
Automate all your EMR related activities
Automate all your EMR related activitiesAutomate all your EMR related activities
Automate all your EMR related activities
 

Ähnlich wie How to collect Google Analytics events to your own data warehouse and do it on budget

How to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless EditionHow to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless EditionLecole Cole
 
AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWS
AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWSAWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWS
AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWSAmazon Web Services
 
How to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless EditionHow to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless Editionecobold
 
Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...
Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...
Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...Amazon Web Services
 
T1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on awsT1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on awsAmazon Web Services
 
Oracle Solutions on AWS : May 2014
Oracle Solutions on AWS : May 2014Oracle Solutions on AWS : May 2014
Oracle Solutions on AWS : May 2014Tom Laszewski
 
AWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAmazon Web Services
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsInformatica
 
Running Business Critical Workloads on AWS – Nam Je Cho
Running Business Critical Workloads on AWS – Nam Je ChoRunning Business Critical Workloads on AWS – Nam Je Cho
Running Business Critical Workloads on AWS – Nam Je ChoAmazon Web Services
 
AWS Summit Benelux 2013 - Enterprise Applications on AWS
AWS Summit Benelux 2013 - Enterprise Applications on AWSAWS Summit Benelux 2013 - Enterprise Applications on AWS
AWS Summit Benelux 2013 - Enterprise Applications on AWSAmazon Web Services
 
Hybrid IT with Amazon Web Services: Best of Both Worlds
Hybrid IT with Amazon Web Services: Best of Both WorldsHybrid IT with Amazon Web Services: Best of Both Worlds
Hybrid IT with Amazon Web Services: Best of Both WorldsAmazon Web Services
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...Amazon Web Services
 
Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...
Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...
Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...Amazon Web Services
 
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...Amazon Web Services
 
Running Business-Critical Applications on the AWS Cloud
Running Business-Critical Applications on the AWS CloudRunning Business-Critical Applications on the AWS Cloud
Running Business-Critical Applications on the AWS CloudAmazon Web Services
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Amazon Web Services
 
Your First 10 Million Users with Amazon Web Services
Your First 10 Million Users with Amazon Web ServicesYour First 10 Million Users with Amazon Web Services
Your First 10 Million Users with Amazon Web ServicesAmazon Web Services
 
Big data and serverless - AWS UG The Netherlands
Big data and serverless - AWS UG The NetherlandsBig data and serverless - AWS UG The Netherlands
Big data and serverless - AWS UG The NetherlandsMarek Kuczynski
 

Ähnlich wie How to collect Google Analytics events to your own data warehouse and do it on budget (20)

How to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless EditionHow to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless Edition
 
AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWS
AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWSAWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWS
AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWS
 
How to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless EditionHow to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless Edition
 
Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...
Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...
Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...
 
T1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on awsT1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on aws
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
Oracle Solutions on AWS : May 2014
Oracle Solutions on AWS : May 2014Oracle Solutions on AWS : May 2014
Oracle Solutions on AWS : May 2014
 
AWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWS
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
 
Running Business Critical Workloads on AWS – Nam Je Cho
Running Business Critical Workloads on AWS – Nam Je ChoRunning Business Critical Workloads on AWS – Nam Je Cho
Running Business Critical Workloads on AWS – Nam Je Cho
 
AWS Summit Benelux 2013 - Enterprise Applications on AWS
AWS Summit Benelux 2013 - Enterprise Applications on AWSAWS Summit Benelux 2013 - Enterprise Applications on AWS
AWS Summit Benelux 2013 - Enterprise Applications on AWS
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 
Hybrid IT with Amazon Web Services: Best of Both Worlds
Hybrid IT with Amazon Web Services: Best of Both WorldsHybrid IT with Amazon Web Services: Best of Both Worlds
Hybrid IT with Amazon Web Services: Best of Both Worlds
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
 
Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...
Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...
Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...
 
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
 
Running Business-Critical Applications on the AWS Cloud
Running Business-Critical Applications on the AWS CloudRunning Business-Critical Applications on the AWS Cloud
Running Business-Critical Applications on the AWS Cloud
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3
 
Your First 10 Million Users with Amazon Web Services
Your First 10 Million Users with Amazon Web ServicesYour First 10 Million Users with Amazon Web Services
Your First 10 Million Users with Amazon Web Services
 
Big data and serverless - AWS UG The Netherlands
Big data and serverless - AWS UG The NetherlandsBig data and serverless - AWS UG The Netherlands
Big data and serverless - AWS UG The Netherlands
 

Mehr von Alex Levashov

Altima better creditcardform-1.0.0_instructions
Altima better creditcardform-1.0.0_instructionsAltima better creditcardform-1.0.0_instructions
Altima better creditcardform-1.0.0_instructionsAlex Levashov
 
Coursera mafash 2014 statement of accomplishment
Coursera mafash 2014 statement of accomplishmentCoursera mafash 2014 statement of accomplishment
Coursera mafash 2014 statement of accomplishmentAlex Levashov
 
Magento capabilities, presentation from Magenable - Melbourne Magento eCommer...
Magento capabilities, presentation from Magenable - Melbourne Magento eCommer...Magento capabilities, presentation from Magenable - Melbourne Magento eCommer...
Magento capabilities, presentation from Magenable - Melbourne Magento eCommer...Alex Levashov
 
Suburbarian - presentation
Suburbarian - presentationSuburbarian - presentation
Suburbarian - presentationAlex Levashov
 
Presentation of Magenable eCommerce consultancy, Magento ecommerce focused, b...
Presentation of Magenable eCommerce consultancy, Magento ecommerce focused, b...Presentation of Magenable eCommerce consultancy, Magento ecommerce focused, b...
Presentation of Magenable eCommerce consultancy, Magento ecommerce focused, b...Alex Levashov
 
Conversion optimization basics: how to extract more value online
Conversion optimization basics: how to extract more value onlineConversion optimization basics: how to extract more value online
Conversion optimization basics: how to extract more value onlineAlex Levashov
 
Lookbook Cloud (Facebook slideshow with multiple hotspots app) user guide
Lookbook Cloud (Facebook slideshow with multiple hotspots app) user guideLookbook Cloud (Facebook slideshow with multiple hotspots app) user guide
Lookbook Cloud (Facebook slideshow with multiple hotspots app) user guideAlex Levashov
 
Product pricing: how to extract more value
Product pricing: how to extract more valueProduct pricing: how to extract more value
Product pricing: how to extract more valueAlex Levashov
 

Mehr von Alex Levashov (8)

Altima better creditcardform-1.0.0_instructions
Altima better creditcardform-1.0.0_instructionsAltima better creditcardform-1.0.0_instructions
Altima better creditcardform-1.0.0_instructions
 
Coursera mafash 2014 statement of accomplishment
Coursera mafash 2014 statement of accomplishmentCoursera mafash 2014 statement of accomplishment
Coursera mafash 2014 statement of accomplishment
 
Magento capabilities, presentation from Magenable - Melbourne Magento eCommer...
Magento capabilities, presentation from Magenable - Melbourne Magento eCommer...Magento capabilities, presentation from Magenable - Melbourne Magento eCommer...
Magento capabilities, presentation from Magenable - Melbourne Magento eCommer...
 
Suburbarian - presentation
Suburbarian - presentationSuburbarian - presentation
Suburbarian - presentation
 
Presentation of Magenable eCommerce consultancy, Magento ecommerce focused, b...
Presentation of Magenable eCommerce consultancy, Magento ecommerce focused, b...Presentation of Magenable eCommerce consultancy, Magento ecommerce focused, b...
Presentation of Magenable eCommerce consultancy, Magento ecommerce focused, b...
 
Conversion optimization basics: how to extract more value online
Conversion optimization basics: how to extract more value onlineConversion optimization basics: how to extract more value online
Conversion optimization basics: how to extract more value online
 
Lookbook Cloud (Facebook slideshow with multiple hotspots app) user guide
Lookbook Cloud (Facebook slideshow with multiple hotspots app) user guideLookbook Cloud (Facebook slideshow with multiple hotspots app) user guide
Lookbook Cloud (Facebook slideshow with multiple hotspots app) user guide
 
Product pricing: how to extract more value
Product pricing: how to extract more valueProduct pricing: how to extract more value
Product pricing: how to extract more value
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Kürzlich hochgeladen (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

How to collect Google Analytics events to your own data warehouse and do it on budget

  • 1. How to collect Google Analytics events to your own data warehouse and do it on budget Alex Levashov Web Analytics Wednesday presentation 06 Nov 2019
  • 2. Brief Intro WWW.OWNYOURBUSINESSDATA.NET • eCommerce consultant, run own small consultancy Magenable, specializing in Magento • Deal with many eCommerce related things: from strategy to implementation to support, so not only web analytics • Started OwnYourBusinessData.net couple months ago
  • 3. OwnYourBusinessData WWW.OWNYOURBUSINESSDATA.NET • Own data warehouse over vendor locked in • Central data warehouse over silos • Open, transferable data format over vendor proprietary • De-coupled warehouse, ETL and business analysis tool over monolith • Open-source over proprietary The data generated by a business should be owned by this business for its own and its customers benefits.
  • 4. WHY BOTHER TO COLLECT COPY OF GA DATA? MOTIVATION Why in general? 1. Being paranoid and control freak  2. Centralization 3. Sampling 4. API Limits Why this way? 1. Affordability 2. Low maintenance 3. Learn something new WWW.OWNYOURBUSINESSDATA.NET
  • 5. INSPIRATION AND CREDITS Existing Snowplow GA Plugin Google Analytics plugin for Snowplow Approach in general Blog post at Bostata.com “Client-side instrumentation for under $1 per month. No servers necessary.” WWW.OWNYOURBUSINESSDATA.NET
  • 6. DISCLAMERS, NOTES • I am just starting to use Snowplow • Alternative ways are there and may work better in other cases • Link to blog post that describes the process in more details and git repository will be provided, so no need to write everything WWW.OWNYOURBUSINESSDATA.NET
  • 7. WWW.OWNYOURBUSINESSDATA.NET TECHNOLOGIES USED Approach Snowplow architecture Technologies we used AWS Cloudfront AWS Lambda Python AWS S3 AWS Athena WWW.OWNYOURBUSINESSDATA.NET
  • 8. WWW.OWNYOURBUSINESSDATA.NET PROCESS Approach WWW.OWNYOURBUSINESSDATA.NET JS tracker • Calls tracking pixel Cloudfront • Produces logs Lambda function • Processes logs • Enriches data • Puts to S3 Athena • Takes S3 data • Creates SQL tables
  • 9. WWW.OWNYOURBUSINESSDATA.NET Why this way? Benefits WWW.OWNYOURBUSINESSDATA.NET • Easy to implement • Serverless, low resource usage and costs (under $1/month) • Reliable/low maintenance • Easy access to data (SQL)
  • 10. WWW.OWNYOURBUSINESSDATA.NET What you need to start? WWW.OWNYOURBUSINESSDATA.NET 1. Google Analytics account 2. Google Tag Manager account 3. AWS account 4. Terraform (optional, but saves your time)
  • 11. WWW.OWNYOURBUSINESSDATA.NET Step 1. Deploy AWS infrastructure WWW.OWNYOURBUSINESSDATA.NET 1. Manually 2. Or use Terraform script: https://github.com/ownyourbusinessdata/snowplow-google-analytics-enrich-lambda At the end of process you’ll get: • Cloudfront distribution • 3 S3 buckets for logs, tracking pixel and Athena queries results • Tracking pixel in one S3 bucket • Python Lambda function that does data processing and enrichment • Athena table (empty now) AWS Cloudfront AWS Lambda Python AWS S3 AWS Athena
  • 12. Step 2. Deploy JS tracker With Google Tag Manager WWW.OWNYOURBUSINESSDATA.NET Create User Defined Variable (Custom Javascript type), where you insert your tracker
  • 13. Step 2. Deploy JS tracker With Google Tag Manager WWW.OWNYOURBUSINESSDATA.NET Make another variable with type Variable Configuration and add there your Custom Javascript variable was a field
  • 14. Step 2. Deploy JS tracker With Google Tag Manager WWW.OWNYOURBUSINESSDATA.NET Use that configuration variable to modify tag configuration
  • 15. Step 2. Deploy JS tracker With Google Tag Manager WWW.OWNYOURBUSINESSDATA.NET Use that configuration variable to modify tag configuration
  • 16. Wait The data updates every 5-15 mins WWW.OWNYOURBUSINESSDATA.NET
  • 17. Few words about enrichment AWS Lambda (Python) WWW.OWNYOURBUSINESSDATA.NET Part that we had to develop • Processing turns logs to text files • Enrichment adds geo data (use MaxMindDB)
  • 18. Let’s check what we get Access from R demo WWW.OWNYOURBUSINESSDATA.NET AWR.Athena package comes handy # sample R connector to Athena DB with Snowplow events generated via Google Analytics plugin collected # required package to instal AWR.Athena # connect to Athena # install.packages("AWR.Athena") library(AWR.Athena) require(DBI) library(tidyverse) library(lubridate) # You need AWS API user with proper access to S3 and Athena # AWS Access Key and Secret should be set via AWS CLI, run "aws configure" from command line # S3OutputLocation should be taken from your Athena settings con <- dbConnect(AWR.Athena::Athena(), region='us-west-2', S3OutputLocation='s3://aws-athena-query-results-518190832416-us- west-2/', Schema='default') # get list of tables available dbListTables(con) #query specific table (all records, SQL statement can be any supported by Athena) df <- as_tibble(dbGetQuery(con, "Select * from eventsga"))
  • 19. Let’s check what we get AWS S3 and Athana live demo WWW.OWNYOURBUSINESSDATA.NET
  • 20. WWW.OWNYOURBUSINESSDATA.NET References WWW.OWNYOURBUSINESSDATA.NET • Collect Google Analytics events in your own cheap AWS warehouse with Snowplow (OwnYourBusinessData) https://www.ownyourbusinessdata.net/collect-google-analytics-events-in-your-own-cheap-aws-warehouse-with-snowplow • Snowplow data enrichment with Lambda (OwnYourBusinessData) https://www.ownyourbusinessdata.net/enrich-snowplow-data-with-aws-lambda-function/ • Connect R to Athena (OwnYourBusinessData) https://www.ownyourbusinessdata.net/connecting-r-to-athena-to-analyse-snowplow-events/ • Own Your Business Data Git https://github.com/ownyourbusinessdata/ • Client-side instrumentation for under $1 per month. No servers necessary (Bostata) https://bostata.com/client-side-instrumentation-for-under-one-dollar/
  • 22. WWW.OWNYOURBUSINESSDATA.NET Contacts Web: OwnYourBusinessData.Net Twitter: https://twitter.com/own_data LinkedIn: https://www.linkedin.com/groups/12283165/ OwnYourBusinessData Web: https://levashov.biz/ Twitter: https://twitter.com/levashovbiz LinkedIn: https://www.linkedin.com/in/alevashov/ Alex Levashov Looking for people interested to join the course