In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
Data Platform on GCP
1. Proprietary + Confidential
SQL Saturday - Los Angeles
Data Platform on GCP
Patrick Alexander
Google - Customer Engineer
Ex-Microsoft - Principal Cloud Solution Architect
PatrickGCP@Google.com
@PatrickCloudArc
9. Confidential & Proprietary
$203,579
In Amazon sales
generated
Data is surging every minute. How are you using it?
500
Hours of video
uploaded on YouTube
142,361,111
Emails sent and
received
2,083,333
Minutes used on
Skype calls
347,222
Tweets posted
50,200
Mobile apps
downloaded
1,389
Uber rides taken
2.4 Million
Google searches
made
216,000
Photos posted to
Instagram
*Stats may be out of date!
10.
11. Confidential & Proprietary
Unintegrated Marketing Tools
Many companies use 20+ separate tools
It’s difficult to get a holistic view of customers.
Company Data in Silos
CRM / ERP / Billing / Inventory / POS
12. Confidential & Proprietary
If you want to unlock the power of your data, you need
a CDP (customer data platform), not just new tools.
14. Warehouse
Cloud
Storage
Object
Binary or
object data
Images, media
serving, backups
Memcache
Key-value
Web/mobile
applications,
gaming
Game state,
user sessions
Non-relational
Cloud
Datastore
Hierarchical,
mobile, web
User profiles,
Game State
Cloud
Bigtable
Heavy read +
write, events
AdTech,
financial, IoT
Relational
Cloud
SQL
Web
frameworks
CMS,
eCommerce
Cloud
Spanner
RDBMS+scale,
HA, HTAP
Transactions,
Ad/Fin/MarTech
BigQuery
Enterprise Data
Warehouse
Analytics,
Dashboards
Fully managed storage
& database services
15. A modern data warehouse on a comprehensive platform
Data ingestion
at any scale
Reliable streaming
data pipeline
Advanced analytics
Data lake and data
warehousing
Cloud Pub/Sub Cloud
Dataflow
Cloud
Dataproc
Cloud
Storage
Data Transfer
Service
Cloud Composer
Cloud IoT
Core
Cloud Dataprep
Cloud AI
Services
Google
Data Studio
Tensorflow Sheets
Storage Transfer
Service
Data Catalog
Cloud Data Fusion
Process
Capture Store
Data warehousing
Analyze
BigQuery
storage
BigQuery
analysis engine
Use
Apache
Beam
16. 16
A Leader in
Cloud Data Warehouse
Data Ingestion
Data Lake Integration
ML / Data Science
Performance
Scalability
Google receives 5 of 5 in 19 different criteria, such as:
Solution Roadmap
Strategy Execution
Customer Adoption
Use Cases
Partners
The Forrester Wave™: Cloud Data Warehouse, Q1 2021, Noel Yuhanna
The Forrester Wave™ is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments.
Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time
and are subject to change.
18. GCP provides a full suite of storage service options
● Cost-effective
● Varied choices based on your:
○ Application
○ Workload
Highlight rows using blue and white, just like the agenda
Cloud
Storage
Cloud
Bigtable
Cloud
Datastore
Cloud
SQL
Cloud
Spanner
BigQuery
Cloud
Firestore
19. Overview Ideal for
● Fully managed, highly reliable ● Images and videos
● Cost-efficient, scalable object/blob
store
● Objects and blobs
● Objects access via HTTP requests ● Unstructured data
● Object name is the only key ● Static website hosting
Cloud Storage
Cloud
Bigtable
Cloud
Datastore
Cloud
SQL
Cloud
Spanner
BigQuery
Cloud
Firestore
Cloud
Storage
21. Cloud Firestore
Overview Ideal for
● Fully managed, serverless, NoSQL
● Scalable
● Native mobile and web client libraries
● Real-time updates
● Document-oriented data
● Large collections of small documents
● Native mobile and web clients
● Durable key-value data
● Hierarchical data
● Managing multiple indexes
● Transactions
Cloud
Storage
Cloud
Bigtable
Cloud
Datastore
Cloud
SQL
Cloud
Spanner
BigQuery
Cloud
Firestore
22. Cloud Bigtable
Overview Ideal for
● High performance wide column NoSQL
database service
● Operational applications
● Sparsely populated table ● Analytical applications
● Can scale to billions of rows and
thousands of columns
● Storing large amounts of single-keyed
data
● Can store TB to PB of data ● MapReduce operations
Cloud
Storage
Cloud
Datastore
Cloud
SQL
Cloud
Spanner
BigQuery
Cloud
Firestore
Cloud
Bigtable
23. Cloud SQL
Overview Ideal for
● Managed service
○ Replication
○ Failover
○ Backups
● Web frameworks
● MySQL, PostgreSQL, and SQL Server ● Structured data
● Relational database service ● OLTP workloads
● Proxy allows for secure access to your
Cloud SQL Second Generation instances
without whitelisting
● Applications using MySQL/PGS
Cloud
Storage
Cloud
Bigtable
Cloud
Datastore
Cloud
Spanner
BigQuery
Cloud
Firestore
Cloud
SQL
24. Cloud Spanner
Overview Ideal for
● Mission-critical relational database
service
● Mission-critical applications
● Transactional conspiracy ● High transactions
● Global scale ● Scale and consistency requirements
● High availability
● Multi-region replication
● 99.999% SLA
Cloud
Storage
Cloud
Bigtable
Cloud
Datastore
Cloud
SQL
BigQuery
Cloud
Firestore
Cloud
Spanner
25. BigQuery
Overview Ideal for
● Low-cost enterprise data warehouse for
analytics
● Online Analytical Processing (OLAP)
workloads
● Fully managed ● Big data exploration and processing
● Petabyte scale ● Reporting via Business Intelligence (BI)
tools
● Fast response times
● Serverless
Cloud
Storage
Cloud
Bigtable
Cloud
Datastore
Cloud
SQL
Cloud
Spanner
Cloud
Firestore
BigQuery
26. Product Simple Description Ideal for Not Ideal for
Cloud
Storage Binary/object store
Large or rarely accessed
unstructured data
Structured data, building
fast apps
Datastore
Scalable store for structured serve
GAE apps, structured
pure-serve use cases
Relational or
analytic data
Firestore Cloud-native app data at global scale
Real-time NoSQL database to
store and sync data
Mobile, web, multi-user,
IoT & real-time
applications
Bigtable
High-volume, low-latency database
“Flat,” heavy read/write, or
analytical data
High structure or
transactional data
CloudSQL
Well-understood VM-based RDBMS
Web frameworks,
existing applications
Scaling, analytics, heavy
writes
Spanner
Relational DB service
Low-latency transactional
systems
Analytic data
BigQuery
Auto-scaling analytic data warehouse
Interactive analysis of static
datasets
Building fast apps
Storage at a glance
27.
28. Cloud
SQL
Cloud
Spanner
Cloud
Datastore
Cloud
Bigtable
BigQuery
Cloud
Firestore on
Firebase
Is your data
structured?
Is your workload
analytics?
Is your data
relational?
Do you need updates
or low-latency?
Do you need
Mobile SDK’s?
Do you need
horizontal scalability?
No
Yes
No
Yes
No
Yes
Yes
No Yes
No Yes No
Do you need
Mobile SDK’s?
Firebase
Storage
Yes
No
Cloud
Storage
Which Google Cloud
Database is right for me?
29. A modern data warehouse on a comprehensive platform
Data ingestion
at any scale
Reliable streaming
data pipeline
Advanced analytics
Data lake and data
warehousing
Apache
Beam
Cloud Pub/Sub Cloud
Dataflow
Cloud
Dataproc
Cloud
Storage
Data Transfer
Service
Cloud Composer
Cloud IoT
Core
Cloud Dataprep
Cloud AI
Services
Google
Data Studio
Tensorflow Sheets
Storage Transfer
Service
Data Catalog
Cloud Data Fusion
Process
Capture Store
Data warehousing
Analyze
BigQuery
storage
BigQuery
analysis engine
Use
30. Cloud
Storage
Cloud
Transfer
Good for:
Managed Bulk
(arbitrary) data
transfer
Such as:
Cloud migration,
backup, legacy
data
Cloud
Pub/Sub
Streaming Batch
Applications
Data lifecycle - ingest
Stackdriver
Logging
Good for:
Centralized Log
management
solution
Such as:
Log data from
Applications
Cloud
Pub/Sub
Good for:
Global,
Scalable MQ,
durable,
de-couple apps
Such as:
IOT, User event,
System metrics
Cloud
SQL
Good for:
Structured
data, Web
frameworks
Such as:
Meta-data,
Fintech,
AdTech
Cloud
Datastore
Good for:
Hierarchical,
Mobile, Web
Such as:
User profile,
Game states
Cloud
Bigtable
Good for:
Heavy
read/write,
events
Such as:
IOT,
User/system
events, low
latency
systems
Cloud
Firestore
Cloud
Spanner
Good for:
RDBMS, SQL,
Horizontal
scaling
Such as:
Meta-data,
Fintech,
AdTech
Good for:
Hierarchical,
Mobile, Web
Such as:
User profile,
Game states
Good for:
Global,
Scalable MQ,
durable,
de-couple apps
Such as:
IOT, User event,
System metrics
Good for:
Binary, Object
data
Such as:
Images, Media
serving, Backup
31. AutoML Video
Intelligence
AutoML
Vision
Good for:
Object/face
detection,
emotional
facial
attributes, Safe
search, real
time or batch,
OCR
Good for:
Video metadata,
entity analysis,
granularity of 1
frame per second,
Video catalog
(timestamped)
entity search
Data Analysis Task specific Machine Learning
Large scale data processing
Data lifecycle - process and analyze
Cloud
Dataproc
Good for:
Managed
hadoop
eco-systems
Such as:
Batch and
streaming
analytics over
Big Data,
Machine
Learning
Cloud
Dataflow
Good for:
Unified abs. for
batch & streaming
data.
Such as:
New pipelines,
Windowing
operations,
Watermarking
Cloud
Dataprep
Good for:
UI Driven data
preparation
Such as:
Pre-step to Big
data jobs
(Dataproc/Data
Flow), Machine
Learning
BigQuery
Vertex AI
Platform
Good for:
General
purpose ML
platform.
Such as:
Data
scientists,
ML on Data
warehouse
Custom ML
Cloud
Dataproc
Good for:
Managed
hadoop
eco-systems
Such as:
ML Jobs using
Mahour/Spark
MLLib
AutoML
Translation
AutoML
NLP
Good for:
Structure and
meaning of text,
sentiment
analysis
Good for:
Auto translation of
90 languages,
language
detection, both
real time and
batch
AutoML
Tables
Good for:
Analyse structured
data, find data
traits, data label
and target feature
selection
Good for:
Enterprise Data
Warehouse
Such as:
Analytics,
Dashboards,
Business
Intelligence, Basic
Machine Learning
32. Cloud
Datalab
Connected
Sheets
Good for:
Jupyter notebooks
for general purpose
data visualization
Good for:
Using Google
App script
ability to run
BigQuery
Query. Usually
for quick
short analysis
on smaller
datasets
Google Data
Studio
Good for:
Drag and Drop report
builder from Google
Sheets, BigQuery,
Cloud storage files,
SQL
Business Intelligence Spreadsheet
Data Science
Data lifecycle - explore and visualize
Looker
Good for:
Custom applications,
embedded
visualizations, data
science workflows,
Integrates with
BigQuery
Cloud
Dataprep
Good for:
UI Driven data
preparation
and
visualization.
Also used as
Pre-step to
Big data jobs
(Dataproc/Dat
aFlow),
Machine
Learning
40. Proprietary + Confidential
Economic value - Data Warehouse Migration
lowers your TCO massively
ES G 2019 : The Economic
advantage of migrating Data
Warehouse Workloads to
BigQuery
52% Lower TCO
(versus on-premises)
41% Lower TCO
(vs Teradata on AWS)
TCO Calculator
Expected 3-year total cost of ownership
Teradata
on-premises
$0
Teradata
on AWS
Google
BigQuery
$2,000,000
$4,000,000
$6,000,000
$8,000,000
$10,000,000
$12,000,000
$14,000,000
$16,000,000
41% lower
TCO (vs EDW
on AWS)
52% lower
TCO vs Legacy
TD on-prem
Up-front Capital Investment Monthly Cloud spend
Administrative costs Planning/deployment/migration
Power/cooling/floorspace
ESG 2019
41. Proprietary + Confidential
Google Cloud provides the most modern data warehouse
Impact Google Cloud
BigQuery
Teradata
on-prem
AWS RedShift Snowflake Azure Synapse Analytics
Scale ✓ Fully managed and
serverless
✓ Petabyte-scale
✓ No warm-up or
maintenance
✕ Tied to cluster
✕ Significant
performance
bottlenecks
✕ Tied to cluster (RedShift
Spectrum is serverless)
✕ Considerable amount of
tuning needed
✕ Huge performance
bottlenecks
✕ Reclustering, shuffles, and
loads hurt performance
✕ SSDs tied to VMs
✕ Significant performance
bottlenecks on large data
✕ Compute has to be scaled up
manually
✕ Capacity limits based on
instance size
Real-time ✓ Streaming data
✓ BI Engine
✓ Streaming SQL
✓ Streaming data,
dashboards, SQL
✓ Streaming data,
dashboards, SQL
✕ Poor streaming performance ✓ Streaming data, dashboards
✕ Requires Databricks for
streaming scenarios
AI support ✓ Built-in BigQuery ML
✓ Two-way connections
to AI Platform
✓ Storage API for
Spark/Dataproc
✓ Some built-in ML
✕ Only basic
techniques
✕ No SQL-based ML
✓ Integration with Sage
Maker
✕ No SQL-based AI/ML
✕ No high-performance support
for Spark
✕ No SQL-based ML
✕ Just a rebrand of three
separate products; no deep
integration
Data
security
✓ Encrypted at rest and
in transit
✓ Immutable audit logs
✓ Data Catalog
✓ DLP API for redaction
✓ Integrated security ✓ Integrated security
✕ Partner tool (DgSecure)
needed for redaction
✕ No VPC-SC means no guards
against data exfiltration
✕ Standalone authentication
system
✕ No native redaction capability
✓ Integrated security
✕ Patches applied during
maintenance windows, with
downtime
42. BigQuery Hands on Lab
https://google.qwiklabs.com/focuses/1145?parent=catalog
Qwiklab
44. Proprietary + Confidential
NDA
Performance at Scale
Petabyte scale, automated, and intelligent - lets your enterprise focus on
delivering insights not infrastructure
Built-in advanced
analytics capabilities
Completely automated
and serverless
Manual configuration
Workloads
and
analytics
Degree of automation
BigQuery
Ad-hoc reporting,
operational insight
Basic reporting Legacy DW
45. Proprietary + Confidential
Expected 3-Year Total Cost of Ownership
52% Lower TCO1
(versus on-premises)
26-34% Lower TCO2
(vs other Cloud DW’s)
Flat-rate and variable pricing
options to give customers
control over TCO
1) Migrating Enterprise Data Warehouse Workloads - ESG 2019
2) Google BigQuery vs. Alternative Cloud-based EDW Solutions - ESG 2019
Economic Value - BigQuery lowers your data warehouse
TCO massively
46. How Google’s Smart Analytics Platform is Unique in the Industry
Scale ✓ Partial ✕
BigQuery is fully managed, serverless and architected for petabyte scale. While others are tied to clusters or require manual reclustering efforts
BQ manages the infrastructure for you and allows your teams to focus on delivering insights
Total Cost of Ownership ✓ ✕ ✕
BigQuery eliminates the need for upfront investment and planning for your EDW, reduces operational and administrative expenses - all while
delivering on business agility. Enterprise Strategy Group (ESG) estimated savings of 26-34% over cloud-based EDW alternatives and >40% over
legacy on-premise solutions
Interoperability ✓ ✕ ✕
BigQuery provides a unified, interoperable best of breed platform across your Data Warehouse and Data Lakes and data integration across
on-prem and cloud sources. BQ was made to tear down data silos and allow you to avoid creating new ones.
Democratized ML/AI ✓ ✕ ✕
BigQuery democratizes Machine Learning for the enterprise user (not just data scientists) with accessible capabilities using SQL. While allowing
for more sophisticated data science teams to access the power of Google’s leading edge AI technologies via Cloud AI. More than 80% of our
BigQuery customers have incorporated ML into their business analysis
Reliable & Secure ✓ Partial Partial
BigQuery offers robust security, governance and reliability that is unmatched in the industry. High availability and a 99.99% SLA, automatic data
replication, restore and backup to ensure business continuity. Ability to classify and redact sensitive data, fine-grained identity and access
management including access transparency so you can log each view. Data is encrypted at rest and in transit by default, and
customer-managed encryption keys provide control over your data
Real-Time ✓ ✕ ✕
Designed to excel in IoT and other scenarios where your analysis depends on real-time streaming data as well as a BI acceleration engine for
high-concurrency low-latency use cases - both are unique differentiators for Google Cloud and essential for businesses that need to make real
time decisions
Usefully Multi-Cloud ✓ ✕ ✕
BigQuery breaks down the silos to provide a single pane of glass for all your data across multiple clouds (AWS. Azure). Most other vendors are
focused on providing the same service running in 3 clouds but these are 3 silos. BigQuery breaks the silo and enables customers to analyze data
across datasets
Industry Leadership ✓ ? ?
Recognized industry leader by both Gartner and Forrester in Data Management and Analytics. With 9 Google products with more than a billion
users running on our platform you can be sure that big data is in our DNA and we are ready to help your business build a future ready data
platform
L
e
g
a
c
y
S
o
l
u
t
i
o
n
s