SlideShare ist ein Scribd-Unternehmen logo
1 von 64
Downloaden Sie, um offline zu lesen
Proprietary + Confidential
SQL Saturday - Los Angeles
Data Platform on GCP
Patrick Alexander
Google - Customer Engineer
Ex-Microsoft - Principal Cloud Solution Architect
PatrickGCP@Google.com
@PatrickCloudArc
Google Office Spruce Goose
Playa Vista - California
2022
Spruce Goose
Hughes H-4 Hercules
1942 - 1947
https://en.wikipedia.org/wiki/Hughes_H-4_Hercules
November 2, 1947
Long Beach, California
THE STATS
Wingspan: 320′ 11″
Length: 218′ 8″
Height: 79′ 4″
Pounds, Empty Weight: 300,000
Cruise Speed: 135 MPH
Intro to GCP’s Data Platform
Confidential & Proprietary
Agenda
01 The Data Landscape
02 Google Cloud Platform
03 Google Cloud Big Data Portfolio
The Data Landscape
Confidential & Proprietary
$203,579
In Amazon sales
generated
Data is surging every minute. How are you using it?
500
Hours of video
uploaded on YouTube
142,361,111
Emails sent and
received
2,083,333
Minutes used on
Skype calls
347,222
Tweets posted
50,200
Mobile apps
downloaded
1,389
Uber rides taken
2.4 Million
Google searches
made
216,000
Photos posted to
Instagram
*Stats may be out of date!
Confidential & Proprietary
Unintegrated Marketing Tools
Many companies use 20+ separate tools
It’s difficult to get a holistic view of customers.
Company Data in Silos
CRM / ERP / Billing / Inventory / POS
Confidential & Proprietary
If you want to unlock the power of your data, you need
a CDP (customer data platform), not just new tools.
Google Cloud Platform
Warehouse
Cloud
Storage
Object
Binary or
object data
Images, media
serving, backups
Memcache
Key-value
Web/mobile
applications,
gaming
Game state,
user sessions
Non-relational
Cloud
Datastore
Hierarchical,
mobile, web
User profiles,
Game State
Cloud
Bigtable
Heavy read +
write, events
AdTech,
financial, IoT
Relational
Cloud
SQL
Web
frameworks
CMS,
eCommerce
Cloud
Spanner
RDBMS+scale,
HA, HTAP
Transactions,
Ad/Fin/MarTech
BigQuery
Enterprise Data
Warehouse
Analytics,
Dashboards
Fully managed storage
& database services
A modern data warehouse on a comprehensive platform
Data ingestion
at any scale
Reliable streaming
data pipeline
Advanced analytics
Data lake and data
warehousing
Cloud Pub/Sub Cloud
Dataflow
Cloud
Dataproc
Cloud
Storage
Data Transfer
Service
Cloud Composer
Cloud IoT
Core
Cloud Dataprep
Cloud AI
Services
Google
Data Studio
Tensorflow Sheets
Storage Transfer
Service
Data Catalog
Cloud Data Fusion
Process
Capture Store
Data warehousing
Analyze
BigQuery
storage
BigQuery
analysis engine
Use
Apache
Beam
16
A Leader in
Cloud Data Warehouse
Data Ingestion
Data Lake Integration
ML / Data Science
Performance
Scalability
Google receives 5 of 5 in 19 different criteria, such as:
Solution Roadmap
Strategy Execution
Customer Adoption
Use Cases
Partners
The Forrester Wave™: Cloud Data Warehouse, Q1 2021, Noel Yuhanna
The Forrester Wave™ is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments.
Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time
and are subject to change.
Google Cloud Big Data Portfolio
GCP provides a full suite of storage service options
● Cost-effective
● Varied choices based on your:
○ Application
○ Workload
Highlight rows using blue and white, just like the agenda
Cloud
Storage
Cloud
Bigtable
Cloud
Datastore
Cloud
SQL
Cloud
Spanner
BigQuery
Cloud
Firestore
Overview Ideal for
● Fully managed, highly reliable ● Images and videos
● Cost-efficient, scalable object/blob
store
● Objects and blobs
● Objects access via HTTP requests ● Unstructured data
● Object name is the only key ● Static website hosting
Cloud Storage
Cloud
Bigtable
Cloud
Datastore
Cloud
SQL
Cloud
Spanner
BigQuery
Cloud
Firestore
Cloud
Storage
Cloud Datastore
Overview Ideal for
● Fully managed NoSQL ● Semi-structured application data
● Scalable ● Durable key-value data
● Hierarchical data
● Managing multiple indexes
● Transactions
Cloud
Storage
Cloud
Bigtable
Cloud
SQL
Cloud
Spanner
BigQuery
Cloud
Firestore
Cloud
Datastore
Cloud Firestore
Overview Ideal for
● Fully managed, serverless, NoSQL
● Scalable
● Native mobile and web client libraries
● Real-time updates
● Document-oriented data
● Large collections of small documents
● Native mobile and web clients
● Durable key-value data
● Hierarchical data
● Managing multiple indexes
● Transactions
Cloud
Storage
Cloud
Bigtable
Cloud
Datastore
Cloud
SQL
Cloud
Spanner
BigQuery
Cloud
Firestore
Cloud Bigtable
Overview Ideal for
● High performance wide column NoSQL
database service
● Operational applications
● Sparsely populated table ● Analytical applications
● Can scale to billions of rows and
thousands of columns
● Storing large amounts of single-keyed
data
● Can store TB to PB of data ● MapReduce operations
Cloud
Storage
Cloud
Datastore
Cloud
SQL
Cloud
Spanner
BigQuery
Cloud
Firestore
Cloud
Bigtable
Cloud SQL
Overview Ideal for
● Managed service
○ Replication
○ Failover
○ Backups
● Web frameworks
● MySQL, PostgreSQL, and SQL Server ● Structured data
● Relational database service ● OLTP workloads
● Proxy allows for secure access to your
Cloud SQL Second Generation instances
without whitelisting
● Applications using MySQL/PGS
Cloud
Storage
Cloud
Bigtable
Cloud
Datastore
Cloud
Spanner
BigQuery
Cloud
Firestore
Cloud
SQL
Cloud Spanner
Overview Ideal for
● Mission-critical relational database
service
● Mission-critical applications
● Transactional conspiracy ● High transactions
● Global scale ● Scale and consistency requirements
● High availability
● Multi-region replication
● 99.999% SLA
Cloud
Storage
Cloud
Bigtable
Cloud
Datastore
Cloud
SQL
BigQuery
Cloud
Firestore
Cloud
Spanner
BigQuery
Overview Ideal for
● Low-cost enterprise data warehouse for
analytics
● Online Analytical Processing (OLAP)
workloads
● Fully managed ● Big data exploration and processing
● Petabyte scale ● Reporting via Business Intelligence (BI)
tools
● Fast response times
● Serverless
Cloud
Storage
Cloud
Bigtable
Cloud
Datastore
Cloud
SQL
Cloud
Spanner
Cloud
Firestore
BigQuery
Product Simple Description Ideal for Not Ideal for
Cloud
Storage Binary/object store
Large or rarely accessed
unstructured data
Structured data, building
fast apps
Datastore
Scalable store for structured serve
GAE apps, structured
pure-serve use cases
Relational or
analytic data
Firestore Cloud-native app data at global scale
Real-time NoSQL database to
store and sync data
Mobile, web, multi-user,
IoT & real-time
applications
Bigtable
High-volume, low-latency database
“Flat,” heavy read/write, or
analytical data
High structure or
transactional data
CloudSQL
Well-understood VM-based RDBMS
Web frameworks,
existing applications
Scaling, analytics, heavy
writes
Spanner
Relational DB service
Low-latency transactional
systems
Analytic data
BigQuery
Auto-scaling analytic data warehouse
Interactive analysis of static
datasets
Building fast apps
Storage at a glance
Cloud
SQL
Cloud
Spanner
Cloud
Datastore
Cloud
Bigtable
BigQuery
Cloud
Firestore on
Firebase
Is your data
structured?
Is your workload
analytics?
Is your data
relational?
Do you need updates
or low-latency?
Do you need
Mobile SDK’s?
Do you need
horizontal scalability?
No
Yes
No
Yes
No
Yes
Yes
No Yes
No Yes No
Do you need
Mobile SDK’s?
Firebase
Storage
Yes
No
Cloud
Storage
Which Google Cloud
Database is right for me?
A modern data warehouse on a comprehensive platform
Data ingestion
at any scale
Reliable streaming
data pipeline
Advanced analytics
Data lake and data
warehousing
Apache
Beam
Cloud Pub/Sub Cloud
Dataflow
Cloud
Dataproc
Cloud
Storage
Data Transfer
Service
Cloud Composer
Cloud IoT
Core
Cloud Dataprep
Cloud AI
Services
Google
Data Studio
Tensorflow Sheets
Storage Transfer
Service
Data Catalog
Cloud Data Fusion
Process
Capture Store
Data warehousing
Analyze
BigQuery
storage
BigQuery
analysis engine
Use
Cloud
Storage
Cloud
Transfer
Good for:
Managed Bulk
(arbitrary) data
transfer
Such as:
Cloud migration,
backup, legacy
data
Cloud
Pub/Sub
Streaming Batch
Applications
Data lifecycle - ingest
Stackdriver
Logging
Good for:
Centralized Log
management
solution
Such as:
Log data from
Applications
Cloud
Pub/Sub
Good for:
Global,
Scalable MQ,
durable,
de-couple apps
Such as:
IOT, User event,
System metrics
Cloud
SQL
Good for:
Structured
data, Web
frameworks
Such as:
Meta-data,
Fintech,
AdTech
Cloud
Datastore
Good for:
Hierarchical,
Mobile, Web
Such as:
User profile,
Game states
Cloud
Bigtable
Good for:
Heavy
read/write,
events
Such as:
IOT,
User/system
events, low
latency
systems
Cloud
Firestore
Cloud
Spanner
Good for:
RDBMS, SQL,
Horizontal
scaling
Such as:
Meta-data,
Fintech,
AdTech
Good for:
Hierarchical,
Mobile, Web
Such as:
User profile,
Game states
Good for:
Global,
Scalable MQ,
durable,
de-couple apps
Such as:
IOT, User event,
System metrics
Good for:
Binary, Object
data
Such as:
Images, Media
serving, Backup
AutoML Video
Intelligence
AutoML
Vision
Good for:
Object/face
detection,
emotional
facial
attributes, Safe
search, real
time or batch,
OCR
Good for:
Video metadata,
entity analysis,
granularity of 1
frame per second,
Video catalog
(timestamped)
entity search
Data Analysis Task specific Machine Learning
Large scale data processing
Data lifecycle - process and analyze
Cloud
Dataproc
Good for:
Managed
hadoop
eco-systems
Such as:
Batch and
streaming
analytics over
Big Data,
Machine
Learning
Cloud
Dataflow
Good for:
Unified abs. for
batch & streaming
data.
Such as:
New pipelines,
Windowing
operations,
Watermarking
Cloud
Dataprep
Good for:
UI Driven data
preparation
Such as:
Pre-step to Big
data jobs
(Dataproc/Data
Flow), Machine
Learning
BigQuery
Vertex AI
Platform
Good for:
General
purpose ML
platform.
Such as:
Data
scientists,
ML on Data
warehouse
Custom ML
Cloud
Dataproc
Good for:
Managed
hadoop
eco-systems
Such as:
ML Jobs using
Mahour/Spark
MLLib
AutoML
Translation
AutoML
NLP
Good for:
Structure and
meaning of text,
sentiment
analysis
Good for:
Auto translation of
90 languages,
language
detection, both
real time and
batch
AutoML
Tables
Good for:
Analyse structured
data, find data
traits, data label
and target feature
selection
Good for:
Enterprise Data
Warehouse
Such as:
Analytics,
Dashboards,
Business
Intelligence, Basic
Machine Learning
Cloud
Datalab
Connected
Sheets
Good for:
Jupyter notebooks
for general purpose
data visualization
Good for:
Using Google
App script
ability to run
BigQuery
Query. Usually
for quick
short analysis
on smaller
datasets
Google Data
Studio
Good for:
Drag and Drop report
builder from Google
Sheets, BigQuery,
Cloud storage files,
SQL
Business Intelligence Spreadsheet
Data Science
Data lifecycle - explore and visualize
Looker
Good for:
Custom applications,
embedded
visualizations, data
science workflows,
Integrates with
BigQuery
Cloud
Dataprep
Good for:
UI Driven data
preparation
and
visualization.
Also used as
Pre-step to
Big data jobs
(Dataproc/Dat
aFlow),
Machine
Learning
Big Data Reference Architecture
Data Science Reference Architecture
(High Performance Computing)
Proprietary + Confidential
SQL:2011
Compliant
Petabit Network
BigQuery High-Available Cluster Compute
(Dremel)
Streaming Ingest
Free Bulk
Loading
Replicated, Distributed Storage
(99.9999999999% durability) REST API
Client libraries for: C#, Go, Java,
Node.js, PHP, Python, Ruby
Web UI, CLI
Distributed
Memory Shuffle
Tier
BigQuery | Architecture
Decoupled storage and compute for maximum flexibility
Proprietary + Confidential
Economic value - Data Warehouse Migration
lowers your TCO massively
ES G 2019 : The Economic
advantage of migrating Data
Warehouse Workloads to
BigQuery
52% Lower TCO
(versus on-premises)
41% Lower TCO
(vs Teradata on AWS)
TCO Calculator
Expected 3-year total cost of ownership
Teradata
on-premises
$0
Teradata
on AWS
Google
BigQuery
$2,000,000
$4,000,000
$6,000,000
$8,000,000
$10,000,000
$12,000,000
$14,000,000
$16,000,000
41% lower
TCO (vs EDW
on AWS)
52% lower
TCO vs Legacy
TD on-prem
Up-front Capital Investment Monthly Cloud spend
Administrative costs Planning/deployment/migration
Power/cooling/floorspace
ESG 2019
Proprietary + Confidential
Google Cloud provides the most modern data warehouse
Impact Google Cloud
BigQuery
Teradata
on-prem
AWS RedShift Snowflake Azure Synapse Analytics
Scale ✓ Fully managed and
serverless
✓ Petabyte-scale
✓ No warm-up or
maintenance
✕ Tied to cluster
✕ Significant
performance
bottlenecks
✕ Tied to cluster (RedShift
Spectrum is serverless)
✕ Considerable amount of
tuning needed
✕ Huge performance
bottlenecks
✕ Reclustering, shuffles, and
loads hurt performance
✕ SSDs tied to VMs
✕ Significant performance
bottlenecks on large data
✕ Compute has to be scaled up
manually
✕ Capacity limits based on
instance size
Real-time ✓ Streaming data
✓ BI Engine
✓ Streaming SQL
✓ Streaming data,
dashboards, SQL
✓ Streaming data,
dashboards, SQL
✕ Poor streaming performance ✓ Streaming data, dashboards
✕ Requires Databricks for
streaming scenarios
AI support ✓ Built-in BigQuery ML
✓ Two-way connections
to AI Platform
✓ Storage API for
Spark/Dataproc
✓ Some built-in ML
✕ Only basic
techniques
✕ No SQL-based ML
✓ Integration with Sage
Maker
✕ No SQL-based AI/ML
✕ No high-performance support
for Spark
✕ No SQL-based ML
✕ Just a rebrand of three
separate products; no deep
integration
Data
security
✓ Encrypted at rest and
in transit
✓ Immutable audit logs
✓ Data Catalog
✓ DLP API for redaction
✓ Integrated security ✓ Integrated security
✕ Partner tool (DgSecure)
needed for redaction
✕ No VPC-SC means no guards
against data exfiltration
✕ Standalone authentication
system
✕ No native redaction capability
✓ Integrated security
✕ Patches applied during
maintenance windows, with
downtime
BigQuery Hands on Lab
https://google.qwiklabs.com/focuses/1145?parent=catalog
Qwiklab
Any questions?
Thank you!
Proprietary + Confidential
NDA
Performance at Scale
Petabyte scale, automated, and intelligent - lets your enterprise focus on
delivering insights not infrastructure
Built-in advanced
analytics capabilities
Completely automated
and serverless
Manual configuration
Workloads
and
analytics
Degree of automation
BigQuery
Ad-hoc reporting,
operational insight
Basic reporting Legacy DW
Proprietary + Confidential
Expected 3-Year Total Cost of Ownership
52% Lower TCO1
(versus on-premises)
26-34% Lower TCO2
(vs other Cloud DW’s)
Flat-rate and variable pricing
options to give customers
control over TCO
1) Migrating Enterprise Data Warehouse Workloads - ESG 2019
2) Google BigQuery vs. Alternative Cloud-based EDW Solutions - ESG 2019
Economic Value - BigQuery lowers your data warehouse
TCO massively
How Google’s Smart Analytics Platform is Unique in the Industry
Scale ✓ Partial ✕
BigQuery is fully managed, serverless and architected for petabyte scale. While others are tied to clusters or require manual reclustering efforts
BQ manages the infrastructure for you and allows your teams to focus on delivering insights
Total Cost of Ownership ✓ ✕ ✕
BigQuery eliminates the need for upfront investment and planning for your EDW, reduces operational and administrative expenses - all while
delivering on business agility. Enterprise Strategy Group (ESG) estimated savings of 26-34% over cloud-based EDW alternatives and >40% over
legacy on-premise solutions
Interoperability ✓ ✕ ✕
BigQuery provides a unified, interoperable best of breed platform across your Data Warehouse and Data Lakes and data integration across
on-prem and cloud sources. BQ was made to tear down data silos and allow you to avoid creating new ones.
Democratized ML/AI ✓ ✕ ✕
BigQuery democratizes Machine Learning for the enterprise user (not just data scientists) with accessible capabilities using SQL. While allowing
for more sophisticated data science teams to access the power of Google’s leading edge AI technologies via Cloud AI. More than 80% of our
BigQuery customers have incorporated ML into their business analysis
Reliable & Secure ✓ Partial Partial
BigQuery offers robust security, governance and reliability that is unmatched in the industry. High availability and a 99.99% SLA, automatic data
replication, restore and backup to ensure business continuity. Ability to classify and redact sensitive data, fine-grained identity and access
management including access transparency so you can log each view. Data is encrypted at rest and in transit by default, and
customer-managed encryption keys provide control over your data
Real-Time ✓ ✕ ✕
Designed to excel in IoT and other scenarios where your analysis depends on real-time streaming data as well as a BI acceleration engine for
high-concurrency low-latency use cases - both are unique differentiators for Google Cloud and essential for businesses that need to make real
time decisions
Usefully Multi-Cloud ✓ ✕ ✕
BigQuery breaks down the silos to provide a single pane of glass for all your data across multiple clouds (AWS. Azure). Most other vendors are
focused on providing the same service running in 3 clouds but these are 3 silos. BigQuery breaks the silo and enables customers to analyze data
across datasets
Industry Leadership ✓ ? ?
Recognized industry leader by both Gartner and Forrester in Data Management and Analytics. With 9 Google products with more than a billion
users running on our platform you can be sure that big data is in our DNA and we are ready to help your business build a future ready data
platform
L
e
g
a
c
y
S
o
l
u
t
i
o
n
s
Appendix
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP

Weitere ähnliche Inhalte

Ähnlich wie Data Platform on GCP

Accelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data StrategyAccelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data Strategy
MongoDB
 
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...
Codecamp Romania
 
Google cloud big data summit master gcp big data summit la - 10-20-2015
Google cloud big data summit   master gcp big data summit la - 10-20-2015Google cloud big data summit   master gcp big data summit la - 10-20-2015
Google cloud big data summit master gcp big data summit la - 10-20-2015
Raj Babu
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
Provectus
 

Ähnlich wie Data Platform on GCP (20)

Slides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-CloudSlides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-Cloud
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
 
Accelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data StrategyAccelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data Strategy
 
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...
 
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
 
Executive Intro to BigQuery
Executive Intro to BigQueryExecutive Intro to BigQuery
Executive Intro to BigQuery
 
Google cloud big data summit master gcp big data summit la - 10-20-2015
Google cloud big data summit   master gcp big data summit la - 10-20-2015Google cloud big data summit   master gcp big data summit la - 10-20-2015
Google cloud big data summit master gcp big data summit la - 10-20-2015
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
Eric Andersen Keynote
Eric Andersen KeynoteEric Andersen Keynote
Eric Andersen Keynote
 
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
High-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache ImpalaHigh-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache Impala
 
Delivering business insights and automation utilizing aws data services
Delivering business insights and automation utilizing aws data servicesDelivering business insights and automation utilizing aws data services
Delivering business insights and automation utilizing aws data services
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
Accelerating the Path to Digital with a Cloud Data Strategy
Accelerating the Path to Digital with a Cloud Data StrategyAccelerating the Path to Digital with a Cloud Data Strategy
Accelerating the Path to Digital with a Cloud Data Strategy
 
Google на конференции Big Data Russia
Google на конференции Big Data RussiaGoogle на конференции Big Data Russia
Google на конференции Big Data Russia
 
3 Steps to Accelerate to Cloud
3 Steps to Accelerate to Cloud3 Steps to Accelerate to Cloud
3 Steps to Accelerate to Cloud
 
Getting more into GCP.pdf
Getting more into GCP.pdfGetting more into GCP.pdf
Getting more into GCP.pdf
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
 

Kürzlich hochgeladen

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 

Kürzlich hochgeladen (20)

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 

Data Platform on GCP

  • 1. Proprietary + Confidential SQL Saturday - Los Angeles Data Platform on GCP Patrick Alexander Google - Customer Engineer Ex-Microsoft - Principal Cloud Solution Architect PatrickGCP@Google.com @PatrickCloudArc
  • 2. Google Office Spruce Goose Playa Vista - California 2022
  • 3. Spruce Goose Hughes H-4 Hercules 1942 - 1947
  • 4. https://en.wikipedia.org/wiki/Hughes_H-4_Hercules November 2, 1947 Long Beach, California THE STATS Wingspan: 320′ 11″ Length: 218′ 8″ Height: 79′ 4″ Pounds, Empty Weight: 300,000 Cruise Speed: 135 MPH
  • 5.
  • 6. Intro to GCP’s Data Platform
  • 7. Confidential & Proprietary Agenda 01 The Data Landscape 02 Google Cloud Platform 03 Google Cloud Big Data Portfolio
  • 9. Confidential & Proprietary $203,579 In Amazon sales generated Data is surging every minute. How are you using it? 500 Hours of video uploaded on YouTube 142,361,111 Emails sent and received 2,083,333 Minutes used on Skype calls 347,222 Tweets posted 50,200 Mobile apps downloaded 1,389 Uber rides taken 2.4 Million Google searches made 216,000 Photos posted to Instagram *Stats may be out of date!
  • 10.
  • 11. Confidential & Proprietary Unintegrated Marketing Tools Many companies use 20+ separate tools It’s difficult to get a holistic view of customers. Company Data in Silos CRM / ERP / Billing / Inventory / POS
  • 12. Confidential & Proprietary If you want to unlock the power of your data, you need a CDP (customer data platform), not just new tools.
  • 14. Warehouse Cloud Storage Object Binary or object data Images, media serving, backups Memcache Key-value Web/mobile applications, gaming Game state, user sessions Non-relational Cloud Datastore Hierarchical, mobile, web User profiles, Game State Cloud Bigtable Heavy read + write, events AdTech, financial, IoT Relational Cloud SQL Web frameworks CMS, eCommerce Cloud Spanner RDBMS+scale, HA, HTAP Transactions, Ad/Fin/MarTech BigQuery Enterprise Data Warehouse Analytics, Dashboards Fully managed storage & database services
  • 15. A modern data warehouse on a comprehensive platform Data ingestion at any scale Reliable streaming data pipeline Advanced analytics Data lake and data warehousing Cloud Pub/Sub Cloud Dataflow Cloud Dataproc Cloud Storage Data Transfer Service Cloud Composer Cloud IoT Core Cloud Dataprep Cloud AI Services Google Data Studio Tensorflow Sheets Storage Transfer Service Data Catalog Cloud Data Fusion Process Capture Store Data warehousing Analyze BigQuery storage BigQuery analysis engine Use Apache Beam
  • 16. 16 A Leader in Cloud Data Warehouse Data Ingestion Data Lake Integration ML / Data Science Performance Scalability Google receives 5 of 5 in 19 different criteria, such as: Solution Roadmap Strategy Execution Customer Adoption Use Cases Partners The Forrester Wave™: Cloud Data Warehouse, Q1 2021, Noel Yuhanna The Forrester Wave™ is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change.
  • 17. Google Cloud Big Data Portfolio
  • 18. GCP provides a full suite of storage service options ● Cost-effective ● Varied choices based on your: ○ Application ○ Workload Highlight rows using blue and white, just like the agenda Cloud Storage Cloud Bigtable Cloud Datastore Cloud SQL Cloud Spanner BigQuery Cloud Firestore
  • 19. Overview Ideal for ● Fully managed, highly reliable ● Images and videos ● Cost-efficient, scalable object/blob store ● Objects and blobs ● Objects access via HTTP requests ● Unstructured data ● Object name is the only key ● Static website hosting Cloud Storage Cloud Bigtable Cloud Datastore Cloud SQL Cloud Spanner BigQuery Cloud Firestore Cloud Storage
  • 20. Cloud Datastore Overview Ideal for ● Fully managed NoSQL ● Semi-structured application data ● Scalable ● Durable key-value data ● Hierarchical data ● Managing multiple indexes ● Transactions Cloud Storage Cloud Bigtable Cloud SQL Cloud Spanner BigQuery Cloud Firestore Cloud Datastore
  • 21. Cloud Firestore Overview Ideal for ● Fully managed, serverless, NoSQL ● Scalable ● Native mobile and web client libraries ● Real-time updates ● Document-oriented data ● Large collections of small documents ● Native mobile and web clients ● Durable key-value data ● Hierarchical data ● Managing multiple indexes ● Transactions Cloud Storage Cloud Bigtable Cloud Datastore Cloud SQL Cloud Spanner BigQuery Cloud Firestore
  • 22. Cloud Bigtable Overview Ideal for ● High performance wide column NoSQL database service ● Operational applications ● Sparsely populated table ● Analytical applications ● Can scale to billions of rows and thousands of columns ● Storing large amounts of single-keyed data ● Can store TB to PB of data ● MapReduce operations Cloud Storage Cloud Datastore Cloud SQL Cloud Spanner BigQuery Cloud Firestore Cloud Bigtable
  • 23. Cloud SQL Overview Ideal for ● Managed service ○ Replication ○ Failover ○ Backups ● Web frameworks ● MySQL, PostgreSQL, and SQL Server ● Structured data ● Relational database service ● OLTP workloads ● Proxy allows for secure access to your Cloud SQL Second Generation instances without whitelisting ● Applications using MySQL/PGS Cloud Storage Cloud Bigtable Cloud Datastore Cloud Spanner BigQuery Cloud Firestore Cloud SQL
  • 24. Cloud Spanner Overview Ideal for ● Mission-critical relational database service ● Mission-critical applications ● Transactional conspiracy ● High transactions ● Global scale ● Scale and consistency requirements ● High availability ● Multi-region replication ● 99.999% SLA Cloud Storage Cloud Bigtable Cloud Datastore Cloud SQL BigQuery Cloud Firestore Cloud Spanner
  • 25. BigQuery Overview Ideal for ● Low-cost enterprise data warehouse for analytics ● Online Analytical Processing (OLAP) workloads ● Fully managed ● Big data exploration and processing ● Petabyte scale ● Reporting via Business Intelligence (BI) tools ● Fast response times ● Serverless Cloud Storage Cloud Bigtable Cloud Datastore Cloud SQL Cloud Spanner Cloud Firestore BigQuery
  • 26. Product Simple Description Ideal for Not Ideal for Cloud Storage Binary/object store Large or rarely accessed unstructured data Structured data, building fast apps Datastore Scalable store for structured serve GAE apps, structured pure-serve use cases Relational or analytic data Firestore Cloud-native app data at global scale Real-time NoSQL database to store and sync data Mobile, web, multi-user, IoT & real-time applications Bigtable High-volume, low-latency database “Flat,” heavy read/write, or analytical data High structure or transactional data CloudSQL Well-understood VM-based RDBMS Web frameworks, existing applications Scaling, analytics, heavy writes Spanner Relational DB service Low-latency transactional systems Analytic data BigQuery Auto-scaling analytic data warehouse Interactive analysis of static datasets Building fast apps Storage at a glance
  • 27.
  • 28. Cloud SQL Cloud Spanner Cloud Datastore Cloud Bigtable BigQuery Cloud Firestore on Firebase Is your data structured? Is your workload analytics? Is your data relational? Do you need updates or low-latency? Do you need Mobile SDK’s? Do you need horizontal scalability? No Yes No Yes No Yes Yes No Yes No Yes No Do you need Mobile SDK’s? Firebase Storage Yes No Cloud Storage Which Google Cloud Database is right for me?
  • 29. A modern data warehouse on a comprehensive platform Data ingestion at any scale Reliable streaming data pipeline Advanced analytics Data lake and data warehousing Apache Beam Cloud Pub/Sub Cloud Dataflow Cloud Dataproc Cloud Storage Data Transfer Service Cloud Composer Cloud IoT Core Cloud Dataprep Cloud AI Services Google Data Studio Tensorflow Sheets Storage Transfer Service Data Catalog Cloud Data Fusion Process Capture Store Data warehousing Analyze BigQuery storage BigQuery analysis engine Use
  • 30. Cloud Storage Cloud Transfer Good for: Managed Bulk (arbitrary) data transfer Such as: Cloud migration, backup, legacy data Cloud Pub/Sub Streaming Batch Applications Data lifecycle - ingest Stackdriver Logging Good for: Centralized Log management solution Such as: Log data from Applications Cloud Pub/Sub Good for: Global, Scalable MQ, durable, de-couple apps Such as: IOT, User event, System metrics Cloud SQL Good for: Structured data, Web frameworks Such as: Meta-data, Fintech, AdTech Cloud Datastore Good for: Hierarchical, Mobile, Web Such as: User profile, Game states Cloud Bigtable Good for: Heavy read/write, events Such as: IOT, User/system events, low latency systems Cloud Firestore Cloud Spanner Good for: RDBMS, SQL, Horizontal scaling Such as: Meta-data, Fintech, AdTech Good for: Hierarchical, Mobile, Web Such as: User profile, Game states Good for: Global, Scalable MQ, durable, de-couple apps Such as: IOT, User event, System metrics Good for: Binary, Object data Such as: Images, Media serving, Backup
  • 31. AutoML Video Intelligence AutoML Vision Good for: Object/face detection, emotional facial attributes, Safe search, real time or batch, OCR Good for: Video metadata, entity analysis, granularity of 1 frame per second, Video catalog (timestamped) entity search Data Analysis Task specific Machine Learning Large scale data processing Data lifecycle - process and analyze Cloud Dataproc Good for: Managed hadoop eco-systems Such as: Batch and streaming analytics over Big Data, Machine Learning Cloud Dataflow Good for: Unified abs. for batch & streaming data. Such as: New pipelines, Windowing operations, Watermarking Cloud Dataprep Good for: UI Driven data preparation Such as: Pre-step to Big data jobs (Dataproc/Data Flow), Machine Learning BigQuery Vertex AI Platform Good for: General purpose ML platform. Such as: Data scientists, ML on Data warehouse Custom ML Cloud Dataproc Good for: Managed hadoop eco-systems Such as: ML Jobs using Mahour/Spark MLLib AutoML Translation AutoML NLP Good for: Structure and meaning of text, sentiment analysis Good for: Auto translation of 90 languages, language detection, both real time and batch AutoML Tables Good for: Analyse structured data, find data traits, data label and target feature selection Good for: Enterprise Data Warehouse Such as: Analytics, Dashboards, Business Intelligence, Basic Machine Learning
  • 32. Cloud Datalab Connected Sheets Good for: Jupyter notebooks for general purpose data visualization Good for: Using Google App script ability to run BigQuery Query. Usually for quick short analysis on smaller datasets Google Data Studio Good for: Drag and Drop report builder from Google Sheets, BigQuery, Cloud storage files, SQL Business Intelligence Spreadsheet Data Science Data lifecycle - explore and visualize Looker Good for: Custom applications, embedded visualizations, data science workflows, Integrates with BigQuery Cloud Dataprep Good for: UI Driven data preparation and visualization. Also used as Pre-step to Big data jobs (Dataproc/Dat aFlow), Machine Learning
  • 33. Big Data Reference Architecture
  • 34. Data Science Reference Architecture
  • 35.
  • 36.
  • 38.
  • 39. Proprietary + Confidential SQL:2011 Compliant Petabit Network BigQuery High-Available Cluster Compute (Dremel) Streaming Ingest Free Bulk Loading Replicated, Distributed Storage (99.9999999999% durability) REST API Client libraries for: C#, Go, Java, Node.js, PHP, Python, Ruby Web UI, CLI Distributed Memory Shuffle Tier BigQuery | Architecture Decoupled storage and compute for maximum flexibility
  • 40. Proprietary + Confidential Economic value - Data Warehouse Migration lowers your TCO massively ES G 2019 : The Economic advantage of migrating Data Warehouse Workloads to BigQuery 52% Lower TCO (versus on-premises) 41% Lower TCO (vs Teradata on AWS) TCO Calculator Expected 3-year total cost of ownership Teradata on-premises $0 Teradata on AWS Google BigQuery $2,000,000 $4,000,000 $6,000,000 $8,000,000 $10,000,000 $12,000,000 $14,000,000 $16,000,000 41% lower TCO (vs EDW on AWS) 52% lower TCO vs Legacy TD on-prem Up-front Capital Investment Monthly Cloud spend Administrative costs Planning/deployment/migration Power/cooling/floorspace ESG 2019
  • 41. Proprietary + Confidential Google Cloud provides the most modern data warehouse Impact Google Cloud BigQuery Teradata on-prem AWS RedShift Snowflake Azure Synapse Analytics Scale ✓ Fully managed and serverless ✓ Petabyte-scale ✓ No warm-up or maintenance ✕ Tied to cluster ✕ Significant performance bottlenecks ✕ Tied to cluster (RedShift Spectrum is serverless) ✕ Considerable amount of tuning needed ✕ Huge performance bottlenecks ✕ Reclustering, shuffles, and loads hurt performance ✕ SSDs tied to VMs ✕ Significant performance bottlenecks on large data ✕ Compute has to be scaled up manually ✕ Capacity limits based on instance size Real-time ✓ Streaming data ✓ BI Engine ✓ Streaming SQL ✓ Streaming data, dashboards, SQL ✓ Streaming data, dashboards, SQL ✕ Poor streaming performance ✓ Streaming data, dashboards ✕ Requires Databricks for streaming scenarios AI support ✓ Built-in BigQuery ML ✓ Two-way connections to AI Platform ✓ Storage API for Spark/Dataproc ✓ Some built-in ML ✕ Only basic techniques ✕ No SQL-based ML ✓ Integration with Sage Maker ✕ No SQL-based AI/ML ✕ No high-performance support for Spark ✕ No SQL-based ML ✕ Just a rebrand of three separate products; no deep integration Data security ✓ Encrypted at rest and in transit ✓ Immutable audit logs ✓ Data Catalog ✓ DLP API for redaction ✓ Integrated security ✓ Integrated security ✕ Partner tool (DgSecure) needed for redaction ✕ No VPC-SC means no guards against data exfiltration ✕ Standalone authentication system ✕ No native redaction capability ✓ Integrated security ✕ Patches applied during maintenance windows, with downtime
  • 42. BigQuery Hands on Lab https://google.qwiklabs.com/focuses/1145?parent=catalog Qwiklab
  • 44. Proprietary + Confidential NDA Performance at Scale Petabyte scale, automated, and intelligent - lets your enterprise focus on delivering insights not infrastructure Built-in advanced analytics capabilities Completely automated and serverless Manual configuration Workloads and analytics Degree of automation BigQuery Ad-hoc reporting, operational insight Basic reporting Legacy DW
  • 45. Proprietary + Confidential Expected 3-Year Total Cost of Ownership 52% Lower TCO1 (versus on-premises) 26-34% Lower TCO2 (vs other Cloud DW’s) Flat-rate and variable pricing options to give customers control over TCO 1) Migrating Enterprise Data Warehouse Workloads - ESG 2019 2) Google BigQuery vs. Alternative Cloud-based EDW Solutions - ESG 2019 Economic Value - BigQuery lowers your data warehouse TCO massively
  • 46. How Google’s Smart Analytics Platform is Unique in the Industry Scale ✓ Partial ✕ BigQuery is fully managed, serverless and architected for petabyte scale. While others are tied to clusters or require manual reclustering efforts BQ manages the infrastructure for you and allows your teams to focus on delivering insights Total Cost of Ownership ✓ ✕ ✕ BigQuery eliminates the need for upfront investment and planning for your EDW, reduces operational and administrative expenses - all while delivering on business agility. Enterprise Strategy Group (ESG) estimated savings of 26-34% over cloud-based EDW alternatives and >40% over legacy on-premise solutions Interoperability ✓ ✕ ✕ BigQuery provides a unified, interoperable best of breed platform across your Data Warehouse and Data Lakes and data integration across on-prem and cloud sources. BQ was made to tear down data silos and allow you to avoid creating new ones. Democratized ML/AI ✓ ✕ ✕ BigQuery democratizes Machine Learning for the enterprise user (not just data scientists) with accessible capabilities using SQL. While allowing for more sophisticated data science teams to access the power of Google’s leading edge AI technologies via Cloud AI. More than 80% of our BigQuery customers have incorporated ML into their business analysis Reliable & Secure ✓ Partial Partial BigQuery offers robust security, governance and reliability that is unmatched in the industry. High availability and a 99.99% SLA, automatic data replication, restore and backup to ensure business continuity. Ability to classify and redact sensitive data, fine-grained identity and access management including access transparency so you can log each view. Data is encrypted at rest and in transit by default, and customer-managed encryption keys provide control over your data Real-Time ✓ ✕ ✕ Designed to excel in IoT and other scenarios where your analysis depends on real-time streaming data as well as a BI acceleration engine for high-concurrency low-latency use cases - both are unique differentiators for Google Cloud and essential for businesses that need to make real time decisions Usefully Multi-Cloud ✓ ✕ ✕ BigQuery breaks down the silos to provide a single pane of glass for all your data across multiple clouds (AWS. Azure). Most other vendors are focused on providing the same service running in 3 clouds but these are 3 silos. BigQuery breaks the silo and enables customers to analyze data across datasets Industry Leadership ✓ ? ? Recognized industry leader by both Gartner and Forrester in Data Management and Analytics. With 9 Google products with more than a billion users running on our platform you can be sure that big data is in our DNA and we are ready to help your business build a future ready data platform L e g a c y S o l u t i o n s