1. BIG DATA AND MACHINE LEARNING
ON GOOGLE CLOUD PLATFORM
Wlodek Bielski | CEO Pure Company
2. AGENDA
Google Cloud Platform at a glance
BigQuery: Cloud DWH and Big Data
Managed Hadoop, Beam, Airflow – overview of GCP PaaS
Machine Learning: from ML API toTensorflow
5. GARTNER MQ FOR CLOUD IAAS, 2018
„Google has been most differentiated
on the forward edge of IT, with deep
investments in analytics and ML,
and many customers who choose Google
for strategic adoption have applications
that are anchored by BigQuery”
8. BIGQUERY: NO-OPS CLOUD DWH
Near real-time analysis of massive datasets
Standard SQL syntax (ANSI SQL 2011)
No-ops for performance and scaling (global black-box)
Separated storage and compute, linked with petabit network
Pay-as-you-go: only for queries and storage used
Automatic discount for long-term storage
13. DATA PROCESSING ON GCP
Cloud Composer – managed AirFlow
Dataproc – Hadoop + Spark
DataFlow – Apache Beam
Matillion – ETL/ELT, mainly for BigQuery
14. DATAPROCVS DATAFLOW
Cloud Dataproc
Migrating existing Hadoop workloads
Iterative processing and Notebooks
ML with Spark ML
Cloud Dataflow
Better for greenfields
Batch + streaming in one tool
Based on Apache Beam
Multiple runtimes, e.g. Spark, Flink
Preprocessing for CloudML
16. GOOGLE ML OFFERING
ML APIs AutoML CloudML
Tensorflow
DIY
Data Science expertise required
17. VISION API
Pretrained models via API
Label detection
Face detection
NO Face recognition
Logo detection
REST API
Cloud Storage integration
18. AUTOML
VISION
Custom ML models
without coding / ML skills
Human labeling available
(2-20 labels, up to 5 working
days)
Powered by Google
research in AutoML
andTransfer Learning
19. CLOUD ML ENGINE
Complete operationalization service in Cloud
GCP console, command line gcloud ml-engine, REST API
TensorFlow, scikit-learn and XGBoost support
Both Python 2 and Python 3 are supported in v1.4+
Training with GPUs andTPUs (beta)
23. WRAP-UP
BigQuery – Cloud DWH
BigQuery ML – pure SQL, for devs / data analysts
Set of APIs for developers (e.g.Vision API)
AutoML for analysts
CloudML – for data scientists