With BigQuery ML, you can build machine learning models without leaving the data warehouse environment and training it on massive datasets. We are going to demonstrate how to build, train, eval and predict, your own scalable machine learning models using standard SQL language in Google BigQuery.
We will see how can we use CREATE MODEL sql syntax to build different models such as:
Linear regression
Multiclass logistic regression for classification
K-means clustering
Import TensorFlow models for prediction in BigQuery
We will see how we can apply these models on tabular data in retail and marketing use cases.
Models are trained and accessed in BigQuery using SQL — a language data analysts know. This enables business decision making through predictive analytics across the organization without leaving the query editor.
2. ● Geek. Hiker. Do-er.
● Among the Top3 romanians on Stackoverflow 133k reputation
● Google Developer Expert on Cloud technologies
● Crafting Web/Mobile backends at REEA.net
● BigQuery/Redis and database engine expert
● Active in mentoring and IT community
StackOverflow: pentium10
GitHub: pentium10
Slideshare: martonkodok
Twitter: @martonkodok
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
About me
3. 1. Application development in the Cloud using Serverless services
2. What is BigQuery? - Data warehouse in the Cloud
3. Introduction to BigQuery ML - execute ML models using SQL
4. Practical use cases
5. Segment and recommend with BigQuery ML
6. Conclusions
Agenda
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
5. Google sees serverless as
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Programming model
Focus on code
Event-driven
Stateless
Operational model Billing model
Pay for usageZero ops
Automatic scaling
Managed security
Dev Ops $
6. Serverless is more than a set of functions
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Cloud Dataflow Cloud Tasks
Cloud Storage
Cloud PubSub
Cloud Functions App Engine
BigQuery
Stackdriver
7. Serverless is about maximizing elasticity, cost
savings, and agility of cloud computing.
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
10. BigQuery ML - Machine Learning at Scale using SQL @martonkodok
11. Analytics-as-a-Service - Data Warehouse in the Cloud
Scales into Petabytes on Managed Infrastructure - load up to 5TB large files
Familiar DB Structure (table, columns, views, struct, nested, JSON)
SQL 2011 + Javascript UDF (User Defined Functions)
Integrates with Google Sheets + Cloud Storage + Pub/Sub connectors
BigQuery ML enables users to create machine learning models by SQL queries
Decent pricing (storage: $20/TB cold: $10/TB,queries $5/TB) *May 2019
What is BigQuery?
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
12. BigQuery: federated data access warehouse
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Application & Presentation
Audit logs
Billing entries
Stackdriver
Firebase
Google
Marketing
Platform
Cloud
Dataflow
Cloud
Storage
Report & Share
Business Analysis
BI Interface
Data Studio 360
Analysis
Processing
ML
Frontend
Platform Services
Real-Time Events
Multiple Platforms
Database
SQL
13. “ Data needs to be processed in
multiple services.
How can we pipe to multiple places?
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
14. Architecting for The Cloud
BigQuery
On-Premises Servers
Pipelines
ETL
Engine
Event Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
15. Data Pipeline Integration at REEA.net
Analytics Backend
BigQuery
On-Premises Servers
Pipelines
FluentD
Event Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
Development
Team
Report & Share
Business Analysis
Tools
Tableau
QlikView
Data Studio
Internal
Dashboard
Database
SQL
Application
ServersServers
Cloud Storage
archive
Load / Export
Replay
Standard
Devices
HTTPS
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Cloud
Functions
16. ● SQL 2011 standard
● big costs saving with partitioning/clustering
● ability to throw in / join all kind of data
● run raw ad-hoc queries (either by analysts/sales or Devs)
● inspiring ML functions - devs no longer leave the IDE
● pricing model 1TB free every month
● no more throwing away-, expiring-, aggregating old data
● no running out of resources
Our benefits using BigQuery
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
18. BigQuery ML - Machine Learning at Scale using SQL @martonkodok
BigQuery ML
1. Execute ML initiatives without moving
data from BigQuery
2. Integrate on models in SQL in BigQuery
to increase development speed
3. Automate common ML tasks and
hyperparameter tuning
19. ● Leverage BigQuery’s processing power to build a model with SQL syntax
● Create model from tabular data
● Auto-split of data into training and test
● Auto-tuned learning rate
● Model evaluation charts on BigQuery UI
● Ability to join the recommendation output with your own tables
Behind the scenes - through two lines of SQL
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
20. Developer SQL Analyst Data Scientist Use cases and skills
TensorFlow and
CloudML Engine
● Build and deploy state-of-art custom models
● Requires deep understanding of ML and
programming
BigQuery ML
● Build and deploy custom models using SQL
● Requires only basic understanding of ML
AutoML and
CloudML APIs
● Build and deploy Google-provided models for
standard use cases
● Requires almost no ML knowledge
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Making ML accessible for all audiences
21. ● Linearregression for forecasting
● Binaryor Multiclasslogisticregression for classification (labels can have up to 50 unique values)
● K-meansclustering for data segmentation (unsupervised learning - not require labels/training)
● Matrixfactorization (Alpha)
● DeepNeuralNetworks using Tensorflow (Alpha)
● ImportTensorFlowmodels for prediction in BigQuery (Alpha)
● Feature pre-processingfunctions (Alpha)
Alphas are whitelist only. Please contact your Google CE/Sales/TAM.
Supported models in BigQuery ML
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
22. Objectives:
● Create a binary logistic regression model using the CREATEMODEL statement
● TheML.EVALUATE function to evaluate the ML model
● TheML.PREDICTfunction to make predictions using the ML model
In this tutorial, you use the sample Google Analytics dataset for BigQuery
to create a model that predicts whether a website visitor will make a transaction.
https://cloud.google.com/bigquery-ml/docs/bigqueryml-web-ui-start
Getting started with BigQuery ML
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
23. Create a binary logistic regression model
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
26. Predict purchases per user
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
27. Use cases:
● Customer segmentation
● Data quality
Options and defaults
● Number of clusters: Default log10
(num_rows) clusters
● Distance type - Euclidean(default), Cosine
● Supports all major SQL data types including GIS
K-means clustering
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
CREATE MODEL yourmodel
OPTIONS (model_type = “kmeans”)
AS SELECT..
ml.PREDICT maps rows to closest clusters
ml.CENTROID for cluster centroids
ml.EVALUATE
ml.TRAINING_INFO
ml.FEATURE_INFO
28. Available data:
● Encode yes/no features
(eg: has a microwave, has a kitchen, has a TV, has a bathroom)
● Can apply clustering on the encoded data
K-means clustering: Problem definition
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
29. Premise
We can identity oddities
(potential data quality issues)
by grouping things together
and separating outliers.
K-means clustering: Problem definition
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
30. Use cases:
● Product recommendation
● Marketing campaign target optimization tool
Options and defaults
● Input: User, Item, Rating
● Can use L2 regularization
● Specify training-test split (default random 80-20)
Matrix Factorization (Alpha)
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
CREATE MODEL yourmodel
OPTIONS (model_type = “matrix_factorization”)
AS SELECT..
ml.PREDICT for user-item ratings
ml.RECOMMEND for full user-item matrix
ml.EVALUATE
ml.WEIGHTS
ml.TRAINING_INFO
ml.FEATURE_INFO
31. Available data:
● User
● Item
● Rating
Problem
● assigning values for previously unknown values
(zeros in our case)
Matrix Factorization: Problem definition
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
32. Segmentation
● Rating can be any metric of views, visits, purchases, edits, saves etc… or combined.
● Try and play with different models based on different rating values.
Recommendation
● assigning values for previously unknown values (zeros in our case)
● based on the recommendation results you can order by / display your results
Marketing campaign
● who to target with an AD campaign? I have budget only for 1000 people.
● use as an optimization tool - which customers will likely to buy?
Summary: Segment and recommend with BigQuery ML
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
33. Automation
● Run the process daily
● Determine hyperparameters
● Surface the results and route them somewhere for inspection and improvement
Testing
● AB test around impact of data quality on conversion and customer NPS (net promoter score)
Improvements
● Determine, and explore outliers
● Repeat, automate
Considerations
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
34. What is on the roadmap of BigQueryML?
Cloud Next 19 announcements
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
35. New on BigQuery UI - Training tab charts
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
36. New on BigQuery UI - Evaluation charts
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
37. New on BigQuery UI - Confusion Matrix
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Percentage of actual
labels that were
classified:
- Correctly (Blue)
- Incorrectly (Grey)
38. Use cases:
● Capture non-linear relationship between features and
label for classification and regression
Options and defaults
● Hidden units (optional)
● Hidden layers (optional)
● Drop_out (optional)
● Batch_size (optional)
Deep Neural Networks using TensorFlow (Alpha)
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
CREATE MODEL yourmodel
OPTIONS (model_type =“dnn_classifier”)
AS SELECT..
CREATE MODEL yourmodel
OPTIONS (model_type =“dnn_regressor”) AS
SELECT..
39. NCAA Basketball 3 point attempt prediction
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
40. Use cases:
● Easily add TensorFlow predictions to BigQuery
(AirFlow or Composer) pipelines
● Build unstructured data models in TensorFlow,
predict in BigQuery
Key alpha restrictions
● Model size limit of 250MB
Import TensorFlow models for prediction (Alpha)
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
CREATE MODEL yourmodel
OPTIONS (model_type =“tensorflow”,
Model_path =’gs://’)
ml.PREDICT()
DEMO
Search 'QueryIt Smart' on GitHub to learn more.
42. ● 10 GB of data processed by queries that contain CREATEMODEL statements per month is free.
● Model creation$250perTB
● Evaluation, inspection, and prediction $5perTB
● Limited to 50iterations
● You are limited to 1,000CREATEMODEL queries per day per project
● BigQuery ML supports the same regions as BigQuery (US, EU, ASIA)
Pricing/quotas/limits of BigQuery ML
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
43. ● ML is hard, we don’t have dedicated team.
With BigQuery ML you need only devs who have good SQL skills.
● Extending your current stack with ML is no longer a steep learning curve using BigQuery ML
● Understand how to connect pieces of tabular data to fulfil a business requirement
● Start using the Cloud benefits and BigQuery ML as a complementary system
● Understand BigQuery ML to see that you don’t need large budget to add ML product improvements
#increase #innovation #work on #fun #stuff
Common mindset blockers
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
44. ● Democratizes the use of ML by empowering data analysts to build and run models using existing
business intelligence tools and spreadsheets
● Generalist team. Models are trained using SQL. There is no need to program an ML solution using
Python or Java.
● Increases the innovation and speed of model development by removing the need to export data from
the data warehouse.
● A Model serves a purpose. Easy to change/recycle.
Benefits of BigQuery ML
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
45. The possibilities are endless
BigQuery ML - Machine Learning at Scale using SQL @martonkodok
Marketing Retail IndustrialandIoT Media/gaming
Predict customer value
Predict funnel conversion
Personalize ads, email,
webpage content
Optimize inventory
Forecase revenue
Enable product
recommendations
Optimize staff promotions
Forecast demand for
parking, traffic utilities,
personnel
Prevent equipment
downtime
Predict maintenance needs
Personalize content
Predict game difficulty
Predict player lifetime value
46. Thank you.
Slides available on: slideshare.net/martonkodok
Reea.net - Integrated web solutions driven by creativity to deliver
projects.