Learn how Google Cloud addresses the key challenges when building an Agile Data & AI platform. This lecture is important regardless of the Cloud you are (will be) using because most businesses face the same 6 challenges:
1. High-quality AI requires a lot of data
2. AI Expertise is in high demand
3. Getting the value of ML requires a modern data platform
4. Activating ML requires surfacing AI into decision UIs
5. Operationalizing ML is hard
6. State-of-the-art changes rapidly
The lecture recording with Q&A is at https://youtu.be/ntBEQdD1IeQ
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
Â
Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshmanan
1. Evening with Lak Lakshmanan
Head, Data Analytics & AI Solutions, GCP
2. Welcome to ServerlessToronto.org
2
Introduce Yourself:
- Where from?
- Why are you here?
Fill the survey to win prises!
Aug 9, 2021: âBuilding a Data Cloud
to enable Analytics and AI-Driven
Innovationâ starts at 6:10pmâŠ
3. Serverless Evolution (since FaaS started)
3
Serverless is New Agile & Mindset
#1 We started as Back-
end FaaS (Serverless)
Developers who enjoyed
âgluingâ other peopleâs
APIs and Managed
Services)
#3 We're obsessed by
creating business value
(meaningful MVPs,
Products), focusing on
Outcomes â NOT
Outputs, and we mesh
well with Product
Managers
#2 We build bridges
between Serverless
Community (âDev legâ),
and Front-end, Voice-First
& UX folks (âUX legâ)
#4 Achieve agility NOT
by âsprintingâ faster
(like in Scrum), but
working smarter (by
using bigger building
blocks and less Ops)
4. Disconnect between IT & Business needs
4
How to help companies accelerate?
Technology is not the point => We are here to create Value
Adopting Serverless Mindset allowed us to shift the focus from âpimping
up our carsâ (infrastructure/code), towards âdrivingâ (the business) forward.
â
5. Letâs bridge the Businesses & IT Gap by:
5
1. bringing more Business-focused
topics (like one today) to educate,
2. offering free Second Opinions on
Application/Data Architecture
modernization (to Businesses),
3. offering for-fee Consulting service
(regardless of how short they are),
4. connecting Cloud enthusiasts from
the Community with Employers
Fill the Survey help us serve you better,
plus to win Manning raffle:
https://forms.gle/oH2ZTnSgMTH41xsg7
6. Knowledge Sponsor
1. Go to www.manning.com
2. Select *any* e-Book, Video course, or liveProject you want!
3. Add it to your shopping cart (no more than 1 item in the cart)
4. Raffle winners will send me the emails (used in Manning portal),
5. So the publisher can move it to your Dashboard â as if purchased.
Fill the survey to win!
7. Upcoming ServerlessToronto.org Meetups
7
1) Get started with Dialogflow &
Contact Center AI on Google
Cloud â Lee Boonstra,
Conversational AI @ Google
2) Dr. Maloy â Empowering
Developers to be Healthcare
Heroes
3) Snowflake talk⊠getting closer
4) YOUR âThis is my Architectureâ
style presentations are welcome!
Regardless how big or small
your learning & sharing will be âș Please rate us on Meetup, Tell your peers
Weâre here to Help YOU help others
9. Building a Data Cloud to
enable Analytics and
AI-Driven Innovation
How Google Cloud addresses key challenges
when building an agile Data & AI Platform
Lak Lakshmanan,
Director, Analytics & AI Solutions
@lak_gcp
https://www.meetup.com/Serverless-Toronto/events/277818918/
10. Proprietary + Confidential
Bring great products to market faster
Unique customer segmentation
Fundamentally change the way you
go to market: identify highest value
customers and provide right products
at the right time
Build better products
Improve end-user experience
through data-driven innovation and
launch of new revenue-generating
opportunities
Adapt in real-time
Run advanced analytics and predict
the future through real-time tests
that allows you to react and respond
immediately to evolving user &
market needs
11. Proprietary + Confidential
Achieve better economics to scale your business
Operations productivity
Apply machine learning techniques to
identify and rectify systemic
inefficiencies, allowing you to learn
and adapt your operations efficiently
Developer productivity
Discover patterns in code and build
tools that improve developer
productivity, e.g., code
recommendation and automatic bug
fixing
AI/ML infrastructure efficiencies
Reduce your cash burn rate by
utilizing machine learning to run and
execute your models quicker
12. Why is AI/ML so exciting today? Why all the hype?
ArtiïŹcial
Intelligence
Machine
Learning
Deep
Learning
Class of problems we can solve when
computers think/act like humans
Scalably solve those problems using data
examples (not custom code)
Even when that data consists of
unstructured data like images, speech,
video, natural language text, etc.
14. Many recent AI advances can be attributed to
increases in data size and compute power
Deep Learning scaling is predictable, empirically
https://arxiv.org/abs/1712.00409
https://blog.openai.com/ai-and-compute/
16. Customizing Googleâs video models with
AutoML Video
SHOT
CLASSIFICATION
ACTION RECOGNITION OBJECT TRACKING
VIDEO CLASSIFICATION
Predict labels on entire videos
(not segments within the
video)
Predict shot boundaries
inside a video, and predict
labels on each of those
those shots
Use a 1-second sliding
window to predict actions,
e.g., goal celebration
Predict bounding boxes
and start/end tracks of
objects inside videos,
e.g., track a drifting car
Enable powerful content discovery and engaging video experiences
17. Proprietary + Confidential
Data Science
Key to ML on structured data is having more features â itâs essential to break
down data silos and do ML while minimizing data movement
BigQuery
Analytics
Project: Sales Data Mesh
Transactions dataset
Project: Customers
Data Mesh
CRM dataset
Project: Products Data Mesh
Products dataset
Offline tickets Online orders Cust. Details P. Referential
ML tooling BI tooling
Global Logical Semantic Layer
Raw data access
still possible
19. Very few people
can create
net-new ML
models today
10K
DL researchers
2M
ML experts
+23M
Developers
+100M
Business users
20. Democratize predictive analytics for
business users using BigQuery ML
1
2
3
Execute ML initiatives without
moving data from BigQuery
Iterate on models in SQL in BigQuery
to increase development speed
Automate common ML tasks,
and hyperparameter tuning
CREATE MODEL my_models.car_accidents
OPTIONS(type=âlogistic_regâ, labels=[âbad_accidentâ])
AS SELECT
speed, age, ...,
FROM
input_table);
SELECT label
FROM
ml.PREDICT(
MODEL my_models.car_accidents,
(SELECT speed, age, ...
FROM input_table));
21. Proprietary + Confidential
Build end-to-end AI
Deploy custom AI
Build a portfolio of AI use cases
Requires
Effort
Benefit
Use AI out of the box
Maximize the value AI
delivers into business
workflows
Low effort, high volume
High quality AI building
blocks and industry
solutions
Extract value from your
data
Medium effort,
customization
High quality baselines and
ease-of-use
Use AI on your data to
differentiate your product
High effort, low volume
Powerful, easy-to-use
platform that allows you to
reuse pre-built and/or
customizable components
Similar value creation (ârewardâ) from all 3 buckets
Need a unified platform that supports all 3 buckets
22. AI Platform for every level of expertise
Pre-trained APIs
No training data needed,
get started right away
Custom AI with AutoML
Easily create custom models
(A no-code approach)
End-to-end AI with core tools
Help data scientists and ML
engineers build and deploy AI
23. Proprietary + Confidential
Google
Cloud
AI
Prebuilt
ML APIs
AI Platform
AutoML
AI Solutions
Language Conversation
Horizontal solutions
Structured Data
Language
Contact
Center AI
Notebooks
Industry solutions
Data
Labeling
Training Prediction Continuous
evaluation
Explainability Pipelines
Data Science and Machine Learning
Sight
Sight
Vision Video Translate Natural
Language
Tables
Video
Intelligence
Vision Natural
Language
Translate Speech-to-Text Text-to-Speech
Document AI
DialogïŹow Talent Solution Recommendation AI
Buy
Build
Customize
29. Googleâs data cloud allows customers to unify data
across the entire organization
Break down silos
Increase agility
Innovate faster
Get value from data
Support business transformation
+consistent governance & security
30. The data and ML infrastructure have to be integrated because
real-time, personalized machine learning is where the value is
Speed of Analytics
Systems must be able to ingest, process, and serve
data in real-time, or opportunities are lost
Speed of Action
Machine-learning drives personalized services,
based on the customerâs context
31. Deliver serverless
analytics, not
infrastructure
Build for growth to
any scale
Embed ML and
drive an end-to-
end lifecycle
Empower analytics
across the entire
data lifecycle
Enable the best
OSS technologies
Google Cloud significantly simplifies big data analytics
32. Proprietary + Confidential
Data Analytics &
Management
Google
Cloud
Smart
Analytics
& AI
Prebuilt
ML APIs
Foundation
AI Platform
AutoML
AI Solutions
Language Conversation
Horizontal solutions
Structured Data
Language
Frameworks Compute
Contact
Center AI
Ingestion and Processing Storage and Analytics
Orchestration
Notebooks
Industry solutions
Data
Labeling
Training Prediction Continuous
evaluation
Explainability Pipelines
Compute
Engine
Cloud TPU
Cloud GPU Cloud
scheduler
Cloud
Composer
Instrumentation
Cloud Build Container
Registry
Cloud
Pub/Sub
Cloud
DataïŹow
Cloud
Dataproc
Data
Fusion
Cloud
Storage BigQuery
Cloud
Bigtable
Cloud SQL
Data
Catalog
Data Studio
Data Science and Machine Learning
Sight
Sight
Vision Video Translate Natural
Language
Tables
Video
Intelligence
Vision Natural
Language
Translate Speech-to-Text Text-to-Speech
Document AI
DialogïŹow Talent Solution Recommendation AI
33. Supports the entire Data Science team
Data Engineer
Uses Pub/Sub,
DataïŹow, and
Dataprep to ingest,
prepare and transform
data
Data Scientist
Uses AI Platform
Notebooks and
Training services to
build, and evaluate
models
ML Engineer
Uses AI Platform
Predictions to serve
models, and KubeïŹow
Pipelines to
encapsulate ML
workïŹows for reuse.
Developer
Collaborates with data
scientists to embed AI
through REST APIs
into applications
Business Analyst
Discovers solutions
from AI Hub and
deploys it into
production
34. Proprietary + Confidential
A platform for all users and intents throughout the data lifecycle
Fine-grained
access control
Cloud IAM
Metadata
management
Data Catalog
Always
encrypted
Data at rest and
in transit
Redact sensitive
data
Cloud DLP
Security Admin
Protecting data
Messaging
PubSub
Data Processing
Dataflow
Data Apps
Looker
(LookML)
OSS Engines
Dataproc
(Spark, Flink)
Developer
Intelligent apps
DW & DB
BigQuery ,
BigTable
Data processing
(OSS) pipelines
Dataproc
(Spark, Presto, Flink)
Data Processing
(Native) pipelines
Dataflow
Orchestration
Composer
Data engineer
Get clean, useful data
Messaging
PubSub or
Confluent Kafka
CDW
BigQuery
CDW &
Orchestration
BigQuery
Visual data
Integration
Data Fusion
ML in SQL
BigQuery ML
Data models,
catalog
Looker, Data
Catalog
Data analyst
Query and analyze
Ingestion
BigQuery
Streaming & DTS
Governed BI
Looker
CDW in a
Spreadsheet
Connected
Sheets
Natural Language
Query
Data QnA
Business User
Insights Everywhere
Data models,
catalog
Looker, Data
Catalog
CDW
BigQuery
Portable
notebooks
AI Platform
Notebooks
Simplified ML
BigQuery ML &
Auto ML
Collaboration
Feature Store,
AI Platform
Pipelines
Spark
Dataproc /
Dataproc Hub
Data scientist
Models that work
CDW
BigQuery
Secure data
sharing
BigQuery
35. Dataflow: fully integrated data processor for ML
BigQuery ML
Onboard training data for BQML
In batch or streaming mode
Use BQML models for online
inference
Export BQML-trained models as
SavedModel and do streaming
inference
Cloud AI
Platform
Tensorflow
Extended (TFX)
Dataflow Dataflow Dataflow
Integrated with KubeïŹow
for production ML pipelines
Data processor for CAIP training
On CPUs and GPUs
Enables large-scale processing
for TFX Transform, Data
Validation, Model Analysis
Brings streaming events and
generates Examples for training
and inference
Robust ingestion
services
Advanced Analytics at
speed
Actionable
Intelligence
37. Proprietary + Confidential
Use Looker to push data into apps
OEM
White labeled Looker
Customers log directly into Looker
Embed/iFrames
Displaying data visualizations and all
advanced BI capabilities using iframes
38. Proprietary + Confidential
A lot of ML predictions happen at the edge
On-premises
deployments
IoT Devices Machine Learning at the Edge
Device inference
framework
Model deployment
to the Edge
ML HW accelerator
for the Edge
Anthos on-prem
Create, manage, and
upgrade Kubernetes
clusters in on-premises
environments
Cloud IoT Core
Managed service to
securely connect,
manage, and ingest
data from global
device fleets
Tensorflow Lite
Deep learning
framework for
on-device inference
Edge Manager
for ML
Deploy, manage and
run ML models on
edge devices with
Cloud AI Platform
Coral
A platform of
hardware
components,
software tools and
precompiled models
40. ML requires operationalizing both data and code
ConïŹguration
Data Collection
Data
VeriïŹcation
Feature Extraction
Process Management
Tools
Analysis Tools
Machine
Resource
Management
Serving
Infrastructure
Monitoring
ML
Code
Source: Sculley et al.: Hidden Technical Debt in Machine Learning Systems
41. Get models into production faster
Ingestion Analysis Transform Training Tracking Evaluating Deploying
Managed dataset
AutoML Models
Evaluation
Endpoints
and
Batch
predictions
START
END
Custom
Code
Labeling
tasks
Unmanaged dataset
Flexibility in training methods, and
running parallel experiments
Access model evaluation, optimization, and
XAI capabilities built into the platform
Robust backend for deployment
with all relevant MLOps services
42. Proprietary + Confidential
Scaling ML workflows with AI Platform Pipelines and the
MLOps suite of services
Artifact Store
Cloud Storage
Scalable Inference
AI Platform Prediction
Processing
Cloud Dataflow
Serverless Training
AI Platform Training
Extract Data
Prepare
Data
Train
Model
Validate
Data
Vertex AI Pipelines
Evaluate
Model
Validate
Model
Deploy
Model
Container Registry
Data warehouse
BigQuery
43. Proprietary + Confidential
AutoML
Experiment Train Deploy
Data Labeling
TensorBoard
Model Builder
SDK
Training
Vizier
NAS
Prediction
Model Monitoring Explainable AI Feature Store ML Metadata
Pipelines
Notebooks
Vision Video Language Tables Forecast
Custom training
workflow
No code / low code
workflow
Vertex AI GA
NEW
NEW
NEW
NEW
NEW
NEW NEW
NEW Matching
Engine
NEW
BigQuery ML Translation
NEW
44. Proprietary + Confidential
Unified machine learning and data science
Increased productivity &
reduced learning curve
Learn a single workflow and
vocabulary for all of our AI
products, regardless of the
layer of abstraction.
Easy experimentation
Train models quickly using
AutoML by building on
Googleâs proprietary IP and
compare the results easily
against custom-built models
trained on the same dataset
and managed in one place on
the unified platform.
Seamless integration
& exibility
Easily interchange custom and
AutoML-trained models as
they are now leveraging the
same format and technical
foundation. Take them with
you and deploy them
anywhere.
MLOps
Tooling and automated
workflows for rapid,
continuous delivery and
management of models to
production
46. Proprietary + Confidential
Use Googleâs best-in-class algorithms like NAS
Use of Neural Architecture Search (NAS) at Waymo
20â30% lower latency/same quality
8â10% lower error rate/same latency
NAS model in 2 weeks vs months (1 year of
GPU time) searching over 10k architectures
âGoing from months of engineering time to
generate and fine tune a architecture manually to
"automatically generating" neural nets with NASâ
NAS
Waymo ML Expert
47. Unstructured
Documents
{ Type: Check
Amount: $100
To:Allstate
âŠ}
Enterprise
Knowledge Graph
(EKG)
Normalize, validate & link
entities across your data
Capture
Unstructured
Content
Content
Warehouse
Integrated unstructured +
structured storage
{ Type: Check
Amount: $100
To: Allstate
Insurance,Inc
âŠ}
â10 checks indicate
$100 payments to Allstate
Insurance, Incâ
BigQuery
analysis engine
Unified Analytics
Easily join structured &
unstructured data into
analysis, models, and
processes
âWeâre seeing new payment
patterns to All-state Insurance,
Inc correlating with Jumbo Loan
volumes in the North Eastâ
The AI-Powered Enterprise Data Warehouse
DocAI + EKG + CMS = Unstructured Data ETL
Process
Best-in-class AI
Store
Unified Data
Lake
Analyze
Data
Warehousing
Use
Advanced
analytics
Document
AI
Get structured data from
unstructured content
Human in the Loop (HITL)
Comprehensive tooling for human review of AI model creation & outputs
48. Proprietary + Confidential
Train on less data
Get a jump start by customizing
Googleâs high-quality APIs
through AutoML
Google Cloud can help you leverage
AI more effectively
Develop AI fast
Train ML models in SQL without
moving data around. Less code
to maintain.
Large Scale ML
Take advantage of AI
accelerators, notebooks to
dramatically speed up ML
development
Integrated with Data Cloud
Build unified AI and data pipelines
to support recommendations,
streaming, and other use cases
Activate Analytics, ML easily
Incorporate Looker embedded
analytics widgets into websites
and mobile applications
Operationalize ML
Take advantage of Vertex AI
Pipelines, Feature Store,
Notebooks, Continuous
Evaluation, etc.
Deploy ML on the Edge
Leverage TensorFlow Lite and
Coral to deploy AI to iOS,
Android, custom hardware
Work with leader in Data & AI
Google Cloud sets the innovation
bar in data and AI. Work with us.