With the rapid growth in data and move towards data commercialisation there are multiple aspects to focus on and prioritize the steps being taken across an enterprise. Enterprises face many challenges when it comes to truly becoming a data driven organization and realize the full potential of data. Some of those challenges include data availability, capacity to process, store and analyze this data, sharing the models and data artefacts across different teams etc. Most of these challenges could be handled through a platform which is Cloud based, scalable, and offers different capabilities for Governance, security, reusability and their likes. In this talk, I will talk about how IBM Cloud Pak serves as a framework for implementing your AI Strategy and how it could be used to build different artefacts while adhering to above listed requirements and being future ready. We will further illustrate how Cloud Pak for Data fastens and shortens the route to data commercialisation?
A journey to faster, repeatable data commercialization
1. A Journey to Faster,
and Repeatable Data
Commercialization.
FeatureStore:RoadtodataCommercialization.
Dr. Chan Naseeb
Lead Data Scientist, DSE, IBM Data & AI
chan.naseeb@ibm.com
Twitter: @ChanNaseeb
Š 2019 IBM Corporation
2. ⢠Introduction
⢠AI Lifecycle
⢠Cloud Pak for Data (CP4D)
⢠Feature Store
⢠Lessons Learned
Agenda
Š 2019 IBM Corporation
3. 3
⢠Data Growth
⢠Becoming data driven
⢠Challenges: data availability, capacity to process,
store, analyze, share models & data artefcats
⢠Need for a platform (Cloud based, Scalable, Governance,
Security, Reusability, âŚ)
⢠CP4D serves as a Frameowrk for your models and basis
for implementing your AI strategy
⢠Lessons Learned
Background
Š 2019 IBM Corporation
5. IBM Analytics | Data Science Elite 5
DSE Operationalizing Data Science for the AI Enterprise
Four elements are required to
effectively operationalize data
science for the AI Enterprise
6. Experiments
Predictions
for Apps or
Processes
Deploy, Manage,
Scale, Monitor in
Production
Understand
Use Case &
Feasibility
Find &
provide Data
Explore &
understand Data
Prepare & if
needed label
Data
Extract
Features
Train Models
Automate
Training
Data Science /AI Life Cycle
Tanya
Domain Expert
Mike
Data Scientist
Deb
Developer
Jennifer
AI Ops
Mike
Data Scientist
Ed
Data Engineer
John
Business Owner
Mike
Data Scientist
Mike
Data Scientist
Catalog Data
for reuse in
the Company
Share
Insights
Build Trust in
Models â
Fairness, âŚ
Š 2019 IBM Corporation 6
Optimize
Actions
Automate pipeline generation
where possible
7. IBM Solutions supporting AI Life Cycle - Flexibility in Choice of
Technologies and Tools
7
IBM Cloud Private For Data
(or IBM Cloud Public)
Build Deploy Monitor
IBM TOOLS
Operate AI
3RD PARTY IDE &
FRAMEWORKS
IBM AI RUNTIME
Watson OpenScale
Fairness & Explainable AI
Inputs for Continuous Evolution
Automated Anomaly and Drift
detection
Business KPIs
Watson Studio Watson Machine Learning
Manage AI at scale
3RD PARTY RUNTIMES
Azure ML Accuracy
Validation & feedback
Knowledge
Catalog
Data Profiling
Quality and Lineage
Data Governance
Organize & Govern
Š 2019 IBM Corporation
9. IBM Cloud Pak for Data (CP4D) is an extensible platform
that goes across the analytics space, addresses multiple
different personas, provides them with resources of their
choice, helps them create and perform their tasks better in
a collaborative environment
Š 2019 IBM Corporation
Collect,
Connect,
and Access
Data
Govern,
Search,
and Find
Data
Understand
and prepare
data for
analysis
Build descriptive,
predictive, and
prescriptive models
Model
management
and
deployment
Create AI
powered
applications
End-to-End Data & AI Platform
10. An integrated ecosystem of Data & AI Services âŚ
Add Ons
*
* *
Db2 AESE
Db2 Eventstore
Collect
Infosphere Streams
Organize
OtherwatsonAPIs*
WatsonCompare&Comply
WatsonAssistant
DiscoveryExtension
WatsonSpeechtoText
CustomerCare
WatsonAssistant
Watson Knowledge
Catalog Pro*
Infosphere Regulatory
Accelerator
Analyze and Infuse
Data Science
Premium
Cognos Analytics
AI OpenScale
Data virtualization
Data warehouse (Db2)
Governance catalog & discovery services
Data Integration services
Data Visualization & Dashboards
Data Science: Model Design &
Deployment
Collect Organize Analyze
IBM Cloud Pak for Data
Infosphere DataStage
Edition*
Š 2019 IBM Corporation
11. Š 2019 IBM Corporation
⌠on a Modularized PlatformâŚ
Premium Add-Ons
Data Science Premium Db2 Eventstore Db2 AESE MongoDB AI Openscale
Core Platform Services + Data Science Unified Governance & Integration
⢠Discover Data
⢠Transform Data
⢠Enterprise Data Catalog
⢠Business Glossary
⢠Policies & Rules
⢠Enterprise Search
⢠User Management
⢠Roles & Privileges
⢠Add-on service mgmt.
⢠Projects
⢠R Scripts
⢠Python Scripts
⢠Jupyter with Python 3.6 & Anaconda 5.2
CP Foundation
⢠Logging
⢠Monitoring
⢠Metering
⢠Persistent volume /Storage
⢠Identity Access Mgmt.
⢠Docker registry/Helm chart mgmt.
⢠Kubernetes
⢠Security
⢠Project Deployments
⢠Models
⢠Jobs & Scheduling
1. CP for Data
(includes services from : Information Server)(includes services from: Watson Studio)
Analytics Dashboards (Powered by : Cognos CDE )
Db2 Warehouse
(includes services from : Db2 Warehouse)
Data Virtualization
(includes services from : new Db2 Technology)
Additional Analytics Environments & Frameworks
Additional
Modules
(Optional Watson Studio content * )
⌠more coming
Recommended
Install
* Refer to appendix for details
Min Install
3. Go To Market
12. App
Deployment
Data
New Requirements
& Engagement
DevelopmentTest
Monitoring
Retraining
Search for
Data
Acquiring Data/
Self Service
Model
building
Hadoop
DW
NO
SQL
Virtualized
Data Access
Insight Deployment
(models, dashboards, etc.)
Refining
Data
Continuous Delivery
of Applications
Continuous Delivery
of Insights
Multi-Cloud GovernanceMicroservices & APIs
... to deliver Data and AI Applications
Š 2019 IBM Corporation
13. COLLECT - Make data simple and accessible
ORGANIZE - Create a trusted analytics foundation
ANALYZE - Scale insights with AI everywhere
Data of every type,
regardless of where it lives
MODERNIZE
your data estate for an
AI and multicloud world
INFUSE â Operationalize AI with trust and transparency
The AI Ladder
A prescriptive approach to accelerating the journey to AI
AI
14. The AI Ladder in Cloud Pak for Data (CP4D) 2. Solution Overview
0. Collect, Organize, Analyze, Trust, Infuse
Š 2019 IBM Corporation
15. Align model performance with business outcomes
Correlate model metrics and business KPIs to measure business impact
Actionable metrics and alerts
Ensure that models are resilient to changing situations
Detect drift in data and anomaly in model behavior
Specific inputs and triggers to model lifecycle
Prove regulatory compliance and safeguards
Detect and mitigate model biases
Audit and Explain model decisions
Watson OpenScale will help validate and monitor AI models, deployed anywhere,
to help comply with regulations and mitigate business risk
Foundational to all AI
implementations
Required in regulated industries
and use cases â FSS, HR etc. in
short term; others longer term*
Required to meet
transformational goals
* E.g. Fair lending practices in finance vs. GDPR across all industries
Š 2019 IBM Corporation
16. IBM Watson & Cloud PlatformIBM Watson & Cloud Platform
Bias mitigation
Pre-processing
Adapt classifier
Post-processing
Š 2019 IBM Corporation
17. 17
⢠Small Enterprise
⢠Large Enterprise
⢠Different Depts/Teams
⢠Different Use Cases
⢠Credit Loan
⢠Term Deposit
⢠Segmentation
⢠Churn
⢠Lead
⢠âŚ..
Feature store can help
Scenarios
Š 2019 IBM Corporation
18. 18
Feature Store for Data Commercialization
Commercialize more use cases faster, while ensuring trust &
security
Prototype & deploy
rapidly
*âFeaturesâ are reusable data, models, scripts, and other Data & AI assets
Infuse AI into the
business
Share skills,
knowledge,
and âfeaturesâ*
Š 2019 IBM Corporation
19. Possible Use Cases
Manage your
Data anywhere
Operationalize
Data Science & AI
Shift to
Next-Gen
Workloads
Smart
Governance
⢠Manage all your
enterprise data
regardless of
where it lives
⢠Gain control &
leverage your
data from
connected
devices
⢠Build, deploy,
manage &
govern models
and data
⢠Shift to cloud native:
⢠Provision and
scale in minutes
⢠Build once,
deploy
anywhere
⢠Build in
automation &
collaboration to
increase
productiviity
⢠Govern to
enable self
service analytics
Š 2019 IBM Corporation
20. 20
Feature Store Use CaseâŚ
Data
Ingestion
Data
Cleansing
Feature
Engineering
Building
Models
⢠Fetch (un)structured/
raw data from different
types of sources e.g.,
csv files, kafka streams
⢠Process raw data into
structured tables to
reduce the amount of
noise
⢠Extract features
representing the
meanning of the data
and its context e.g.,
topic of interest:
business, market etc.
⢠Build multiple algorithms
to identify the ârelevantâ
and ânon relevantâ items
⢠Get the results depending
on the problem nature
Š 2019 IBM Corporation
21. 21
Feature Store Use Case
Develop a central storage for curated, controlled machine learning
features, where a feature can be a model, script or data set.
Business User Data Engineer Data Scientist Data Steward
⢠To identify a new Data
Science opprtunity
⢠To collaborate more
closely with Data
Scientists in order to
support them with
new use cases
⢠To maintain features
that Data Scientists
can search for and
discover
⢠To develop, score, and
evaluate several AI
models
⢠To publish and share
the best models across
the organization
⢠To approve and
control features to be
published in order to
main a high standard
and secure control
over features
⢠Making sure the
artifacts remains
organized, secure, and
populated with high
quality assests.
Š 2019 IBM Corporation
22. 22The Feature Store is built on Cloud Pak for Data
Of course, quicker turn around time is a bonus to business. But in
addition . . .
Business
user
Data
scientist
Business users will also get a view of the Feature Store
⢠You will be able to see the library of use cases
⢠You can get a view of project results, or request the results be
served to them
⢠The Feature Store can inspire new use cases within the business
areas
Data Scientists benefit from the Feature Store
⢠A central location of community and collaboration
⢠Reuse of data assets created by the community
⢠An audit trail of work done and a guarantee of reliability
⢠Faster access to data and compliant data assets
Benefits of the Feature Store
Š 2019 IBM Corporation
23. 23The Feature Store is built on Cloud Pak for Data
Deploy
Analysis
Data
Š 2019 IBM Corporation
24. 24
Some Lessons Learned
Š 2019 IBM Corporation
Collaboration:
Reusability and
Scaling
AI & ML
Modeling
Framework:
CP4D Visualization &
Storytelling
Business Need
Global Expertise
of IBM
25. Watson Studio
Cloud-Local-Desktop*
Add
Members
IBM Data and AI Capabilities â Modular with open APIs
Watson Knowledge
Catalog
Cloud-Local
Add/connect Data
Reuse Assets
from Catalog
Publish Assets
to Catalog
Work with Data and Models
⢠Connect/prepare Data
⢠Create/run Notebooks
⢠Analyze/visualize Data
⢠Create Dashboards
⢠Train AI/ML/DL Models
as a team or individually.
Intuitive online collaboration
over shared assets and
project data storage
Can integrate with Git
Create
Project
Share Enterprise Assets
⢠Data
⢠Connections
⢠Notebooks
⢠Models
⢠Dashboards
⢠Soon: Project ZIPs
⢠âŚ
governed at scale
IBM
Cloud
Other
Clouds
On
Prem
Gallery
Samples to get started with
⢠Notebooks and Data Sets
⢠Soon: Models
⢠Soon: Dashboards
⢠Soon: Project ZIPs
Try Assets
from
Community
WML/WML-A
Cloud-Local
Create and manage prod.
deployments
⢠Models
⢠Scripts
⢠Notebooks, âŚ
Deploy
Customerâs Git
and Build Process
E.g. pull model from Git
or WML export and feed
into building a image
to deploy on customer env.
or a device
Provide
Apps
Processes
Devices
Use
API API
Use
Export
Watson Open Scale
Cloud-Local
⢠Explainability for LOB
⢠Fairness
Connect
*Watson Studio Desktop is a new single-user desktop tool that currently has a subset of Watson Studio
Cloud/Local function with direction to connect to WML, Catalog, and Community in the future
Š 2019 IBM Corporation
Option: Engage Data Science Elite
26. Training ML Models Ă Operationalization in Prod. Deployments
Create & Train Models
in Watson Studio Projects
- ML Flows
- ML Builder
- ML Notebooks
- DO Model Builder
- DO Notebooks
ML Models
ML Pipelines
DO Models
Scipts
Operationalize
in WML
managed
Production
Deployments
Apps on
IBM CLoud
or other
Apps
REST
API to
call
Model
Data Scientists and Subject Matter Experts
Create & train Models
Make them available to be consumed by IT
IT / Application Owners
Get Models and related assets
Deploy and run in Production
Invoke
⢠Data Scientists can create, train, validate ML models in Projects
⢠Use ML Builder for guided creation and training of models using common patterns and algorithms
⢠Use Notebooks or Flows to train models for more advanced use cases with more flexibility
⢠Operations team can deploy models to to production
⢠Use REST API to invoke your models for online scoring / predictions
⢠Batch scoring of data e.g. stored in relational database tables
Provide
Consume
& Deploy
Š 2019 IBM Corporation
27. 27
IBM Data Science Elite helps
Standard Bank South Africa
accelerate and
commercialize data science
and AI use cases across the
bank.
CASE STUDY
EXPECTED BENEFIT
Increase and expand
deployment of AI at
Standard Bank, to enhance
the value of data science
and inject the results into
workflows for business
users and clients and to
map out a way forward for
AI at the bank.
UNIQUE CHALLENGE
Data scientists and
engineers spend much of
their time on grunt-work.
Work flows are often
repeated and models are
remade and retained for
every new use case.
27
âIt was an immense pleasure to
partner with IBM Global DSE
team. The 12 week journey
allowed us to identify gaps in our
processes, re-imagine our
delivery and commercialization
process as well as leverage
some of the best in class
technologies and practices to
deliver our solutions in a rapidly
changing environment.â
John Mukomberanwa
Head: Digital Insights
Corporate and Investment Banking
Standard Bank South Africa
Reimagine & Scale Data ScienceIBM Data Science Elite & Services