2. Presenters
22018 IBM Corporation
Ahsan Rehman
Data Scientist & Developer
Advocate
IBM Analytics
Masters of Science in Analytics
rehmana@us.ibm.com
IBM Chicago
Svetlana Levitan
Developer Advocate and PMML
Release Manager
Center for Open Data and AI
Technologies (CODAIT)
IBM Digital Business Group
slevitan@us.ibm.com
Adam Massachi
Offering Manager for Watson
Studio
IBM Analytics
Degrees in Math and Philosophy
adam.massachi@ibm.com
3. Special Speaker
32018 IBM Corporation
IBM Chicago
Jing Shyr
Chief Statistician, IBM Fellow - Analytics
Subject matter expert for statistical analysis and data mining,
working knowledge of applying predictive analytics to business
problems
4. Machine Learning
Deep Learning
and AI
are everywhere
42018 IBM Corporation
facial recognition
unlocks your phone
fraud detection
protects your credit
recommendations
help you shop faster
what other users like you buy
what gets bought together
speech recognition
lets you go hands-free
business processes
automate human tasks
e.g. train based on human decisions
autonomous vehicles
detect pedestrians
machine vision
detects cancer early
spam detection
unclogs your Inbox
The future is now
7. Tools & Infrastructure
• Need an environment
that enables quick
experiments and real
outcomes
• Discrete tools present
barriers to productivity
Data Governance
• If the data isn’t
secure, self-service
isn’t a reality
• Challenge
understanding data
lineage and getting to
a system of truth
Skills
• Data Science skills
are in low supply and
high demand
• Nurturing new data
professionals is
challenging
Data Access
• Data resides in silos
& difficult to access
• Unstructured and
external data wasn’t
considered
7
2018 IBM Corporation
Why are enterprises struggling to
capture the value of AI?
8. Watson Studio: Accelerating value from AI for
EnterprisesWatson Studio with Knowledge Catalog
accelerates the machine and deep learning workflows
required to infuse AI into your business to drive innovation.
It provides a suite of tools for data scientists/engineers, app developers, domain experts
to collaboratively connect, work with and analyze data,
and build, train, and deploy models.
AI Requires Teamwork
82018 IBM Corporation
• AI is not magic
• AI is algorithms + data + team
9. Her Job:
Builds AI application that meet the
requirements of the business.
What she does:
• Starts PoCs which includes
gathering content, dialog
building and model training
• Focus is on app building for the
team or company to use. Will
handle ML Ops as needed
Sometimes known as:
Front-end, back-end, full stack,
mobile or low-code developer
Tanya
Domain Expert
Her Job:
To transfer knowledge to Watson for
a successful user experience.
What she does:
• Range of domain knowledge and
uses that to teach Watson and
develop a custom models
• As Tanya gains more experience
she optimizes her knowledge to
teach Watson to design better
end-user experiences.
Sometimes known as:
Subject matter expert, content
strategist.
His Job:
Transform data into knowledge for
solving business problems.
What he does:
•Runs experiments to build custom
models that solve business problems.
•Use techniques such as Machine
Learning or Deep Learning and works
with Tanya to validate success of
trained models.
Data Science and AI Teams
Need to be enabled for Productivity and Collaboration
Sometimes known as:
ML/DL engineer, Modeler, Data Miner
Ed
Data Engineer
His Job:
Architects how data is organized
and ensures operability
What he does:
• Builds data infrastructure and ETL
pipelines. Works with Spark,
Hadoop, and HDFS.
• Works with data scientist to
transform research models into
production quality systems.
Sometimes known as:
Data infrastructure engineer
Mike
Data Scientist
Deb
The Developer
9
10. Watson Studio
Supporting the end-to-end ML workflow
Prepare Data
for Analysis
Analyze Data
Build and Train
ML/DL Models
Deploy Models
on Public or
Private Cloud
Monitor, Analyze
and Manage Model
Deployments
Search and Find
Relevant Data
Connect &
Access Data
on Cloud(s) or
On-Premise
Catalog Data
11. Watson Studio
Tools for supporting the end-to-end AI workflow
Model Lifecycle Management
Machine Learning Runtimes Deep Learning Runtimes
Authoring Tools
Cloud Infrastructure as a Service
• Most popular open source frameworks
• IBM best-in-class frameworks
• Create, collaborate, deploy, and monitor
• Best of breed open source & IBM tools
• Code (R, Python or Scala) and no-code/visual
modeling tools
• Fully managed service
• Container-based resource management
• Elastic pay as you go cpu/gpu power
15. DMG to the rescue!
15
Data Mining Group
dmg.org
Founded in late 1990’s by Professor Robert
Grossman
16. What is PMML?
Predictive Model Markup Language
• An Open Standard for XML Representation
• Developed by DMG
• Over 30 vendors and organizations
• PMML 4.3, 4.4 Release manager: Svetlana Levitan dmg.org/pmml
17. Main Components of PMML
Header
Data Dictionary
Transformation Dictionary
Model(s)
19. Transformations in PMML
• NormContinuous: piece-wise linear transform
• NormDiscrete: map a categorical
field to a set of dummy fields
• Discretize: binning
• MapValues: map one or more categorical fields into another categorical one
• Functions: built-in and user-defined
• Aggregation
20. PMML Models
o Association Rules Model
o Clustering Model
o General Regression
o Naïve Bayes
o Nearest Neighbor Model
o Neural Network
o Regression
o Tree Model
o Mining Model: composition or
ensemble (or both) of models
o Baseline Model
o Bayesian Network
o Gaussian Process
o Ruleset
o Scorecard
o Sequence Model
o Support Vector Machine
o Time Series
27. PMML in Python
JPMML package is created and maintained by Villu Rasmussen.
From https://stackoverflow.com/questions/33221331/export-python-scikit-learn-models-into-pmml
pip install git+https://github.com/jpmml/sklearn2pmml.git
Example of how export a classifier tree to PMML. First grow the tree:
# example tree & viz from http://scikit-learn.org/stable/modules/tree.html
from sklearn import datasets, tree
iris = datasets.load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)
SkLearn2PMML conversion takes 2 arguments: an estimator (our clf) and a mapper for preprocessing.
Our mapper is pretty basic, since no transformations.
from sklearn_pandas import DataFrameMapper
default_mapper = DataFrameMapper([(i, None) for i in iris.feature_names + ['Species']])
from sklearn2pmml import sklearn2pmml
sklearn2pmml(estimator=clf, mapper=default_mapper, pmml=“C:/workspace/IrisClassificationTree.pmml")
28. PMML in R
R is a programming language and software environment for statistical
computing and graphics supported by the R Foundation for Statistical Computing.
R package “pmml”
https://cran.r-project.org/package=pmml
Depends on XML package
Supports a number of R models
Maintained by Dmitriy Bolotov and Tridivesh Jena from Software AG
29. Create PMML in R (using R Studio)
>library(XML);
>library(pmml);
> data(iris);
Build and save a linear regression model predicting Sepal length:
> irisLR<-lm(Sepal.Length~.,iris)
>saveXML( pmml(irisLR), "IrisLR.xml" )
Build and save a decision tree (C&RT) model predicting class:
> irisTree <- rpart( Species~., iris )
> saveXML( pmml( irisTree ), "IrisTree.xml" )
30. Deploy a model or a pipeline into Watson Machine Learning
30
In Project view
“+ New Watson Machine Learning model”,
Give a name, select “From File”
32. 32
Other model deployment formats
PFA from DMG – an emerging format, JSON-based, big potential
Pickle in Python – binary serialization of Scikit Learn models
MLeap - open format, not a standard, uses protobuf. Works for Spark ML models
Open Neural Network Exchange (ONNX) from Microsoft and Facebook
• binary (protobuf) format for deep learning models
• describes computation graph (including operators)
• supported by most deep learning frameworks (TF still in progress)
• now adding support for traditional ML
Neural Network Exchange Format (NNEF) by Khronos Group
• Dual format (weights in binary file)
• Allows user-defined functions
• Only neural network models
Docker containers
33. ONNX use pattern
ONNX IR Spec
.onnxFrontend
Models in different
frameworks
Tools
Netron visualizer
Net Drawer visualizer
Checker
Shape Inferencer
Graph Optimizer
Opset Version Converter
Backend
Models in different
frameworks
Training
Inference
Export Import
Run
33
Editor's Notes
AI is typically defined as the ability of a machine to perform cognitive functions we associate with human minds, such as perceiving, reasoning, learning, interacting with the environment, problem solving, and even exercising creativity. Examples of technologies that enable AI to solve business problems are robotics and autonomous vehicles, computer vision, language, virtual agents, and machine learning.
A convergence of algorithmic advances, data proliferation, and tremendous increases in computing power and storage has propelled AI from hype to reality.
Most recent advances in AI have been achieved by applying machine learning to very large data sets. Machine learning algorithms detect patterns and learn how to make predictions and recommendations by processing data and experiences, rather than by receiving explicit programming instruction.
The algorithms also adapt in response to new data and experiences to improve efficacy over time.
dmg.org: PMML is the leading standard for statistical and data mining models and supported by over 20 vendors and organizations. With PMML, it is easy to develop a model on one system using one application and deploy the model on another system using another application, simply by transmitting an XML configuration file.
PMML can describe the entire data processing pipeline, including data preparation, one or several models, post-processing of model predictions.
Villu creates many Mantis issues for PMML, helps to improve it.
Here we build a tree model for classifying the iris flowers based on its dimensions, then export PMML.