SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
State of the (J)PMML art
Villu Ruusmann
Openscoring OÜ
Overview
Data Science paradigms
Structured ("ML") Unstructured ("AI")
Data Scalars Arrays, Sequences
Modeling Statistics/CPU,
Human-driven
Brute force/GPU,
"Deep learning"
Audience Global, from SMEs to
mega-corps
SV tech companies
(proprietary datasets)
Standards PMML ONNX, TensorFlow
Structured information
● Relational databases
○ Schemaful
○ High-level, conceptual features
● Human interpretable algorithms
○ Decision tree
○ Logistic regression
○ Scorecard
○ Business rules
● Machine interpretable algorithms
Training perspective
Structured training workflow
1. Feature engineering
1.1. Transformation
1.2. Filtering
2. Modeling
2.1. Choice of algorithm
2.2. Hyperparameter tuning
3. Decision engineering
3.1. Transformation
3.2. Packaging (following consumer API conventions)
OSS ML frameworks
R Scikit-Learn Apache Spark
Scale Desktop Desktop, Server Server, Cluster
Data model Rich Poor Mixed
Main concept Model formula Pipeline Pipeline
Feature eng. Average Very good Good
Modeling Very good Good Average
Decision eng. N/A N/A Very good
OSS ML algorithm providers
H2O.ai XGBoost LightGBM
Scale Desktop, Server, Cluster
Data model Mixed
Algorithms Decision tree
ensembles,
GLM,
Naive Bayes,
Neural Networks,
Stacking
Decision tree
ensembles
Decision tree
ensembles
Workflow implementation
● R/Scikit-Learn/Apache Spark everything
● R/Scikit-Learn feature eng. + H2O.ai/XGBoost/LightGBM
modeling
● Apache Spark feature eng. + H2O.ai/XGBoost modeling +
Apache Spark decision eng.
Mixed workflows bring better results, but are considearbly
harder to deploy.
Deployment perspective
The challenge
Productionalization happens on Java, C# or SQL, which have
very limited interoperability with R and Python:
● Desktop vs. Server/Cluster
● Language-level threading, memory management issues
● Zero overlap between library sets
ML training and deployment are completely separate
concerns and shouldn't be (force-)blended into one.
Possible solutions
● Translation from R/Python application code to
Java/C#/SQL application code
● Translation from R/Python application code to some
standardized intermediate representation:
○ Higher abstraction level
○ Reorganized, refactored and simplified
○ Stable in time
● Containerization
Deployment options on Java/JVM
● R: N/A
● Scikit-Learn: nok/sklearn-porter
● Apache Spark: Java/Scala APIs, which are tightly coupled
to the Apache Spark runtime
● H2O.ai: POJO and MOJO mechanisms
● XGBoost: JNI wrapper,
komiya-atsushi/xgboost-predictor-java
● LightGBM: JNI wrapper(?)
The JPMML ecosystem
● "The API is the product"
○ Conversion APIs
○ Analysis and interpretation APIs
○ Scoring APIs
● Layered, role-based design
○ Data scientists, business analysts
○ Data engineers
○ Java/Scala developers
● Automated annotation, quality control
Converting R to PMML
library("r2pmml")
audit = read.csv("Audit.csv")
audit$Adjusted = as.factor(audit$Adjusted)
audit.glm = glm(Adjusted ~ . - Age + cut(Age, breaks = c(0, 18, 65, 100))
+ Gender:Education + I(Income / (Hours * 52)), data = audit, family =
"binomial")
audit.glm = r2pmml::verify(audit.glm, audit[sample(nrow(audit), 100), ])
r2pmml::r2pmml(audit.glm, "GLMAudit.pmml")
Converting Scikit-Learn to PMML
from sklearn2pmml import sklearn2pmml
from sklearn2pmml.pipeline import PMMLPipeline
pipeline = PMMLPipeline([
("mapper", DataFrameMapper([...])),
("classifier", DecisionTreeClassifier())
])
pipeline.fit(audit_X, audit_y)
pipeline.configure(compact = True, flat = True)
pipeline.verify(audit_X.sample(100))
sklearn2pmml(pipeline, "DecisionTreeAudit.pmml")
Scoring PMML
The JPMML-Evaluator®
library:
● Small (<1 MB), nearly self-contained (F)OSS Java library
● Developer API consists a single Java interface and a
couple of utility classes (model loading, optimization)
● Supports all PMML 3.X and 4.X schema versions
● Vendor extensions:
○ Accessing 3rd party Java libraries in PMML data flow
○ MathML prediction reports
Scoring application scenarios
● Synchronous vs. asynchronous
● Low-latency vs. high-latency
● Individual data records vs. batches of data records
● Embedded vs. over-the-network
Q&A
villu@openscoring.io
https://github.com/jpmml
https://github.com/openscoring
https://groups.google.com/forum/#!forum/jpmml

Weitere ähnliche Inhalte

Ähnlich wie State of the (J)PMML art

Ähnlich wie State of the (J)PMML art (20)

Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
 
Rust & Apache Arrow @ RMS
Rust & Apache Arrow @ RMSRust & Apache Arrow @ RMS
Rust & Apache Arrow @ RMS
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
 
BigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for SparkBigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for Spark
 
Joker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistJoker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data Scientist
 
Ai platform at scale
Ai platform at scaleAi platform at scale
Ai platform at scale
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
 
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkNear Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
 
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP Library
 
MLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkMLlib and Machine Learning on Spark
MLlib and Machine Learning on Spark
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
ISC Frankfurt 2015: Good, bad and ugly of accelerators and a complementary path
ISC Frankfurt 2015: Good, bad and ugly of accelerators and a complementary pathISC Frankfurt 2015: Good, bad and ugly of accelerators and a complementary path
ISC Frankfurt 2015: Good, bad and ugly of accelerators and a complementary path
 
System mldl meetup
System mldl meetupSystem mldl meetup
System mldl meetup
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
 

Kürzlich hochgeladen

Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 

Kürzlich hochgeladen (20)

Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 

State of the (J)PMML art

  • 1. State of the (J)PMML art Villu Ruusmann Openscoring OÜ
  • 3. Data Science paradigms Structured ("ML") Unstructured ("AI") Data Scalars Arrays, Sequences Modeling Statistics/CPU, Human-driven Brute force/GPU, "Deep learning" Audience Global, from SMEs to mega-corps SV tech companies (proprietary datasets) Standards PMML ONNX, TensorFlow
  • 4. Structured information ● Relational databases ○ Schemaful ○ High-level, conceptual features ● Human interpretable algorithms ○ Decision tree ○ Logistic regression ○ Scorecard ○ Business rules ● Machine interpretable algorithms
  • 6. Structured training workflow 1. Feature engineering 1.1. Transformation 1.2. Filtering 2. Modeling 2.1. Choice of algorithm 2.2. Hyperparameter tuning 3. Decision engineering 3.1. Transformation 3.2. Packaging (following consumer API conventions)
  • 7. OSS ML frameworks R Scikit-Learn Apache Spark Scale Desktop Desktop, Server Server, Cluster Data model Rich Poor Mixed Main concept Model formula Pipeline Pipeline Feature eng. Average Very good Good Modeling Very good Good Average Decision eng. N/A N/A Very good
  • 8. OSS ML algorithm providers H2O.ai XGBoost LightGBM Scale Desktop, Server, Cluster Data model Mixed Algorithms Decision tree ensembles, GLM, Naive Bayes, Neural Networks, Stacking Decision tree ensembles Decision tree ensembles
  • 9. Workflow implementation ● R/Scikit-Learn/Apache Spark everything ● R/Scikit-Learn feature eng. + H2O.ai/XGBoost/LightGBM modeling ● Apache Spark feature eng. + H2O.ai/XGBoost modeling + Apache Spark decision eng. Mixed workflows bring better results, but are considearbly harder to deploy.
  • 11. The challenge Productionalization happens on Java, C# or SQL, which have very limited interoperability with R and Python: ● Desktop vs. Server/Cluster ● Language-level threading, memory management issues ● Zero overlap between library sets ML training and deployment are completely separate concerns and shouldn't be (force-)blended into one.
  • 12. Possible solutions ● Translation from R/Python application code to Java/C#/SQL application code ● Translation from R/Python application code to some standardized intermediate representation: ○ Higher abstraction level ○ Reorganized, refactored and simplified ○ Stable in time ● Containerization
  • 13. Deployment options on Java/JVM ● R: N/A ● Scikit-Learn: nok/sklearn-porter ● Apache Spark: Java/Scala APIs, which are tightly coupled to the Apache Spark runtime ● H2O.ai: POJO and MOJO mechanisms ● XGBoost: JNI wrapper, komiya-atsushi/xgboost-predictor-java ● LightGBM: JNI wrapper(?)
  • 14. The JPMML ecosystem ● "The API is the product" ○ Conversion APIs ○ Analysis and interpretation APIs ○ Scoring APIs ● Layered, role-based design ○ Data scientists, business analysts ○ Data engineers ○ Java/Scala developers ● Automated annotation, quality control
  • 15. Converting R to PMML library("r2pmml") audit = read.csv("Audit.csv") audit$Adjusted = as.factor(audit$Adjusted) audit.glm = glm(Adjusted ~ . - Age + cut(Age, breaks = c(0, 18, 65, 100)) + Gender:Education + I(Income / (Hours * 52)), data = audit, family = "binomial") audit.glm = r2pmml::verify(audit.glm, audit[sample(nrow(audit), 100), ]) r2pmml::r2pmml(audit.glm, "GLMAudit.pmml")
  • 16. Converting Scikit-Learn to PMML from sklearn2pmml import sklearn2pmml from sklearn2pmml.pipeline import PMMLPipeline pipeline = PMMLPipeline([ ("mapper", DataFrameMapper([...])), ("classifier", DecisionTreeClassifier()) ]) pipeline.fit(audit_X, audit_y) pipeline.configure(compact = True, flat = True) pipeline.verify(audit_X.sample(100)) sklearn2pmml(pipeline, "DecisionTreeAudit.pmml")
  • 17. Scoring PMML The JPMML-Evaluator® library: ● Small (<1 MB), nearly self-contained (F)OSS Java library ● Developer API consists a single Java interface and a couple of utility classes (model loading, optimization) ● Supports all PMML 3.X and 4.X schema versions ● Vendor extensions: ○ Accessing 3rd party Java libraries in PMML data flow ○ MathML prediction reports
  • 18. Scoring application scenarios ● Synchronous vs. asynchronous ● Low-latency vs. high-latency ● Individual data records vs. batches of data records ● Embedded vs. over-the-network