SlideShare ist ein Scribd-Unternehmen logo
1 von 36
2019 HPCC
Systems®
Community Day
Challenge Yourself –
Challenge the Status Quo
Lili Xu
Software Engineer III
HPCC Systems
LexisNexis Risk
Roger Dev
Sr. Architect
Machine Learning
Library
Advancements in
HPCC Systems
Machine
Learning
2
Overview
• Theme: Expand the ML library to handle multimedia and unsupervised learning
• Extended set of model evaluation metrics
• Text Vectors – Machine Learning for textual data
• Generalized Neural Networks (GNN) – ECL Deep Learning for Image and Video
and more
• Unsupervised Clustering
• K-Means – Centroid based clustering
• DBSCAN – Density based clustering
Advancements in HPCC Systems Machine Learning 3
Advancements in HPCC Systems Machine Learning
HPCC Systems Machine Learning Library
ML_Core
PBblas
LinearRegressi
on
LogisticRegressi
on
SVM
GLM
LearningTrees
TextVectors
BAS
E
Deep
Learning
SUPERVISED LEARNING
DBSCAN
K-Means
UNSUPERVISED
LEARNING
GNN
4
Previous
Knowledg
e
Supervised Learning vs. Unsupervised Learning
Advancements in HPCC Systems Machine Learning 5
Photo credit[1]
Supervised Learning
Supervised Learning vs. Unsupervised Learning
Advancements in HPCC Systems Machine Learning 6
Photo credit[1]
Unsupervised Learning
Extended Evaluation Metrics
Advancements in HPCC Systems Machine Learning 7
Evaluation Metrics
• Extensions to ML_Core to better evaluate the ML Models or compare alternative
Models.
• Done by our intern: A. Suryanarayanan (“Surya”)
• Enhanced ML_Core Accuracy module:
• Regression Accuracy
• Standard Error
• ANOVA (Analysis of Variance)
• T-Statistic
• P-value
• Confidence Interval
• R-Squared
• Root Mean Squared Error
• Akaike Information Criterion (AIC)
Advancements in HPCC Systems Machine Learning 8
Evaluation Metrics (cont’d)
• Enhanced ML_Core Accuracy module (cont’d)
• Classification Accuracy
• Raw Accuracy
• Power-of-Discrimination (PoD)
• Extended Power-of-Discrimination (PoDE)
• Confusion Matrix
• Precision, Recall, False-Positive-Rate
• Balanced F-Score – Combines Precision and Recall into a score for
each class
• Hamming Loss
• Area Under Curve (AUC)
• Clustering Accuracy
• Silhouette Coefficient
• Adjusted Rand Index
Advancements in HPCC Systems Machine Learning 9
Other
• New Feature Selection Module
• Chi Squared Feature Selection Test
Advancements in HPCC Systems Machine Learning 10
Advancements in HPCC Systems Machine Learning 11
For more details
• Module Documentation:
ML_Core.Accuracy
ML_Core.FeatureSelection
• Research Publication:
Design and implementation of Machine Learning Evaluation Metrics on HPCC
Systems
A. Suryanarayanan, Arjuna Chala, Lili Xu, Shobha G, Jyothi Shetty, Roger Dev
Text Vectors – Machine Learning for Free-form Text
Advancements in HPCC Systems Machine Learning 12
Introduction to Text Vectors
• Fully unsupervised learning – Give it a Corpus (large body of text) and it will learn
on its own.
• Convert free-form text into numeric vectors to allow mathematical treatment of
text.
• Word Vectors
• Sentence Vectors
• Vector: An ordered list of numbers – A coordinate in N-dimensional space
• (11.3, -2.5) – A two dimensional vector
• (-.0138, .5247, .9831) – A three dimensional vector
• (.1, -.3, -.1, … , .5) – An N dimensional vector
• Text Vectors are typically between 20 and 1000 dimensional
• Vectors that are close in space are also close in meaning.
Advancements in HPCC Systems Machine Learning 13
Text Vectorization – The theory
Advancements in HPCC Systems Machine Learning 14
"You shall know a word by the company it
keeps."
- Linguist John Rupert Firth, 1957
Or more rigorously:
"The meaning of a word is closely associated with the
distribution of the words that surround it in coherent text."
Optimizing Text Vectors
Advancements in HPCC Systems Machine Learning 15
Dog
Cat
Piston
Applications
• Analysis of Free-form Text
• Turn text into features for any ML algorithm
• Classification of text (e.g. Positive, Negative, Neutral)
• Find closest sentence to a new sentence
• Free-form Search
• Translation
• Textual Mining
• Many more undiscovered uses
Advancements in HPCC Systems Machine Learning 16
Advancements in HPCC Systems Machine Learning 17
For more details
• Theory and Tutorial:
Text Vectors – Machine Learning for Textual Data
Link: https://hpccsystems.com/blog/textvectors
Generalized Neural Networks (GNN)
Advancements in HPCC Systems Machine Learning 18
Introduction to GNN
• Flexible ECL interface to Keras / Tensorflow.
• Google’s Tensorflow is the most widely used Deep Learning framework
• Keras is a high-level interface to Tensorflow (and other frameworks) and is the
most widely used DL interface. It is included as a standard component of
Tensorflow
• Parallelized training using Batch Synchronous Network Optimization.
• Provides full access to Keras Sequential Model capabilities.
• Can handle nearly any style of Neural Network:
• Classical (Densely connected)
• Convolutional (commonly used for image processing)
• Recurrent (used for video / time-series)
• Auto-encoders (unsupervised training of weight vectors)
• ECL Tensor module allows N dimensional datasets
Advancements in HPCC Systems Machine Learning 19
Tensors
• Think of Tensors as N dimensional arrays or matrices.
• A single number is a 0 dimensional Tensor
• A vector is a 1 dimensional Tensor
• A matrix is a 2 dimensional Tensor
• For traditional ML, 2 dimensional works well.
• Example: nObservations X nFeatures
• For multi-media ML (e.g. images, video, time-series) more dimensions are
required. E.g.
• For color images: nObservations X pixel-width X pixel-height X 3 (i.e.
Red/Green/Blue)
• For video: nObservations X pixel-width X pixel-height X 3 X time-steps
• The GNN Tensor module provides efficient storage and distribution for Tensor-
based data of any dimension.
Advancements in HPCC Systems Machine Learning 20
GNNI
• The GNNI module provides an easy to use interface for defining, training, and utilizing
Neural Networks.
• It handles all of the parallelization and distribution of data transparently.
• Under the hood, a separate Keras / Tensorflow network is trained on each node, and
the resulting weights are combined periodically.
• Neural Networks and their training mechanism are defined using the same Python
syntax as native Keras.
• In native (Python) Keras, you create a Sequential Model and add layers one at a
time.
• In ECL, you create a list of layers (as Python text) and call DefineModel(…) with
that list.
• Input to training and prediction are via Tensors. Tensors are also used to get and set
weights.
Advancements in HPCC Systems Machine Learning 21
• Non-sequential Models (Complex hybrid deep learning)
• Support for textual data
• Generative Adversarial Networks (GANs) and their derivatives
Future Directions
Advancements in HPCC Systems Machine Learning 22
Applications
• Machine Learning for Images, Video, or Time-series
• Scoring (i.e. Regression)
• Classification
• Multivariate Optimization
• Auto-encoders
• Vectorization
• Many others TBD
Advancements in HPCC Systems Machine Learning 23
Advancements in HPCC Systems Machine Learning 24
For more details
• The Bundle will be releasing soon.
• Look for the blog article announcing the release on hpccsystems.com >> Community >>
Blog
Clustering Methods
Clustering Methods in HPCC Systems : KMeans & DBSCAN
• Unsupervised Machine Learning (ML) algorithms
• Automatically find the clusters/groups of the data without previous knowledge
• Highly Scalable Parallelized for Big Data machine learning challenge
Clustering Methods of the HPCC Systems Machine Learning
Library
26
Applications
Clustering Methods of the HPCC Systems Machine Learning
Library
27
Image segmentationClaimCustomer segmentation
Clustering gene expressions
Eisen et al, PNAS 1998
• Most popular clustering method
• Highly Scalable Parallelized
• Parametric: K, Tolerance
• Sensitive to Initialization
• Spherical Clusters
• Sensitive to Outliers
• Curse of Dimensionality
Clustering Methods of the HPCC Systems Machine Learning
Library
28
[3]
KMeans vs. DBSCAN
 KMEANS
K = 3
Tolerance = 0.0
• Density-Based Clustering Method
• Highly Scalable Parallelized
• Parametric: epsilon, minPoints
• Sensitive to Initialization
• Random Shapes Clusters
• Outliers Detection
• Sensitive to Density Variance
• Curse of Dimensionality
Clustering Methods of the HPCC Systems Machine Learning
Library
29
KMeans vs. DBSCAN
 DBSCAN
Clustering Methods of the HPCC Systems Machine Learning
Library
30
KMeans vs. DBSCAN
KMean
s
DBSCAN
• Clusters Shape
• Cluster Size
• Model Parameters
• Number of Clusters
(Fixed vs. Variable)
• Outlier Detection
• Curse of Dimensionality
Clustering Methods of the HPCC Systems Machine Learning
Library
31
Recommendation SystemClustering Demographic/Geospatial Data
Application Domains
IMPORT KMeans as KM;
Clustering Methods of the HPCC Systems Machine Learning
Library
32
Model := KM.KMeans(Max_iterations,Tolerance).Fit( Samples,
InitialCentroids));
Easy to use
Step 1 Import K-Means bundle
Step 2 Train K-Means Model
Optional
Labels := KM.KMeans().Predict(Model, NewSamples);
Step 3 Predict the cluster index of the new samples (Optional)
Required
Clustering Methods of the HPCC Systems Machine Learning
Library
33
Easy to use – Cont.
Step 4 Visualization (Optional)
ECL Cloud IDE: KMeans Visualization
Clustering Methods of the HPCC Systems Machine Learning
Library
34
For more details
• Tutorial:
Automatically Cluster your Data with Massively Scalable K-Means
Link: https://hpccsystems.com/blog/kmeans
• Research Publication:
Massively Scalable Parallel KMeans on the HPCC Systems Platform
Lili Xu, Amy Apon, Flavio Villanustre, Roger Dev, Arjuna Chala
Clustering Methods of the HPCC Systems Machine Learning
Library
35
Reference
ECL-ML module: https://hpccsystems.com/ml
Download: https://hpccsystems.com/download/free-modules/machine-learning-library
Source code: https://github.com/hpcc-systems
Forum: http://hpccsystems.com/bb/viewforum.php?f=23
Contact us: Lili Xu
Software Engineer III
HPCC Systems
Lili.xu@lexisnexisrisk.com
Roger Dev
Sr. Architect
Machine Learning Library
roger.dev@lexisnexisrisk.co
m
Presentation Title Here (Insert Menu > Header & Footer > Apply) 36
View this presentation on YouTube:
https://www.youtube.com/watch?v=Z1A3nOuhv3A&list=PL-
8MJMUpp8IKH5-d56az56t52YccleX5h&index=11&t=43s (12:32)

Weitere ähnliche Inhalte

Was ist angesagt?

"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
Edge AI and Vision Alliance
 

Was ist angesagt? (20)

Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr..."Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
 
LeNet to ResNet
LeNet to ResNetLeNet to ResNet
LeNet to ResNet
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural Networks
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
 
TensorFlow Tutorial Part2
TensorFlow Tutorial Part2TensorFlow Tutorial Part2
TensorFlow Tutorial Part2
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localization
 
Finding the best solution for Image Processing
Finding the best solution for Image ProcessingFinding the best solution for Image Processing
Finding the best solution for Image Processing
 
Lecture 11 neural network principles
Lecture 11 neural network principlesLecture 11 neural network principles
Lecture 11 neural network principles
 
Efficient de cvpr_2020_paper
Efficient de cvpr_2020_paperEfficient de cvpr_2020_paper
Efficient de cvpr_2020_paper
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
Big data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesBig data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial Usecases
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingEfficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter Sharing
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
 
201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search
 
Robustness of compressed CNNs
Robustness of compressed CNNsRobustness of compressed CNNs
Robustness of compressed CNNs
 

Ähnlich wie Advancements in HPCC Systems Machine Learning

ExaLearn Overview - ECP Co-Design Center for Machine Learning
ExaLearn Overview - ECP Co-Design Center for Machine LearningExaLearn Overview - ECP Co-Design Center for Machine Learning
ExaLearn Overview - ECP Co-Design Center for Machine Learning
inside-BigData.com
 
Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017
Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017
Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017
MLconf
 

Ähnlich wie Advancements in HPCC Systems Machine Learning (20)

Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Parallel Distributed Deep Learning on HPCC Systems
Parallel Distributed Deep Learning on HPCC SystemsParallel Distributed Deep Learning on HPCC Systems
Parallel Distributed Deep Learning on HPCC Systems
 
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksA Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
 
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksA Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
 
CONVOLUTIONAL NEURAL NETWORKS: The workhorse of image and video
CONVOLUTIONAL NEURAL NETWORKS: The workhorse of image and videoCONVOLUTIONAL NEURAL NETWORKS: The workhorse of image and video
CONVOLUTIONAL NEURAL NETWORKS: The workhorse of image and video
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNetFrom Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
 
Studies of HPCC Systems from Machine Learning Perspectives
Studies of HPCC Systems from Machine Learning PerspectivesStudies of HPCC Systems from Machine Learning Perspectives
Studies of HPCC Systems from Machine Learning Perspectives
 
Deep Domain
Deep DomainDeep Domain
Deep Domain
 
ExaLearn Overview - ECP Co-Design Center for Machine Learning
ExaLearn Overview - ECP Co-Design Center for Machine LearningExaLearn Overview - ECP Co-Design Center for Machine Learning
ExaLearn Overview - ECP Co-Design Center for Machine Learning
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architecture
 
Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017
Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017
Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
TensorFlow.pptx
TensorFlow.pptxTensorFlow.pptx
TensorFlow.pptx
 
Cognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from MicrosoftCognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from Microsoft
 
Democratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDemocratizing machine learning on kubernetes
Democratizing machine learning on kubernetes
 
WINSEM2023-24_BCSE429L_TH_CH2023240501528_Reference_Material_III_S3-Homoheter...
WINSEM2023-24_BCSE429L_TH_CH2023240501528_Reference_Material_III_S3-Homoheter...WINSEM2023-24_BCSE429L_TH_CH2023240501528_Reference_Material_III_S3-Homoheter...
WINSEM2023-24_BCSE429L_TH_CH2023240501528_Reference_Material_III_S3-Homoheter...
 
Computer Design Concepts for Machine Learning
Computer Design Concepts for Machine LearningComputer Design Concepts for Machine Learning
Computer Design Concepts for Machine Learning
 
Mx net image segmentation to predict and diagnose the cardiac diseases karp...
Mx net image segmentation to predict and diagnose the cardiac diseases   karp...Mx net image segmentation to predict and diagnose the cardiac diseases   karp...
Mx net image segmentation to predict and diagnose the cardiac diseases karp...
 
A Neural Network that Understands Handwriting
A Neural Network that Understands HandwritingA Neural Network that Understands Handwriting
A Neural Network that Understands Handwriting
 

Mehr von HPCC Systems

Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
HPCC Systems
 

Mehr von HPCC Systems (20)

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
 
Welcome
WelcomeWelcome
Welcome
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
 
Path to 8.0
Path to 8.0 Path to 8.0
Path to 8.0
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
 
Docker Support
Docker Support Docker Support
Docker Support
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis Tool
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
 
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
 
Leveraging HPCC Systems as Part of an Information Security, Privacy, and Comp...
Leveraging HPCC Systems as Part of an Information Security, Privacy, and Comp...Leveraging HPCC Systems as Part of an Information Security, Privacy, and Comp...
Leveraging HPCC Systems as Part of an Information Security, Privacy, and Comp...
 

Kürzlich hochgeladen

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 

Kürzlich hochgeladen (20)

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 

Advancements in HPCC Systems Machine Learning

  • 1. 2019 HPCC Systems® Community Day Challenge Yourself – Challenge the Status Quo Lili Xu Software Engineer III HPCC Systems LexisNexis Risk Roger Dev Sr. Architect Machine Learning Library
  • 3. Overview • Theme: Expand the ML library to handle multimedia and unsupervised learning • Extended set of model evaluation metrics • Text Vectors – Machine Learning for textual data • Generalized Neural Networks (GNN) – ECL Deep Learning for Image and Video and more • Unsupervised Clustering • K-Means – Centroid based clustering • DBSCAN – Density based clustering Advancements in HPCC Systems Machine Learning 3
  • 4. Advancements in HPCC Systems Machine Learning HPCC Systems Machine Learning Library ML_Core PBblas LinearRegressi on LogisticRegressi on SVM GLM LearningTrees TextVectors BAS E Deep Learning SUPERVISED LEARNING DBSCAN K-Means UNSUPERVISED LEARNING GNN 4
  • 5. Previous Knowledg e Supervised Learning vs. Unsupervised Learning Advancements in HPCC Systems Machine Learning 5 Photo credit[1] Supervised Learning
  • 6. Supervised Learning vs. Unsupervised Learning Advancements in HPCC Systems Machine Learning 6 Photo credit[1] Unsupervised Learning
  • 7. Extended Evaluation Metrics Advancements in HPCC Systems Machine Learning 7
  • 8. Evaluation Metrics • Extensions to ML_Core to better evaluate the ML Models or compare alternative Models. • Done by our intern: A. Suryanarayanan (“Surya”) • Enhanced ML_Core Accuracy module: • Regression Accuracy • Standard Error • ANOVA (Analysis of Variance) • T-Statistic • P-value • Confidence Interval • R-Squared • Root Mean Squared Error • Akaike Information Criterion (AIC) Advancements in HPCC Systems Machine Learning 8
  • 9. Evaluation Metrics (cont’d) • Enhanced ML_Core Accuracy module (cont’d) • Classification Accuracy • Raw Accuracy • Power-of-Discrimination (PoD) • Extended Power-of-Discrimination (PoDE) • Confusion Matrix • Precision, Recall, False-Positive-Rate • Balanced F-Score – Combines Precision and Recall into a score for each class • Hamming Loss • Area Under Curve (AUC) • Clustering Accuracy • Silhouette Coefficient • Adjusted Rand Index Advancements in HPCC Systems Machine Learning 9
  • 10. Other • New Feature Selection Module • Chi Squared Feature Selection Test Advancements in HPCC Systems Machine Learning 10
  • 11. Advancements in HPCC Systems Machine Learning 11 For more details • Module Documentation: ML_Core.Accuracy ML_Core.FeatureSelection • Research Publication: Design and implementation of Machine Learning Evaluation Metrics on HPCC Systems A. Suryanarayanan, Arjuna Chala, Lili Xu, Shobha G, Jyothi Shetty, Roger Dev
  • 12. Text Vectors – Machine Learning for Free-form Text Advancements in HPCC Systems Machine Learning 12
  • 13. Introduction to Text Vectors • Fully unsupervised learning – Give it a Corpus (large body of text) and it will learn on its own. • Convert free-form text into numeric vectors to allow mathematical treatment of text. • Word Vectors • Sentence Vectors • Vector: An ordered list of numbers – A coordinate in N-dimensional space • (11.3, -2.5) – A two dimensional vector • (-.0138, .5247, .9831) – A three dimensional vector • (.1, -.3, -.1, … , .5) – An N dimensional vector • Text Vectors are typically between 20 and 1000 dimensional • Vectors that are close in space are also close in meaning. Advancements in HPCC Systems Machine Learning 13
  • 14. Text Vectorization – The theory Advancements in HPCC Systems Machine Learning 14 "You shall know a word by the company it keeps." - Linguist John Rupert Firth, 1957 Or more rigorously: "The meaning of a word is closely associated with the distribution of the words that surround it in coherent text."
  • 15. Optimizing Text Vectors Advancements in HPCC Systems Machine Learning 15 Dog Cat Piston
  • 16. Applications • Analysis of Free-form Text • Turn text into features for any ML algorithm • Classification of text (e.g. Positive, Negative, Neutral) • Find closest sentence to a new sentence • Free-form Search • Translation • Textual Mining • Many more undiscovered uses Advancements in HPCC Systems Machine Learning 16
  • 17. Advancements in HPCC Systems Machine Learning 17 For more details • Theory and Tutorial: Text Vectors – Machine Learning for Textual Data Link: https://hpccsystems.com/blog/textvectors
  • 18. Generalized Neural Networks (GNN) Advancements in HPCC Systems Machine Learning 18
  • 19. Introduction to GNN • Flexible ECL interface to Keras / Tensorflow. • Google’s Tensorflow is the most widely used Deep Learning framework • Keras is a high-level interface to Tensorflow (and other frameworks) and is the most widely used DL interface. It is included as a standard component of Tensorflow • Parallelized training using Batch Synchronous Network Optimization. • Provides full access to Keras Sequential Model capabilities. • Can handle nearly any style of Neural Network: • Classical (Densely connected) • Convolutional (commonly used for image processing) • Recurrent (used for video / time-series) • Auto-encoders (unsupervised training of weight vectors) • ECL Tensor module allows N dimensional datasets Advancements in HPCC Systems Machine Learning 19
  • 20. Tensors • Think of Tensors as N dimensional arrays or matrices. • A single number is a 0 dimensional Tensor • A vector is a 1 dimensional Tensor • A matrix is a 2 dimensional Tensor • For traditional ML, 2 dimensional works well. • Example: nObservations X nFeatures • For multi-media ML (e.g. images, video, time-series) more dimensions are required. E.g. • For color images: nObservations X pixel-width X pixel-height X 3 (i.e. Red/Green/Blue) • For video: nObservations X pixel-width X pixel-height X 3 X time-steps • The GNN Tensor module provides efficient storage and distribution for Tensor- based data of any dimension. Advancements in HPCC Systems Machine Learning 20
  • 21. GNNI • The GNNI module provides an easy to use interface for defining, training, and utilizing Neural Networks. • It handles all of the parallelization and distribution of data transparently. • Under the hood, a separate Keras / Tensorflow network is trained on each node, and the resulting weights are combined periodically. • Neural Networks and their training mechanism are defined using the same Python syntax as native Keras. • In native (Python) Keras, you create a Sequential Model and add layers one at a time. • In ECL, you create a list of layers (as Python text) and call DefineModel(…) with that list. • Input to training and prediction are via Tensors. Tensors are also used to get and set weights. Advancements in HPCC Systems Machine Learning 21
  • 22. • Non-sequential Models (Complex hybrid deep learning) • Support for textual data • Generative Adversarial Networks (GANs) and their derivatives Future Directions Advancements in HPCC Systems Machine Learning 22
  • 23. Applications • Machine Learning for Images, Video, or Time-series • Scoring (i.e. Regression) • Classification • Multivariate Optimization • Auto-encoders • Vectorization • Many others TBD Advancements in HPCC Systems Machine Learning 23
  • 24. Advancements in HPCC Systems Machine Learning 24 For more details • The Bundle will be releasing soon. • Look for the blog article announcing the release on hpccsystems.com >> Community >> Blog
  • 26. Clustering Methods in HPCC Systems : KMeans & DBSCAN • Unsupervised Machine Learning (ML) algorithms • Automatically find the clusters/groups of the data without previous knowledge • Highly Scalable Parallelized for Big Data machine learning challenge Clustering Methods of the HPCC Systems Machine Learning Library 26
  • 27. Applications Clustering Methods of the HPCC Systems Machine Learning Library 27 Image segmentationClaimCustomer segmentation Clustering gene expressions Eisen et al, PNAS 1998
  • 28. • Most popular clustering method • Highly Scalable Parallelized • Parametric: K, Tolerance • Sensitive to Initialization • Spherical Clusters • Sensitive to Outliers • Curse of Dimensionality Clustering Methods of the HPCC Systems Machine Learning Library 28 [3] KMeans vs. DBSCAN  KMEANS K = 3 Tolerance = 0.0
  • 29. • Density-Based Clustering Method • Highly Scalable Parallelized • Parametric: epsilon, minPoints • Sensitive to Initialization • Random Shapes Clusters • Outliers Detection • Sensitive to Density Variance • Curse of Dimensionality Clustering Methods of the HPCC Systems Machine Learning Library 29 KMeans vs. DBSCAN  DBSCAN
  • 30. Clustering Methods of the HPCC Systems Machine Learning Library 30 KMeans vs. DBSCAN KMean s DBSCAN • Clusters Shape • Cluster Size • Model Parameters • Number of Clusters (Fixed vs. Variable) • Outlier Detection • Curse of Dimensionality
  • 31. Clustering Methods of the HPCC Systems Machine Learning Library 31 Recommendation SystemClustering Demographic/Geospatial Data Application Domains
  • 32. IMPORT KMeans as KM; Clustering Methods of the HPCC Systems Machine Learning Library 32 Model := KM.KMeans(Max_iterations,Tolerance).Fit( Samples, InitialCentroids)); Easy to use Step 1 Import K-Means bundle Step 2 Train K-Means Model Optional Labels := KM.KMeans().Predict(Model, NewSamples); Step 3 Predict the cluster index of the new samples (Optional) Required
  • 33. Clustering Methods of the HPCC Systems Machine Learning Library 33 Easy to use – Cont. Step 4 Visualization (Optional) ECL Cloud IDE: KMeans Visualization
  • 34. Clustering Methods of the HPCC Systems Machine Learning Library 34 For more details • Tutorial: Automatically Cluster your Data with Massively Scalable K-Means Link: https://hpccsystems.com/blog/kmeans • Research Publication: Massively Scalable Parallel KMeans on the HPCC Systems Platform Lili Xu, Amy Apon, Flavio Villanustre, Roger Dev, Arjuna Chala
  • 35. Clustering Methods of the HPCC Systems Machine Learning Library 35 Reference ECL-ML module: https://hpccsystems.com/ml Download: https://hpccsystems.com/download/free-modules/machine-learning-library Source code: https://github.com/hpcc-systems Forum: http://hpccsystems.com/bb/viewforum.php?f=23 Contact us: Lili Xu Software Engineer III HPCC Systems Lili.xu@lexisnexisrisk.com Roger Dev Sr. Architect Machine Learning Library roger.dev@lexisnexisrisk.co m
  • 36. Presentation Title Here (Insert Menu > Header & Footer > Apply) 36 View this presentation on YouTube: https://www.youtube.com/watch?v=Z1A3nOuhv3A&list=PL- 8MJMUpp8IKH5-d56az56t52YccleX5h&index=11&t=43s (12:32)

Hinweis der Redaktion

  1. Last step is optional. if you want to predict other samples’s group relationship, you can simple use the model you already built in the last step and feed it together with the new sample set into the predict function. The results will give the group label of each new sample.
  2. Last step is optional. if you want to predict other samples’s group relationship, you can simple use the model you already built in the last step and feed it together with the new sample set into the predict function. The results will give the group label of each new sample.
  3. Now you understand how KMeans works, let’s take a look how to use this simple but powerful too to cluster your data in hpcc systems. It’s very easy, only three steps. Below steps assume that you already downloaded and installed KMeans bundle on the machine. You can go to our website for more details about this if you have questions. With kmeans bundle installed. Step 1, simply import KMeans bundle in your ECL code. Then step 2 is to train your model. Copy this code in your ECL code and then change the content in the parenthesis which we call model parameters. Let me explain the details of each parameters here. It’s easy to understand that the parameters such as sample set and centroid set. The sample set means the data points you want to find the groups in it. The centroidset is the centroids that you want to initialize with. Another parameter is the max_iterations. It defines the max number of iterations OUR MODEL can run. The last one is the tolerance, it defines the