SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Distributed Machine and
Deep Learning at Scale with
Apache Ignite
Yuriy Babak
Yuriy Babak
Head of ML/DL framework
development at GridGain
Apache Ignite committer
2019 © GridGain Systems
Agenda
● Introduction of distributed ML/DL
● Small math background
● Apache Ignite ML overview
● External integrations
2019 © GridGain Systems
Apache Ignite
Distributed ML
with Apache Ignite
2019 © GridGain Systems
Why do we need distributed ML?
● Data preprocessing
○ Big amount of data on different machines
○ High costs of ETL
● Model training
○ We cannot store all data on single server
○ Data irregularity
● Inference
○ Intensive data flow
○ Data co-located inference
○ Network latency
2019 © GridGain Systems
Why do we need distributed ML?
● Data preprocessing
○ Big amount of data on different machines
○ High costs of ETL
● Model training
○ We cannot store all data on single server
○ Data irregularity
● Inference
○ Intensive data flow
○ Data co-located inference
○ Network latency
2019 © GridGain Systems
Why do we need distributed ML?
● Data preprocessing
○ Big amount of data on different machines
○ High costs of ETL
● Model training
○ We cannot store all data on single server
○ Data irregularity
● Inference
○ Intensive data flow
○ Data co-located inference
○ Network latency
2019 © GridGain Systems
Common approach
Local calculations Result aggregationModel
Model update
2019 © GridGain Systems
Ignite partition-based distributed dataset
Partition Data Dataset Context Dataset Data
Upstream Cache Context Cache On-Heap
Learning Env
On-Heap
Durable Stateless Durable Recoverable
double[][] x = …
double[] y = ...
double[][] x = …
double[] y = ...
Partition Based Dataset StructuresSource Data
2019 © GridGain Systems
Let`s do the math!
2019 © GridGain Systems
Example: Gradient descent
• Model: Linear regression
f(x)
x
2019 © GridGain Systems
Example: Gradient Descent
2019 © GridGain Systems
• Model: Linear regression
• Loss function:
2019 © GridGain Systems
Example: Gradient Descent
2019 © GridGain Systems
• Model: Linear regression
• Loss function: MSE
• Gradient:
2019 © GridGain Systems
Example: Gradient Descent
2019 © GridGain Systems
• Model: Linear regression
• Loss function:
• Gradient:
2019 © GridGain Systems
Example: Random Forest
• Model: Random forest
2019 © GridGain Systems
2019 © GridGain Systems
Example: Random Forest
• Model: Random forest
• Loss function: min Gini coefficient
2019 © GridGain Systems
2019 © GridGain Systems
Example: Random Forest
• Model: Random forest
• Objective function: min Gini coefficient for each sub-tree
• Gradient: doesn’t exist (what shall we do?)
2019 © GridGain Systems
2019 © GridGain Systems
Example: Random Forest
2019 © GridGain Systems
2019 © GridGain Systems
Example: Random Forest
• Model: Random forest
• Objective function: min Gini coefficient for each sub-tree
• Gradient: doesn’t exist
• What to do: sum histograms
2019 © GridGain Systems
=+
2019 © GridGain Systems
Example: Random Forest
• Model: Random forest
• Objective function: min Gini coefficient for each sub-tree
• Gradient: doesn’t exist
• What to do:
– One can sum histograms
– Summarizing histograms - commutative and associative operation
– So you can count the histograms using map-reduce!
2019 © GridGain Systems
2019 © GridGain Systems
Example: Random Forest
• Model: Random forest
• Objective function: min Gini coefficient for each sub-tree
• Gradient: doesn’t exist
• What to do: approximation using histogram partitioning
• Code: RandomForestTrainer, RandomForestClassifierTrainer
2019 © GridGain Systems
Distributed ML
with Apache Ignite
2019 © GridGain Systems
What does Apache Ignite support?
• Classical ML
– classification
– regression
– clustering
2019 © GridGain Systems
2019 © GridGain Systems
Ignite ML API
Vectorizer<Integer, BinaryObject, String, Double> vectorizer =
new BinaryObjectVectorizer<Integer>(
"feature1",
“feature2”
).labeled("label");
LogisticRegressionSGDTrainer trainer = new LogisticRegressionSGDTrainer();
LogisticRegressionModel mdl = trainer.fit(ignite, dataCache, vectorizer);
2019 © GridGain Systems
What does Apache Ignite support?
• Classical ML
– classification
– regression
– clustering
• Preprocessing
– binarization, imputing
– normalization, scaling
– string encoding
• Pipelines
• Фреймворк для ансамблей
• Интеграция с Tensorflow
– как источник данных
– как менеджмент кластера
2019 © GridGain Systems
2019 © GridGain Systems
Preprocessing and Pipelines
Vectorizer<Integer, Vector, Integer, Double> vectorizer
= new DummyVectorizer<Integer>(0, 3, 4, 5, 6, 8, 10).labeled(1);
PipelineMdl<Integer, Vector> mdl =
new Pipeline<Integer, Vector, Integer, Double>()
.addVectorizer(vectorizer)
.addPreprocessingTrainer(new EncoderTrainer<Integer, Vector>()
.withEncoderType(EncoderType.STRING_ENCODER)
.withEncodedFeature(1)
.withEncodedFeature(6))
.addPreprocessingTrainer(new ImputerTrainer<Integer, Vector>())
.addPreprocessingTrainer(new MinMaxScalerTrainer<Integer, Vector>())
.addPreprocessingTrainer(new NormalizationTrainer<Integer, Vector>()
.withP(1))
.addTrainer(new DecisionTreeClassificationTrainer(5, 0))
.fit(ignite, dataCache);
2019 © GridGain Systems
What does Apache Ignite support?
• Classical ML
– classification
– regression
– clustering
• Preprocessing
– binarization, imputing
– normalization, scaling
– string encoding
• Pipelines
• Ensemble Framework
2019 © GridGain Systems
2019 © GridGain Systems
Bagging, Boosting and Stacking
DatasetTrainer<LogisticRegressionModel, Double> trainer =
new LogisticRegressionSGDTrainer(...)...;
BaggedTrainer<Double> baggedTrainer = TrainerTransformers.makeBagged(trainer,
// ensemble size, subsample ration, feature vector size, features subspace dim
7, 0.7, 2, 2,
new onMajorityPredictionsAggregator());
2019 © GridGain Systems
What does Apache Ignite support?
• Classical ML
– classification
– regression
– clustering
• Preprocessing
– binarization, imputing
– normalization, scaling
– string encoding
• Pipelines
• Ensemble Framework
• Continuous learning
2019 © GridGain Systems
2019 © GridGain Systems
Online learning
SVMLinearClassificationTrainer trainer = new SVMLinearClassificationTrainer();
SVMLinearClassificationModel mdl1 = trainer.fit(ignite, dataCache1, vectorizer);
SVMLinearClassificationModel mdl2 = trainer.update(mdl1, ignite, dataCache2,
vectorizer);
2019 © GridGain Systems
What does Apache Ignite support?
• Classical ML
– classification
– regression
– clustering
• Preprocessing
– binarization, imputing
– normalization, scaling
– string encoding
• Pipelines
• Ensemble Framework
• Continuous learning
• SQL in-place inference
2019 © GridGain Systems
2019 © GridGain Systems
IMDB with built-in ML
IgniteModelStorageUtil.saveModel(ignite, model, “titanik_model_tree”);
QueryCursor<List<?>> cursor = cache.query(new SqlFieldsQuery("select " +
"survived as truth, " +
"predict('titanik_model_tree', pclass, age, sibsp, parch, fare, case
sex when 'male' then 1 else 0 end) as prediction " +
"from titanik_train"))
2019 © GridGain Systems
External integrations
2019 © GridGain Systems
Model import
2019 © GridGain Systems
Inference in Ignite ML
2019 © GridGain Systems
Request queue
Response queue
Inference
Service
Inference
Service
Inference
Gateway
2019 © GridGain Systems
Import / Inference API
AsyncModelBuilder mdlBuilder = new IgniteDistributedModelBuilder(ignite, 4, 4);
XGModelParser parser = new XGModelParser();
try (Model<NamedVector, Future<Double>> mdl = mdlBuilder.build(reader, parser);
...
double prediction = mdl.predict(testObj).get();
}
2019 © GridGain Systems
TensorFlow on Apache Ignite
• Ignite Dataset
• IGFS Plugin
• Distributed Training
• More info here
#UnifiedAnalytics #SparkAISummit
>>> import tensorflow as tf
>>> from tensorflow.contrib.ignite import IgniteDataset
>>>
>>> dataset = IgniteDataset(cache_name="SQL_PUBLIC_KITTEN_CACHE")
>>> iterator = dataset.make_one_shot_iterator()
>>> next_obj = iterator.get_next()
>>>
>>> with tf.Session() as sess:
>>> for _ in range(3):
>>> print(sess.run(next_obj))
{'key': 1, 'val': {'NAME': b'WARM KITTY'}}
{'key': 2, 'val': {'NAME': b'SOFT KITTY'}}
{'key': 3, 'val': {'NAME': b'LITTLE BALL OF FUR'}}
2019 © GridGain Systems
Apache Ignite with GridGain ML Python API
GridGain ML client library provides user applications the ability to work
with GridGain ML functionality using Py4J as an integration mechanism.
If you want to use ggml in your project, you may install it from PyPI:
$ pip install ggml
2019 © GridGain Systems
Apache Ignite with GridGain ML Python API
GridGain ML client library provides user applications the ability to work
with GridGain ML functionality using Py4J as an integration mechanism.
If you want to use ggml in your project, you may install it from PyPI:
$ pip install ggml
NB: available only for Apache Ignite master and for GG 8.8
2019 © GridGain Systems
Apache Ignite with GridGain ML Python API
from ggml.regression import LinearRegressionTrainer
trainer = LinearRegressionTrainer()
model = trainer.fit(x_train, y_train)
r2_score(y_test, model.predict(x_test))
2019 © GridGain Systems
Apache Ignite ML Tutorial
https://github.com/apache/ignite/
org.apache.ignite.examples.ml.tutorial
2019 © GridGain Systems
2019 © GridGain Systems
Distributed Machine and Deep Learning
at Scale with Apache Ignite
Links:
• http://ignite.apache.org/
• https://medium.com/tensorflow/tensorflow-on-apache-ignite-99f1fc60efeb
• https://github.com/gridgain/ml-python-api
• http://bit.ly/DataNativesOslo
Email:
● user@ignite.apache.org
● dev@ignite.apache.org
● ybabak@gridgain.com
2019 © GridGain Systems
Preprocessing and Pipelines

Weitere ähnliche Inhalte

Was ist angesagt?

Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...Amazon Web Services
 
Google Cloud Platform Tutorial | GCP Fundamentals | Edureka
Google Cloud Platform Tutorial | GCP Fundamentals | EdurekaGoogle Cloud Platform Tutorial | GCP Fundamentals | Edureka
Google Cloud Platform Tutorial | GCP Fundamentals | EdurekaEdureka!
 
Deep learning acceleration with Amazon Elastic Inference
Deep learning acceleration with Amazon Elastic Inference  Deep learning acceleration with Amazon Elastic Inference
Deep learning acceleration with Amazon Elastic Inference Hagay Lupesko
 
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data PlatformBDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data PlatformBig Data Week
 
AWS CodeStar 및 Cloud9을 통한 서버리스(Serverless) 앱 개발 길잡이 - 윤석찬 (AWS 테크에반젤리스트)
AWS CodeStar 및 Cloud9을 통한 서버리스(Serverless) 앱 개발 길잡이 - 윤석찬 (AWS 테크에반젤리스트)AWS CodeStar 및 Cloud9을 통한 서버리스(Serverless) 앱 개발 길잡이 - 윤석찬 (AWS 테크에반젤리스트)
AWS CodeStar 및 Cloud9을 통한 서버리스(Serverless) 앱 개발 길잡이 - 윤석찬 (AWS 테크에반젤리스트)Amazon Web Services Korea
 
Special Purpose Quantum Annealing Quantum Computer v1.0
Special Purpose Quantum Annealing Quantum Computer v1.0Special Purpose Quantum Annealing Quantum Computer v1.0
Special Purpose Quantum Annealing Quantum Computer v1.0Aditya Yadav
 
Monitor the World: Meaningful Metrics for Containerized Apps and Clusters (CO...
Monitor the World: Meaningful Metrics for Containerized Apps and Clusters (CO...Monitor the World: Meaningful Metrics for Containerized Apps and Clusters (CO...
Monitor the World: Meaningful Metrics for Containerized Apps and Clusters (CO...Amazon Web Services
 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applicationsKexin Xie
 
Easy and Efficient Batch Computing on AWS
Easy and Efficient Batch Computing on AWSEasy and Efficient Batch Computing on AWS
Easy and Efficient Batch Computing on AWSAmazon Web Services
 
(CMP202) Engineering Simulation and Analysis in the Cloud
(CMP202) Engineering Simulation and Analysis in the Cloud(CMP202) Engineering Simulation and Analysis in the Cloud
(CMP202) Engineering Simulation and Analysis in the CloudAmazon Web Services
 
Optimize Your Amazon ECS Environment
Optimize Your Amazon ECS EnvironmentOptimize Your Amazon ECS Environment
Optimize Your Amazon ECS EnvironmentAmazon Web Services
 
Engineering Genomic Big Data Analytics at A Global Scale
Engineering Genomic Big Data Analytics at A Global ScaleEngineering Genomic Big Data Analytics at A Global Scale
Engineering Genomic Big Data Analytics at A Global ScaleAmazon Web Services
 
Titan and Cassandra at WellAware
Titan and Cassandra at WellAwareTitan and Cassandra at WellAware
Titan and Cassandra at WellAwaretwilmes
 
Top 13 best security practices for Azure
Top 13 best security practices for AzureTop 13 best security practices for Azure
Top 13 best security practices for AzureRadu Vunvulea
 
Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...
Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...
Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...Amazon Web Services
 
PAC 2019 virtual Stefano Doni
PAC 2019 virtual Stefano Doni   PAC 2019 virtual Stefano Doni
PAC 2019 virtual Stefano Doni Neotys
 
Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudAmazon Web Services
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningDataWorks Summit/Hadoop Summit
 

Was ist angesagt? (20)

Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
 
Google Cloud Platform Tutorial | GCP Fundamentals | Edureka
Google Cloud Platform Tutorial | GCP Fundamentals | EdurekaGoogle Cloud Platform Tutorial | GCP Fundamentals | Edureka
Google Cloud Platform Tutorial | GCP Fundamentals | Edureka
 
Data Science on Google Cloud Platform
Data Science on Google Cloud PlatformData Science on Google Cloud Platform
Data Science on Google Cloud Platform
 
Deep learning acceleration with Amazon Elastic Inference
Deep learning acceleration with Amazon Elastic Inference  Deep learning acceleration with Amazon Elastic Inference
Deep learning acceleration with Amazon Elastic Inference
 
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data PlatformBDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
 
AWS CodeStar 및 Cloud9을 통한 서버리스(Serverless) 앱 개발 길잡이 - 윤석찬 (AWS 테크에반젤리스트)
AWS CodeStar 및 Cloud9을 통한 서버리스(Serverless) 앱 개발 길잡이 - 윤석찬 (AWS 테크에반젤리스트)AWS CodeStar 및 Cloud9을 통한 서버리스(Serverless) 앱 개발 길잡이 - 윤석찬 (AWS 테크에반젤리스트)
AWS CodeStar 및 Cloud9을 통한 서버리스(Serverless) 앱 개발 길잡이 - 윤석찬 (AWS 테크에반젤리스트)
 
Special Purpose Quantum Annealing Quantum Computer v1.0
Special Purpose Quantum Annealing Quantum Computer v1.0Special Purpose Quantum Annealing Quantum Computer v1.0
Special Purpose Quantum Annealing Quantum Computer v1.0
 
Amazon Aurora 深度探討
Amazon Aurora 深度探討Amazon Aurora 深度探討
Amazon Aurora 深度探討
 
Monitor the World: Meaningful Metrics for Containerized Apps and Clusters (CO...
Monitor the World: Meaningful Metrics for Containerized Apps and Clusters (CO...Monitor the World: Meaningful Metrics for Containerized Apps and Clusters (CO...
Monitor the World: Meaningful Metrics for Containerized Apps and Clusters (CO...
 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applications
 
Easy and Efficient Batch Computing on AWS
Easy and Efficient Batch Computing on AWSEasy and Efficient Batch Computing on AWS
Easy and Efficient Batch Computing on AWS
 
(CMP202) Engineering Simulation and Analysis in the Cloud
(CMP202) Engineering Simulation and Analysis in the Cloud(CMP202) Engineering Simulation and Analysis in the Cloud
(CMP202) Engineering Simulation and Analysis in the Cloud
 
Optimize Your Amazon ECS Environment
Optimize Your Amazon ECS EnvironmentOptimize Your Amazon ECS Environment
Optimize Your Amazon ECS Environment
 
Engineering Genomic Big Data Analytics at A Global Scale
Engineering Genomic Big Data Analytics at A Global ScaleEngineering Genomic Big Data Analytics at A Global Scale
Engineering Genomic Big Data Analytics at A Global Scale
 
Titan and Cassandra at WellAware
Titan and Cassandra at WellAwareTitan and Cassandra at WellAware
Titan and Cassandra at WellAware
 
Top 13 best security practices for Azure
Top 13 best security practices for AzureTop 13 best security practices for Azure
Top 13 best security practices for Azure
 
Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...
Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...
Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...
 
PAC 2019 virtual Stefano Doni
PAC 2019 virtual Stefano Doni   PAC 2019 virtual Stefano Doni
PAC 2019 virtual Stefano Doni
 
Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS Cloud
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine Learning
 

Ähnlich wie Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with Apache Ignite" - Yuriy Babak

Continuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache IgniteContinuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache IgniteDenis Magda
 
Python tutorial for ML
Python tutorial for MLPython tutorial for ML
Python tutorial for MLBin Han
 
In-Memory Computing Essentials for Software Engineers
In-Memory Computing Essentials for Software EngineersIn-Memory Computing Essentials for Software Engineers
In-Memory Computing Essentials for Software EngineersDenis Magda
 
Machine Learning for Capacity Management
 Machine Learning for Capacity Management Machine Learning for Capacity Management
Machine Learning for Capacity ManagementEDB
 
Deploying MariaDB for HA on Google Cloud Platform
Deploying MariaDB for HA on Google Cloud PlatformDeploying MariaDB for HA on Google Cloud Platform
Deploying MariaDB for HA on Google Cloud PlatformMariaDB plc
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformDeepak Chandramouli
 
Ippon Technologies: Multi-criteria queries on a Cassandra application
Ippon Technologies: Multi-criteria queries on a Cassandra applicationIppon Technologies: Multi-criteria queries on a Cassandra application
Ippon Technologies: Multi-criteria queries on a Cassandra applicationDataStax Academy
 
Multi criteria queries on a cassandra application
Multi criteria queries on a cassandra applicationMulti criteria queries on a cassandra application
Multi criteria queries on a cassandra applicationIppon
 
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)Jasjeet Thind
 
MongoDB World 2019: Unleash the Power of the MongoDB Aggregation Framework
MongoDB World 2019: Unleash the Power of the MongoDB Aggregation FrameworkMongoDB World 2019: Unleash the Power of the MongoDB Aggregation Framework
MongoDB World 2019: Unleash the Power of the MongoDB Aggregation FrameworkMongoDB
 
Distributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with HorovodDistributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with HorovodLin Yuan
 
Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...HostedbyConfluent
 
Using Grid Technologies in the Cloud for High Scalability
Using Grid Technologies in the Cloud for High ScalabilityUsing Grid Technologies in the Cloud for High Scalability
Using Grid Technologies in the Cloud for High Scalabilitymabuhr
 
Spark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit
 
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET Journal
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform Seldon
 
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...KNIMESlides
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and EngineeringVijayananda Mohire
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and EngineeringVijayananda Mohire
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLDESMOND YUEN
 

Ähnlich wie Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with Apache Ignite" - Yuriy Babak (20)

Continuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache IgniteContinuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache Ignite
 
Python tutorial for ML
Python tutorial for MLPython tutorial for ML
Python tutorial for ML
 
In-Memory Computing Essentials for Software Engineers
In-Memory Computing Essentials for Software EngineersIn-Memory Computing Essentials for Software Engineers
In-Memory Computing Essentials for Software Engineers
 
Machine Learning for Capacity Management
 Machine Learning for Capacity Management Machine Learning for Capacity Management
Machine Learning for Capacity Management
 
Deploying MariaDB for HA on Google Cloud Platform
Deploying MariaDB for HA on Google Cloud PlatformDeploying MariaDB for HA on Google Cloud Platform
Deploying MariaDB for HA on Google Cloud Platform
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic Platform
 
Ippon Technologies: Multi-criteria queries on a Cassandra application
Ippon Technologies: Multi-criteria queries on a Cassandra applicationIppon Technologies: Multi-criteria queries on a Cassandra application
Ippon Technologies: Multi-criteria queries on a Cassandra application
 
Multi criteria queries on a cassandra application
Multi criteria queries on a cassandra applicationMulti criteria queries on a cassandra application
Multi criteria queries on a cassandra application
 
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
 
MongoDB World 2019: Unleash the Power of the MongoDB Aggregation Framework
MongoDB World 2019: Unleash the Power of the MongoDB Aggregation FrameworkMongoDB World 2019: Unleash the Power of the MongoDB Aggregation Framework
MongoDB World 2019: Unleash the Power of the MongoDB Aggregation Framework
 
Distributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with HorovodDistributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with Horovod
 
Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...
 
Using Grid Technologies in the Cloud for High Scalability
Using Grid Technologies in the Cloud for High ScalabilityUsing Grid Technologies in the Cloud for High Scalability
Using Grid Technologies in the Cloud for High Scalability
 
Spark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos Erotocritou
 
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
 
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
 

Mehr von Dataconomy Media

Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & David An...
Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & 	David An...Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & 	David An...
Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & David An...Dataconomy Media
 
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...Dataconomy Media
 
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...Dataconomy Media
 
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...Dataconomy Media
 
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...
Data Natives meets DataRobot |  "Build and deploy an anti-money laundering mo...Data Natives meets DataRobot |  "Build and deploy an anti-money laundering mo...
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...Dataconomy Media
 
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...Dataconomy Media
 
Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...
Data Natives Vienna v 7.0  | "Building Kubernetes Operators with KUDO for Dat...Data Natives Vienna v 7.0  | "Building Kubernetes Operators with KUDO for Dat...
Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...Dataconomy Media
 
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...Dataconomy Media
 
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...
Data Natives Cologne v 4.0  | "The Data Lorax: Planting the Seeds of Fairness...Data Natives Cologne v 4.0  | "The Data Lorax: Planting the Seeds of Fairness...
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...Dataconomy Media
 
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...Dataconomy Media
 
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...Dataconomy Media
 
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...Dataconomy Media
 
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...Dataconomy Media
 
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...Dataconomy Media
 
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...Dataconomy Media
 
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...Dataconomy Media
 
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...Dataconomy Media
 
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...Dataconomy Media
 
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Dataconomy Media
 
Big Data Helsinki v 3 | "What you should know about PSD2 APIs?" - Joonas Tomperi
Big Data Helsinki v 3 | "What you should know about PSD2 APIs?" - Joonas TomperiBig Data Helsinki v 3 | "What you should know about PSD2 APIs?" - Joonas Tomperi
Big Data Helsinki v 3 | "What you should know about PSD2 APIs?" - Joonas TomperiDataconomy Media
 

Mehr von Dataconomy Media (20)

Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & David An...
Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & 	David An...Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & 	David An...
Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & David An...
 
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
 
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
 
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
 
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...
Data Natives meets DataRobot |  "Build and deploy an anti-money laundering mo...Data Natives meets DataRobot |  "Build and deploy an anti-money laundering mo...
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...
 
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
 
Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...
Data Natives Vienna v 7.0  | "Building Kubernetes Operators with KUDO for Dat...Data Natives Vienna v 7.0  | "Building Kubernetes Operators with KUDO for Dat...
Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...
 
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
 
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...
Data Natives Cologne v 4.0  | "The Data Lorax: Planting the Seeds of Fairness...Data Natives Cologne v 4.0  | "The Data Lorax: Planting the Seeds of Fairness...
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...
 
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
 
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
 
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
 
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
 
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
 
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
 
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
 
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
 
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
 
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
 
Big Data Helsinki v 3 | "What you should know about PSD2 APIs?" - Joonas Tomperi
Big Data Helsinki v 3 | "What you should know about PSD2 APIs?" - Joonas TomperiBig Data Helsinki v 3 | "What you should know about PSD2 APIs?" - Joonas Tomperi
Big Data Helsinki v 3 | "What you should know about PSD2 APIs?" - Joonas Tomperi
 

Kürzlich hochgeladen

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 

Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with Apache Ignite" - Yuriy Babak

  • 1. Distributed Machine and Deep Learning at Scale with Apache Ignite Yuriy Babak
  • 2. Yuriy Babak Head of ML/DL framework development at GridGain Apache Ignite committer
  • 3. 2019 © GridGain Systems Agenda ● Introduction of distributed ML/DL ● Small math background ● Apache Ignite ML overview ● External integrations
  • 4. 2019 © GridGain Systems Apache Ignite
  • 6. 2019 © GridGain Systems Why do we need distributed ML? ● Data preprocessing ○ Big amount of data on different machines ○ High costs of ETL ● Model training ○ We cannot store all data on single server ○ Data irregularity ● Inference ○ Intensive data flow ○ Data co-located inference ○ Network latency
  • 7. 2019 © GridGain Systems Why do we need distributed ML? ● Data preprocessing ○ Big amount of data on different machines ○ High costs of ETL ● Model training ○ We cannot store all data on single server ○ Data irregularity ● Inference ○ Intensive data flow ○ Data co-located inference ○ Network latency
  • 8. 2019 © GridGain Systems Why do we need distributed ML? ● Data preprocessing ○ Big amount of data on different machines ○ High costs of ETL ● Model training ○ We cannot store all data on single server ○ Data irregularity ● Inference ○ Intensive data flow ○ Data co-located inference ○ Network latency
  • 9. 2019 © GridGain Systems Common approach Local calculations Result aggregationModel Model update
  • 10. 2019 © GridGain Systems Ignite partition-based distributed dataset Partition Data Dataset Context Dataset Data Upstream Cache Context Cache On-Heap Learning Env On-Heap Durable Stateless Durable Recoverable double[][] x = … double[] y = ... double[][] x = … double[] y = ... Partition Based Dataset StructuresSource Data
  • 11. 2019 © GridGain Systems Let`s do the math!
  • 12. 2019 © GridGain Systems Example: Gradient descent • Model: Linear regression f(x) x
  • 13. 2019 © GridGain Systems Example: Gradient Descent 2019 © GridGain Systems • Model: Linear regression • Loss function:
  • 14. 2019 © GridGain Systems Example: Gradient Descent 2019 © GridGain Systems • Model: Linear regression • Loss function: MSE • Gradient:
  • 15. 2019 © GridGain Systems Example: Gradient Descent 2019 © GridGain Systems • Model: Linear regression • Loss function: • Gradient:
  • 16. 2019 © GridGain Systems Example: Random Forest • Model: Random forest 2019 © GridGain Systems
  • 17. 2019 © GridGain Systems Example: Random Forest • Model: Random forest • Loss function: min Gini coefficient 2019 © GridGain Systems
  • 18. 2019 © GridGain Systems Example: Random Forest • Model: Random forest • Objective function: min Gini coefficient for each sub-tree • Gradient: doesn’t exist (what shall we do?) 2019 © GridGain Systems
  • 19. 2019 © GridGain Systems Example: Random Forest 2019 © GridGain Systems
  • 20. 2019 © GridGain Systems Example: Random Forest • Model: Random forest • Objective function: min Gini coefficient for each sub-tree • Gradient: doesn’t exist • What to do: sum histograms 2019 © GridGain Systems =+
  • 21. 2019 © GridGain Systems Example: Random Forest • Model: Random forest • Objective function: min Gini coefficient for each sub-tree • Gradient: doesn’t exist • What to do: – One can sum histograms – Summarizing histograms - commutative and associative operation – So you can count the histograms using map-reduce! 2019 © GridGain Systems
  • 22. 2019 © GridGain Systems Example: Random Forest • Model: Random forest • Objective function: min Gini coefficient for each sub-tree • Gradient: doesn’t exist • What to do: approximation using histogram partitioning • Code: RandomForestTrainer, RandomForestClassifierTrainer 2019 © GridGain Systems
  • 24. 2019 © GridGain Systems What does Apache Ignite support? • Classical ML – classification – regression – clustering 2019 © GridGain Systems
  • 25. 2019 © GridGain Systems Ignite ML API Vectorizer<Integer, BinaryObject, String, Double> vectorizer = new BinaryObjectVectorizer<Integer>( "feature1", “feature2” ).labeled("label"); LogisticRegressionSGDTrainer trainer = new LogisticRegressionSGDTrainer(); LogisticRegressionModel mdl = trainer.fit(ignite, dataCache, vectorizer);
  • 26. 2019 © GridGain Systems What does Apache Ignite support? • Classical ML – classification – regression – clustering • Preprocessing – binarization, imputing – normalization, scaling – string encoding • Pipelines • Фреймворк для ансамблей • Интеграция с Tensorflow – как источник данных – как менеджмент кластера 2019 © GridGain Systems
  • 27. 2019 © GridGain Systems Preprocessing and Pipelines Vectorizer<Integer, Vector, Integer, Double> vectorizer = new DummyVectorizer<Integer>(0, 3, 4, 5, 6, 8, 10).labeled(1); PipelineMdl<Integer, Vector> mdl = new Pipeline<Integer, Vector, Integer, Double>() .addVectorizer(vectorizer) .addPreprocessingTrainer(new EncoderTrainer<Integer, Vector>() .withEncoderType(EncoderType.STRING_ENCODER) .withEncodedFeature(1) .withEncodedFeature(6)) .addPreprocessingTrainer(new ImputerTrainer<Integer, Vector>()) .addPreprocessingTrainer(new MinMaxScalerTrainer<Integer, Vector>()) .addPreprocessingTrainer(new NormalizationTrainer<Integer, Vector>() .withP(1)) .addTrainer(new DecisionTreeClassificationTrainer(5, 0)) .fit(ignite, dataCache);
  • 28. 2019 © GridGain Systems What does Apache Ignite support? • Classical ML – classification – regression – clustering • Preprocessing – binarization, imputing – normalization, scaling – string encoding • Pipelines • Ensemble Framework 2019 © GridGain Systems
  • 29. 2019 © GridGain Systems Bagging, Boosting and Stacking DatasetTrainer<LogisticRegressionModel, Double> trainer = new LogisticRegressionSGDTrainer(...)...; BaggedTrainer<Double> baggedTrainer = TrainerTransformers.makeBagged(trainer, // ensemble size, subsample ration, feature vector size, features subspace dim 7, 0.7, 2, 2, new onMajorityPredictionsAggregator());
  • 30. 2019 © GridGain Systems What does Apache Ignite support? • Classical ML – classification – regression – clustering • Preprocessing – binarization, imputing – normalization, scaling – string encoding • Pipelines • Ensemble Framework • Continuous learning 2019 © GridGain Systems
  • 31. 2019 © GridGain Systems Online learning SVMLinearClassificationTrainer trainer = new SVMLinearClassificationTrainer(); SVMLinearClassificationModel mdl1 = trainer.fit(ignite, dataCache1, vectorizer); SVMLinearClassificationModel mdl2 = trainer.update(mdl1, ignite, dataCache2, vectorizer);
  • 32. 2019 © GridGain Systems What does Apache Ignite support? • Classical ML – classification – regression – clustering • Preprocessing – binarization, imputing – normalization, scaling – string encoding • Pipelines • Ensemble Framework • Continuous learning • SQL in-place inference 2019 © GridGain Systems
  • 33. 2019 © GridGain Systems IMDB with built-in ML IgniteModelStorageUtil.saveModel(ignite, model, “titanik_model_tree”); QueryCursor<List<?>> cursor = cache.query(new SqlFieldsQuery("select " + "survived as truth, " + "predict('titanik_model_tree', pclass, age, sibsp, parch, fare, case sex when 'male' then 1 else 0 end) as prediction " + "from titanik_train")) 2019 © GridGain Systems
  • 35. 2019 © GridGain Systems Model import
  • 36. 2019 © GridGain Systems Inference in Ignite ML 2019 © GridGain Systems Request queue Response queue Inference Service Inference Service Inference Gateway
  • 37. 2019 © GridGain Systems Import / Inference API AsyncModelBuilder mdlBuilder = new IgniteDistributedModelBuilder(ignite, 4, 4); XGModelParser parser = new XGModelParser(); try (Model<NamedVector, Future<Double>> mdl = mdlBuilder.build(reader, parser); ... double prediction = mdl.predict(testObj).get(); }
  • 38. 2019 © GridGain Systems TensorFlow on Apache Ignite • Ignite Dataset • IGFS Plugin • Distributed Training • More info here #UnifiedAnalytics #SparkAISummit >>> import tensorflow as tf >>> from tensorflow.contrib.ignite import IgniteDataset >>> >>> dataset = IgniteDataset(cache_name="SQL_PUBLIC_KITTEN_CACHE") >>> iterator = dataset.make_one_shot_iterator() >>> next_obj = iterator.get_next() >>> >>> with tf.Session() as sess: >>> for _ in range(3): >>> print(sess.run(next_obj)) {'key': 1, 'val': {'NAME': b'WARM KITTY'}} {'key': 2, 'val': {'NAME': b'SOFT KITTY'}} {'key': 3, 'val': {'NAME': b'LITTLE BALL OF FUR'}}
  • 39. 2019 © GridGain Systems Apache Ignite with GridGain ML Python API GridGain ML client library provides user applications the ability to work with GridGain ML functionality using Py4J as an integration mechanism. If you want to use ggml in your project, you may install it from PyPI: $ pip install ggml
  • 40. 2019 © GridGain Systems Apache Ignite with GridGain ML Python API GridGain ML client library provides user applications the ability to work with GridGain ML functionality using Py4J as an integration mechanism. If you want to use ggml in your project, you may install it from PyPI: $ pip install ggml NB: available only for Apache Ignite master and for GG 8.8
  • 41. 2019 © GridGain Systems Apache Ignite with GridGain ML Python API from ggml.regression import LinearRegressionTrainer trainer = LinearRegressionTrainer() model = trainer.fit(x_train, y_train) r2_score(y_test, model.predict(x_test))
  • 42. 2019 © GridGain Systems Apache Ignite ML Tutorial https://github.com/apache/ignite/ org.apache.ignite.examples.ml.tutorial 2019 © GridGain Systems
  • 43. 2019 © GridGain Systems Distributed Machine and Deep Learning at Scale with Apache Ignite Links: • http://ignite.apache.org/ • https://medium.com/tensorflow/tensorflow-on-apache-ignite-99f1fc60efeb • https://github.com/gridgain/ml-python-api • http://bit.ly/DataNativesOslo Email: ● user@ignite.apache.org ● dev@ignite.apache.org ● ybabak@gridgain.com
  • 44. 2019 © GridGain Systems Preprocessing and Pipelines