SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
Real World Machine
Learning in Java 8 at
Fumankaitori.com
Mathieu Dumoulin, Chief Data Scientist fumankaitori.com,
Data Science Team manager at en-japan
Today’s menu
● About me and 不満買取センータ
● The business problem: Post pricing
● Project Overview
○ Why use ML
○ How to use ML in projects
○ How we used ML in this project
● Results
● Live code (depends on time)
● Conclusion
Presentation goals
● Machine learning is possible by any Java Engineer
● Java is a great programming language for real-
world machine learning systems
● New ML APIs make it easy to focus on the problem
and the data, and get a well-performing model “for
free”
● You don’t need a ph.D. to use machine learning,
just some self-study, good tools and libraries and
build experience one project at a time
About me
Google map for Quebec City
here!
My Work: Java SE, Hadoop Engineer, Data Scientist
● Launched in Mar 2015. Provide web/Android/iOS
applications.
● An application to collect data about people's
dissatisfactions.
● Features:
○ Users can post any dissatisfaction of any products/services.
○ Users get points as a reward for their posts. And the point is
exchangeable with coupon code of EC sites.
● 250,000 users with 1,500,000 posts (accumulated)
(end of Nov 2015)
Problem statement: post point value prediction
● Fuman user posts have a money value
● We want to give more points for “good”
posts
● At first, operations staff checked all
posts, but they can’t check 10,000 posts
each day...
We made rules, but point value was worse:
● Rules can’t check the content of the posts
● Rules always miss something
● Making hundreds or thousands of rules by
hand is ridiculous
ML is the best solution for 不満買取センター
● ML Problem: Estimate the point value of a user posts (0-25)
● Project goal: Estimate the value of posts with less than 5 points
difference from human judgement
● Data: All user posts and user profile data
● Data with known output (labels): staff already set points for 200k
posts manually
This is a classic case of supervised learning (Wiki). Another reference from Microsoft
Prediction of a price requires to build a Regression model because the prediction is a number, as
opposed to a classification problem which predicts which of two classes each post would belong to.
Real world ML project overview
● Machine Learning Workflow
● Data Scientist and Java Engineer roles
● Java for production ML
● Java 8 benefits
● Our point prediction system details
● Results
Machine Learning Workflow
Load data
Extract Features
Train Model
Evaluate vs. business goal
Load new data
Extract Features
Predict using model
Act on prediction
data, labels (known result)
feature vectors, labels
prediction, labels
data
feature vectors
predictions
iterate
best model
the same
Workflow for machine learning system
1. Set a goal with business
value
2. Get data (fuman user
posts) with a price
already set
3. Transform data for input
into machine learning
algorithm
4. Train and evaluate
machine learning model
until reach goal
5. Deploy best model
Data Scientist’s role
1. Set a goal with business
value
2. Get data (fuman user
posts) with a price
already set
3. Transform data for input
into machine learning
algorithm
4. Train and evaluate
machine learning model
until reach goal
5. Deploy best model
Choose features
Build many models
Software Engineer’s role
Implement and integrate into production system
1. Set a goal with business
value
2. Get data (fuman user
posts) with a price
already set
3. Transform data for input
into machine learning
algorithm
4. Train and evaluate
machine learning model
until reach goal
5. Deploy best model
Get data from data source
Implement production code
But we don’t have a data scientist...
You can outsource!
Java for production ML
● Easy integration with Java applications
● Fast (vs. Python or R)
● Easy to program (vs. C++)
● Most common enterprise programming language, IDE support and excellent
support libraries
● Lots of state of the art machine learning libraries have a Java API
Machine Learning libraries
Benefits of Java 8
● Java 8’s functional style is a very good match with ML operations
a. Feature extraction: data in → transform → data out
● Java 8’s streams and Lambdas
a. Code is easier to understand and less verbose
● Easy parallel code
a. Faster “for free”
Post point prediction system: step by step
Feature
Extraction
Fuman
DB
Prediction Service
● Train/Test split
● Categorical features
transformation
● Select best features
● Try many algorithms
● Tune algorithms
● Evaluate models
● REST Prediction API
Iterate until results
meet business goals
CSV format
DR Prediction
API
posts, label
Feature Extraction details
● We added character and words statistics about each fuman user post
○ Number of hiragana, katakana, kanji, alphabet characters and words
○ Number of words, length of words
○ Ratio of hiragana, katakana, kanji, alphabet words to the number of tokens in a
post
● User profile information
○ age, gender, job category, etc.
● Bag-of-word models:
○ Words using Tf-Idf, removing stopwords (これ、あれ、それ、です、など、 …)
○ Part-of-speech (名詞、動詞、形容詞、 …)
○ Word types features (hiragana word, katakana word, kanji word, …)
マックのポテト揚げたてでお願いしたのに、揚げたてじゃ
なかった。
Feature Extraction: Example
Feature Example: MeCab analyzer
マックのポテト揚げたてでお願いしたのに、揚げたてじゃなかった。
マック 名詞,固有名詞,一般,*,*,*,マック,マック,マック
の 助詞,連体化,*,*,*,*,の,ノ,ノ
ポテト 名詞,一般,*,*,*,*,ポテト,ポテト,ポテト
揚げたて 名詞,一般,*,*,*,*,揚げたて,アゲタテ,アゲタテ
で 助詞,格助詞,一般,*,*,*,で,デ,デ
お願い 名詞,サ変接続,*,*,*,*,お願い,オネガイ,オネガイ
し 動詞,自立,*,*,サ変・スル,連用形,する,シ,シ
た 助動詞,*,*,*,特殊・タ,基本形,た,タ,タ
のに 助詞,接続助詞,*,*,*,*,のに,ノニ,ノニ
、 記号,読点,*,*,*,*,、,、,、
揚げたて 名詞,一般,*,*,*,*,揚げたて,アゲタテ,アゲタテ
じゃ 助詞,副助詞,*,*,*,*,じゃ,ジャ,ジャ
なかっ 助動詞,*,*,*,特殊・ナイ,連用タ接続,ない,ナカッ,ナカッ
た 助動詞,*,*,*,特殊・タ,基本形,た,タ,タ
。 記号,句点,*,*,*,*,。,。,。
EOS
Feature Extraction: Example
Character counts
Hiragana: 20
Katakana: 6
Kanji: 3
Alpha: 0
Digits: 0
Marks (!,?): 0
Token type counts
Hiragana: 8
Katakana: 2
Kanji: 3
Alpha: 0
Digits: 0
Marks: 0
Token length
1: 5
2: 2
3: 4
4: 2
5+: 0
Training and evaluation of our model
We reached the project goal!
● DataRobot’s best model
○ eXtreme Gradient Boosted Trees
○ RMSE: 3.54
○ MSE: 12.53
Business result:
● Higher quality evaluation than rules
● Operation staff don’t need to manually check posts
● We can validate points every day
Our result: 3.5 point difference from human judgement
Deployment issues
● Problem: The Prediction API was very slow (>1s / post) so we
had to run it as a batch process each night.
● We want: Make predictions locally with low latency, without losing
the good prediction performance we already have.
We solved this problem using the
excellent open source, distributed
machine learning library H2
O by H2o.ai.
Co-founder: Cliff Click, who made the
Java HotSpot Server Compiler
Post point prediction system: Current system
Feature
Extraction
Fuman
DB
Prediction Service
Prediction
POJO
● Train/Test split
● Categorical features
transformation
● Distributed, fast and state
of the art algorithms
● POJO prediction class
generation
CSV formatposts, label
Fuman Webapp
get new post
values
make feature
vectors
Train Production Model: H2
O
Overview: Making Predictions
● Use the prediction POJO generated
by H2O
● For each new post query Prediction
Service
○ Convert to vector (Double[] for H2O)
○ Get prediction from prediction POJO
(Double value, round to integer)
○ Update database with predicted price
We reached the business goal!
Project goal: Get similar performance from H2O as from DataRobot
H2O is not ideal to explore different models and features, but for
production, it is FAST with similar predictive performance. It is
implemented in pure Java (Github).
● H2O: Train a new model for
production
○ GBM (Gradient Boosting Machine)
○ MSE: 12.8
● DataRobot’s best model
○ eXtreme Gradient Boosted Trees
○ RMSE: 3.54
○ MSE: 12.53
Real world ML loves Java!
● Java is a top choice for making production machine
learning systems
● Benefits of Java 8 makes Java fun and relevant again
● Integration in a Java web application was not hard
● Java is not a good choice for experimentation
○ Start with a Python prototype with Scikit-learn
○ Use a Machine Learning service like DataRobot.com
You can use ML in your projects!
● Web API services are like a personal data
scientist
○ No need for Data Scientist for simple use of ML
○ But harder dataset will need expertise
● Real world ML projects needs Engineers:
○ Get data to train a good model (log files, sales results,
mail campaign results,…)
○ Transform data into input for ML library or web service
○ Deploy and integrate into production
● Most steps are just normal programming
○ Get data from DB
○ Transform data into a CSV
○ Call a REST API or Java POJO to make predictions
○ Integrate with the system that needs predictions
Questions?
Live code
Feature engineering with streams and lambdas
The goal is to take raw data from the DB and create arrays of numerical or
categorical features.
1. Get Fuman user post data from DB -> UserPost
2. Learn the vocabulary of all user posts word types
3. Create the dataset:
a. For each post,
i. Add the statistics features
ii. Add the word types features
4. Transform to csv output (for DataRobot)
Instances are Weka SparseInstance (sparse vectors for memory efficiency), but in
retrospect, a specialized vector library would have been better, I think. Weka is a
terrible production library

Weitere ähnliche Inhalte

Was ist angesagt?

Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsDataWorks Summit
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architectureStepan Pushkarev
 
Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
 Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ... Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...Databricks
 
Productive Use of the Apache Spark Prompt with Sam Penrose
Productive Use of the Apache Spark Prompt with Sam PenroseProductive Use of the Apache Spark Prompt with Sam Penrose
Productive Use of the Apache Spark Prompt with Sam PenroseDatabricks
 
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...Databricks
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLDESMOND YUEN
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersJen Aman
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning PrimerMathieu Dumoulin
 
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleData Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleDatabricks
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Mathieu Dumoulin
 
Fast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL ReleasesFast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL ReleasesDataWorks Summit
 
Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark...
 Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark... Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark...
Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark...Databricks
 
SparkApplicationDevMadeEasy_Spark_Summit_2015
SparkApplicationDevMadeEasy_Spark_Summit_2015SparkApplicationDevMadeEasy_Spark_Summit_2015
SparkApplicationDevMadeEasy_Spark_Summit_2015Lance Co Ting Keh
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Databricks
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic RepartitioningHandling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic RepartitioningSpark Summit
 
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...Databricks
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSDatabricks
 
Thing you didn't know you could do in Spark
Thing you didn't know you could do in SparkThing you didn't know you could do in Spark
Thing you didn't know you could do in SparkSnappyData
 

Was ist angesagt? (20)

Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source Toolkits
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architecture
 
Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
 Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ... Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
 
Productive Use of the Apache Spark Prompt with Sam Penrose
Productive Use of the Apache Spark Prompt with Sam PenroseProductive Use of the Apache Spark Prompt with Sam Penrose
Productive Use of the Apache Spark Prompt with Sam Penrose
 
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of Parameters
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning Primer
 
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleData Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
 
Fast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL ReleasesFast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL Releases
 
Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark...
 Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark... Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark...
Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark...
 
SparkApplicationDevMadeEasy_Spark_Summit_2015
SparkApplicationDevMadeEasy_Spark_Summit_2015SparkApplicationDevMadeEasy_Spark_Summit_2015
SparkApplicationDevMadeEasy_Spark_Summit_2015
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic RepartitioningHandling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
 
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
 
Thing you didn't know you could do in Spark
Thing you didn't know you could do in SparkThing you didn't know you could do in Spark
Thing you didn't know you could do in Spark
 

Andere mochten auch

よくある業務開発の自動化事情 #jjug_ccc #ccc_cd3
よくある業務開発の自動化事情 #jjug_ccc #ccc_cd3よくある業務開発の自動化事情 #jjug_ccc #ccc_cd3
よくある業務開発の自動化事情 #jjug_ccc #ccc_cd3irof N
 
【こっそり始める】Javaプログラマコーディングマイグレーション
【こっそり始める】Javaプログラマコーディングマイグレーション【こっそり始める】Javaプログラマコーディングマイグレーション
【こっそり始める】Javaプログラマコーディングマイグレーションyy yank
 
プログラム初心者がWebサービスをリリースして運営するまで
プログラム初心者がWebサービスをリリースして運営するまでプログラム初心者がWebサービスをリリースして運営するまで
プログラム初心者がWebサービスをリリースして運営するまでTomoaki Iwasaki
 
日本 Java ユーザーグループ JJUG CCC 2015 Fall by ソラコム 片山
日本 Java ユーザーグループ JJUG CCC 2015 Fall  by ソラコム 片山 日本 Java ユーザーグループ JJUG CCC 2015 Fall  by ソラコム 片山
日本 Java ユーザーグループ JJUG CCC 2015 Fall by ソラコム 片山 SORACOM,INC
 
Javaにおけるネイティブコード連携の各種手法の紹介
Javaにおけるネイティブコード連携の各種手法の紹介Javaにおけるネイティブコード連携の各種手法の紹介
Javaにおけるネイティブコード連携の各種手法の紹介khisano
 
Java8 Stream APIとApache SparkとAsakusa Frameworkの類似点・相違点
Java8 Stream APIとApache SparkとAsakusa Frameworkの類似点・相違点Java8 Stream APIとApache SparkとAsakusa Frameworkの類似点・相違点
Java8 Stream APIとApache SparkとAsakusa Frameworkの類似点・相違点hishidama
 
Java8移行から始めた技術的負債との戦い(jjug ccc 2015 fall)
Java8移行から始めた技術的負債との戦い(jjug ccc 2015 fall)Java8移行から始めた技術的負債との戦い(jjug ccc 2015 fall)
Java8移行から始めた技術的負債との戦い(jjug ccc 2015 fall)sogdice
 
デバッガのしくみ(JDI)を学んでみよう
デバッガのしくみ(JDI)を学んでみようデバッガのしくみ(JDI)を学んでみよう
デバッガのしくみ(JDI)を学んでみようfukai_yas
 
Reactive Webアプリケーション - そしてSpring 5へ #jjug_ccc #ccc_ef3
Reactive Webアプリケーション - そしてSpring 5へ #jjug_ccc #ccc_ef3Reactive Webアプリケーション - そしてSpring 5へ #jjug_ccc #ccc_ef3
Reactive Webアプリケーション - そしてSpring 5へ #jjug_ccc #ccc_ef3Toshiaki Maki
 
Java EEハンズオン資料 JJUG CCC 2015 Fall
Java EEハンズオン資料 JJUG CCC 2015 FallJava EEハンズオン資料 JJUG CCC 2015 Fall
Java EEハンズオン資料 JJUG CCC 2015 FallMasatoshi Tada
 
マイクロサービスアーキテクチャ - アーキテクチャ設計の歴史を背景に
マイクロサービスアーキテクチャ - アーキテクチャ設計の歴史を背景にマイクロサービスアーキテクチャ - アーキテクチャ設計の歴史を背景に
マイクロサービスアーキテクチャ - アーキテクチャ設計の歴史を背景にYusuke Suzuki
 
タイムマシン採用:明日のエンタープライズJavaの世界を予想する -Java EE7/クラウド/Docker/etc.-
タイムマシン採用:明日のエンタープライズJavaの世界を予想する -Java EE7/クラウド/Docker/etc.-タイムマシン採用:明日のエンタープライズJavaの世界を予想する -Java EE7/クラウド/Docker/etc.-
タイムマシン採用:明日のエンタープライズJavaの世界を予想する -Java EE7/クラウド/Docker/etc.-Takakiyo Tanaka
 
Getting start Java EE Action-Based MVC with Thymeleaf
Getting start Java EE Action-Based MVC with ThymeleafGetting start Java EE Action-Based MVC with Thymeleaf
Getting start Java EE Action-Based MVC with ThymeleafMasatoshi Tada
 
VMの歩む道。 Dalvik、ART、そしてJava VM
VMの歩む道。 Dalvik、ART、そしてJava VMVMの歩む道。 Dalvik、ART、そしてJava VM
VMの歩む道。 Dalvik、ART、そしてJava VMyy yank
 
Java8移行は怖くない~エンタープライズ案件でのJava8移行事例~
Java8移行は怖くない~エンタープライズ案件でのJava8移行事例~Java8移行は怖くない~エンタープライズ案件でのJava8移行事例~
Java8移行は怖くない~エンタープライズ案件でのJava8移行事例~Hiroyuki Ohnaka
 
Kotlin is charming; The reasons Java engineers should start Kotlin.
Kotlin is charming; The reasons Java engineers should start Kotlin.Kotlin is charming; The reasons Java engineers should start Kotlin.
Kotlin is charming; The reasons Java engineers should start Kotlin.JustSystems Corporation
 
U-NEXT学生インターン、過激なJavaの学び方と過激な要求
U-NEXT学生インターン、過激なJavaの学び方と過激な要求U-NEXT学生インターン、過激なJavaの学び方と過激な要求
U-NEXT学生インターン、過激なJavaの学び方と過激な要求hajime funaki
 
Java libraries you can't afford to miss
Java libraries you can't afford to missJava libraries you can't afford to miss
Java libraries you can't afford to missAndres Almiray
 
2017spring jjug ccc_f2
2017spring jjug ccc_f22017spring jjug ccc_f2
2017spring jjug ccc_f2Kazuhiro Wada
 

Andere mochten auch (20)

よくある業務開発の自動化事情 #jjug_ccc #ccc_cd3
よくある業務開発の自動化事情 #jjug_ccc #ccc_cd3よくある業務開発の自動化事情 #jjug_ccc #ccc_cd3
よくある業務開発の自動化事情 #jjug_ccc #ccc_cd3
 
【こっそり始める】Javaプログラマコーディングマイグレーション
【こっそり始める】Javaプログラマコーディングマイグレーション【こっそり始める】Javaプログラマコーディングマイグレーション
【こっそり始める】Javaプログラマコーディングマイグレーション
 
プログラム初心者がWebサービスをリリースして運営するまで
プログラム初心者がWebサービスをリリースして運営するまでプログラム初心者がWebサービスをリリースして運営するまで
プログラム初心者がWebサービスをリリースして運営するまで
 
日本 Java ユーザーグループ JJUG CCC 2015 Fall by ソラコム 片山
日本 Java ユーザーグループ JJUG CCC 2015 Fall  by ソラコム 片山 日本 Java ユーザーグループ JJUG CCC 2015 Fall  by ソラコム 片山
日本 Java ユーザーグループ JJUG CCC 2015 Fall by ソラコム 片山
 
Javaにおけるネイティブコード連携の各種手法の紹介
Javaにおけるネイティブコード連携の各種手法の紹介Javaにおけるネイティブコード連携の各種手法の紹介
Javaにおけるネイティブコード連携の各種手法の紹介
 
Java8 Stream APIとApache SparkとAsakusa Frameworkの類似点・相違点
Java8 Stream APIとApache SparkとAsakusa Frameworkの類似点・相違点Java8 Stream APIとApache SparkとAsakusa Frameworkの類似点・相違点
Java8 Stream APIとApache SparkとAsakusa Frameworkの類似点・相違点
 
Java8移行から始めた技術的負債との戦い(jjug ccc 2015 fall)
Java8移行から始めた技術的負債との戦い(jjug ccc 2015 fall)Java8移行から始めた技術的負債との戦い(jjug ccc 2015 fall)
Java8移行から始めた技術的負債との戦い(jjug ccc 2015 fall)
 
デバッガのしくみ(JDI)を学んでみよう
デバッガのしくみ(JDI)を学んでみようデバッガのしくみ(JDI)を学んでみよう
デバッガのしくみ(JDI)を学んでみよう
 
Reactive Webアプリケーション - そしてSpring 5へ #jjug_ccc #ccc_ef3
Reactive Webアプリケーション - そしてSpring 5へ #jjug_ccc #ccc_ef3Reactive Webアプリケーション - そしてSpring 5へ #jjug_ccc #ccc_ef3
Reactive Webアプリケーション - そしてSpring 5へ #jjug_ccc #ccc_ef3
 
Java EEハンズオン資料 JJUG CCC 2015 Fall
Java EEハンズオン資料 JJUG CCC 2015 FallJava EEハンズオン資料 JJUG CCC 2015 Fall
Java EEハンズオン資料 JJUG CCC 2015 Fall
 
マイクロサービスアーキテクチャ - アーキテクチャ設計の歴史を背景に
マイクロサービスアーキテクチャ - アーキテクチャ設計の歴史を背景にマイクロサービスアーキテクチャ - アーキテクチャ設計の歴史を背景に
マイクロサービスアーキテクチャ - アーキテクチャ設計の歴史を背景に
 
タイムマシン採用:明日のエンタープライズJavaの世界を予想する -Java EE7/クラウド/Docker/etc.-
タイムマシン採用:明日のエンタープライズJavaの世界を予想する -Java EE7/クラウド/Docker/etc.-タイムマシン採用:明日のエンタープライズJavaの世界を予想する -Java EE7/クラウド/Docker/etc.-
タイムマシン採用:明日のエンタープライズJavaの世界を予想する -Java EE7/クラウド/Docker/etc.-
 
Getting start Java EE Action-Based MVC with Thymeleaf
Getting start Java EE Action-Based MVC with ThymeleafGetting start Java EE Action-Based MVC with Thymeleaf
Getting start Java EE Action-Based MVC with Thymeleaf
 
VMの歩む道。 Dalvik、ART、そしてJava VM
VMの歩む道。 Dalvik、ART、そしてJava VMVMの歩む道。 Dalvik、ART、そしてJava VM
VMの歩む道。 Dalvik、ART、そしてJava VM
 
Java8移行は怖くない~エンタープライズ案件でのJava8移行事例~
Java8移行は怖くない~エンタープライズ案件でのJava8移行事例~Java8移行は怖くない~エンタープライズ案件でのJava8移行事例~
Java8移行は怖くない~エンタープライズ案件でのJava8移行事例~
 
Kotlin is charming; The reasons Java engineers should start Kotlin.
Kotlin is charming; The reasons Java engineers should start Kotlin.Kotlin is charming; The reasons Java engineers should start Kotlin.
Kotlin is charming; The reasons Java engineers should start Kotlin.
 
U-NEXT学生インターン、過激なJavaの学び方と過激な要求
U-NEXT学生インターン、過激なJavaの学び方と過激な要求U-NEXT学生インターン、過激なJavaの学び方と過激な要求
U-NEXT学生インターン、過激なJavaの学び方と過激な要求
 
Java libraries you can't afford to miss
Java libraries you can't afford to missJava libraries you can't afford to miss
Java libraries you can't afford to miss
 
Jjug ccc
Jjug cccJjug ccc
Jjug ccc
 
2017spring jjug ccc_f2
2017spring jjug ccc_f22017spring jjug ccc_f2
2017spring jjug ccc_f2
 

Ähnlich wie Real World Machine Learning in Java 8 at Fumankaitori.com

Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabszekeLabs Technologies
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureFei Chen
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or realityAwantik Das
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makerszekeLabs Technologies
 
From science to engineering, the process to build a machine learning product
From science to engineering, the process to build a machine learning productFrom science to engineering, the process to build a machine learning product
From science to engineering, the process to build a machine learning productBruce Kuo
 
Bridging the gap in enterprise AI
Bridging the gap in enterprise AIBridging the gap in enterprise AI
Bridging the gap in enterprise AIMax Pumperla
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfJim Dowling
 
BSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 SessionsBSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 SessionsBigML, Inc
 
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher ScientificEnabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher ScientificDatabricks
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfvitm11
 
Vitalii Bondarenko and Eugene Berko "Cloud AI Platform as an accelerator of e...
Vitalii Bondarenko and Eugene Berko "Cloud AI Platform as an accelerator of e...Vitalii Bondarenko and Eugene Berko "Cloud AI Platform as an accelerator of e...
Vitalii Bondarenko and Eugene Berko "Cloud AI Platform as an accelerator of e...Lviv Startup Club
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOpsCarl W. Handlin
 
Reproducibility and experiments management in Machine Learning
Reproducibility and experiments management in Machine Learning Reproducibility and experiments management in Machine Learning
Reproducibility and experiments management in Machine Learning Mikhail Rozhkov
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018Adam Gibson
 
Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bigheadChester Chen
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?Ivo Andreev
 

Ähnlich wie Real World Machine Learning in Java 8 at Fumankaitori.com (20)

DevOps Days Rockies MLOps
DevOps Days Rockies MLOpsDevOps Days Rockies MLOps
DevOps Days Rockies MLOps
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
From science to engineering, the process to build a machine learning product
From science to engineering, the process to build a machine learning productFrom science to engineering, the process to build a machine learning product
From science to engineering, the process to build a machine learning product
 
Python and data analytics
Python and data analyticsPython and data analytics
Python and data analytics
 
Bridging the gap in enterprise AI
Bridging the gap in enterprise AIBridging the gap in enterprise AI
Bridging the gap in enterprise AI
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
 
BSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 SessionsBSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 Sessions
 
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher ScientificEnabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
 
Vitalii Bondarenko and Eugene Berko "Cloud AI Platform as an accelerator of e...
Vitalii Bondarenko and Eugene Berko "Cloud AI Platform as an accelerator of e...Vitalii Bondarenko and Eugene Berko "Cloud AI Platform as an accelerator of e...
Vitalii Bondarenko and Eugene Berko "Cloud AI Platform as an accelerator of e...
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
 
Reproducibility and experiments management in Machine Learning
Reproducibility and experiments management in Machine Learning Reproducibility and experiments management in Machine Learning
Reproducibility and experiments management in Machine Learning
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
 
Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bighead
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
 
Machine learning
Machine learningMachine learning
Machine learning
 
Overcome a Frontier
Overcome a FrontierOvercome a Frontier
Overcome a Frontier
 

Mehr von Mathieu Dumoulin

State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataMathieu Dumoulin
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016Mathieu Dumoulin
 
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Mathieu Dumoulin
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Mathieu Dumoulin
 
Introduction aux algorithmes map reduce
Introduction aux algorithmes map reduceIntroduction aux algorithmes map reduce
Introduction aux algorithmes map reduceMathieu Dumoulin
 
MapReduce: Traitement de données distribué à grande échelle simplifié
MapReduce: Traitement de données distribué à grande échelle simplifiéMapReduce: Traitement de données distribué à grande échelle simplifié
MapReduce: Traitement de données distribué à grande échelle simplifiéMathieu Dumoulin
 
Presentation Hadoop Québec
Presentation Hadoop QuébecPresentation Hadoop Québec
Presentation Hadoop QuébecMathieu Dumoulin
 

Mehr von Mathieu Dumoulin (8)

State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
 
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
Introduction aux algorithmes map reduce
Introduction aux algorithmes map reduceIntroduction aux algorithmes map reduce
Introduction aux algorithmes map reduce
 
MapReduce: Traitement de données distribué à grande échelle simplifié
MapReduce: Traitement de données distribué à grande échelle simplifiéMapReduce: Traitement de données distribué à grande échelle simplifié
MapReduce: Traitement de données distribué à grande échelle simplifié
 
Presentation Hadoop Québec
Presentation Hadoop QuébecPresentation Hadoop Québec
Presentation Hadoop Québec
 
Introduction à Hadoop
Introduction à HadoopIntroduction à Hadoop
Introduction à Hadoop
 

Kürzlich hochgeladen

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 

Kürzlich hochgeladen (20)

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 

Real World Machine Learning in Java 8 at Fumankaitori.com

  • 1. Real World Machine Learning in Java 8 at Fumankaitori.com Mathieu Dumoulin, Chief Data Scientist fumankaitori.com, Data Science Team manager at en-japan
  • 2. Today’s menu ● About me and 不満買取センータ ● The business problem: Post pricing ● Project Overview ○ Why use ML ○ How to use ML in projects ○ How we used ML in this project ● Results ● Live code (depends on time) ● Conclusion
  • 3. Presentation goals ● Machine learning is possible by any Java Engineer ● Java is a great programming language for real- world machine learning systems ● New ML APIs make it easy to focus on the problem and the data, and get a well-performing model “for free” ● You don’t need a ph.D. to use machine learning, just some self-study, good tools and libraries and build experience one project at a time
  • 5. Google map for Quebec City here!
  • 6. My Work: Java SE, Hadoop Engineer, Data Scientist
  • 7. ● Launched in Mar 2015. Provide web/Android/iOS applications. ● An application to collect data about people's dissatisfactions. ● Features: ○ Users can post any dissatisfaction of any products/services. ○ Users get points as a reward for their posts. And the point is exchangeable with coupon code of EC sites. ● 250,000 users with 1,500,000 posts (accumulated) (end of Nov 2015)
  • 8. Problem statement: post point value prediction ● Fuman user posts have a money value ● We want to give more points for “good” posts ● At first, operations staff checked all posts, but they can’t check 10,000 posts each day... We made rules, but point value was worse: ● Rules can’t check the content of the posts ● Rules always miss something ● Making hundreds or thousands of rules by hand is ridiculous
  • 9. ML is the best solution for 不満買取センター ● ML Problem: Estimate the point value of a user posts (0-25) ● Project goal: Estimate the value of posts with less than 5 points difference from human judgement ● Data: All user posts and user profile data ● Data with known output (labels): staff already set points for 200k posts manually This is a classic case of supervised learning (Wiki). Another reference from Microsoft Prediction of a price requires to build a Regression model because the prediction is a number, as opposed to a classification problem which predicts which of two classes each post would belong to.
  • 10. Real world ML project overview ● Machine Learning Workflow ● Data Scientist and Java Engineer roles ● Java for production ML ● Java 8 benefits ● Our point prediction system details ● Results
  • 11. Machine Learning Workflow Load data Extract Features Train Model Evaluate vs. business goal Load new data Extract Features Predict using model Act on prediction data, labels (known result) feature vectors, labels prediction, labels data feature vectors predictions iterate best model the same
  • 12. Workflow for machine learning system 1. Set a goal with business value 2. Get data (fuman user posts) with a price already set 3. Transform data for input into machine learning algorithm 4. Train and evaluate machine learning model until reach goal 5. Deploy best model
  • 13. Data Scientist’s role 1. Set a goal with business value 2. Get data (fuman user posts) with a price already set 3. Transform data for input into machine learning algorithm 4. Train and evaluate machine learning model until reach goal 5. Deploy best model Choose features Build many models
  • 14. Software Engineer’s role Implement and integrate into production system 1. Set a goal with business value 2. Get data (fuman user posts) with a price already set 3. Transform data for input into machine learning algorithm 4. Train and evaluate machine learning model until reach goal 5. Deploy best model Get data from data source Implement production code
  • 15. But we don’t have a data scientist...
  • 17. Java for production ML ● Easy integration with Java applications ● Fast (vs. Python or R) ● Easy to program (vs. C++) ● Most common enterprise programming language, IDE support and excellent support libraries ● Lots of state of the art machine learning libraries have a Java API
  • 19. Benefits of Java 8 ● Java 8’s functional style is a very good match with ML operations a. Feature extraction: data in → transform → data out ● Java 8’s streams and Lambdas a. Code is easier to understand and less verbose ● Easy parallel code a. Faster “for free”
  • 20. Post point prediction system: step by step Feature Extraction Fuman DB Prediction Service ● Train/Test split ● Categorical features transformation ● Select best features ● Try many algorithms ● Tune algorithms ● Evaluate models ● REST Prediction API Iterate until results meet business goals CSV format DR Prediction API posts, label
  • 21. Feature Extraction details ● We added character and words statistics about each fuman user post ○ Number of hiragana, katakana, kanji, alphabet characters and words ○ Number of words, length of words ○ Ratio of hiragana, katakana, kanji, alphabet words to the number of tokens in a post ● User profile information ○ age, gender, job category, etc. ● Bag-of-word models: ○ Words using Tf-Idf, removing stopwords (これ、あれ、それ、です、など、 …) ○ Part-of-speech (名詞、動詞、形容詞、 …) ○ Word types features (hiragana word, katakana word, kanji word, …)
  • 23. Feature Example: MeCab analyzer マックのポテト揚げたてでお願いしたのに、揚げたてじゃなかった。 マック 名詞,固有名詞,一般,*,*,*,マック,マック,マック の 助詞,連体化,*,*,*,*,の,ノ,ノ ポテト 名詞,一般,*,*,*,*,ポテト,ポテト,ポテト 揚げたて 名詞,一般,*,*,*,*,揚げたて,アゲタテ,アゲタテ で 助詞,格助詞,一般,*,*,*,で,デ,デ お願い 名詞,サ変接続,*,*,*,*,お願い,オネガイ,オネガイ し 動詞,自立,*,*,サ変・スル,連用形,する,シ,シ た 助動詞,*,*,*,特殊・タ,基本形,た,タ,タ のに 助詞,接続助詞,*,*,*,*,のに,ノニ,ノニ 、 記号,読点,*,*,*,*,、,、,、 揚げたて 名詞,一般,*,*,*,*,揚げたて,アゲタテ,アゲタテ じゃ 助詞,副助詞,*,*,*,*,じゃ,ジャ,ジャ なかっ 助動詞,*,*,*,特殊・ナイ,連用タ接続,ない,ナカッ,ナカッ た 助動詞,*,*,*,特殊・タ,基本形,た,タ,タ 。 記号,句点,*,*,*,*,。,。,。 EOS
  • 24. Feature Extraction: Example Character counts Hiragana: 20 Katakana: 6 Kanji: 3 Alpha: 0 Digits: 0 Marks (!,?): 0 Token type counts Hiragana: 8 Katakana: 2 Kanji: 3 Alpha: 0 Digits: 0 Marks: 0 Token length 1: 5 2: 2 3: 4 4: 2 5+: 0
  • 25. Training and evaluation of our model
  • 26. We reached the project goal! ● DataRobot’s best model ○ eXtreme Gradient Boosted Trees ○ RMSE: 3.54 ○ MSE: 12.53 Business result: ● Higher quality evaluation than rules ● Operation staff don’t need to manually check posts ● We can validate points every day Our result: 3.5 point difference from human judgement
  • 27. Deployment issues ● Problem: The Prediction API was very slow (>1s / post) so we had to run it as a batch process each night. ● We want: Make predictions locally with low latency, without losing the good prediction performance we already have. We solved this problem using the excellent open source, distributed machine learning library H2 O by H2o.ai. Co-founder: Cliff Click, who made the Java HotSpot Server Compiler
  • 28. Post point prediction system: Current system Feature Extraction Fuman DB Prediction Service Prediction POJO ● Train/Test split ● Categorical features transformation ● Distributed, fast and state of the art algorithms ● POJO prediction class generation CSV formatposts, label Fuman Webapp get new post values make feature vectors
  • 30. Overview: Making Predictions ● Use the prediction POJO generated by H2O ● For each new post query Prediction Service ○ Convert to vector (Double[] for H2O) ○ Get prediction from prediction POJO (Double value, round to integer) ○ Update database with predicted price
  • 31. We reached the business goal! Project goal: Get similar performance from H2O as from DataRobot H2O is not ideal to explore different models and features, but for production, it is FAST with similar predictive performance. It is implemented in pure Java (Github). ● H2O: Train a new model for production ○ GBM (Gradient Boosting Machine) ○ MSE: 12.8 ● DataRobot’s best model ○ eXtreme Gradient Boosted Trees ○ RMSE: 3.54 ○ MSE: 12.53
  • 32. Real world ML loves Java! ● Java is a top choice for making production machine learning systems ● Benefits of Java 8 makes Java fun and relevant again ● Integration in a Java web application was not hard ● Java is not a good choice for experimentation ○ Start with a Python prototype with Scikit-learn ○ Use a Machine Learning service like DataRobot.com
  • 33. You can use ML in your projects! ● Web API services are like a personal data scientist ○ No need for Data Scientist for simple use of ML ○ But harder dataset will need expertise ● Real world ML projects needs Engineers: ○ Get data to train a good model (log files, sales results, mail campaign results,…) ○ Transform data into input for ML library or web service ○ Deploy and integrate into production ● Most steps are just normal programming ○ Get data from DB ○ Transform data into a CSV ○ Call a REST API or Java POJO to make predictions ○ Integrate with the system that needs predictions
  • 36. Feature engineering with streams and lambdas The goal is to take raw data from the DB and create arrays of numerical or categorical features. 1. Get Fuman user post data from DB -> UserPost 2. Learn the vocabulary of all user posts word types 3. Create the dataset: a. For each post, i. Add the statistics features ii. Add the word types features 4. Transform to csv output (for DataRobot) Instances are Weka SparseInstance (sparse vectors for memory efficiency), but in retrospect, a specialized vector library would have been better, I think. Weka is a terrible production library