SlideShare ist ein Scribd-Unternehmen logo
1 von 40
What is Jubatus?
How it works for you?
NTT SIC Hiroki Kumazaki
Jubatus is…
• A Distributed Online Machine-Learning framework
• Distributed
– Fault-Tolerance
– Scale out
• Online
– Fixed time computation
• Machine-Learning
– More than “word count”!
Architecture
• ML model is combined with feature-extractor
Machine
Learning
Model
Feature
Extractor
Jubatus Server
Jubatus RPC
Architecture
• Distributed Computation
– Shared-Everything Architecture
• It’s fast and fault-tolerant!
Mix
Architecture
• It looks as if one server running.
Client
Jubatus RPC
Proxy
Architecture
• It looks as if one server running
– You can use single local Jubatus server for develop
– Multiple Jubatus server cluster for production
Client
Jubatus RPC
The same RPC!
Architecture
• With heavy load…
Client
Jubatus RPC
Proxy
Architecture
• Dynamically scale-out!
Client
Jubatus RPC
Proxy
Architecture
• Whenever servers break down
– Proxy conceals failures, so the service will continue.
Client
Jubatus RPC
Proxy
Architecture
• Multilanguage client library
– gem, pip, cpan, maven Ready!
– It essentially uses a messagepack-rpc.
• So you can use OCaml, Haskell, JavaScript, Go with your own
risk.
Client
Jubatus RPC
Architecture
• Many ML algorithms
– Classifier
– Recommender
– Anomaly Detection
– Clustering
– Regression
– Graph Mining
Useful!
Classifier
• Task: Classification of Datum
import sys
def fib(a):
if a == 1 or a == 0:
return 1
else:
return fib(a-1) + fib(a-2)
if __name__ == “__main__”:
print(fib(int(sys.argv[1])))
def fib(a)
if a == 1 or a == 0
1
else
return fib(a-1) + fib(a-2)
end
end
if __FILE__ == $0
puts fib(ARGV[0].to_i)
end
Sample Task: Classify what programming language used
It’s It’s
Classifier
• Set configuration in the Jubatus server
ClassifierFreature
Extractor
"converter": {
"string_types": {
"bigram": {
"method": "ngram",
"char_num": "2"
}
},
"string_rules": [
{
"key": "*",
"type": "bigram",
"sample_weight": "tf",
"global_weight": "idf“
}
]
}
Feature Extractor
Classifier
• Configuration JSON
– It does “feature vector design”
– very important step for machine learning
"converter": {
"string_types": {
"bigram": {
"method": "ngram",
"char_num": "2"
}
},
"string_rules": [
{
"key": "*",
"type": "bigram",
"sample_weight": "tf",
"global_weight": "idf“
}
]
}
setteings for extract feature from string
define function named “bigram”
original embedded function “ngram”
pass “2” to “ngram” to create “bigram”
for all data
apply “bigram”
feature weights based on tf/idf
see wikipedia/tf-idf
Classifier
• Feature Extractor becomes “bigram extractor”
Classifierbigram
extractor
Feature Extractor
• What bigram extractor does?
bigram
extractor
import sys
def fib(a):
if a == 1 or a == 0:
return 1
else:
return fib(a-1) + fib(a-2)
if __name__ == “__main__”:
print(fib(int(sys.argv[1])))
key value
im 1
mp 1
po 1
... ...
): 1
... ...
de 1
ef 1
... ...
Feature Vector
Classifier
• Training model with feature vectors
key value
im 1
mp 1
po 1
... ...
): 1
... ...
de 1
ef 1
... ...
Classifier
key value
pu 1
ut 1
... ...
{| ...
|m 1
m| 1
{| 1
en 1
nd 1
key value
@a 1
$_ 1
... ...
my ...
su 1
ub 1
us 1
se 1
... ...
Classifier
• Set configuration in the Jubatus server
Classifier
"method" : "AROW",
"parameter" : {
"regularization_weight" : 1.0
}
Feature Extractor
bigram
extractor Classifier Algorithms
• Perceptron
• Passive Aggressive
• Confidence Weight
• Adaptive Regularization of Weights
• Normal Herd
Classifier
• Use model to classification task
– Jubatus will find clue for classification
AROW
key value
si 1
il 1
... ...
{| 1
... ...
It’s
Classifier
• Use model to classification task
– Jubatus will find clue for classification
AROW
key value
re 1
): 1
... ...
s[ 1
... ...
It’s
Via RPC
• call feature extraction and classification from
client via RPC
AROWbigram
extractor
lang = client.classify([sourcecode])
import sys
def fib(a):
if a == 1 or a == 0:
return 1
else:
return fib(a-1) + fib(a-2)
if __name__ == “__main__”:
print(fib(int(sys.argv[1])))
key value
im 1
mp 1
po 1
... ...
): 1
... ...
de 1
ef 1
... ...
It may be
What classifier can do?
• You can
– estimate the topic of tweets
– trash spam mail automatically
– monitor server failure from syslog
– estimate sentiment of user from blog post
– detect malicious attack
– find what feature is the best clue to classification
What classifier cannot do
• You cannot
– train model from data without supervised answer
– create a class without knowledge of the class
– get fine model without correct feature designing
How to use?
• see examples in
http://github.com/jubatus/jubatus-example
– gender
– shogun
– malware classification
– language detection
Recommender
• Task: what datum is similar to the datum?
Name
Star
Wars
Harry
Potter
Star Trek Titanic Frozen
John 4 3 2 2
Bob 5 3
Erika 1 3 4 5
Jack 2 5
Ann 4 5
Emily 1 4 2 5 4
Which movie should we recommend Ann?
Recommender
• Do recommendation based on Nearest Neighbor
Movie Rating(high-dimensional)
Science Fiction
Star Trek lover
John
Jack
Love Romance
Fantasy
Erika
Ann
StarWars lover
Bob
Emily
Near
Far
Recommender
• Ann and Emily is near
– we should recommend Flozen for Ann
Name
Star
Wars
Harry
Potter
Star Trek Titanic Frozen
Ann 4 5 ★
Emily 1 4 2 5 4
I bet Ann would like it!
Recommender with Feature Extractor
• Recommender server consist of Feature Extractor
and Recommender engine.
– Jubatus calculates distance between feature vectors
RecommenderFeature
Extractor
Recommender Engine can use
• Minhash
• Locality Sensitive Hashing
• Euclid Locality Sensitive Hashing
for defining distance.
Recommender with Feature Extractor
• Jubatus maps data in feature space
– There are distances between data
• How are they near or far?
key value
pu 1
ut 1
... ...
{| ...
|m 1
m| 1
{| 1
Feature
Extractor
key value
im 1
mp 1
... ...
... ...
“{ 1
fo 1
... ...
key value
Ma 1
ap 1
... ...
in 1
nt 1
te 1
er 1
Recommender
Ruby
Python
Java
What Recommender can do?
• You can
– create recommendation engine in e-commerce
– calculate similarity of tweets
– find similar directional NBA player
– visualize distance between “Star Wars” and “Star Trek”
What Recommender cannot do?
• You cannot
– Label data(use classifier!)
– get decision tree
– get a-priori based recommendation
Anomaly Detection
• Task: Which datum is far from the others?
Anomaly Detection
• Task: Which datum is far from the others?
This One!
Anomaly Detection
• Distance based detection is not good
– We cannot decide appropriate threshold of distance
Distance is equal!
Anomaly Detection with Feature Extractor
• Anomaly detection server consist of Feature
Extractor and anomaly detection engine.
– Jubatus finds outlier from feature vectors
Anomaly
Detection
Feature
Extractor
Anomaly Detection Engine can use
• Minhash
• Locality Sensitive Hashing
• Euclid Locality Sensitive Hashing
for defining distance.
Anomaly Detection
• jubaanomaly can do it!
– It base on local outlier factor algorithm
key value
pu 1
ut 1
... ...
{| ...
|m 1
m| 1
{| 1
Feature
Extractor
key value
im 1
mp 1
... ...
... ...
“{ 1
fo 1
... ...
key value
Ma 1
ap 1
... ...
in 1
nt 1
te 1
er 1
Anomaly
Detection
Outlier!
What Anomaly Detection can do?
• You (might) can
– find outlier
– grasp the trend and overview of current data stream
– detect or predict server's failure
– protect Web services from zero-day attacks
What Anomaly Detection cannot do?
• You cannot
– know the cluster distribution of data
– find any kinds of outliers with 100% accuracy
– easily understand how each outlier occurs
– know why a datum is assigned high outlier score
Conclusion
• Jubatus have embedded feature extractor with
algorithms.
• User should configure both feature extractor and
algorithm properly
• Client use configured machine learning via
Jubatus-RPC
• Classifier and Recommender and Anomaly may
be useful for your task.
DEMO
• I try to run the jubatus-example.

Weitere ähnliche Inhalte

Andere mochten auch

よくわかるHopscotch hashing
よくわかるHopscotch hashingよくわかるHopscotch hashing
よくわかるHopscotch hashingKumazaki Hiroki
 
トランザクションの設計と進化
トランザクションの設計と進化トランザクションの設計と進化
トランザクションの設計と進化Kumazaki Hiroki
 
冬のLock free祭り safe
冬のLock free祭り safe冬のLock free祭り safe
冬のLock free祭り safeKumazaki Hiroki
 
トランザクション入門
トランザクション入門 トランザクション入門
トランザクション入門 Kumazaki Hiroki
 
地理分散DBについて
地理分散DBについて地理分散DBについて
地理分散DBについてKumazaki Hiroki
 
分散システムについて語らせてくれ
分散システムについて語らせてくれ分散システムについて語らせてくれ
分散システムについて語らせてくれKumazaki Hiroki
 
本当は恐ろしい分散システムの話
本当は恐ろしい分散システムの話本当は恐ろしい分散システムの話
本当は恐ろしい分散システムの話Kumazaki Hiroki
 

Andere mochten auch (13)

よくわかるHopscotch hashing
よくわかるHopscotch hashingよくわかるHopscotch hashing
よくわかるHopscotch hashing
 
Lockfree Priority Queue
Lockfree Priority QueueLockfree Priority Queue
Lockfree Priority Queue
 
Lockfree Queue
Lockfree QueueLockfree Queue
Lockfree Queue
 
Lockfree list
Lockfree listLockfree list
Lockfree list
 
Cache obliviousの話
Cache obliviousの話Cache obliviousの話
Cache obliviousの話
 
トランザクションの設計と進化
トランザクションの設計と進化トランザクションの設計と進化
トランザクションの設計と進化
 
SkipGraph
SkipGraphSkipGraph
SkipGraph
 
冬のLock free祭り safe
冬のLock free祭り safe冬のLock free祭り safe
冬のLock free祭り safe
 
トランザクション入門
トランザクション入門 トランザクション入門
トランザクション入門
 
地理分散DBについて
地理分散DBについて地理分散DBについて
地理分散DBについて
 
Bloom filter
Bloom filterBloom filter
Bloom filter
 
分散システムについて語らせてくれ
分散システムについて語らせてくれ分散システムについて語らせてくれ
分散システムについて語らせてくれ
 
本当は恐ろしい分散システムの話
本当は恐ろしい分散システムの話本当は恐ろしい分散システムの話
本当は恐ろしい分散システムの話
 

Ähnlich wie What is jubatus? How it works for you?

DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slapDEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slapFelipe Prado
 
Bsides Knoxville - APT2
Bsides Knoxville - APT2Bsides Knoxville - APT2
Bsides Knoxville - APT2Adam Compton
 
Metasploitation part-1 (murtuja)
Metasploitation part-1 (murtuja)Metasploitation part-1 (murtuja)
Metasploitation part-1 (murtuja)ClubHack
 
Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)DECK36
 
Performance and Abstractions
Performance and AbstractionsPerformance and Abstractions
Performance and AbstractionsMetosin Oy
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxIvo Andreev
 
Sparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsSparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsMaya Hristakeva
 
Vulnerability, exploit to metasploit
Vulnerability, exploit to metasploitVulnerability, exploit to metasploit
Vulnerability, exploit to metasploitTiago Henriques
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsSanghamitra Deb
 
Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSpark Summit
 
Sista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceSista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceESUG
 
The Dirty Little Secrets They Didn’t Teach You In Pentesting Class
The Dirty Little Secrets They Didn’t Teach You In Pentesting Class The Dirty Little Secrets They Didn’t Teach You In Pentesting Class
The Dirty Little Secrets They Didn’t Teach You In Pentesting Class Chris Gates
 
PAC 2019 virtual Christoph NEUMÜLLER
PAC 2019 virtual Christoph NEUMÜLLERPAC 2019 virtual Christoph NEUMÜLLER
PAC 2019 virtual Christoph NEUMÜLLERNeotys
 
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuireEmbracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuireDatabricks
 
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin Databricks
 
Puppet Development Workflow
Puppet Development WorkflowPuppet Development Workflow
Puppet Development WorkflowJeffery Smith
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneIvo Andreev
 

Ähnlich wie What is jubatus? How it works for you? (20)

DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slapDEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
 
Connected Components Labeling
Connected Components LabelingConnected Components Labeling
Connected Components Labeling
 
DerbyCon - APT2
DerbyCon - APT2DerbyCon - APT2
DerbyCon - APT2
 
Bsides Knoxville - APT2
Bsides Knoxville - APT2Bsides Knoxville - APT2
Bsides Knoxville - APT2
 
Metasploitation part-1 (murtuja)
Metasploitation part-1 (murtuja)Metasploitation part-1 (murtuja)
Metasploitation part-1 (murtuja)
 
Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)
 
Performance and Abstractions
Performance and AbstractionsPerformance and Abstractions
Performance and Abstractions
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
 
Sparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsSparking Science up with Research Recommendations
Sparking Science up with Research Recommendations
 
Vulnerability, exploit to metasploit
Vulnerability, exploit to metasploitVulnerability, exploit to metasploit
Vulnerability, exploit to metasploit
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya Hristakeva
 
Sista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceSista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performance
 
The Dirty Little Secrets They Didn’t Teach You In Pentesting Class
The Dirty Little Secrets They Didn’t Teach You In Pentesting Class The Dirty Little Secrets They Didn’t Teach You In Pentesting Class
The Dirty Little Secrets They Didn’t Teach You In Pentesting Class
 
PAC 2019 virtual Christoph NEUMÜLLER
PAC 2019 virtual Christoph NEUMÜLLERPAC 2019 virtual Christoph NEUMÜLLER
PAC 2019 virtual Christoph NEUMÜLLER
 
SecureWV - APT2
SecureWV - APT2SecureWV - APT2
SecureWV - APT2
 
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuireEmbracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
 
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
 
Puppet Development Workflow
Puppet Development WorkflowPuppet Development Workflow
Puppet Development Workflow
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for Everyone
 

Kürzlich hochgeladen

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 

Kürzlich hochgeladen (20)

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 

What is jubatus? How it works for you?

  • 1. What is Jubatus? How it works for you? NTT SIC Hiroki Kumazaki
  • 2. Jubatus is… • A Distributed Online Machine-Learning framework • Distributed – Fault-Tolerance – Scale out • Online – Fixed time computation • Machine-Learning – More than “word count”!
  • 3. Architecture • ML model is combined with feature-extractor Machine Learning Model Feature Extractor Jubatus Server Jubatus RPC
  • 4. Architecture • Distributed Computation – Shared-Everything Architecture • It’s fast and fault-tolerant! Mix
  • 5. Architecture • It looks as if one server running. Client Jubatus RPC Proxy
  • 6. Architecture • It looks as if one server running – You can use single local Jubatus server for develop – Multiple Jubatus server cluster for production Client Jubatus RPC The same RPC!
  • 7. Architecture • With heavy load… Client Jubatus RPC Proxy
  • 9. Architecture • Whenever servers break down – Proxy conceals failures, so the service will continue. Client Jubatus RPC Proxy
  • 10. Architecture • Multilanguage client library – gem, pip, cpan, maven Ready! – It essentially uses a messagepack-rpc. • So you can use OCaml, Haskell, JavaScript, Go with your own risk. Client Jubatus RPC
  • 11. Architecture • Many ML algorithms – Classifier – Recommender – Anomaly Detection – Clustering – Regression – Graph Mining Useful!
  • 12. Classifier • Task: Classification of Datum import sys def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2) if __name__ == “__main__”: print(fib(int(sys.argv[1]))) def fib(a) if a == 1 or a == 0 1 else return fib(a-1) + fib(a-2) end end if __FILE__ == $0 puts fib(ARGV[0].to_i) end Sample Task: Classify what programming language used It’s It’s
  • 13. Classifier • Set configuration in the Jubatus server ClassifierFreature Extractor "converter": { "string_types": { "bigram": { "method": "ngram", "char_num": "2" } }, "string_rules": [ { "key": "*", "type": "bigram", "sample_weight": "tf", "global_weight": "idf“ } ] } Feature Extractor
  • 14. Classifier • Configuration JSON – It does “feature vector design” – very important step for machine learning "converter": { "string_types": { "bigram": { "method": "ngram", "char_num": "2" } }, "string_rules": [ { "key": "*", "type": "bigram", "sample_weight": "tf", "global_weight": "idf“ } ] } setteings for extract feature from string define function named “bigram” original embedded function “ngram” pass “2” to “ngram” to create “bigram” for all data apply “bigram” feature weights based on tf/idf see wikipedia/tf-idf
  • 15. Classifier • Feature Extractor becomes “bigram extractor” Classifierbigram extractor
  • 16. Feature Extractor • What bigram extractor does? bigram extractor import sys def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2) if __name__ == “__main__”: print(fib(int(sys.argv[1]))) key value im 1 mp 1 po 1 ... ... ): 1 ... ... de 1 ef 1 ... ... Feature Vector
  • 17. Classifier • Training model with feature vectors key value im 1 mp 1 po 1 ... ... ): 1 ... ... de 1 ef 1 ... ... Classifier key value pu 1 ut 1 ... ... {| ... |m 1 m| 1 {| 1 en 1 nd 1 key value @a 1 $_ 1 ... ... my ... su 1 ub 1 us 1 se 1 ... ...
  • 18. Classifier • Set configuration in the Jubatus server Classifier "method" : "AROW", "parameter" : { "regularization_weight" : 1.0 } Feature Extractor bigram extractor Classifier Algorithms • Perceptron • Passive Aggressive • Confidence Weight • Adaptive Regularization of Weights • Normal Herd
  • 19. Classifier • Use model to classification task – Jubatus will find clue for classification AROW key value si 1 il 1 ... ... {| 1 ... ... It’s
  • 20. Classifier • Use model to classification task – Jubatus will find clue for classification AROW key value re 1 ): 1 ... ... s[ 1 ... ... It’s
  • 21. Via RPC • call feature extraction and classification from client via RPC AROWbigram extractor lang = client.classify([sourcecode]) import sys def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2) if __name__ == “__main__”: print(fib(int(sys.argv[1]))) key value im 1 mp 1 po 1 ... ... ): 1 ... ... de 1 ef 1 ... ... It may be
  • 22. What classifier can do? • You can – estimate the topic of tweets – trash spam mail automatically – monitor server failure from syslog – estimate sentiment of user from blog post – detect malicious attack – find what feature is the best clue to classification
  • 23. What classifier cannot do • You cannot – train model from data without supervised answer – create a class without knowledge of the class – get fine model without correct feature designing
  • 24. How to use? • see examples in http://github.com/jubatus/jubatus-example – gender – shogun – malware classification – language detection
  • 25. Recommender • Task: what datum is similar to the datum? Name Star Wars Harry Potter Star Trek Titanic Frozen John 4 3 2 2 Bob 5 3 Erika 1 3 4 5 Jack 2 5 Ann 4 5 Emily 1 4 2 5 4 Which movie should we recommend Ann?
  • 26. Recommender • Do recommendation based on Nearest Neighbor Movie Rating(high-dimensional) Science Fiction Star Trek lover John Jack Love Romance Fantasy Erika Ann StarWars lover Bob Emily Near Far
  • 27. Recommender • Ann and Emily is near – we should recommend Flozen for Ann Name Star Wars Harry Potter Star Trek Titanic Frozen Ann 4 5 ★ Emily 1 4 2 5 4 I bet Ann would like it!
  • 28. Recommender with Feature Extractor • Recommender server consist of Feature Extractor and Recommender engine. – Jubatus calculates distance between feature vectors RecommenderFeature Extractor Recommender Engine can use • Minhash • Locality Sensitive Hashing • Euclid Locality Sensitive Hashing for defining distance.
  • 29. Recommender with Feature Extractor • Jubatus maps data in feature space – There are distances between data • How are they near or far? key value pu 1 ut 1 ... ... {| ... |m 1 m| 1 {| 1 Feature Extractor key value im 1 mp 1 ... ... ... ... “{ 1 fo 1 ... ... key value Ma 1 ap 1 ... ... in 1 nt 1 te 1 er 1 Recommender Ruby Python Java
  • 30. What Recommender can do? • You can – create recommendation engine in e-commerce – calculate similarity of tweets – find similar directional NBA player – visualize distance between “Star Wars” and “Star Trek”
  • 31. What Recommender cannot do? • You cannot – Label data(use classifier!) – get decision tree – get a-priori based recommendation
  • 32. Anomaly Detection • Task: Which datum is far from the others?
  • 33. Anomaly Detection • Task: Which datum is far from the others? This One!
  • 34. Anomaly Detection • Distance based detection is not good – We cannot decide appropriate threshold of distance Distance is equal!
  • 35. Anomaly Detection with Feature Extractor • Anomaly detection server consist of Feature Extractor and anomaly detection engine. – Jubatus finds outlier from feature vectors Anomaly Detection Feature Extractor Anomaly Detection Engine can use • Minhash • Locality Sensitive Hashing • Euclid Locality Sensitive Hashing for defining distance.
  • 36. Anomaly Detection • jubaanomaly can do it! – It base on local outlier factor algorithm key value pu 1 ut 1 ... ... {| ... |m 1 m| 1 {| 1 Feature Extractor key value im 1 mp 1 ... ... ... ... “{ 1 fo 1 ... ... key value Ma 1 ap 1 ... ... in 1 nt 1 te 1 er 1 Anomaly Detection Outlier!
  • 37. What Anomaly Detection can do? • You (might) can – find outlier – grasp the trend and overview of current data stream – detect or predict server's failure – protect Web services from zero-day attacks
  • 38. What Anomaly Detection cannot do? • You cannot – know the cluster distribution of data – find any kinds of outliers with 100% accuracy – easily understand how each outlier occurs – know why a datum is assigned high outlier score
  • 39. Conclusion • Jubatus have embedded feature extractor with algorithms. • User should configure both feature extractor and algorithm properly • Client use configured machine learning via Jubatus-RPC • Classifier and Recommender and Anomaly may be useful for your task.
  • 40. DEMO • I try to run the jubatus-example.

Hinweis der Redaktion

  1. Hello, I’ll speak about Jubatus. You may heard about jubatus, but I’m afraid you don’t know jubatus well. In this speak, I wish you’d realize what jubatus can do, or how to use it for your task.
  2. Jubatus has 3 feature. Jubatus is a distributed online machine-learning framework. Distributed means resilient to machine failure. And Jubatus can increase its performance for your task by coordinate multi-machine cluster. Online means fixed time computation. Jubatus developer carefully designed Jubatus API so that users can balance between performance and computation time. Machine-Learning is key factor of Big Data age. You’ll need more than “word count”
  3. This is a overview of Jubatus process. This red rectangle is one Jubatus process. Inside process, there is two component exists. Feature Extractor and Machine-Learning-Model. You can connect your program with jubatus via Jubatus RPC. So you can do machine learning with client-server model.
  4. You can combine this process in cluster each other. Jubatus in cluster communicate and make more fast and reliable machine learning. Whole model is shared and resilient to machine failure.
  5. If there are many Jubatus servers running and continue to mixing User can communicate with cluster via jubatus proxy as if it is single jubatus server.
  6. The communication protocol between Jubatus server and client is completely the same with that of Jubatus proxy and client. It is useful for developers because they can run jubatus in local machine for developing environment, and deploy the client code for production clusters.
  7. A big benefit of distributed system, Jubatus can scale performance out. In your production environment, if there is too heavy RPC request for the throughput of clusters
  8. You can append machine to cluster, cluster will increase its performance. It is suitable for Cloud Computing era.
  9. And jubatus cluster is resilient for cluster failure. Whenever servers break down, the proxy server conceal the machine failure so the service will continue. So you can append or remove cluster machine dynamically.
  10. And Jubatus client library is implemented in many language. you can get jubatus client library via gem, pip, cpan, maven. If you want to use it in other language, you can use messagepack-rpc client with your own risk. It will work! (I tried Javascript
  11. And Jubatus has many kind of machine-learning module. You can use these machine learning rapidly. Among 6 machine learning modules, Classsifier and Recommender and Anomaly Detection will be great help of you. I’ll introduce these 3 machine learning modules.
  12. classifier can classify data. A sample task, you may want to detect programming language of source code. In this case, you can classify language from sequence of text.
  13. First of all, you have to set configuration in the jubatus server. The configuration is written in JSON.
  14. In this case, you choose embedded ngram function, and passing number 2 to ngram. You can get bigram function. And set rule. In this rule, all data inserted will be handled with bigram. Regulating the weights of words with tf/idf scheme.
  15. Now, the Feature Extractor becomes “bigram extractor”
  16. with this bigram extractor, all datum to be splited into two character words. “import” will become “im”, “mp”, “po”, “or”, “rt” with bigram scheme. This form of datum representation if Feature Vector. bigram extractor extracts bigram from datum and get Feature Vector.
  17. You extracting feature vectors from many language source code. Jubatus Classifier learns from feature vectors and create model.
  18. Next, the classifier algorithm should be configured. You can select Classifier Algorithm from Perceptron or Passive Aggressive or the others.
  19. the trained model can classify datum from feature vector. In this case, Jubatus classifier finds a Ruby characteristic feature like "{|" and highly score for ruby, then Jubatus estimate this source code is Ruby.
  20. Another datum, Jubatus find Python characteristic feature like “):” Jubatus scores high for this feature and it estimate this source code should be python.
  21. You can do these procedure via Jubatus RPC. On RPC, giving datum for classification, and Jubatus returns the classification result. All you have to do is write precise JSON configuration and client source code.
  22. You can estimate the topic of a tweet trash spam mail automatically monitor server failure from syslog estimate sentiment from blog post detect attacking via network calculate what feature is the best clue to classification
  23. You cannot train model from data without supervised answer create a class without knowledge of the class get fine model without correct feature designing
  24. Other information for using classifier is available at jubatus official example repository. These 4 sample may be useful for study.
  25. Next Jubatus algorithm is recommender. With this “movie and review rating matrix” which movie should we recommend Ann? Jubatus can answer.
  26. An imaginary field of highly dimensional rating space. Star Wars lover and Star Trek lover is relatively close. Both of them movie is a kind of Science Fiction. Ann and Emily is relatively close. These distance is useful for recommendation. Because Preferences of the human is tend to be similar.
  27. In this case, Ann would like Frozen
  28. Jubatus recommender server consists of Feature Extractor and recommender engine. Feature extractor is completely the same with classifier’s one. Jubatus calculates distance between feature vectors.
  29. From former example, jubatus recommender extracts feature vector from source code, and recommender engine maps each vectors in feature space.
  30. You can create recommendation engine calculate similarity of tweets find similar directional NBA player visualize distance between “Star Wars” and “Star Trek notice that you can use recommender more than recommender.
  31. Recommender is based on unsupervised algorithm. So that You cannot Labeling data(use classifier!) get decision tree And it is nearest-neighbor based recommendation so that get a-priori based recommendation
  32. Another algorithm is Anomaly Detection It calculates “How this datum is far from others?”
  33. Jubatus can detect the outlier from mass of data.
  34. In easy way, you may use recommender’s distance score for finding outlier Distance is not homogeneous, it can not be used to discover outliers.
  35. anomaly detection server consists of Feature Extractor and anomaly detection engine. Feature extractor is completely the same with classifier and recommender’s one. Jubatus finds outlier from feature vectors
  36. The same wit recommender, Jubatus detect anomaly from Feature Vector You should access this procedure via RPC too.
  37. You (might) can find outlier detect or prediction of server’s failure protect service against zero-day attack know the trend of the entire data stream
  38. You cannot get mostly common datum get cluster map of data give a diagnosis the outlier reason automatically
  39. Jubatus have embedded feature extractor with algorithms. User should configure both feature extractor and algorithm properly Client use configured machine learning via Jubatus-RPC Classifier and Recommender and Anomaly may be useful for your task.
  40. I try to run the jubatus-example.