SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Jubatus: Real-time and Highly-scalable
Machine Learning Platform
Shohei Hido
Preferred Infrastructure, Inc. Japan.
HadoopSummit 2013 @ San Jose, CA
2013/06/27
Jubatus: OSS for real-time big data analytics
l  Joint development with NTT laboratory in Japan
l  Released Oct. 2011 (current version is v0.4.3)
l  You can download it from https://github.com/jubatus/
2
1. Bigger data
3. Machine learning
2. More in real-time
Bottom line: Just two words
3
l  Software company in Tokyo, Japan (founded in 2006)
l  Focus on long-term technology innovation
l  28 regular employees, many top-notch engineers
l  Customers: media, e-commerce, research institutes
Distributed computing
Natural language
processing
Machine learning
Information retrieval
Preferred Infrastructure, Inc. (PFI)
-To bring cutting-edge research advances to the real world-
4
l  What is Jubatus? : Motivation and applications
l  How Jubatus works? : The architecture
l  How to use it : Quick-start steps
l  Summary and future
Agenda
At HadoopSummit Last year:
Everyone talked about “real-time”
6
Real-time BI
Definition of
real-time
Real-time
analytics
Real-time
SQL-like
query
Real-time
processing
Real-time
ad-hoc query
Real-time
visualization
Real-time big data analytics: A trend
From an O’Reilly article(2013)
“Real-time big data isn’t just a process for storing petabytes
or exabytes of data in a data warehouse”
“It’s about the ability to make better decisions and take
meaningful actions at the right time.”
“It’s about combining and analyzing data so you can take
the right action, at the right time, and at the right place”
- Michael Minelli, Co-author of “Big Data, Big Analytics”
7
Hadoop
ecosystem
Deeper
analytics
Decision
Speed Sedue
Jubatus
Surveillance
camera	
Security traffic
Automobile
Agriculture
Market Research
Education
Bio
Health Care
Speed and depth of Big data analytics: a whitespace
Big data analytics will go real-time and deeper
9
1. Bigger data
3. Machine learning
2. More in real-time
l  Future: Deeper analytics for rapid decisions and actions
l  Twitter analysis for personalized advertisement optimization
l  Anomaly detection from M2M sensor data
l  Energy demand forecast / Smart grid optimization
l  Security monitoring on Network traffic or financial fraud
Demo: real-time tweet categorization
l  Automatically learns “Apple + iPad => Apple” then “iPad => Apple” in real-time
Jubatus is with Twitter ecosystem in Japan
l  NTT Data: Exclusive tweet reseller in Japan
l  Firehose contract with Twitter
l  Jubatus is an official tool for analytics on Japanese tweets
l  Jubatus can classify 5,000+ tweets per second on a few servers
l 
11
http://blog.jp.twitter.com/2012/09/twitter.html
http://www.nttdata.com/jp/ja/news/release/2012/092700.html
Our twitter analysis modules
Jubatus as a big data analytics platform for industry
l  Gov. fund for IT fusion: big-data new business creation
l  In collaboration with NEC and other research labs.
l  Focus on performance improvement for larger M2M data
12	
Datasize	
Development plan	
Human-generated	
+ Machine- generated	
+ Severe real-time requirement	
SNS data	
Healthcare	
 Agriculture	
Network	
Traffic	
Video
surveillance	
12
Scaling
up
Active development & growing business/community
l  10+ active committers
l  & Pull requests from users
l  Monthly minor update
l  Bug & usability fix
l  Quarterly major update
l  Add new features & interface
13
l  PoC on user companies
l  Real-time ad optimization
l  Server monitoring
l  Smart-house / smart-grid
l  Intelligent camera	
l  Deployment & Experiment
l  Twitter analysis
l  Social media monitoring
l  Malicious attack detection
l  Malware detection	
l  2 Hands-on: 90+ attend in total
l  1 Meetup: 90+ attendees
l  What is Jubatus? : Motivation and applications
l  How Jubatus works? : The architecture
l  How to use it : Quick-start steps
l  Summary and future
Agenda
Online machine learning in Jubatus
l  Batch learning
l  Scan all data before building a model
l  Data must be stored in memory or storage
l  Online learning
l  Model will be updated by each data sample
l  Sometimes with theory that the online model converges
to the batch model
15	
Model	
Model
What Jubatus currently supports
l  Classification (multi-class)
l  Perceptron / PA / CW / AROW
l  Regression
l  PA-based regression
l  Nearest neighbor
l  LSH / MinHash / Euclid LSH
l  Recommendation
l  Nearest neighbor based
l  Anomaly detection
l  LOF (Local Outlier Factor)
l  Graph analysis
l  Shortest path / Centrality (PageRank)
l  Some simple statistics
16
Online learning or distributed learning:
No unified solution has been available
l  Jubatus combines them into a unified computation framework
17
WEKA
  1993-	
SPSS
1988-
Mahout
2006-
Online ML alg.:
PA [2003]
CW[2008]
Real-time/
Online
Batch
Small scale
Stand-alone
Large scale
&
Distributed/
Parallel
computing
Jubatus
2011-
Q: How to make online algorithms distributed?
A: no trivial and some tricks needed
l  Online learning requires frequent model updates
l  Naïve data distribution leads to too many synchronization operations
l  It causes performance problems in terms of network communications and
accuracy
LLLL
LLLL
L
Sync
LLL
Sync
Sync
Sync
time
Data syncronization?
Server A
Server B
Server C
Local model update
18
Our approach: Loose model sharing
l  Jubatus only shares the local models in a loose manner
l  Model size << Data size
l  Jubatus DOES NOT share datasets
l  Unique approach compared to existing framework
l  Local models can be different on the servers
l  Different models will be gradually merged
l  We define three fundamental operations
l  UPDATE / MIX / ANALYZE
l  Algorithms can be implemented independently from
l  Distribution logic
l  Data sharing
l  Failover
19
ModelModelModel
UPDATE, MIX, and ANALYZE
1.  UPDATE - locally
l  Receive a sample, learn and update the local model
2.  MIX - globally
l  Exchange and merge the local models between servers
3.  ANALYZE - locally
l  Receive a sample, apply the local model, return result
ModelModelModel
Unified
model
Unified
model
Unified
model
MIX
Share only models
UPDATE
Distributed training
ANALYZE
Distributed prediction
20
UPDATE
l  Each server starts from an initial model
l  Each data sample are sent to one (or two) servers
l  Local models updated based on the sample
l  Data samples are NEVER shared
21	
Local
model
1
Local
model
2
Initial
model
Initial
model
Distributed
randomly
or consistently
MIX
l  Each server sends its model diff
l  Model diffs are merged and distributed
l  Only model diffs are transmitted
Local
model
1
Local
model
2
Mixed
model
Mixed
model
Initial
model
Initial
model
=	
=	
Model
diff
1
Model
diff
2
Initial
model
Initial
model
-	
-	
Model
diff
1
Model
diff
2
Merged
diff
Merged
diff
Merged
diff
+	
+	
=	
=	
=	
+	
22
UPDATE (iteration)
l  Locally updated models after MIX are discarded
l  Each server starts updating from the mixed model
l  The mixed model improves gradually thanks to all of the servers
Local
model
1
Local
model
2
Mixed
model
Mixed
model
Distributed
randomly
or consistently 	
23
ANALYZE
l  For prediction, each sample randomly goes to a server
l  Server applies the current mixed model to the sample
l  The prediction will be returned to the client
l  You add servers for higher throughput
24	
Mixed
model
Mixed
model
Distributed
randomly
Return prediction
Return prediction
Model inside Jubatus (1): classification
w1
w2
wn
MIX
w
w
w
w =
1
n
w1 ++ wn( )
l  Each server updates local linear models
l  MIX computes the averaged coefficients
25
Model inside Jubatus (2): nearest neighbor
011010010
110001100
110010111
000100101
110101011
000010110
1
2
3
4
5
6
011010010
000010110
1
6
:
011010010
000010110
1
6
:
011010010
000010110
1
6
:
MIX
l  Samples are approximated by LSH, MinHash, etc
l  Only bit-arrays are shared between servers
Jubatus architecture
Standard client-server system
l  Zookeeper and RPC handles connections between clients and servers
l  We have clients for C++/Java/Ruby/Python (All under MIT license)
27
JubaServer
JubaKeeper
fv_converter Algorithm
JubaServer
JubaServer
Linux server
thread
thread
Linux server
Client
Linux server
Linux server
thread
thread
RPC
Client+JubaKeeper
Client+JubaKeeper
…
…
…
……
thread
…
thread
thread
thread
…
RPCRPC RPC Model
Best QPS performances (evaluated on old ver.)
l  Experimental settings
l  Standalone vs. multiple servers
l  Client processes: 1 - 4
l  Server processes: 1 – 6
l  Server thread: 1 – 6
l  Results
l  Classification scales linearly with #server-processes & threads
l  Recommendation performance highly depends on collected #samples
28
Task Operation Max-qps
Classification UPDATE 3,000 [qps]
ANALYZE 6,500 [qps]
Recommendation UPDATE 400 [qps]
ANALYZE 2,500 [qps]
l  What is Jubatus? : Motivation and applications
l  How Jubatus works? : The architecture
l  How to use it : Quick-start steps
l  Summary and future
Agenda
Step (0): Visit Jubatus website (http://jubat.us/)
l  Overview
l  Installation
l  Tutorials
l  API documents
l  Reference
30
Step (1): VM images and tutorial
http://download.jubat.us/event/handson_01/en/
31
l  Hands-on tutorial
l  Intro to ML, How to start, examples, configurations
l  VM images running on any OS
l  VirtualBox / VMware
Step (2) : Download from github
l  https://github.com/jubatus/jubatus/
32
Step (3): Play with Jubatus examples
l  https://github.com/jubatus/jubatus-example/
33
Step (4) : Build your own apps
l  Examples
l  Tweet categorization
l  User segmentation
l  Power consumption estimation
l  Stock price prediction
l  Real-time recommendation
l  Advertisement optimization
l  Online fraud prevention
l  Early-stage defect detection
l  Proactive network monitoring
l  Online malware detection
34
l  What is Jubatus? : Motivation and applications
l  How Jubatus works? : The architecture
l  How to use it : Quick-start steps
l  Summary and future
Agenda
Summary
l  Jubatus is an OSS for online distributed machine learning
l  UPDATE-MIX-ANALYZE for abstracting ML algorithms
l  Most of the tasks
l  Future plans
l  Clustering
l  P2P-like MIX method
l  Time-series preprocessing in fv_converter
l  Unlearning
36
1. Bigger data
3. Machine learning
2. More in real-time
Current: As a meta-data predictor
37
User	
 Time	
 Bet	
 Act.	
 Gain	
 Class	
 Est,	
 Cluster	
 Outlier	
A33	
 5:34	
 40	
 ↑C	
 +20	
 Good	
 +18	
 C1	
 0.07	
A33	
 5:34	
 10	
 ←B	
 +80	
 Good	
 -10	
 C3	
 0.92	
A33	
 5:35	
 20	
 ↑B	
 -16	
 Bad	
 -15	
 C1	
 0.11	
…	
 …	
 …	
 …	
 …	
RDB	
•  Aggregation
•  Reporting
•  AnalyticsOnline learning
Input
data	
Enriched data	
Real-time
prediction	
NoSQL	
HDFS	
Predicted columns	
Search	
l  Apply Jubatus models before storing
l  Adaptive and memory-efficient
Future: For edge-heavy data
38
l  Emerging apps that can’t collect data into one place
l  Due to data intensity: video streams from millions of devices
l  Due to latency: real-time decision within <100 msec
l  Due to privacy: sensitive raw data cannot be shared
Smartphones	
Intelligent cars	
Intelligent cameras	
Healthcare
monitoring	
Bio-medical
How can we help you?
One more thing…
39
We opened a subsidiary in San Jose
l  Preferred Infrastructure America, Inc.
l  Established in March, Office opened in April
l  Next to the SJC airport
l  Start doing business in the U.S.
40
Thank you
l  Follow us
l  github.com/jubatus
l  jubatus@googlegroups.com
l  Twitter: @JubatusOfficial
l  We welcome your contribution and collaboration
41

Weitere ähnliche Inhalte

Was ist angesagt?

Video Analytics on Hadoop webinar victor fang-201309
Video Analytics on Hadoop webinar victor fang-201309Video Analytics on Hadoop webinar victor fang-201309
Video Analytics on Hadoop webinar victor fang-201309DrVictorFang
 
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...Databricks
 
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...Aseda Owusua Addai-Deseh
 
Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014Adam Gibson
 
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...Willy Marroquin (WillyDevNET)
 
Towards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTowards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTuri, Inc.
 

Was ist angesagt? (8)

Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Video Analytics on Hadoop webinar victor fang-201309
Video Analytics on Hadoop webinar victor fang-201309Video Analytics on Hadoop webinar victor fang-201309
Video Analytics on Hadoop webinar victor fang-201309
 
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
 
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
 
Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014
 
Introduction to Auto ML
Introduction to Auto MLIntroduction to Auto ML
Introduction to Auto ML
 
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
 
Towards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTowards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning Benchmark
 

Andere mochten auch

Hadoop Turns a Corner and Sees the Future
Hadoop Turns a Corner and Sees the FutureHadoop Turns a Corner and Sees the Future
Hadoop Turns a Corner and Sees the FutureDataWorks Summit
 
Demystifying Systems for Interactive and Real-time Analytics
Demystifying Systems for Interactive and Real-time AnalyticsDemystifying Systems for Interactive and Real-time Analytics
Demystifying Systems for Interactive and Real-time AnalyticsDataWorks Summit
 
前回のCasual Talkでいただいたご要望に対する進捗状況
前回のCasual Talkでいただいたご要望に対する進捗状況前回のCasual Talkでいただいたご要望に対する進捗状況
前回のCasual Talkでいただいたご要望に対する進捗状況JubatusOfficial
 
Jubatusハンズオン分散編
Jubatusハンズオン分散編Jubatusハンズオン分散編
Jubatusハンズオン分散編odasatoshi
 
機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual Talks機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual TalksYuya Unno
 
Jubatusをベースにしたオーディエンスの分析エンジンの紹介
Jubatusをベースにしたオーディエンスの分析エンジンの紹介Jubatusをベースにしたオーディエンスの分析エンジンの紹介
Jubatusをベースにしたオーディエンスの分析エンジンの紹介JubatusOfficial
 
評BanにおけるJubatus活用事例
評BanにおけるJubatus活用事例評BanにおけるJubatus活用事例
評BanにおけるJubatus活用事例JubatusOfficial
 
標的型メール対策製品でのJubatus活用事例
標的型メール対策製品でのJubatus活用事例標的型メール対策製品でのJubatus活用事例
標的型メール対策製品でのJubatus活用事例JubatusOfficial
 
Jubatus 0.6.0 新機能紹介
Jubatus 0.6.0 新機能紹介Jubatus 0.6.0 新機能紹介
Jubatus 0.6.0 新機能紹介JubatusOfficial
 
Jubatus Casual Talks #2: 大量映像・画像のための異常値検知とクラス分類
Jubatus Casual Talks #2: 大量映像・画像のための異常値検知とクラス分類Jubatus Casual Talks #2: 大量映像・画像のための異常値検知とクラス分類
Jubatus Casual Talks #2: 大量映像・画像のための異常値検知とクラス分類Hirotaka Ogawa
 
Jubatusで始める機械学習
Jubatusで始める機械学習Jubatusで始める機械学習
Jubatusで始める機械学習JubatusOfficial
 
Jubatus Casual Talks #2 Jubatus開発者入門
Jubatus Casual Talks #2 Jubatus開発者入門Jubatus Casual Talks #2 Jubatus開発者入門
Jubatus Casual Talks #2 Jubatus開発者入門Shuzo Kashihara
 
世界征服を目指すJubatusだからこそ期待する5つのポイント
世界征服を目指すJubatusだからこそ期待する5つのポイント世界征服を目指すJubatusだからこそ期待する5つのポイント
世界征服を目指すJubatusだからこそ期待する5つのポイントNTT DATA OSS Professional Services
 
Jubatus Casual Talks #2 : 0.5.0の新機能(クラスタリング)の紹介
Jubatus Casual Talks #2 : 0.5.0の新機能(クラスタリング)の紹介Jubatus Casual Talks #2 : 0.5.0の新機能(クラスタリング)の紹介
Jubatus Casual Talks #2 : 0.5.0の新機能(クラスタリング)の紹介瑛 村下
 
センサデータ解析におけるJubatus活用事例
センサデータ解析におけるJubatus活用事例センサデータ解析におけるJubatus活用事例
センサデータ解析におけるJubatus活用事例JubatusOfficial
 
Jubatus分類器の活用テクニック
Jubatus分類器の活用テクニックJubatus分類器の活用テクニック
Jubatus分類器の活用テクニックJubatusOfficial
 

Andere mochten auch (20)

Hadoop Turns a Corner and Sees the Future
Hadoop Turns a Corner and Sees the FutureHadoop Turns a Corner and Sees the Future
Hadoop Turns a Corner and Sees the Future
 
Demystifying Systems for Interactive and Real-time Analytics
Demystifying Systems for Interactive and Real-time AnalyticsDemystifying Systems for Interactive and Real-time Analytics
Demystifying Systems for Interactive and Real-time Analytics
 
前回のCasual Talkでいただいたご要望に対する進捗状況
前回のCasual Talkでいただいたご要望に対する進捗状況前回のCasual Talkでいただいたご要望に対する進捗状況
前回のCasual Talkでいただいたご要望に対する進捗状況
 
Jubatusハンズオン分散編
Jubatusハンズオン分散編Jubatusハンズオン分散編
Jubatusハンズオン分散編
 
Video Analysis in Hadoop
Video Analysis in HadoopVideo Analysis in Hadoop
Video Analysis in Hadoop
 
機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual Talks機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual Talks
 
Jubatus on Mavericks
Jubatus on MavericksJubatus on Mavericks
Jubatus on Mavericks
 
Jubatusをベースにしたオーディエンスの分析エンジンの紹介
Jubatusをベースにしたオーディエンスの分析エンジンの紹介Jubatusをベースにしたオーディエンスの分析エンジンの紹介
Jubatusをベースにしたオーディエンスの分析エンジンの紹介
 
評BanにおけるJubatus活用事例
評BanにおけるJubatus活用事例評BanにおけるJubatus活用事例
評BanにおけるJubatus活用事例
 
標的型メール対策製品でのJubatus活用事例
標的型メール対策製品でのJubatus活用事例標的型メール対策製品でのJubatus活用事例
標的型メール対策製品でのJubatus活用事例
 
Jubatus 0.6.0 新機能紹介
Jubatus 0.6.0 新機能紹介Jubatus 0.6.0 新機能紹介
Jubatus 0.6.0 新機能紹介
 
Jubatus Casual Talks #2: 大量映像・画像のための異常値検知とクラス分類
Jubatus Casual Talks #2: 大量映像・画像のための異常値検知とクラス分類Jubatus Casual Talks #2: 大量映像・画像のための異常値検知とクラス分類
Jubatus Casual Talks #2: 大量映像・画像のための異常値検知とクラス分類
 
Jubatusで始める機械学習
Jubatusで始める機械学習Jubatusで始める機械学習
Jubatusで始める機械学習
 
Jubatus Casual Talks #2 Jubatus開発者入門
Jubatus Casual Talks #2 Jubatus開発者入門Jubatus Casual Talks #2 Jubatus開発者入門
Jubatus Casual Talks #2 Jubatus開発者入門
 
世界征服を目指すJubatusだからこそ期待する5つのポイント
世界征服を目指すJubatusだからこそ期待する5つのポイント世界征服を目指すJubatusだからこそ期待する5つのポイント
世界征服を目指すJubatusだからこそ期待する5つのポイント
 
Jubatus Casual Talks #2 : 0.5.0の新機能(クラスタリング)の紹介
Jubatus Casual Talks #2 : 0.5.0の新機能(クラスタリング)の紹介Jubatus Casual Talks #2 : 0.5.0の新機能(クラスタリング)の紹介
Jubatus Casual Talks #2 : 0.5.0の新機能(クラスタリング)の紹介
 
センサデータ解析におけるJubatus活用事例
センサデータ解析におけるJubatus活用事例センサデータ解析におけるJubatus活用事例
センサデータ解析におけるJubatus活用事例
 
Jubatus casulatalks2
Jubatus casulatalks2Jubatus casulatalks2
Jubatus casulatalks2
 
Jubatus分類器の活用テクニック
Jubatus分類器の活用テクニックJubatus分類器の活用テクニック
Jubatus分類器の活用テクニック
 
A use case of online machine learning using Jubatus
A use case of online machine learning using JubatusA use case of online machine learning using Jubatus
A use case of online machine learning using Jubatus
 

Ähnlich wie Jubatus talk at HadoopSummit 2013

Chainer GTC 2016
Chainer GTC 2016Chainer GTC 2016
Chainer GTC 2016Shohei Hido
 
Machine Learning on mobile devices
Machine Learning on mobile devicesMachine Learning on mobile devices
Machine Learning on mobile devicesSergey Burkov
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big dataTrieu Nguyen
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and FutureKeiichiro Ono
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2Mohit Garg
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataTrieu Nguyen
 
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Preferred Networks
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...Srivatsan Ramanujam
 
IPv4 to IPv6 network transformation
IPv4 to IPv6 network transformationIPv4 to IPv6 network transformation
IPv4 to IPv6 network transformationNikolay Milovanov
 
Current & Future Use-Cases of OpenDaylight
Current & Future Use-Cases of OpenDaylightCurrent & Future Use-Cases of OpenDaylight
Current & Future Use-Cases of OpenDaylightabhijit2511
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
 
Big data analytics for transport
Big data analytics for transportBig data analytics for transport
Big data analytics for transportUKinItaly
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For ArchitectsKevin Brockhoff
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poliivascucristian
 
BISSA: Empowering Web gadget Communication with Tuple Spaces
BISSA: Empowering Web gadget Communication with Tuple SpacesBISSA: Empowering Web gadget Communication with Tuple Spaces
BISSA: Empowering Web gadget Communication with Tuple SpacesSrinath Perera
 

Ähnlich wie Jubatus talk at HadoopSummit 2013 (20)

Opnet simulator
Opnet simulatorOpnet simulator
Opnet simulator
 
Chainer GTC 2016
Chainer GTC 2016Chainer GTC 2016
Chainer GTC 2016
 
Machine Learning on mobile devices
Machine Learning on mobile devicesMachine Learning on mobile devices
Machine Learning on mobile devices
 
Bitcoin Price Prediction
Bitcoin Price PredictionBitcoin Price Prediction
Bitcoin Price Prediction
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and Future
 
Monitoring in 2017 - TIAD Camp Docker
Monitoring in 2017 - TIAD Camp DockerMonitoring in 2017 - TIAD Camp Docker
Monitoring in 2017 - TIAD Camp Docker
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
 
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
 
IPv4 to IPv6 network transformation
IPv4 to IPv6 network transformationIPv4 to IPv6 network transformation
IPv4 to IPv6 network transformation
 
Meet with Meteor
Meet with MeteorMeet with Meteor
Meet with Meteor
 
Current & Future Use-Cases of OpenDaylight
Current & Future Use-Cases of OpenDaylightCurrent & Future Use-Cases of OpenDaylight
Current & Future Use-Cases of OpenDaylight
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Big data analytics for transport
Big data analytics for transportBig data analytics for transport
Big data analytics for transport
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
 
BISSA: Empowering Web gadget Communication with Tuple Spaces
BISSA: Empowering Web gadget Communication with Tuple SpacesBISSA: Empowering Web gadget Communication with Tuple Spaces
BISSA: Empowering Web gadget Communication with Tuple Spaces
 

Mehr von Preferred Networks

PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57Preferred Networks
 
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3Preferred Networks
 
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...Preferred Networks
 
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...Preferred Networks
 
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55Preferred Networks
 
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Preferred Networks
 
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2Preferred Networks
 
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2Preferred Networks
 
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演Preferred Networks
 
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)Preferred Networks
 
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)Preferred Networks
 
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)Preferred Networks
 
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語るKubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語るPreferred Networks
 
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張Preferred Networks
 
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会Preferred Networks
 
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2Preferred Networks
 
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...Preferred Networks
 
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...Preferred Networks
 
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...Preferred Networks
 
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50Preferred Networks
 

Mehr von Preferred Networks (20)

PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
 
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
 
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
 
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
 
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
 
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
 
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
 
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
 
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
 
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
 
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
 
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
 
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語るKubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
 
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
 
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
 
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
 
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
 
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
 
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
 
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
 

Kürzlich hochgeladen

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 

Jubatus talk at HadoopSummit 2013

  • 1. Jubatus: Real-time and Highly-scalable Machine Learning Platform Shohei Hido Preferred Infrastructure, Inc. Japan. HadoopSummit 2013 @ San Jose, CA 2013/06/27
  • 2. Jubatus: OSS for real-time big data analytics l  Joint development with NTT laboratory in Japan l  Released Oct. 2011 (current version is v0.4.3) l  You can download it from https://github.com/jubatus/ 2 1. Bigger data 3. Machine learning 2. More in real-time
  • 3. Bottom line: Just two words 3
  • 4. l  Software company in Tokyo, Japan (founded in 2006) l  Focus on long-term technology innovation l  28 regular employees, many top-notch engineers l  Customers: media, e-commerce, research institutes Distributed computing Natural language processing Machine learning Information retrieval Preferred Infrastructure, Inc. (PFI) -To bring cutting-edge research advances to the real world- 4
  • 5. l  What is Jubatus? : Motivation and applications l  How Jubatus works? : The architecture l  How to use it : Quick-start steps l  Summary and future Agenda
  • 6. At HadoopSummit Last year: Everyone talked about “real-time” 6 Real-time BI Definition of real-time Real-time analytics Real-time SQL-like query Real-time processing Real-time ad-hoc query Real-time visualization
  • 7. Real-time big data analytics: A trend From an O’Reilly article(2013) “Real-time big data isn’t just a process for storing petabytes or exabytes of data in a data warehouse” “It’s about the ability to make better decisions and take meaningful actions at the right time.” “It’s about combining and analyzing data so you can take the right action, at the right time, and at the right place” - Michael Minelli, Co-author of “Big Data, Big Analytics” 7
  • 8. Hadoop ecosystem Deeper analytics Decision Speed Sedue Jubatus Surveillance camera Security traffic Automobile Agriculture Market Research Education Bio Health Care Speed and depth of Big data analytics: a whitespace
  • 9. Big data analytics will go real-time and deeper 9 1. Bigger data 3. Machine learning 2. More in real-time l  Future: Deeper analytics for rapid decisions and actions l  Twitter analysis for personalized advertisement optimization l  Anomaly detection from M2M sensor data l  Energy demand forecast / Smart grid optimization l  Security monitoring on Network traffic or financial fraud
  • 10. Demo: real-time tweet categorization l  Automatically learns “Apple + iPad => Apple” then “iPad => Apple” in real-time
  • 11. Jubatus is with Twitter ecosystem in Japan l  NTT Data: Exclusive tweet reseller in Japan l  Firehose contract with Twitter l  Jubatus is an official tool for analytics on Japanese tweets l  Jubatus can classify 5,000+ tweets per second on a few servers l  11 http://blog.jp.twitter.com/2012/09/twitter.html http://www.nttdata.com/jp/ja/news/release/2012/092700.html Our twitter analysis modules
  • 12. Jubatus as a big data analytics platform for industry l  Gov. fund for IT fusion: big-data new business creation l  In collaboration with NEC and other research labs. l  Focus on performance improvement for larger M2M data 12 Datasize Development plan Human-generated + Machine- generated + Severe real-time requirement SNS data Healthcare Agriculture Network Traffic Video surveillance 12 Scaling up
  • 13. Active development & growing business/community l  10+ active committers l  & Pull requests from users l  Monthly minor update l  Bug & usability fix l  Quarterly major update l  Add new features & interface 13 l  PoC on user companies l  Real-time ad optimization l  Server monitoring l  Smart-house / smart-grid l  Intelligent camera l  Deployment & Experiment l  Twitter analysis l  Social media monitoring l  Malicious attack detection l  Malware detection l  2 Hands-on: 90+ attend in total l  1 Meetup: 90+ attendees
  • 14. l  What is Jubatus? : Motivation and applications l  How Jubatus works? : The architecture l  How to use it : Quick-start steps l  Summary and future Agenda
  • 15. Online machine learning in Jubatus l  Batch learning l  Scan all data before building a model l  Data must be stored in memory or storage l  Online learning l  Model will be updated by each data sample l  Sometimes with theory that the online model converges to the batch model 15 Model Model
  • 16. What Jubatus currently supports l  Classification (multi-class) l  Perceptron / PA / CW / AROW l  Regression l  PA-based regression l  Nearest neighbor l  LSH / MinHash / Euclid LSH l  Recommendation l  Nearest neighbor based l  Anomaly detection l  LOF (Local Outlier Factor) l  Graph analysis l  Shortest path / Centrality (PageRank) l  Some simple statistics 16
  • 17. Online learning or distributed learning: No unified solution has been available l  Jubatus combines them into a unified computation framework 17 WEKA   1993- SPSS 1988- Mahout 2006- Online ML alg.: PA [2003] CW[2008] Real-time/ Online Batch Small scale Stand-alone Large scale & Distributed/ Parallel computing Jubatus 2011-
  • 18. Q: How to make online algorithms distributed? A: no trivial and some tricks needed l  Online learning requires frequent model updates l  Naïve data distribution leads to too many synchronization operations l  It causes performance problems in terms of network communications and accuracy LLLL LLLL L Sync LLL Sync Sync Sync time Data syncronization? Server A Server B Server C Local model update 18
  • 19. Our approach: Loose model sharing l  Jubatus only shares the local models in a loose manner l  Model size << Data size l  Jubatus DOES NOT share datasets l  Unique approach compared to existing framework l  Local models can be different on the servers l  Different models will be gradually merged l  We define three fundamental operations l  UPDATE / MIX / ANALYZE l  Algorithms can be implemented independently from l  Distribution logic l  Data sharing l  Failover 19 ModelModelModel
  • 20. UPDATE, MIX, and ANALYZE 1.  UPDATE - locally l  Receive a sample, learn and update the local model 2.  MIX - globally l  Exchange and merge the local models between servers 3.  ANALYZE - locally l  Receive a sample, apply the local model, return result ModelModelModel Unified model Unified model Unified model MIX Share only models UPDATE Distributed training ANALYZE Distributed prediction 20
  • 21. UPDATE l  Each server starts from an initial model l  Each data sample are sent to one (or two) servers l  Local models updated based on the sample l  Data samples are NEVER shared 21 Local model 1 Local model 2 Initial model Initial model Distributed randomly or consistently
  • 22. MIX l  Each server sends its model diff l  Model diffs are merged and distributed l  Only model diffs are transmitted Local model 1 Local model 2 Mixed model Mixed model Initial model Initial model = = Model diff 1 Model diff 2 Initial model Initial model - - Model diff 1 Model diff 2 Merged diff Merged diff Merged diff + + = = = + 22
  • 23. UPDATE (iteration) l  Locally updated models after MIX are discarded l  Each server starts updating from the mixed model l  The mixed model improves gradually thanks to all of the servers Local model 1 Local model 2 Mixed model Mixed model Distributed randomly or consistently 23
  • 24. ANALYZE l  For prediction, each sample randomly goes to a server l  Server applies the current mixed model to the sample l  The prediction will be returned to the client l  You add servers for higher throughput 24 Mixed model Mixed model Distributed randomly Return prediction Return prediction
  • 25. Model inside Jubatus (1): classification w1 w2 wn MIX w w w w = 1 n w1 ++ wn( ) l  Each server updates local linear models l  MIX computes the averaged coefficients 25
  • 26. Model inside Jubatus (2): nearest neighbor 011010010 110001100 110010111 000100101 110101011 000010110 1 2 3 4 5 6 011010010 000010110 1 6 : 011010010 000010110 1 6 : 011010010 000010110 1 6 : MIX l  Samples are approximated by LSH, MinHash, etc l  Only bit-arrays are shared between servers
  • 27. Jubatus architecture Standard client-server system l  Zookeeper and RPC handles connections between clients and servers l  We have clients for C++/Java/Ruby/Python (All under MIT license) 27 JubaServer JubaKeeper fv_converter Algorithm JubaServer JubaServer Linux server thread thread Linux server Client Linux server Linux server thread thread RPC Client+JubaKeeper Client+JubaKeeper … … … …… thread … thread thread thread … RPCRPC RPC Model
  • 28. Best QPS performances (evaluated on old ver.) l  Experimental settings l  Standalone vs. multiple servers l  Client processes: 1 - 4 l  Server processes: 1 – 6 l  Server thread: 1 – 6 l  Results l  Classification scales linearly with #server-processes & threads l  Recommendation performance highly depends on collected #samples 28 Task Operation Max-qps Classification UPDATE 3,000 [qps] ANALYZE 6,500 [qps] Recommendation UPDATE 400 [qps] ANALYZE 2,500 [qps]
  • 29. l  What is Jubatus? : Motivation and applications l  How Jubatus works? : The architecture l  How to use it : Quick-start steps l  Summary and future Agenda
  • 30. Step (0): Visit Jubatus website (http://jubat.us/) l  Overview l  Installation l  Tutorials l  API documents l  Reference 30
  • 31. Step (1): VM images and tutorial http://download.jubat.us/event/handson_01/en/ 31 l  Hands-on tutorial l  Intro to ML, How to start, examples, configurations l  VM images running on any OS l  VirtualBox / VMware
  • 32. Step (2) : Download from github l  https://github.com/jubatus/jubatus/ 32
  • 33. Step (3): Play with Jubatus examples l  https://github.com/jubatus/jubatus-example/ 33
  • 34. Step (4) : Build your own apps l  Examples l  Tweet categorization l  User segmentation l  Power consumption estimation l  Stock price prediction l  Real-time recommendation l  Advertisement optimization l  Online fraud prevention l  Early-stage defect detection l  Proactive network monitoring l  Online malware detection 34
  • 35. l  What is Jubatus? : Motivation and applications l  How Jubatus works? : The architecture l  How to use it : Quick-start steps l  Summary and future Agenda
  • 36. Summary l  Jubatus is an OSS for online distributed machine learning l  UPDATE-MIX-ANALYZE for abstracting ML algorithms l  Most of the tasks l  Future plans l  Clustering l  P2P-like MIX method l  Time-series preprocessing in fv_converter l  Unlearning 36 1. Bigger data 3. Machine learning 2. More in real-time
  • 37. Current: As a meta-data predictor 37 User Time Bet Act. Gain Class Est, Cluster Outlier A33 5:34 40 ↑C +20 Good +18 C1 0.07 A33 5:34 10 ←B +80 Good -10 C3 0.92 A33 5:35 20 ↑B -16 Bad -15 C1 0.11 … … … … … RDB •  Aggregation •  Reporting •  AnalyticsOnline learning Input data Enriched data Real-time prediction NoSQL HDFS Predicted columns Search l  Apply Jubatus models before storing l  Adaptive and memory-efficient
  • 38. Future: For edge-heavy data 38 l  Emerging apps that can’t collect data into one place l  Due to data intensity: video streams from millions of devices l  Due to latency: real-time decision within <100 msec l  Due to privacy: sensitive raw data cannot be shared Smartphones Intelligent cars Intelligent cameras Healthcare monitoring Bio-medical
  • 39. How can we help you? One more thing… 39
  • 40. We opened a subsidiary in San Jose l  Preferred Infrastructure America, Inc. l  Established in March, Office opened in April l  Next to the SJC airport l  Start doing business in the U.S. 40
  • 41. Thank you l  Follow us l  github.com/jubatus l  jubatus@googlegroups.com l  Twitter: @JubatusOfficial l  We welcome your contribution and collaboration 41