SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Distributed Online Machine Learning
        Framework for Big Data




                 Shohei Hido
     Preferred Infrastructure, Inc. Japan.
        XLDB Asia, June 22nd, 2012
Overview:
Big Data analytics will go real-time and deeper

        1. Bigger data

     2. More in real-time

      3. Deep analysis

                                No storage
                                No data sharing
                                Only mix model
Jubatus: OSS platform for Big Data analytics




l    Joint development with NTT laboratory in Japan
      l    Project started April 2011
l    Released as an open source software
      l    Just released 0.3.0
l    You can download it from
l    http://github.com/jubatus/
l    Waiting for your contribution and collaboration

                                         3
Agenda

l    What’s missing for Big Data analytics


l    Comparison with existing software


l    Inside Jubatus: Update, Analyze, and Mix


l    Jubatus demo


l    Summary




                                    4
Increasing demand in Big Data applications:
    Real-time deeper analysis
    l  Current focus: aggregation and rule processing on bigger data
         l  CEP (Complex Event Processing) for real-time processing

         l  Hadoop/MapReduce for distributed computation

    l  Future: deeper analysis for rapid decisions and actions
         l  Ex. 1: Defect detection on NY power grid [Rubin+,TPAMI2012]

         l  Ex. 2: Proactive algorithmic trading [ComputerWorldUK, 2011]


Data size	

                                                               What will
                                        Hadoop                  come?
                  CEP
                                                                        Deep
    Reference:http://web.mit.edu/rudin/www/TPAMIPreprint.pdf

                                             5	
                        analysis	
        
    
http://www.computerworlduk.com/news/networking/3302464/
Key technology: Machine learning

l    Examples need rapid decisions under uncertainty
      l    Anomaly detection from M2M sensor data
      l    Energy demand forecast / Smart grid optimization
      l    Security monitoring on raw Internet traffic
l    What is missing for fast & deep analytics on Big Data?
      l    Online/real-time machine learning platform
      l    + Scale-out distributed machine learning platform



            1. Bigger data

      2. More in real-time

       3. Deep analysis
Online machine learning in Jubatus
l    Batch learning
       l  Scan all data before building a model
       l  Data must be stored in memory or storage


                                          Model


l    Online learning
       l  Model will be updated by each data sample
       l  Sometimes with theory that the online model
           converges to the batch model


                                              Model


                                7
Jubatus focuses on latest online algorithms

l    Advantage: fast and not memory-intensive
       l  Low latency & high throughput
       l  No need for storing large datasets


l    Eg. Linear classification algorithms
      l    Perceptron (1958)
      l    Passive Aggressive (PA) (2003)             Very recent
                                                        progress
      l    Confidence Weighted Learning (CW) (2008)
      l    AROW (2009)
      l    Normal HERD (NHERD) (2010)




                                    8
Online learning or distributed learning:
   No unified solution has been available
   l    Jubatus combines them into a unified computation framework
                                  Real-time/
                                    Online
                Online ML alg.:                Jubatus
                  PA [2003]                    2011-
                  CW[2008]

                                                                  Large scale
Small scale                                                             &
Stand-alone                                                       Distributed/
                                                                    Parallel
                WEKA                                     Mahout    computing
                   1993-                                  2006-
                SPSS
                   1988-
                                    Batch
                                      9
What Jubatus currently supports

l    Classification (multi-class)
       l  Perceptron / PA / CW / AROW

l    Regression
       l  PA-based regression

l    Nearest neighbor
       l  LSH / MinHash / Euclid LSH

l    Recommendation
       l  Based on nearest neighbor

l    Anomaly detection*
       l  LOF based on nearest neighbor

l  Graph analysis*
     l  Shortest path / Centrality (PageRank)

l  Some simple statistics
                                    10
Agenda

l    What’s missing for Big Data analytics


l    Comparison with existing software


l    Inside Jubatus: Update, Analyze, and Mix


l    Jubatus demo


l    Summary




                                   11
Hadoop and Mahout: Not good for online learning

l    Hadoop
       l  Advantage

              l    Many extensions for a variety of applications
              l    Good for distributed data storing and aggregation
       l    Disadvantage
              l    No direct support for machine learning and online processing
l    Mahout
       l  Advantage

              l    Popular machine learning algorithms are implemented
       l    Disadvantage
              l    Some implementation are less mature
              l    Still not capable of online machine learning

                                              12
Jubatus vs. Hadoop, RDB-based, and Storm:
    Advantage in online AND distributed ML
    l    Only Jubatus satisfies both of them at the same time

                            Jubatus       Hadoop           RDB        Storm
                Storing          ✓               ✓✓                     ✓
                                                             ✓
                Big Data    External DB          HDFS                 Ext. DB
                 Batch                             ✓        ✓✓
                                ✓                                       ✕
                learning                         Mahout   SPSS, etc
                 Stream
                                ✓                  ✕         ✕         ✓✓
               processing
             Distributed                           ✓
                               ✓✓                            ✕          ✕
              learning                           Mahout
   High
         Online
importance	
                   ✓✓                  ✕         ✕          ✕
                learning
                                          13
Agenda

l    What’s missing for Big Data analytics


l    Comparison with existing software


l    Inside Jubatus: Update, Analyze, and Mix


l    Jubatus demo


l    Summary




                                   14
How to make online algorithms distributed?
=> No trivial!
            Batch learning	
                      Online learning	

                Learn                                  Learn
                                    Easy to
              the update           parallelize     Model update
                                                       Learn
             Model update                          Model update
                                    Hard to            Learn
                Learn
                                   parallelize     Model update
              the update
                                     due to
                                                       Learn
                               frequent updates
  Time	
     Model update                          Model update


l    Online learning requires frequent model updates
l    Naïve distributed architecture leads to too many
      synchronization operations
l    It causes performance problems in terms of network
      communications and accuracy
                               15
Solution: Loose model sharing

l  Jubatus only shares the local models in a loose manner
     l  Model size << Data size

l  Jubatus DOES NOT share datasets
     l  Unique approach compared to existing framework

l  Local models can be different on the servers
     l  Different models will be gradually merged




                  Model      Model       Model




                  Mixed      Mixed       Mixed
                  model      model       model
Three fundamental operations on Jubatus:
UPDATE, ANALYZE, and MIX
1.    UPDATE
      l  Receive a sample, learn and update the local model

2.    ANALYZE
      l  Receive a sample, apply the local model, return result

3.    MIX (called automatically in backend)
      l  Exchange and merge the local models between servers



l    C.f. Map-Shuffle-Reduce operations on Hadoop
l    Algorithms can be implemented independently from
      l    Distribution logic
      l    Data sharing
      l    Failover

                                  17
UPDATE

   l  Each server starts from an initial model
   l  Each data sample are sent to one (or two) servers
   l  Local models updated based on the sample
   l  Data samples are NEVER shared




Distributed

randomly
                                            Local
or consistently 	
                                           Initial
                                                     model
                                                             model
                                                       1

                                                     Local
                                                     model   Initial
                                                             model
                                                       2
                                    18
MIX

l  Each server sends its model diff
l  Model diffs are merged and distributed
l  Only model diffs are transmitted




            Local     Model    Model
Initial                                         Merged Initial     Mixed
model     -	
            model   =	
 diff    diff
                                                  diff +	
                                                         model   =	
                                                                   model
              1          1       1    Merged
                                 +	
 =	
 diff
        Local         Model    Model
Initial                                         Merged Initial     Mixed
model -	
 2
        model       =	
 diff    diff
                                                  diff +	
                                                        model    =	
                                                                   model
                         2       2


                                       19
UPDATE (iteration)

   l  Locally updated models after MIX are discarded
   l  Each server starts updating from the mixed model
   l  The mixed model improves gradually thanks to all of the servers




Distributed

randomly
                                            Local
or consistently 	
                                             Mixed
                                                     model
                                                               model
                                                       1

                                                     Local
                                                     model     Mixed
                                                               model
                                                       2
                                   20
ANALYZE

   l  For prediction, each sample randomly goes to a server
   l  Server applies the current mixed model to the sample
   l  The prediction will be returned to the client




Distributed

randomly
                                                      Mixed
                                                               model

                                Return prediction
                                                               Mixed
                                                               model
                                Return prediction
                                   21
Why Jubatus can work in real-time?

l  Focus on online machine learning
     l  Make online machine learning algorithms distributed

l  Update locally
     l  Online training without communication with others

l  Mix only models globally
     l  Small communication cost, low latency, good performance

     l  Advantage compared to costly Shuffle in MapReduce

l  Analyze locally
     l  Each server has mixed model

     l  Low latency for making predictions

l    Everything in-memory
       l  Process data on-the-fly


                                     22
Agenda

l    What’s missing for Big Data analytics


l    Comparison with existing software


l    Inside Jubatus: Update, Analyze, and Mix


l    Jubatus demo


l    Summary




                                   23
Demo: Twitter analysis using natural language
processing and machine learning
Jubatus classifies each tweet from Twitter data stream into pre-defined
categories. Only one Jubatus server is enough to classify over 5,000 QPS,
which is close to the raw Twitter data. We provide a browser-based GUI.




                                   24
Experiment: Estimation of power consumption
Jubatus learns the power usage and network data flow pattern of
certain servers. The power consumption of individual servers can be
estimated in real-time by monitoring and analyzing packets without
having to install power measurement modules on all servers.




                                      Predicted value (W)
  Data Center /
     Office     Estimation

                    Power
No power meter      meter

                                                            Actual value (W)
                         TAP
                         (Packet data)
Consumption differs for
different types of packets
Agenda

l    What’s missing for Big Data analytics


l    Comparison with existing software


l    Inside Jubatus: Update, Analyze, and Mix


l    Jubatus demo


l    Summary




                                   26
Summary

l    Jubatus is the first OSS platform for online
      distributed machine learning on Big Data streams.
l    Download it from http://github.com/jubatus/
l    We welcome your contribution and collaboration
               1. Bigger data

            2. More in real-time

              3. Deep analysis
                                      No storage
                                      No data sharing
                                      Only mix model

Weitere ähnliche Inhalte

Was ist angesagt?

Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Rio Info
 
The State of Artificial Intelligence in 2018: A Good Old Fashioned Report
The State of Artificial Intelligence in 2018: A Good Old Fashioned ReportThe State of Artificial Intelligence in 2018: A Good Old Fashioned Report
The State of Artificial Intelligence in 2018: A Good Old Fashioned ReportNathan Benaich
 
Deep Credit Risk Ranking with LSTM with Kyle Grove
Deep Credit Risk Ranking with LSTM with Kyle GroveDeep Credit Risk Ranking with LSTM with Kyle Grove
Deep Credit Risk Ranking with LSTM with Kyle GroveDatabricks
 
Artificial Intelligence at LinkedIn
Artificial Intelligence at LinkedInArtificial Intelligence at LinkedIn
Artificial Intelligence at LinkedInBill Liu
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningVarad Meru
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSSri Ambati
 
Computer vision-must-nit-silchar-ml-hackathon-2019
Computer vision-must-nit-silchar-ml-hackathon-2019Computer vision-must-nit-silchar-ml-hackathon-2019
Computer vision-must-nit-silchar-ml-hackathon-2019Aditya Bhattacharya
 
Large-Scale Machine Learning at Twitter
Large-Scale Machine Learning at TwitterLarge-Scale Machine Learning at Twitter
Large-Scale Machine Learning at Twitternep_test_account
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine LearningMostafa
 
Video Analytics on Hadoop webinar victor fang-201309
Video Analytics on Hadoop webinar victor fang-201309Video Analytics on Hadoop webinar victor fang-201309
Video Analytics on Hadoop webinar victor fang-201309DrVictorFang
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...MLconf
 
How Artificial Intelligence & Machine Learning Are Transforming Modern Marketing
How Artificial Intelligence & Machine Learning Are Transforming Modern MarketingHow Artificial Intelligence & Machine Learning Are Transforming Modern Marketing
How Artificial Intelligence & Machine Learning Are Transforming Modern MarketingCleverTap
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceMahantesh Angadi
 
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...Databricks
 

Was ist angesagt? (20)

Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
 
The State of Artificial Intelligence in 2018: A Good Old Fashioned Report
The State of Artificial Intelligence in 2018: A Good Old Fashioned ReportThe State of Artificial Intelligence in 2018: A Good Old Fashioned Report
The State of Artificial Intelligence in 2018: A Good Old Fashioned Report
 
Deep Credit Risk Ranking with LSTM with Kyle Grove
Deep Credit Risk Ranking with LSTM with Kyle GroveDeep Credit Risk Ranking with LSTM with Kyle Grove
Deep Credit Risk Ranking with LSTM with Kyle Grove
 
Artificial Intelligence at LinkedIn
Artificial Intelligence at LinkedInArtificial Intelligence at LinkedIn
Artificial Intelligence at LinkedIn
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Ai use cases
Ai use casesAi use cases
Ai use cases
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
Computer vision-must-nit-silchar-ml-hackathon-2019
Computer vision-must-nit-silchar-ml-hackathon-2019Computer vision-must-nit-silchar-ml-hackathon-2019
Computer vision-must-nit-silchar-ml-hackathon-2019
 
Large-Scale Machine Learning at Twitter
Large-Scale Machine Learning at TwitterLarge-Scale Machine Learning at Twitter
Large-Scale Machine Learning at Twitter
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
Video Analytics on Hadoop webinar victor fang-201309
Video Analytics on Hadoop webinar victor fang-201309Video Analytics on Hadoop webinar victor fang-201309
Video Analytics on Hadoop webinar victor fang-201309
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
 
How Artificial Intelligence & Machine Learning Are Transforming Modern Marketing
How Artificial Intelligence & Machine Learning Are Transforming Modern MarketingHow Artificial Intelligence & Machine Learning Are Transforming Modern Marketing
How Artificial Intelligence & Machine Learning Are Transforming Modern Marketing
 
How Artificial Intelligence & Machine Learning Are Transforming Modern Market...
How Artificial Intelligence & Machine Learning Are Transforming Modern Market...How Artificial Intelligence & Machine Learning Are Transforming Modern Market...
How Artificial Intelligence & Machine Learning Are Transforming Modern Market...
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
 
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
 
Apache Mahout
Apache MahoutApache Mahout
Apache Mahout
 

Andere mochten auch

Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examplesFelipe
 
Online algorithms in Machine Learning
Online algorithms in Machine LearningOnline algorithms in Machine Learning
Online algorithms in Machine LearningAmrinder Arora
 
Learn Like a Human: Taking Machine Learning from Batch to Real-Time
Learn Like a Human: Taking Machine Learning from Batch to Real-TimeLearn Like a Human: Taking Machine Learning from Batch to Real-Time
Learn Like a Human: Taking Machine Learning from Batch to Real-TimeDynamic Yield
 
Online Optimization Problem-1 (Online machine learning)
Online Optimization Problem-1 (Online machine learning)Online Optimization Problem-1 (Online machine learning)
Online Optimization Problem-1 (Online machine learning)Shao-Yen Hung
 
Cost savings from auto-scaling of network resources using machine learning
Cost savings from auto-scaling of network resources using machine learningCost savings from auto-scaling of network resources using machine learning
Cost savings from auto-scaling of network resources using machine learningSabidur Rahman
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Sean Golliher
 
Applications of Machine Learning to Location-based Social Networks
Applications of Machine Learning to Location-based Social NetworksApplications of Machine Learning to Location-based Social Networks
Applications of Machine Learning to Location-based Social NetworksJoan Capdevila Pujol
 
IoT Mobility Forensics
IoT Mobility ForensicsIoT Mobility Forensics
IoT Mobility ForensicsSabidur Rahman
 
Network_Intrusion_Detection_System_Team1
Network_Intrusion_Detection_System_Team1Network_Intrusion_Detection_System_Team1
Network_Intrusion_Detection_System_Team1Saksham Agrawal
 
Airline passenger profiling based on fuzzy deep machine learning
Airline passenger profiling based on fuzzy deep machine learningAirline passenger profiling based on fuzzy deep machine learning
Airline passenger profiling based on fuzzy deep machine learningAyman Qaddumi
 
Computer security using machine learning
Computer security using machine learningComputer security using machine learning
Computer security using machine learningSandeep Sabnani
 
Seo easywebshop notion_technologies
Seo easywebshop notion_technologiesSeo easywebshop notion_technologies
Seo easywebshop notion_technologiesFind-U België
 
DataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big DataDataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big DataDATAVERSITY
 
jubaanomalyでキーストローク認証
jubaanomalyでキーストローク認証jubaanomalyでキーストローク認証
jubaanomalyでキーストローク認証odasatoshi
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
BSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information SecurityBSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information SecurityAlex Pinto
 
Machine learning support vector machines
Machine learning   support vector machinesMachine learning   support vector machines
Machine learning support vector machinesSjoerd Maessen
 

Andere mochten auch (20)

Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examples
 
Online algorithms in Machine Learning
Online algorithms in Machine LearningOnline algorithms in Machine Learning
Online algorithms in Machine Learning
 
A use case of online machine learning using Jubatus
A use case of online machine learning using JubatusA use case of online machine learning using Jubatus
A use case of online machine learning using Jubatus
 
Learn Like a Human: Taking Machine Learning from Batch to Real-Time
Learn Like a Human: Taking Machine Learning from Batch to Real-TimeLearn Like a Human: Taking Machine Learning from Batch to Real-Time
Learn Like a Human: Taking Machine Learning from Batch to Real-Time
 
Online Optimization Problem-1 (Online machine learning)
Online Optimization Problem-1 (Online machine learning)Online Optimization Problem-1 (Online machine learning)
Online Optimization Problem-1 (Online machine learning)
 
OnlineClassifiers
OnlineClassifiersOnlineClassifiers
OnlineClassifiers
 
Cost savings from auto-scaling of network resources using machine learning
Cost savings from auto-scaling of network resources using machine learningCost savings from auto-scaling of network resources using machine learning
Cost savings from auto-scaling of network resources using machine learning
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
 
Applications of Machine Learning to Location-based Social Networks
Applications of Machine Learning to Location-based Social NetworksApplications of Machine Learning to Location-based Social Networks
Applications of Machine Learning to Location-based Social Networks
 
IoT Mobility Forensics
IoT Mobility ForensicsIoT Mobility Forensics
IoT Mobility Forensics
 
Network_Intrusion_Detection_System_Team1
Network_Intrusion_Detection_System_Team1Network_Intrusion_Detection_System_Team1
Network_Intrusion_Detection_System_Team1
 
Airline passenger profiling based on fuzzy deep machine learning
Airline passenger profiling based on fuzzy deep machine learningAirline passenger profiling based on fuzzy deep machine learning
Airline passenger profiling based on fuzzy deep machine learning
 
Machine Learning for dummies
Machine Learning for dummiesMachine Learning for dummies
Machine Learning for dummies
 
Computer security using machine learning
Computer security using machine learningComputer security using machine learning
Computer security using machine learning
 
Seo easywebshop notion_technologies
Seo easywebshop notion_technologiesSeo easywebshop notion_technologies
Seo easywebshop notion_technologies
 
DataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big DataDataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big Data
 
jubaanomalyでキーストローク認証
jubaanomalyでキーストローク認証jubaanomalyでキーストローク認証
jubaanomalyでキーストローク認証
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
BSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information SecurityBSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information Security
 
Machine learning support vector machines
Machine learning   support vector machinesMachine learning   support vector machines
Machine learning support vector machines
 

Ähnlich wie Distributed Online Machine Learning Framework for Big Data

Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridEvert Lammerts
 
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Preferred Networks
 
Hadoop tutorial for Freshers,
Hadoop tutorial for Freshers, Hadoop tutorial for Freshers,
Hadoop tutorial for Freshers, TIB Academy
 
Hadoop training in bangalore
Hadoop training in bangaloreHadoop training in bangalore
Hadoop training in bangaloreTIB Academy
 
Pivotal: Virtualize Big Data to Make the Elephant Dance
Pivotal: Virtualize Big Data to Make the Elephant DancePivotal: Virtualize Big Data to Make the Elephant Dance
Pivotal: Virtualize Big Data to Make the Elephant DanceEMC
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetupRoby Chen
 
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019VMware Tanzu
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache HadoopChristopher Pezza
 
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on HadoopHadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on HadoopJosh Patterson
 
Hadoop Summit 2014 Distributed Deep Learning
Hadoop Summit 2014 Distributed Deep LearningHadoop Summit 2014 Distributed Deep Learning
Hadoop Summit 2014 Distributed Deep LearningAdam Gibson
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Challenges in Large Scale Machine Learning
Challenges in Large Scale  Machine LearningChallenges in Large Scale  Machine Learning
Challenges in Large Scale Machine LearningSudarsun Santhiappan
 
Greenplum Database Overview
Greenplum Database Overview Greenplum Database Overview
Greenplum Database Overview EMC
 
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 Unexpected Challenges in Large Scale Machine Learning by Charles Parker Unexpected Challenges in Large Scale Machine Learning by Charles Parker
Unexpected Challenges in Large Scale Machine Learning by Charles ParkerBigMine
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
 
Cloud computing and Hadoop introduction
Cloud computing and Hadoop introductionCloud computing and Hadoop introduction
Cloud computing and Hadoop introductionchristian.perez
 

Ähnlich wie Distributed Online Machine Learning Framework for Big Data (20)

Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
 
Hadoop tutorial for Freshers,
Hadoop tutorial for Freshers, Hadoop tutorial for Freshers,
Hadoop tutorial for Freshers,
 
Hadoop training in bangalore
Hadoop training in bangaloreHadoop training in bangalore
Hadoop training in bangalore
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Pivotal: Virtualize Big Data to Make the Elephant Dance
Pivotal: Virtualize Big Data to Make the Elephant DancePivotal: Virtualize Big Data to Make the Elephant Dance
Pivotal: Virtualize Big Data to Make the Elephant Dance
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetup
 
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on HadoopHadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
 
Hadoop Summit 2014 Distributed Deep Learning
Hadoop Summit 2014 Distributed Deep LearningHadoop Summit 2014 Distributed Deep Learning
Hadoop Summit 2014 Distributed Deep Learning
 
Deep Learning on Hadoop
Deep Learning on HadoopDeep Learning on Hadoop
Deep Learning on Hadoop
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Challenges in Large Scale Machine Learning
Challenges in Large Scale  Machine LearningChallenges in Large Scale  Machine Learning
Challenges in Large Scale Machine Learning
 
Greenplum Database Overview
Greenplum Database Overview Greenplum Database Overview
Greenplum Database Overview
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 Unexpected Challenges in Large Scale Machine Learning by Charles Parker Unexpected Challenges in Large Scale Machine Learning by Charles Parker
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Cloud computing and Hadoop introduction
Cloud computing and Hadoop introductionCloud computing and Hadoop introduction
Cloud computing and Hadoop introduction
 

Mehr von JubatusOfficial

Python 特徴抽出プラグイン
Python 特徴抽出プラグインPython 特徴抽出プラグイン
Python 特徴抽出プラグインJubatusOfficial
 
Jubatus解説本の紹介
Jubatus解説本の紹介Jubatus解説本の紹介
Jubatus解説本の紹介JubatusOfficial
 
地域の魅力を伝えるツアーガイドAI
地域の魅力を伝えるツアーガイドAI地域の魅力を伝えるツアーガイドAI
地域の魅力を伝えるツアーガイドAIJubatusOfficial
 
単語コレクター(文章自動校正器)
単語コレクター(文章自動校正器)単語コレクター(文章自動校正器)
単語コレクター(文章自動校正器)JubatusOfficial
 
小町のレス数が予測できるか試してみた
小町のレス数が予測できるか試してみた小町のレス数が予測できるか試してみた
小町のレス数が予測できるか試してみたJubatusOfficial
 
新聞から今年の漢字を予測する
新聞から今年の漢字を予測する新聞から今年の漢字を予測する
新聞から今年の漢字を予測するJubatusOfficial
 
かまってちゃん小町
かまってちゃん小町かまってちゃん小町
かまってちゃん小町JubatusOfficial
 
発言小町からのプロファイリング
発言小町からのプロファイリング発言小町からのプロファイリング
発言小町からのプロファイリングJubatusOfficial
 
コンテンツマーケティングでレコメンドエンジンが必要になる背景とその活用
コンテンツマーケティングでレコメンドエンジンが必要になる背景とその活用コンテンツマーケティングでレコメンドエンジンが必要になる背景とその活用
コンテンツマーケティングでレコメンドエンジンが必要になる背景とその活用JubatusOfficial
 
まだCPUで消耗してるの?Jubatusによる近傍探索のGPUを利用した高速化
まだCPUで消耗してるの?Jubatusによる近傍探索のGPUを利用した高速化まだCPUで消耗してるの?Jubatusによる近傍探索のGPUを利用した高速化
まだCPUで消耗してるの?Jubatusによる近傍探索のGPUを利用した高速化JubatusOfficial
 
jubarecommenderの紹介
jubarecommenderの紹介jubarecommenderの紹介
jubarecommenderの紹介JubatusOfficial
 

Mehr von JubatusOfficial (20)

新機能紹介 1.0.6
新機能紹介 1.0.6新機能紹介 1.0.6
新機能紹介 1.0.6
 
Python 特徴抽出プラグイン
Python 特徴抽出プラグインPython 特徴抽出プラグイン
Python 特徴抽出プラグイン
 
Jubakitの解説
Jubakitの解説Jubakitの解説
Jubakitの解説
 
Jubatus解説本の紹介
Jubatus解説本の紹介Jubatus解説本の紹介
Jubatus解説本の紹介
 
Jubatus 1.0 の紹介
Jubatus 1.0 の紹介Jubatus 1.0 の紹介
Jubatus 1.0 の紹介
 
地域の魅力を伝えるツアーガイドAI
地域の魅力を伝えるツアーガイドAI地域の魅力を伝えるツアーガイドAI
地域の魅力を伝えるツアーガイドAI
 
JUBARHYME
JUBARHYMEJUBARHYME
JUBARHYME
 
小町の溜息
小町の溜息小町の溜息
小町の溜息
 
単語コレクター(文章自動校正器)
単語コレクター(文章自動校正器)単語コレクター(文章自動校正器)
単語コレクター(文章自動校正器)
 
銀座のママ
銀座のママ銀座のママ
銀座のママ
 
小町のレス数が予測できるか試してみた
小町のレス数が予測できるか試してみた小町のレス数が予測できるか試してみた
小町のレス数が予測できるか試してみた
 
新聞から今年の漢字を予測する
新聞から今年の漢字を予測する新聞から今年の漢字を予測する
新聞から今年の漢字を予測する
 
かまってちゃん小町
かまってちゃん小町かまってちゃん小町
かまってちゃん小町
 
発言小町からのプロファイリング
発言小町からのプロファイリング発言小町からのプロファイリング
発言小町からのプロファイリング
 
コンテンツマーケティングでレコメンドエンジンが必要になる背景とその活用
コンテンツマーケティングでレコメンドエンジンが必要になる背景とその活用コンテンツマーケティングでレコメンドエンジンが必要になる背景とその活用
コンテンツマーケティングでレコメンドエンジンが必要になる背景とその活用
 
まだCPUで消耗してるの?Jubatusによる近傍探索のGPUを利用した高速化
まだCPUで消耗してるの?Jubatusによる近傍探索のGPUを利用した高速化まだCPUで消耗してるの?Jubatusによる近傍探索のGPUを利用した高速化
まだCPUで消耗してるの?Jubatusによる近傍探索のGPUを利用した高速化
 
jubarecommenderの紹介
jubarecommenderの紹介jubarecommenderの紹介
jubarecommenderの紹介
 
JubaQLご紹介
JubaQLご紹介JubaQLご紹介
JubaQLご紹介
 
Jubaanomalyについて
JubaanomalyについてJubaanomalyについて
Jubaanomalyについて
 
jubabanditの紹介
jubabanditの紹介jubabanditの紹介
jubabanditの紹介
 

Kürzlich hochgeladen

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Kürzlich hochgeladen (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

Distributed Online Machine Learning Framework for Big Data

  • 1. Distributed Online Machine Learning Framework for Big Data Shohei Hido Preferred Infrastructure, Inc. Japan. XLDB Asia, June 22nd, 2012
  • 2. Overview: Big Data analytics will go real-time and deeper 1. Bigger data 2. More in real-time 3. Deep analysis No storage No data sharing Only mix model
  • 3. Jubatus: OSS platform for Big Data analytics l  Joint development with NTT laboratory in Japan l  Project started April 2011 l  Released as an open source software l  Just released 0.3.0 l  You can download it from l  http://github.com/jubatus/ l  Waiting for your contribution and collaboration 3
  • 4. Agenda l  What’s missing for Big Data analytics l  Comparison with existing software l  Inside Jubatus: Update, Analyze, and Mix l  Jubatus demo l  Summary 4
  • 5. Increasing demand in Big Data applications: Real-time deeper analysis l  Current focus: aggregation and rule processing on bigger data l  CEP (Complex Event Processing) for real-time processing l  Hadoop/MapReduce for distributed computation l  Future: deeper analysis for rapid decisions and actions l  Ex. 1: Defect detection on NY power grid [Rubin+,TPAMI2012] l  Ex. 2: Proactive algorithmic trading [ComputerWorldUK, 2011] Data size What will Hadoop come? CEP Deep Reference:http://web.mit.edu/rudin/www/TPAMIPreprint.pdf
 5 analysis http://www.computerworlduk.com/news/networking/3302464/
  • 6. Key technology: Machine learning l  Examples need rapid decisions under uncertainty l  Anomaly detection from M2M sensor data l  Energy demand forecast / Smart grid optimization l  Security monitoring on raw Internet traffic l  What is missing for fast & deep analytics on Big Data? l  Online/real-time machine learning platform l  + Scale-out distributed machine learning platform 1. Bigger data 2. More in real-time 3. Deep analysis
  • 7. Online machine learning in Jubatus l  Batch learning l  Scan all data before building a model l  Data must be stored in memory or storage Model l  Online learning l  Model will be updated by each data sample l  Sometimes with theory that the online model converges to the batch model Model 7
  • 8. Jubatus focuses on latest online algorithms l  Advantage: fast and not memory-intensive l  Low latency & high throughput l  No need for storing large datasets l  Eg. Linear classification algorithms l  Perceptron (1958) l  Passive Aggressive (PA) (2003) Very recent progress l  Confidence Weighted Learning (CW) (2008) l  AROW (2009) l  Normal HERD (NHERD) (2010) 8
  • 9. Online learning or distributed learning: No unified solution has been available l  Jubatus combines them into a unified computation framework Real-time/ Online Online ML alg.: Jubatus PA [2003] 2011- CW[2008] Large scale Small scale & Stand-alone Distributed/ Parallel WEKA Mahout computing    1993- 2006- SPSS 1988- Batch 9
  • 10. What Jubatus currently supports l  Classification (multi-class) l  Perceptron / PA / CW / AROW l  Regression l  PA-based regression l  Nearest neighbor l  LSH / MinHash / Euclid LSH l  Recommendation l  Based on nearest neighbor l  Anomaly detection* l  LOF based on nearest neighbor l  Graph analysis* l  Shortest path / Centrality (PageRank) l  Some simple statistics 10
  • 11. Agenda l  What’s missing for Big Data analytics l  Comparison with existing software l  Inside Jubatus: Update, Analyze, and Mix l  Jubatus demo l  Summary 11
  • 12. Hadoop and Mahout: Not good for online learning l  Hadoop l  Advantage l  Many extensions for a variety of applications l  Good for distributed data storing and aggregation l  Disadvantage l  No direct support for machine learning and online processing l  Mahout l  Advantage l  Popular machine learning algorithms are implemented l  Disadvantage l  Some implementation are less mature l  Still not capable of online machine learning 12
  • 13. Jubatus vs. Hadoop, RDB-based, and Storm: Advantage in online AND distributed ML l  Only Jubatus satisfies both of them at the same time Jubatus Hadoop RDB Storm Storing ✓ ✓✓ ✓ ✓ Big Data External DB HDFS Ext. DB Batch ✓ ✓✓ ✓ ✕ learning Mahout SPSS, etc Stream ✓ ✕ ✕ ✓✓ processing Distributed ✓ ✓✓ ✕ ✕ learning Mahout High
 Online importance ✓✓ ✕ ✕ ✕ learning 13
  • 14. Agenda l  What’s missing for Big Data analytics l  Comparison with existing software l  Inside Jubatus: Update, Analyze, and Mix l  Jubatus demo l  Summary 14
  • 15. How to make online algorithms distributed? => No trivial! Batch learning Online learning Learn Learn Easy to the update parallelize Model update Learn Model update Model update Hard to Learn Learn parallelize Model update the update due to Learn frequent updates Time Model update Model update l  Online learning requires frequent model updates l  Naïve distributed architecture leads to too many synchronization operations l  It causes performance problems in terms of network communications and accuracy 15
  • 16. Solution: Loose model sharing l  Jubatus only shares the local models in a loose manner l  Model size << Data size l  Jubatus DOES NOT share datasets l  Unique approach compared to existing framework l  Local models can be different on the servers l  Different models will be gradually merged Model Model Model Mixed Mixed Mixed model model model
  • 17. Three fundamental operations on Jubatus: UPDATE, ANALYZE, and MIX 1.  UPDATE l  Receive a sample, learn and update the local model 2.  ANALYZE l  Receive a sample, apply the local model, return result 3.  MIX (called automatically in backend) l  Exchange and merge the local models between servers l  C.f. Map-Shuffle-Reduce operations on Hadoop l  Algorithms can be implemented independently from l  Distribution logic l  Data sharing l  Failover 17
  • 18. UPDATE l  Each server starts from an initial model l  Each data sample are sent to one (or two) servers l  Local models updated based on the sample l  Data samples are NEVER shared Distributed
 randomly Local or consistently Initial model model 1 Local model Initial model 2 18
  • 19. MIX l  Each server sends its model diff l  Model diffs are merged and distributed l  Only model diffs are transmitted Local Model Model Initial Merged Initial Mixed model - model = diff diff diff + model = model 1 1 1 Merged + = diff Local Model Model Initial Merged Initial Mixed model - 2 model = diff diff diff + model = model 2 2 19
  • 20. UPDATE (iteration) l  Locally updated models after MIX are discarded l  Each server starts updating from the mixed model l  The mixed model improves gradually thanks to all of the servers Distributed
 randomly Local or consistently Mixed model model 1 Local model Mixed model 2 20
  • 21. ANALYZE l  For prediction, each sample randomly goes to a server l  Server applies the current mixed model to the sample l  The prediction will be returned to the client Distributed
 randomly Mixed model Return prediction Mixed model Return prediction 21
  • 22. Why Jubatus can work in real-time? l  Focus on online machine learning l  Make online machine learning algorithms distributed l  Update locally l  Online training without communication with others l  Mix only models globally l  Small communication cost, low latency, good performance l  Advantage compared to costly Shuffle in MapReduce l  Analyze locally l  Each server has mixed model l  Low latency for making predictions l  Everything in-memory l  Process data on-the-fly 22
  • 23. Agenda l  What’s missing for Big Data analytics l  Comparison with existing software l  Inside Jubatus: Update, Analyze, and Mix l  Jubatus demo l  Summary 23
  • 24. Demo: Twitter analysis using natural language processing and machine learning Jubatus classifies each tweet from Twitter data stream into pre-defined categories. Only one Jubatus server is enough to classify over 5,000 QPS, which is close to the raw Twitter data. We provide a browser-based GUI. 24
  • 25. Experiment: Estimation of power consumption Jubatus learns the power usage and network data flow pattern of certain servers. The power consumption of individual servers can be estimated in real-time by monitoring and analyzing packets without having to install power measurement modules on all servers. Predicted value (W) Data Center / Office Estimation Power No power meter meter Actual value (W) TAP (Packet data) Consumption differs for different types of packets
  • 26. Agenda l  What’s missing for Big Data analytics l  Comparison with existing software l  Inside Jubatus: Update, Analyze, and Mix l  Jubatus demo l  Summary 26
  • 27. Summary l  Jubatus is the first OSS platform for online distributed machine learning on Big Data streams. l  Download it from http://github.com/jubatus/ l  We welcome your contribution and collaboration 1. Bigger data 2. More in real-time 3. Deep analysis No storage No data sharing Only mix model