SlideShare ist ein Scribd-Unternehmen logo
1 von 47
RUBY AND R


Chang Sau Sheong
Director, Applied Research, HP Labs Singapore


1   © Copyright 2010 Hewlett-Packard Development Company, L.P.
About HP Labs



2   © Copyright 2010 Hewlett-Packard Development Company, L.P.
HP LABS
– Exploratory and advanced
  research group for Hewlett-Packard
– Global organization that tackles
  complex challenges facing our
  customers and society over the next
  decade
– Pushes the frontiers of fundamental
  science
– HQ Palo Alto



3   © Copyright 2010 Hewlett-Packard Development Company, L.P.
HP LABS AROUND THE WORLD

                                                                 Bristol   St. Petersburg

                                                                                 Beijing
           Palo Alto

                                                                             Bangalore

                      Haifa                                                 Singapore




4   © Copyright 2010 Hewlett-Packard Development Company, L.P.
HP LABS SINGAPORE
– Set up in February 2010
– Focus on Cloud Computing
      Research                                                   Applied Research
            •   Exploratory research                              •   Applied Research
            •   Researchers                                       •   Innovators
            •   Change the state of the art                       •   Take the research to the next
                                                                      stage
            •   Working closely with the
                academic community                                •   Work closely with customers
                                                                      and business units



5   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Ruby and R



6   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Programming language and
    platform for statistical computing,
           licensed under GPL


7   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Strengths in
               statistical processing
                                                                 and
                          data visualization

8   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Extensive library of statistical
           computing packages (CRAN)
              written by statisticians



9   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Statistics is not just
                            for statisticians


10   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Recommendation                                                       Speech
         engine                                                         recognition
        Fingerprint         Spam detection
       identification
                    Card fraud Financial
         Face        detection forecasting
     recognition

                       Data                                       OCR      Credit scoring
                      mining
11   © Copyright 2010 Hewlett-Packard Development Company, L.P.
CRAN
– Almost 2000 packages, mostly created by
  statisticians
     • BiodiversityR                           – GUI for biodiversity and community ecology
       analysis
     • Emu – analyze speech patterns
     • GenABEL – study human genome
     • Quantmod– quantitative financial modeling framework
     • Ftrading – technical trading analysis
     • Cyclones – cyclone identification
     • DOSim – disease analysis toolkit for gene set
     • Agricolae – statistical procedures for agricultural research


12   © Copyright 2010 Hewlett-Packard Development Company, L.P.
EXAMPLE R CODE
– EPL data from football-data.co.uk
– Show home/away goals distribution for 201 season
                                           1




13   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Why Ruby and R?



14   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Stand on shoulders
                          of giants


15   © Copyright 2010 Hewlett-Packard Development Company, L.P.
–Ruby
     • Human   focused programming!
     • Better general purpose programming capabilities
     • Great                  frameworks!
     • Great                  libraries (20,000+ gems in RubyGems)
–R
     • Focus   on statistical computing/crunching
     • Lots of packages written by domain experts/
       statisticians
     • Great                  graphing libraries

16   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Ruby and R
                                                    integration


17   © Copyright 2010 Hewlett-Packard Development Company, L.P.
RINRUBY
– 100% Ruby
– Uses pipes to send commands and evals
– Uses TCP/IP Sockets to send and retrieve data
– Pros:
     •   Doesn't requires anything but R
     •   Works flawlessly on Windows
     •   Work with Ruby 1.8, 1.9 and JRuby 1.5
     •   All API tested

– Cons:
     •   VERY SLOW in assigning
     •   Very limited datatypes: only Vector and Matrix
     •   Not released since 2009
     •   Poor documentation


18   © Copyright 2010 Hewlett-Packard Development Company, L.P.
RSRUBY
– C Extension for Ruby, linked to R's shared library
– Pros:
     •   Blazing speed! 5-10 times faster than Rserve and 100-1000 than RinRuby.
     •   Seamless integration with Ruby. Every method and object is treated like a Ruby object

– Cons:
     •   Transformation between R and Ruby types aren't trivial
     •   Dependent on operating system, Ruby implementation and R version
     •   Not available for alternative implementations of Ruby (eg JRuby)
     •   Not released since 2009
     •   Poor documentation




19   © Copyright 2010 Hewlett-Packard Development Company, L.P.
RSERVE
– 100% Ruby
– Uses TCP/IP sockets to interchange data and commands
– Requires Rserve installed on the server machine
– Access with Ruby uses Ruby-Rserve-Client library
– Pros:
     •   Work with Ruby 1.8, 1.9 and JRuby 1.5.
     •   Session allows to process data asynchronously
     •   Fast: 5-10 times faster than RinRuby
     •   Most recently updated (Jan 2011)

– Cons:
     •   Requires Rserve
     •   Limited features on Windows
     •   Poor documentation



20   © Copyright 2010 Hewlett-Packard Development Company, L.P.
RAPACHE/RRACK
– Web service based
– Run R scripts as web services, consumed by Ruby front-end apps
– Pros:
     •   Modular and separate (no direct integration)
     •   Can be scalable, ‘cloud’-ready

– Cons:
     •   Requires Rapache/rRack
     •   rRack is very new (not accepted by CRAN yet, as of today!), requires R 2.13 (just
         released a few weeks ago)
     •   Rapache specific to Apache web server only
     •   Communications overhead for smaller integrations




21   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Let’s look at some
                                    code!
                                                  (I’m going to use Rserve)




22   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Text classification



23   © Copyright 2010 Hewlett-Packard Development Company, L.P.
TEXT CLASSIFICATION
–Automatically sorting a set of documents into
 different categories from a predefined set
–Classic uses:                                                    Training
                                                                                          Test data
     • Spam               filtering                                 data
     • Email              prioritization
                                                                             Classifier




                                                                             category


24   © Copyright 2010 Hewlett-Packard Development Company, L.P.
25   © Copyright 2010 Hewlett-Packard Development Company, L.P.
TEXT CLASSIFIER CODE

 Prepare




26   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Train classifier by counting frequency of
each word in the document




27   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Get word count




28   © Copyright 2010 Hewlett-Packard Development Company, L.P.
What you get
     {"check"=>1, "result"=>3, "marissa"=>1, "experi"=>1,
     "click"=>1, "engin"=>1, "simpli"=>1, "mistakenli"=>1,
     "pick"=>1, "prevent"=>1, "40"=>1, "regularli"=>1, "place"=>1,
     "user"=>5, "prefer"=>1, "malevol"=>1, "access"=>1,
     "robust"=>1, "servic"=>1, "fault"=>1, "malici"=>1, "list"=>2,
     "hand"=>1, "internet"=>1, "attribut"=>1, "instal"=>1,
     "file"=>1, "unabl"=>1, "vice"=>1, "stopbadwareorg"=>2,
     "merit"=>1, "decid"=>1, "flag"=>2, "saturdai"=>2, "hit"=>2,
     "offici"=>1, "error"=>3, "work"=>1, "site"=>5, "happen"=>2,
     "incid"=>1, "technic"=>1, "advis"=>1, "put"=>1, "human"=>3,
     "harm"=>2, "softwar"=>1, "ms"=>1, "affect"=>1, "carefulli"=>1,
     "product"=>1, "presid"=>1, "complaint"=>1, "potenti"=>2,
     "googl"=>6, "comput"=>2, "peopl"=>1, "investig"=>2,
     "consum"=>1, "danger"=>2, "period"=>1, "wrote"=>2,
     "search"=>7, "ascertain"=>1, "blog"=>1, "warn"=>2,
     "problem"=>1, "updat"=>2, "minut"=>1, "mayer"=>2}




29   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Generate training data for prediction




30   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Training data



31   © Copyright 2010 Hewlett-Packard Development Company, L.P.
category,googl,report,search,user,review,court,mckinnon,year,internet,microsoft,site,sof
twar,warn,browser,oper,expert,rise,lawyer,digit,extradit,sharpli,error,group,result,syst
em,rebel,econom,presid,crisi,find,year,accus,global,obama,china,civilian,shrink,hous,wal
l,street,quarter,white,heavi,lehman,economi,session,ey,time,davo,human
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,
0,0,0,0,1,0,0,0,0,0,0,0,0,0
not_interesting,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,5,0,2,0,0,0,3,0,0,0,3,
1,0,0,0,0,0,3,0,0,0,0,0,0,2
not_interesting,0,1,0,0,0,0,0,2,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,3,0,3,1,2,0,2,0,0,0,
0,0,0,0,0,0,0,3,1,3,1,0,2,0
not_interesting,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,


                                                                     The top 25 most
0,0,0,0,0,0,0,0,0,0,0,0,0,1
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,2,0,0,1,2,1,4,0,
0,2,0,0,0,2,0,0,0,0,2,0,1,0

                                                                    frequent words in
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,
0,0,3,3,0,0,0,0,0,0,0,2,0,0


                                                                   the training dataset
not_interesting,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,2,0,0,2,0,0,
2,1,0,0,2,1,0,0,2,0,0,1,0,0
interesting,6,0,7,5,0,0,0,0,1,0,5,1,2,0,0,0,0,0,0,0,0,3,0,3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,3
interesting,0,7,0,0,2,0,0,0,0,0,0,0,1,0,0,1,0,0,3,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,0,1,0,0,0,0,0,3,3,1,0,1,1,1,0,3,3,0,1,0,3,0,1,0,2,0,1,0,0,0,3,0,0,0,0,0,0,0,
0,0,0,0,0,0,1,1,0,0,3,0
interesting,0,0,0,0,3,5,5,0,0,0,0,0,0,0,0,0,1,4,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,6,0,1,1,0,0,0,0,0,0,0,1,0,0,4,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,0,0,0,2,0,0,0,2,1,4,0,2,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,2,0,0



 32   © Copyright 2010 Hewlett-Packard Development Company, L.P.
category,googl,report,search,user,review,court,mckinnon,year,internet,microsoft,site,sof
twar,warn,browser,oper,expert,rise,lawyer,digit,extradit,sharpli,error,group,result,syst
em,rebel,econom,presid,crisi,find,year,accus,global,obama,china,civilian,shrink,hous,wal
l,street,quarter,white,heavi,lehman,economi,session,ey,time,davo,human
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,
0,0,0,0,1,0,0,0,0,0,0,0,0,0
not_interesting,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,5,0,2,0,0,0,3,0,0,0,3,
1,0,0,0,0,0,3,0,0,0,0,0,0,2
not_interesting,0,1,0,0,0,0,0,2,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,3,0,3,1,2,0,2,0,0,0,
0,0,0,0,0,0,0,3,1,3,1,0,2,0
not_interesting,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,


                                                                       Each line
0,0,0,0,0,0,0,0,0,0,0,0,0,1
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,2,0,0,1,2,1,4,0,
0,2,0,0,0,2,0,0,0,0,2,0,1,0

                                                                     represents 1
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,
0,0,3,3,0,0,0,0,0,0,0,2,0,0


                                                                   document trained
not_interesting,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,2,0,0,2,0,0,
2,1,0,0,2,1,0,0,2,0,0,1,0,0
interesting,6,0,7,5,0,0,0,0,1,0,5,1,2,0,0,0,0,0,0,0,0,3,0,3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,3
interesting,0,7,0,0,2,0,0,0,0,0,0,0,1,0,0,1,0,0,3,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,0,1,0,0,0,0,0,3,3,1,0,1,1,1,0,3,3,0,1,0,3,0,1,0,2,0,1,0,0,0,3,0,0,0,0,0,0,0,
0,0,0,0,0,0,1,1,0,0,3,0
interesting,0,0,0,0,3,5,5,0,0,0,0,0,0,0,0,0,1,4,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,6,0,1,1,0,0,0,0,0,0,0,1,0,0,4,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,0,0,0,2,0,0,0,2,1,4,0,2,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,2,0,0



 33   © Copyright 2010 Hewlett-Packard Development Company, L.P.
category,googl,report,search,user,review,court,mckinnon,year,internet,microsoft,site
,softwar,warn,browser,oper,expert,rise,lawyer,digit,extradit,sharpli,error,group,result,
system,rebel,econom,presid,crisi,find,year,accus,global,obama,china,civilian,shrink,hous
,wall,street,quarter,white,heavi,lehman,economi,session,ey,time,davo,human
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0
,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
not_interesting,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,5,0,2,0,0,0,3,0,0,0,3,
1,0,0,0,0,0,3,0,0,0,0,0,0,2
not_interesting,0,1,0,0,0,0,0,2,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,3,0,3,1,2,0,2,0,0,0,
0,0,0,0,0,0,0,3,1,3,1,0,2,0


                                                                    Categories set
not_interesting,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,1
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,2,0,0,1,2,1,4,0,
0,2,0,0,0,2,0,0,0,0,2,0,1,0
                                                                   when the classifier
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,


                                                                      is created
0,0,3,3,0,0,0,0,0,0,0,2,0,0
not_interesting,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,2,0,0,2,0,0,
2,1,0,0,2,1,0,0,2,0,0,1,0,0
interesting,6,0,7,5,0,0,0,0,1,0,5,1,2,0,0,0,0,0,0,0,0,3,0,3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,3
interesting,0,7,0,0,2,0,0,0,0,0,0,0,1,0,0,1,0,0,3,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,0,1,0,0,0,0,0,3,3,1,0,1,1,1,0,3,3,0,1,0,3,0,1,0,2,0,1,0,0,0,3,0,0,0,0,0,0,0,
0,0,0,0,0,0,1,1,0,0,3,0
interesting,0,0,0,0,3,5,5,0,0,0,0,0,0,0,0,0,1,4,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,6,0,1,1,0,0,0,0,0,0,0,1,0,0,4,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,0,0,0,2,0,0,0,2,1,4,0,2,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,2,0,0


 34   © Copyright 2010 Hewlett-Packard Development Company, L.P.
category,googl,report,search,user,review,court,mckinnon,year,internet,microsoft,site,s
oftwar,warn,browser,oper,expert,rise,lawyer,digit,extradit,sharpli,error,group,result,sy
stem,rebel,econom,presid,crisi,find,year,accus,global,obama,china,civilian,shrink,hous,w
all,street,quarter,white,heavi,lehman,economi,session,ey,time,davo,human
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,
0,0,0,0,1,0,0,0,0,0,0,0,0,0
not_interesting,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,5,0,2,0,0,0,3,0,0,0,3,
1,0,0,0,0,0,3,0,0,0,0,0,0,2
not_interesting,0,1,0,0,0,0,0,2,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,3,0,3,1,2,0,2,0,0,0,


                                                                   Number indicates the
0,0,0,0,0,0,0,3,1,3,1,0,2,0
not_interesting,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,1

                                                                   number of times the
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,2,0,0,1,2,1,4,0,
0,2,0,0,0,2,0,0,0,0,2,0,1,0


                                                                   word appears in that
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,
0,0,3,3,0,0,0,0,0,0,0,2,0,0
not_interesting,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,2,0,0,2,0,0,

                                                                        document
2,1,0,0,2,1,0,0,2,0,0,1,0,0
interesting,6,0,7,5,0,0,0,0,1,0,5,1,2,0,0,0,0,0,0,0,0,3,0,3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,3
interesting,0,7,0,0,2,0,0,0,0,0,0,0,1,0,0,1,0,0,3,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,0,1,0,0,0,0,0,3,3,1,0,1,1,1,0,3,3,0,1,0,3,0,1,0,2,0,1,0,0,0,3,0,0,0,0,0,0,0,
0,0,0,0,0,0,1,1,0,0,3,0
interesting,0,0,0,0,3,5,5,0,0,0,0,0,0,0,0,0,1,4,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,6,0,1,1,0,0,0,0,0,0,0,1,0,0,4,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,0,0,0,2,0,0,0,2,1,4,0,2,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,2,0,0


 35   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Test data



36   © Copyright 2010 Hewlett-Packard Development Company, L.P.
category,googl,report,search,user,review,court,mckinnon,year,internet,micr
 osoft,site,softwar,warn,browser,oper,expert,rise,lawyer,digit,extradit,sha
 rpli,error,group,result,system,rebel,econom,presid,crisi,find,year,accus,g
 lobal,obama,china,civilian,shrink,hous,wall,street,quarter,white,heavi,leh
 man,economi,session,ey,time,davo,human
 category,0,0,0,2,0,0,0,2,1,4,0,2,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0
 ,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0

37   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Using different
                  classification models


38   © Copyright 2010 Hewlett-Packard Development Company, L.P.
NAÏVE BAYES




39   © Copyright 2010 Hewlett-Packard Development Company, L.P.
SVM




40   © Copyright 2010 Hewlett-Packard Development Company, L.P.
RANDOM FOREST




41   © Copyright 2010 Hewlett-Packard Development Company, L.P.
NEURAL NETWORKS




42   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Using the classifier



43   © Copyright 2010 Hewlett-Packard Development Company, L.P.
44   © Copyright 2010 Hewlett-Packard Development Company, L.P.
45   © Copyright 2010 Hewlett-Packard Development Company, L.P.
RESOURCES
– HP Labs Worldwide                                               – Rserve-Ruby-Client
http://www.hpl.hp.com/                                            https://github.com/clbustos/Rserve-
– R Project                                                       Ruby-client

http://www.r-project.org/                                         – rApache
– RsRuby                                                          http://rapache.net/index.html

https://github.com/alexgutteridge/rsrub                           – rRack
y                                                                 https://github.com/jeffreyhorner/rRack/
– RinRuby
http://rinruby.ddahl.org/
– Rserve
http://www.rforge.net/Rserve/


46   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Thank you

 sausheong@hp.com
 http://twitter.com/sausheong
 http://blog.saush.com
47   © Copyright 2010 Hewlett-Packard Development Company, L.P.

Weitere ähnliche Inhalte

Ähnlich wie Ruby and R

Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big Data
PatrickCrompton
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Hortonworks
 

Ähnlich wie Ruby and R (20)

Evented programming
Evented programmingEvented programming
Evented programming
 
Python course in hyderabad
Python course in hyderabadPython course in hyderabad
Python course in hyderabad
 
Introduction to pig
Introduction to pigIntroduction to pig
Introduction to pig
 
HP Helion Webinar #1 - Introduction to HP Helion OpenStack w/Christian Frank
HP Helion Webinar #1 - Introduction to HP Helion OpenStack w/Christian FrankHP Helion Webinar #1 - Introduction to HP Helion OpenStack w/Christian Frank
HP Helion Webinar #1 - Introduction to HP Helion OpenStack w/Christian Frank
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics?
 
Revolution Analytics Podcast
Revolution Analytics PodcastRevolution Analytics Podcast
Revolution Analytics Podcast
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...
 
Reason To learn & use r
Reason To learn & use rReason To learn & use r
Reason To learn & use r
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big Data
 
iKariera 2015
iKariera 2015iKariera 2015
iKariera 2015
 
Pilot Project Highlights: Ruby on Rails - November 2006
Pilot Project Highlights: Ruby on Rails - November 2006Pilot Project Highlights: Ruby on Rails - November 2006
Pilot Project Highlights: Ruby on Rails - November 2006
 
Helion meetup-2014
Helion meetup-2014Helion meetup-2014
Helion meetup-2014
 
Yahoo! Hack Europe
Yahoo! Hack EuropeYahoo! Hack Europe
Yahoo! Hack Europe
 
Big Data & SQL: The On-Ramp to Hadoop
Big Data & SQL: The On-Ramp to Hadoop Big Data & SQL: The On-Ramp to Hadoop
Big Data & SQL: The On-Ramp to Hadoop
 
Trafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoopTrafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoop
 
2019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 42019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 4
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Pig programming is fun
Pig programming is funPig programming is fun
Pig programming is fun
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
HP and linux
HP and linuxHP and linux
HP and linux
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Ruby and R