SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Big Data,
         Bigger Data
              &
         Big R Data
     Birmingham R Users Meeting
            23rd April 2013
            Andy Pryke
Andy@The-Data-Mine.co.uk / @AndyPryke
My Bias…
www.the-data-mine.co.uk




      I work in commercial data
      mining, data analysis and data
      visualisation
      Background in computing and
      artificial intelligence
      Use R to write programs which
      analyse data
What is Big Data?
www.the-data-mine.co.uk




      Depends who you ask.
      Answers are often “too big to ….”
        …load into memory
        …store on a hard drive
        …fit in a standard database
      Plus
        “Fast changing”
        Not just relational
My “Big Data” Definition
www.the-data-mine.co.uk




     “Data collections big
     enough to require you to
     change the way you
     store and process them.”
               - Andy Pryke
Data Size Limits in R
www.the-data-mine.co.uk




      Standard R packages use a single
      thread, with data held in memory (RAM)
      help("Memory-limits")
               •     Vectors limited to 2 Billion items
               •     Memory limit of ~128Tb
      Servers with 1Tb+ memory are available
               • Also, Amazon EC2 servers up to 244Gb
Overview
www.the-data-mine.co.uk




      • Problems using R with Big Data
      • Processing data on disk
      • Hadoop for parallel computation and Big
        Data storage / access
      • “In Database” analysis
      • What next for Birmingham R User Group?
Background: R matrix class
www.the-data-mine.co.uk




      “matrix”
       - Built in (package base).
       - Stored in RAM
       - “Dense” - takes up memory
                 to store zero values)

      Can be replaced by…..
Sparse / Disk Based Matrices
www.the-data-mine.co.uk




      • Matrix – Package Matrix. Sparse. In RAM
      • big.matrix – Package bigmemory /
        bigmemoryExtras & VAM. On disk. VAM
        allows access from parallel R sessions
      • Analysis – Packages
        irlba, bigalgebra, biganalytics (R-Forge
        list)etc.
      More details?
        “Large-Scale Linear Algebra with R”, Bryan
          W. Lewis, Boston R Users Meetup
Commercial Versions of R
www.the-data-mine.co.uk




      Revolution Analytics have specialised
      versions of R for parallel execution & big data

      I believe many if not most components are
      also available under Free Open Source
      licences, including the RHadoop set of
      packages

      Plenty more info here
Background: Hadoop
www.the-data-mine.co.uk




      • Parallel data processing environment
         based on Google’s “MapReduce” model
      • “Map” – divide up data and sending it for
         processing to multiple nodes.
      • “Reduce” – Combine the results
      Plus:
      • Hadoop Distributed File System (HDFS)
      • HBase – Distributed database like
                 Google’s BigTable
RHadoop – Revolution Analytics
www.the-data-mine.co.uk




       Package: rmr2, rhbase, rhdfs

       • Example code using RMR (R Map-Reduce)
       • R and Hadoop – Step by Step Tutorials
       • Install and Demo RHadoop (Google for
         more of these online)
       • Data Hacking with RHadoop
E.g. Function Output
wc.map <- function(., lines) {    RHadoop
  ## split "lines" of text into a vector of individual "words"
                                                                 ## In, 1
                                                                 ## the, 1
  words <- unlist(strsplit(x = lines,split = " "))
www.the-data-mine.co.uk

  keyval(words,1) ## each word occurs once
                                                                 ## beginning, 1
}                                                                ##...

wc.reduce <- function(word, counts ) {                           ## the, 2345
  ## Add up the counts, grouping them by word                    ## word, 987
  keyval(word, sum(counts))
}
                                                                 ## beginning, 123
                                                                 ##...
wordcount <- function(input, output = NULL){
  mapreduce(
   input = input ,
   output = output,
   input.format = "text",
   map = wc.map,
   reduce = wc.reduce,
   combine = T)
}
Other Hadoop libraries for R
www.the-data-mine.co.uk




 Other packages: hive, segue, RHIPE…

 segue
 – easy way to distribute CPU intensive work
 - Uses Amazon’s Elastic Map Reduce service,
   which costs money.
 - not designed for big data, but easy and fun.

 Example follows…
# first, let's generate a 10-element list of
# 999 random numbers + RHadoop
                             1 NA:
> myList <- getMyTestList()
www.the-data-mine.co.uk
# Add up each set of 999 numbers
> outputLocal <- lapply(myList, mean, na.rm=T)
> outputEmr <- emrlapply(myCluster, myList, mean, na.rm=T)
RUNNING - 2011-01-04 15:16:57
RUNNING - 2011-01-04 15:17:27
RUNNING - 2011-01-04 15:17:58
WAITING - 2011-01-04 15:18:29

## Check local and cluster results match
> all.equal(outputEmr, outputLocal)
[1] TRUE

# The key is the emrlapply() function. It works just like lapply(),
# but automagically spreads its work across the specified cluster
Oracle R Connector for Hadoop
www.the-data-mine.co.uk




 • Integrates with Oracle Db, “Oracle Big Data
   Appliance” (sounds expensive!) & HDFS
 • Map-Reduce is very similar to the rmr example
 • Documentation lists examples for Linear
   Regression, k-means, working with graphs
   amongst others
 • Introduction to Oracle R Connector for Hadoop.
 • Oracle also offer some in-database algorithms
   for R via Oracle R Enterprise (overview)
Teradata Integration
www.the-data-mine.co.uk




 Package: teradataR
 • Teradata offer in-database analytics, accessible
   through R
 • These include k-means clustering, descriptive
   statistics and the ability to create and call in-
   database user defined functions
What Next?
www.the-data-mine.co.uk




 I propose an informal “big data” Special Interest
 Group, where we collaborate to explore big data
 options within R, producing example code etc.


         “R” you interested?

Weitere ähnliche Inhalte

Andere mochten auch

Model Visualisation (with ggplot2)
Model Visualisation (with ggplot2)Model Visualisation (with ggplot2)
Model Visualisation (with ggplot2)Hadley Wickham
 
R workshop iii -- 3 hours to learn ggplot2 series
R workshop iii -- 3 hours to learn ggplot2 seriesR workshop iii -- 3 hours to learn ggplot2 series
R workshop iii -- 3 hours to learn ggplot2 seriesVivian S. Zhang
 
Machine learning in R
Machine learning in RMachine learning in R
Machine learning in Rapolol92
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply FunctionSakthi Dasans
 
Data manipulation with dplyr
Data manipulation with dplyrData manipulation with dplyr
Data manipulation with dplyrRomain Francois
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Ram Narasimhan
 
Introducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with rIntroducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with rVivian S. Zhang
 
Grouping & Summarizing Data in R
Grouping & Summarizing Data in RGrouping & Summarizing Data in R
Grouping & Summarizing Data in RJeffrey Breen
 
Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2yannabraham
 

Andere mochten auch (17)

Model Visualisation (with ggplot2)
Model Visualisation (with ggplot2)Model Visualisation (with ggplot2)
Model Visualisation (with ggplot2)
 
Graphical inference
Graphical inferenceGraphical inference
Graphical inference
 
R workshop iii -- 3 hours to learn ggplot2 series
R workshop iii -- 3 hours to learn ggplot2 seriesR workshop iii -- 3 hours to learn ggplot2 series
R workshop iii -- 3 hours to learn ggplot2 series
 
03 Modelling
03 Modelling03 Modelling
03 Modelling
 
23 data-structures
23 data-structures23 data-structures
23 data-structures
 
R packages
R packagesR packages
R packages
 
02 Ddply
02 Ddply02 Ddply
02 Ddply
 
01 Intro
01 Intro01 Intro
01 Intro
 
Reshaping Data in R
Reshaping Data in RReshaping Data in R
Reshaping Data in R
 
Machine learning in R
Machine learning in RMachine learning in R
Machine learning in R
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function
 
Data manipulation with dplyr
Data manipulation with dplyrData manipulation with dplyr
Data manipulation with dplyr
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)
 
Introducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with rIntroducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with r
 
Grouping & Summarizing Data in R
Grouping & Summarizing Data in RGrouping & Summarizing Data in R
Grouping & Summarizing Data in R
 
Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2
 
Rsplit apply combine
Rsplit apply combineRsplit apply combine
Rsplit apply combine
 

Kürzlich hochgeladen

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Kürzlich hochgeladen (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Big Data, Bigger Data & Big R Data

  • 1. Big Data, Bigger Data & Big R Data Birmingham R Users Meeting 23rd April 2013 Andy Pryke Andy@The-Data-Mine.co.uk / @AndyPryke
  • 2. My Bias… www.the-data-mine.co.uk I work in commercial data mining, data analysis and data visualisation Background in computing and artificial intelligence Use R to write programs which analyse data
  • 3. What is Big Data? www.the-data-mine.co.uk Depends who you ask. Answers are often “too big to ….” …load into memory …store on a hard drive …fit in a standard database Plus “Fast changing” Not just relational
  • 4. My “Big Data” Definition www.the-data-mine.co.uk “Data collections big enough to require you to change the way you store and process them.” - Andy Pryke
  • 5. Data Size Limits in R www.the-data-mine.co.uk Standard R packages use a single thread, with data held in memory (RAM) help("Memory-limits") • Vectors limited to 2 Billion items • Memory limit of ~128Tb Servers with 1Tb+ memory are available • Also, Amazon EC2 servers up to 244Gb
  • 6. Overview www.the-data-mine.co.uk • Problems using R with Big Data • Processing data on disk • Hadoop for parallel computation and Big Data storage / access • “In Database” analysis • What next for Birmingham R User Group?
  • 7. Background: R matrix class www.the-data-mine.co.uk “matrix” - Built in (package base). - Stored in RAM - “Dense” - takes up memory to store zero values) Can be replaced by…..
  • 8. Sparse / Disk Based Matrices www.the-data-mine.co.uk • Matrix – Package Matrix. Sparse. In RAM • big.matrix – Package bigmemory / bigmemoryExtras & VAM. On disk. VAM allows access from parallel R sessions • Analysis – Packages irlba, bigalgebra, biganalytics (R-Forge list)etc. More details? “Large-Scale Linear Algebra with R”, Bryan W. Lewis, Boston R Users Meetup
  • 9. Commercial Versions of R www.the-data-mine.co.uk Revolution Analytics have specialised versions of R for parallel execution & big data I believe many if not most components are also available under Free Open Source licences, including the RHadoop set of packages Plenty more info here
  • 10. Background: Hadoop www.the-data-mine.co.uk • Parallel data processing environment based on Google’s “MapReduce” model • “Map” – divide up data and sending it for processing to multiple nodes. • “Reduce” – Combine the results Plus: • Hadoop Distributed File System (HDFS) • HBase – Distributed database like Google’s BigTable
  • 11. RHadoop – Revolution Analytics www.the-data-mine.co.uk Package: rmr2, rhbase, rhdfs • Example code using RMR (R Map-Reduce) • R and Hadoop – Step by Step Tutorials • Install and Demo RHadoop (Google for more of these online) • Data Hacking with RHadoop
  • 12. E.g. Function Output wc.map <- function(., lines) { RHadoop ## split "lines" of text into a vector of individual "words" ## In, 1 ## the, 1 words <- unlist(strsplit(x = lines,split = " ")) www.the-data-mine.co.uk keyval(words,1) ## each word occurs once ## beginning, 1 } ##... wc.reduce <- function(word, counts ) { ## the, 2345 ## Add up the counts, grouping them by word ## word, 987 keyval(word, sum(counts)) } ## beginning, 123 ##... wordcount <- function(input, output = NULL){ mapreduce( input = input , output = output, input.format = "text", map = wc.map, reduce = wc.reduce, combine = T) }
  • 13. Other Hadoop libraries for R www.the-data-mine.co.uk Other packages: hive, segue, RHIPE… segue – easy way to distribute CPU intensive work - Uses Amazon’s Elastic Map Reduce service, which costs money. - not designed for big data, but easy and fun. Example follows…
  • 14. # first, let's generate a 10-element list of # 999 random numbers + RHadoop 1 NA: > myList <- getMyTestList() www.the-data-mine.co.uk # Add up each set of 999 numbers > outputLocal <- lapply(myList, mean, na.rm=T) > outputEmr <- emrlapply(myCluster, myList, mean, na.rm=T) RUNNING - 2011-01-04 15:16:57 RUNNING - 2011-01-04 15:17:27 RUNNING - 2011-01-04 15:17:58 WAITING - 2011-01-04 15:18:29 ## Check local and cluster results match > all.equal(outputEmr, outputLocal) [1] TRUE # The key is the emrlapply() function. It works just like lapply(), # but automagically spreads its work across the specified cluster
  • 15. Oracle R Connector for Hadoop www.the-data-mine.co.uk • Integrates with Oracle Db, “Oracle Big Data Appliance” (sounds expensive!) & HDFS • Map-Reduce is very similar to the rmr example • Documentation lists examples for Linear Regression, k-means, working with graphs amongst others • Introduction to Oracle R Connector for Hadoop. • Oracle also offer some in-database algorithms for R via Oracle R Enterprise (overview)
  • 16. Teradata Integration www.the-data-mine.co.uk Package: teradataR • Teradata offer in-database analytics, accessible through R • These include k-means clustering, descriptive statistics and the ability to create and call in- database user defined functions
  • 17. What Next? www.the-data-mine.co.uk I propose an informal “big data” Special Interest Group, where we collaborate to explore big data options within R, producing example code etc. “R” you interested?