Suche senden
Hochladen
SparkR: Enabling Interactive Data Science at Scale
•
9 gefällt mir
•
5,328 views
J
jeykottalam
Folgen
"SparkR" presentation by Shivaram Venkataraman and Zongheng Yang
Weniger lesen
Mehr lesen
Software
Melden
Teilen
Melden
Teilen
1 von 33
Jetzt herunterladen
Downloaden Sie, um offline zu lesen
Empfohlen
SparkR: Enabling Interactive Data Science at Scale on Hadoop
SparkR: Enabling Interactive Data Science at Scale on Hadoop
DataWorks Summit
Introduction to SparkR
Introduction to SparkR
Kien Dang
Introduction to Spark R with R studio - Mr. Pragith
Introduction to Spark R with R studio - Mr. Pragith
Sigmoid
First impressions of SparkR: our own machine learning algorithm
First impressions of SparkR: our own machine learning algorithm
InfoFarm
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
Spark Summit
From DataFrames to Tungsten: A Peek into Spark's Future @ Spark Summit San Fr...
From DataFrames to Tungsten: A Peek into Spark's Future @ Spark Summit San Fr...
Databricks
SparkR - Scalable machine learning - Utah R Users Group - U of U - June 17th
SparkR - Scalable machine learning - Utah R Users Group - U of U - June 17th
Alton Alexander
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science London
Databricks
Empfohlen
SparkR: Enabling Interactive Data Science at Scale on Hadoop
SparkR: Enabling Interactive Data Science at Scale on Hadoop
DataWorks Summit
Introduction to SparkR
Introduction to SparkR
Kien Dang
Introduction to Spark R with R studio - Mr. Pragith
Introduction to Spark R with R studio - Mr. Pragith
Sigmoid
First impressions of SparkR: our own machine learning algorithm
First impressions of SparkR: our own machine learning algorithm
InfoFarm
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
Spark Summit
From DataFrames to Tungsten: A Peek into Spark's Future @ Spark Summit San Fr...
From DataFrames to Tungsten: A Peek into Spark's Future @ Spark Summit San Fr...
Databricks
SparkR - Scalable machine learning - Utah R Users Group - U of U - June 17th
SparkR - Scalable machine learning - Utah R Users Group - U of U - June 17th
Alton Alexander
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science London
Databricks
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Databricks
Enabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and R
Databricks
New directions for Apache Spark in 2015
New directions for Apache Spark in 2015
Databricks
Apache® Spark™ 1.5 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.5 presented by Databricks co-founder Patrick Wendell
Databricks
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
Databricks
A Data Frame Abstraction Layer for SparkR-(Chris Freeman, Alteryx)
A Data Frame Abstraction Layer for SparkR-(Chris Freeman, Alteryx)
Spark Summit
End-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache Spark
Databricks
From Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data Applications
Databricks
Performant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame API
Ryuji Tamagawa
Building a modern Application with DataFrames
Building a modern Application with DataFrames
Spark Summit
Spark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with Spark
Databricks
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
Databricks
Enabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and R
Databricks
Up and running with pyspark
Up and running with pyspark
Krishna Sangeeth KS
Strata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark community
Databricks
Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)
Databricks
Scalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache Spark
felixcss
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Spark Summit
Apache spark linkedin
Apache spark linkedin
Yukti Kaura
New Developments in Spark
New Developments in Spark
Databricks
Parallelizing Existing R Packages
Parallelizing Existing R Packages
Craig Warman
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
Walaa Hamdy Assy
Weitere ähnliche Inhalte
Was ist angesagt?
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Databricks
Enabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and R
Databricks
New directions for Apache Spark in 2015
New directions for Apache Spark in 2015
Databricks
Apache® Spark™ 1.5 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.5 presented by Databricks co-founder Patrick Wendell
Databricks
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
Databricks
A Data Frame Abstraction Layer for SparkR-(Chris Freeman, Alteryx)
A Data Frame Abstraction Layer for SparkR-(Chris Freeman, Alteryx)
Spark Summit
End-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache Spark
Databricks
From Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data Applications
Databricks
Performant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame API
Ryuji Tamagawa
Building a modern Application with DataFrames
Building a modern Application with DataFrames
Spark Summit
Spark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with Spark
Databricks
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
Databricks
Enabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and R
Databricks
Up and running with pyspark
Up and running with pyspark
Krishna Sangeeth KS
Strata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark community
Databricks
Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)
Databricks
Scalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache Spark
felixcss
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Spark Summit
Apache spark linkedin
Apache spark linkedin
Yukti Kaura
New Developments in Spark
New Developments in Spark
Databricks
Was ist angesagt?
(20)
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Enabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and R
New directions for Apache Spark in 2015
New directions for Apache Spark in 2015
Apache® Spark™ 1.5 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.5 presented by Databricks co-founder Patrick Wendell
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
A Data Frame Abstraction Layer for SparkR-(Chris Freeman, Alteryx)
A Data Frame Abstraction Layer for SparkR-(Chris Freeman, Alteryx)
End-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache Spark
From Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data Applications
Performant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame API
Building a modern Application with DataFrames
Building a modern Application with DataFrames
Spark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with Spark
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
Enabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and R
Up and running with pyspark
Up and running with pyspark
Strata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark community
Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)
Scalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache Spark
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Apache spark linkedin
Apache spark linkedin
New Developments in Spark
New Developments in Spark
Ähnlich wie SparkR: Enabling Interactive Data Science at Scale
Parallelizing Existing R Packages
Parallelizing Existing R Packages
Craig Warman
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
Walaa Hamdy Assy
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
Michael Rys
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
Spark Summit
Parallelize R Code Using Apache Spark
Parallelize R Code Using Apache Spark
Databricks
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZ
DataFactZ
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
Ned Shawa
Parallelizing Existing R Packages with SparkR
Parallelizing Existing R Packages with SparkR
Databricks
Apache spark basics
Apache spark basics
sparrowAnalytics.com
Apache Spark Introduction.pdf
Apache Spark Introduction.pdf
MaheshPandit16
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
Debraj GuhaThakurta
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
Debraj GuhaThakurta
Apache Spark Fundamentals Training
Apache Spark Fundamentals Training
Eren Avşaroğulları
Running R at Scale with Apache Arrow on Spark
Running R at Scale with Apache Arrow on Spark
Databricks
Big data analysis using spark r published
Big data analysis using spark r published
Dipendra Kusi
Apache spark sneha challa- google pittsburgh-aug 25th
Apache spark sneha challa- google pittsburgh-aug 25th
Sneha Challa
Strata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache Spark
Databricks
Spark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin Keynote
Databricks
Apache Spark An Overview
Apache Spark An Overview
Mohit Jain
Ähnlich wie SparkR: Enabling Interactive Data Science at Scale
(20)
Parallelizing Existing R Packages
Parallelizing Existing R Packages
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
Parallelize R Code Using Apache Spark
Parallelize R Code Using Apache Spark
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZ
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
Parallelizing Existing R Packages with SparkR
Parallelizing Existing R Packages with SparkR
Apache spark basics
Apache spark basics
Apache Spark Introduction.pdf
Apache Spark Introduction.pdf
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
Apache Spark Fundamentals Training
Apache Spark Fundamentals Training
Running R at Scale with Apache Arrow on Spark
Running R at Scale with Apache Arrow on Spark
Big data analysis using spark r published
Big data analysis using spark r published
Apache spark sneha challa- google pittsburgh-aug 25th
Apache spark sneha challa- google pittsburgh-aug 25th
Strata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache Spark
Spark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin Keynote
Apache Spark An Overview
Apache Spark An Overview
Mehr von jeykottalam
AMP Camp 5 Intro
AMP Camp 5 Intro
jeykottalam
Intro to Spark and Spark SQL
Intro to Spark and Spark SQL
jeykottalam
Concurrency Control for Parallel Machine Learning
Concurrency Control for Parallel Machine Learning
jeykottalam
MLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning Library
jeykottalam
SampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS Stack
jeykottalam
Machine Learning Pipelines
Machine Learning Pipelines
jeykottalam
COCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate Ascent
jeykottalam
The BDAS Open Source Community
The BDAS Open Source Community
jeykottalam
Mehr von jeykottalam
(8)
AMP Camp 5 Intro
AMP Camp 5 Intro
Intro to Spark and Spark SQL
Intro to Spark and Spark SQL
Concurrency Control for Parallel Machine Learning
Concurrency Control for Parallel Machine Learning
MLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning Library
SampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS Stack
Machine Learning Pipelines
Machine Learning Pipelines
COCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate Ascent
The BDAS Open Source Community
The BDAS Open Source Community
Kürzlich hochgeladen
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
masabamasaba
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
masabamasaba
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
masabamasaba
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
AnnaArtyushina1
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
WSO2
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
AmarnathKambale
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
Shane Coughlan
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
SelfMade bd
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
masabamasaba
Kürzlich hochgeladen
(20)
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
SparkR: Enabling Interactive Data Science at Scale
1.
SparkR: Enabling Interactive
Data Science at Scale Shivaram Venkataraman Zongheng Yang
2.
Fast ! Scalable
Flexible
3.
Statistics ! Packages
Plots
4.
Fast ! Scalable
Flexible Statistics ! Plots Packages
5.
Outline SparkR API
Live Demo Design Details
6.
RDD Parallel Collection
Transformations map filter groupBy … Actions count collect saveAsTextFile …
7.
R + RDD
= R2D2
8.
R + RDD
= RRDD lapply lapplyPartition groupByKey reduceByKey sampleRDD collect cache filter … broadcast includePackage textFile parallelize
9.
SparkR – R
package for Spark R RRDD Spark
10.
Example: word_count.R library(SparkR)
lines <-‐ textFile(sc, “hdfs://my_text_file”)
11.
Example: word_count.R library(SparkR)
lines <-‐ textFile(sc, “hdfs://my_text_file”) words <-‐ flatMap(lines, function(line) { strsplit(line, " ")[[1]] }) wordCount <-‐ lapply(words, function(word) { list(word, 1L) })
12.
Example: word_count.R library(SparkR)
lines <-‐ textFile(sc, “hdfs://my_text_file”) words <-‐ flatMap(lines, function(line) { strsplit(line, " ")[[1]] }) wordCount <-‐ lapply(words, function(word) { list(word, 1L) }) counts <-‐ reduceByKey(wordCount, "+", 2L) output <-‐ collect(counts)
13.
Demo: Digit Classification
14.
MNIST
15.
A b ||
Ax − b ||2 Minimize x = (ATA)−1ATb
16.
How does this
work ?
17.
Dataflow Local Worker
Worker
18.
Dataflow Local R
Worker Worker
19.
Dataflow Local R
Spark Context Java Spark Context JNI Worker Worker
20.
Dataflow Local Worker
R Spark Context Worker Java Spark Context JNI Spark Executor exec R Spark Executor exec R
21.
22.
From http://obeautifulcode.com/R/How-R-Searches-And-Finds-Stuff/
23.
24.
Dataflow Local Worker
R Spark Context Worker Java Spark Context JNI Spark Executor exec R Spark Executor exec R
25.
Pipelined RDD words
<-‐ flatMap(lines,…) wordCount <-‐ lapply(words,…) Spark Executor exec R Spark Executor R exec
26.
Pipelined RDD Spark
Executor exec R Spark Executor R exec Spark Executor exec R R Spark Executor
27.
Alpha developer release
One line install ! install_github("amplab-‐extras/SparkR-‐pkg", subdir="pkg")
28.
SparkR Implementation Very
similar to PySpark Spark is easy to extend 329 lines of Scala code 2079 lines of R code 693 lines of test code in R
29.
EC2 setup scripts
All Spark examples MNIST demo YARN, Windows support Also on github
30.
Developer Community 13
contributors (10 from outside AMPLab) Collaboration with Alteryx
31.
On the Roadmap
High level DataFrame API Integrating Spark’s MLLib from R Merge with Apache Spark
32.
SparkR RDD à
distributed lists Run R on clusters Re-use existing packages Combine scalability & utility
33.
SparkR https://github.com/amplab-extras/SparkR-pkg Shivaram
Venkataraman shivaram@cs.berkeley.edu Zongheng Yang zongheng.y@gmail.com SparkR mailing list sparkr-dev@googlegroups.com
Jetzt herunterladen