SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Spark As A Distributed Scala
Write a lot to my blog www.Fruzenshtein.com
Currently interested in Scala, Akka, Spark…
Who is who?
Alexey Zvolinskiy
~4 years of Scala experience
Passing through Functional Programming
in Scala Specialization on Coursera
@Fruzenshtein
Why Scala?
What makes Scala so great?
1. Functional programming language*
2. Immutability
3. Type system
4. Collections API
5. Pattern matching
6. Implicit
Functional programming language
1. Function is a first class citizen
2. Totality
3. Determinism
4. Purity
A => B
A1
A2
…
An
B1
B2
…
Bn
A => BAi Bi A => BAi Bi
A => BAi Bi
Immutability
1. Makes a code more predictable
2. Reduces efforts to understand a code
3. Key to thread-safety
Books:
Java concurrency in practice
Effective Java 2nd Edition
Type system
1. Static typing
2. Type inference
3. Bounds Map[V, K]
List[T1 <: T2]
Set[+T]
Collections API
val numbers = List(1,2,3,4,5,6,7,8,9,10)
numbers.filter(_ % 2 == 0)
.map(_ * 10)
//List(20, 40, 60, 80, 100)
filter(n:Int => Boolean)
//(n => n % 2 == 0)
//(n => n * 10)
map(n:Int => Int)
Collections API
val groupsOfStudents = List(
List(("Alex", 65), ("Kate", 87), ("Sam", 98)),
List(("Peter", 84), ("Bob", 79), ("Samanta", 71)),
List(("Rob", 82), ("Jack", 55), ("Ann", 90))
)
groupsOfStudents.flatMap(students => students)
.groupBy(student => student._2 > 75)
.get(true).get
//List((Kate,87), (Sam,98), (Peter,84), (Bob,79), (Rob,82), (Ann,90))
And what?!
=Parallelism=
Idea of parallelism
How to divide a problem
into subproblems?
How to use a hardware
optimally?
Parallelism background
Scala parallel collections
val from0to100000: Range = 0 until 100000
val list = from0to100000.toList
//scala.collection.parallel.immutable.ParSeq[Int]
val parList = list.par
Some benchmarks
val list = from0to100000.toList
for (i <- 1 to 10) {
val t0 = System.currentTimeMillis()
list.filter(isPrime(_))
println(System.currentTimeMillis - t0)
}
def isPrime(n: Int): Boolean = ! (
(2 until n-1) exists (n % _ == 0)
)
val parList = list.par
for (i <- 1 to 10) {
val t1 = System.currentTimeMillis()
parList.filter(isPrime(_))
println(System.currentTimeMillis - t1)
}
7106
6467
6315
6275
6478
8732
6543
6296
6299
6286
5130
5106
4649
4568
4580
4446
4447
4437
4290
4476
Ok, but what about
Spark?!
Why distributed computations?
single machine
(shared memory)
Multiple nodes
(network)
Parallel collections
(scala)
RDDs
(spark)
Almost the same API
RDD example
Spark
Spark
Spark
val tweets: RDD[Tweet] = …
tweets.filter(
_.contains(“bigdata”)
)
Latency
Numbers from Jeff Dean http://research.google.com/people/jeff/ https://gist.github.com/2841832 Graph and scale by Thomas Lee
Computation model
memory disk network
seconds
-
days
weeks
-
months
weeks
-
years
Scala transformations & actions
1. Transformations are lazy
2. Actions are eager
map
filter
flatMap
…
reduce
collect
count
…
val tweets: RDD[Tweet] = …
tweets.filter(_.contains(“bigdata”))
.map(t => (t.author, t.body)
val tweets: RDD[Tweet] = …
tweets.filter(_.contains(“bigdata”))
.map(t => (t.author, t.body)
.collect()
Rules of thumb
1. Cache
2. Apply efficiently
3. Avoid shuffling
val tweets: RDD[Tweet] = …
val cachedTweets = tweets.cache()
cachedTweets.filter(_.contains(“USA”))
.map(t => (t.author, t.body)
cachedTweets.map(t => (t.author, t.body)
.filter(_.contains(“USA”))
Shuffling
(1, 240)
(2, 500)
(2, 105)
(3, 100)
(1, 200)
(1, 500)
(1, 450)
(3, 100)
(3, 100)
(2, [500, 105]) (1, [240, 200, 500, 450]) (3, [100, 100, 100])
groupByKey()
Transaction(id: Int, amount: Int)
We want to know how much money spent each client
Reduce before group
(1, 240)
(2, 605)
(3, 100)
(1, 700)
(1, 450)
(3, 200)
(2, [605]) (1, [240, 700, 450]) (3, [100, 200])
groupByKey()
(1, 240)
(2, 500)
(2, 105)
(3, 100)
(1, 200)
(1, 500)
(1, 450)
(3, 100)
(3, 100)
reduceByKey(…)
Thanks :)
@Fruzenshtein

Weitere ähnliche Inhalte

Was ist angesagt?

Xebicon2013 scala vsjava_final
Xebicon2013 scala vsjava_finalXebicon2013 scala vsjava_final
Xebicon2013 scala vsjava_finalUrs Peter
 
Lisp Programming Languge
Lisp Programming LangugeLisp Programming Languge
Lisp Programming LangugeYaser Jaradeh
 
Gentle Introduction To Lisp
Gentle Introduction To LispGentle Introduction To Lisp
Gentle Introduction To LispDamien Garaud
 
Verified Subtyping with Traits and Mixins
Verified Subtyping with Traits and MixinsVerified Subtyping with Traits and Mixins
Verified Subtyping with Traits and MixinsAsankhaya Sharma
 
Introduction to lambda calculus
Introduction to lambda calculusIntroduction to lambda calculus
Introduction to lambda calculusAfaq Siddiqui
 
Using VI Java from Scala
Using VI Java from ScalaUsing VI Java from Scala
Using VI Java from Scaladcbriccetti
 
Getting Started With Scala
Getting Started With ScalaGetting Started With Scala
Getting Started With ScalaMeetu Maltiar
 
(3) collections algorithms
(3) collections algorithms(3) collections algorithms
(3) collections algorithmsNico Ludwig
 
RxJS - The Reactive Extensions for JavaScript
RxJS - The Reactive Extensions for JavaScriptRxJS - The Reactive Extensions for JavaScript
RxJS - The Reactive Extensions for JavaScriptViliam Elischer
 
Object Oriented JavaScript
Object Oriented JavaScriptObject Oriented JavaScript
Object Oriented JavaScriptinjulkarnilesh
 
Introducing Pattern Matching in Scala
 Introducing Pattern Matching  in Scala Introducing Pattern Matching  in Scala
Introducing Pattern Matching in ScalaAyush Mishra
 
Safe navigation in OCL
Safe navigation in OCLSafe navigation in OCL
Safe navigation in OCLEdward Willink
 
The Scala Refactoring Library: Problems and Perspectives
The Scala Refactoring Library: Problems and PerspectivesThe Scala Refactoring Library: Problems and Perspectives
The Scala Refactoring Library: Problems and PerspectivesMatthias Langer
 

Was ist angesagt? (16)

Xebicon2013 scala vsjava_final
Xebicon2013 scala vsjava_finalXebicon2013 scala vsjava_final
Xebicon2013 scala vsjava_final
 
Lisp Programming Languge
Lisp Programming LangugeLisp Programming Languge
Lisp Programming Languge
 
Gentle Introduction To Lisp
Gentle Introduction To LispGentle Introduction To Lisp
Gentle Introduction To Lisp
 
Verified Subtyping with Traits and Mixins
Verified Subtyping with Traits and MixinsVerified Subtyping with Traits and Mixins
Verified Subtyping with Traits and Mixins
 
Linked lists c7
Linked lists c7Linked lists c7
Linked lists c7
 
Introduction to lambda calculus
Introduction to lambda calculusIntroduction to lambda calculus
Introduction to lambda calculus
 
Using VI Java from Scala
Using VI Java from ScalaUsing VI Java from Scala
Using VI Java from Scala
 
Getting Started With Scala
Getting Started With ScalaGetting Started With Scala
Getting Started With Scala
 
(3) collections algorithms
(3) collections algorithms(3) collections algorithms
(3) collections algorithms
 
RxJS - The Reactive Extensions for JavaScript
RxJS - The Reactive Extensions for JavaScriptRxJS - The Reactive Extensions for JavaScript
RxJS - The Reactive Extensions for JavaScript
 
OCL 2.4. (... 2.5)
OCL 2.4. (... 2.5)OCL 2.4. (... 2.5)
OCL 2.4. (... 2.5)
 
Lisp
LispLisp
Lisp
 
Object Oriented JavaScript
Object Oriented JavaScriptObject Oriented JavaScript
Object Oriented JavaScript
 
Introducing Pattern Matching in Scala
 Introducing Pattern Matching  in Scala Introducing Pattern Matching  in Scala
Introducing Pattern Matching in Scala
 
Safe navigation in OCL
Safe navigation in OCLSafe navigation in OCL
Safe navigation in OCL
 
The Scala Refactoring Library: Problems and Perspectives
The Scala Refactoring Library: Problems and PerspectivesThe Scala Refactoring Library: Problems and Perspectives
The Scala Refactoring Library: Problems and Perspectives
 

Andere mochten auch

ELIXIR Webinar: Introducing TeSS
ELIXIR Webinar: Introducing TeSSELIXIR Webinar: Introducing TeSS
ELIXIR Webinar: Introducing TeSSNiall Beard
 
WEB MINING: PATTERN DISCOVERY ON THE WORLD WIDE WEB - 2011
WEB MINING: PATTERN DISCOVERY ON THE WORLD WIDE WEB - 2011WEB MINING: PATTERN DISCOVERY ON THE WORLD WIDE WEB - 2011
WEB MINING: PATTERN DISCOVERY ON THE WORLD WIDE WEB - 2011Mustafa TURAN
 
Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov
Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov
Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov Elixir Club
 
나프다 웨비너 1604: Elixir와 함수형 프로그래밍을 이용한 웹 개발
나프다 웨비너 1604: Elixir와 함수형 프로그래밍을 이용한 웹 개발나프다 웨비너 1604: Elixir와 함수형 프로그래밍을 이용한 웹 개발
나프다 웨비너 1604: Elixir와 함수형 프로그래밍을 이용한 웹 개발Changwook Park
 
Control flow in_elixir
Control flow in_elixirControl flow in_elixir
Control flow in_elixirAnna Neyzberg
 
Flowex: Flow-Based Programming with Elixir GenStage - Anton Mishchuk
Flowex: Flow-Based Programming with Elixir GenStage - Anton MishchukFlowex: Flow-Based Programming with Elixir GenStage - Anton Mishchuk
Flowex: Flow-Based Programming with Elixir GenStage - Anton MishchukElixir Club
 
Phoenix: Inflame the Web - Alex Troush
Phoenix: Inflame the Web - Alex TroushPhoenix: Inflame the Web - Alex Troush
Phoenix: Inflame the Web - Alex TroushElixir Club
 
GenStage and Flow - Jose Valim
GenStage and Flow - Jose Valim GenStage and Flow - Jose Valim
GenStage and Flow - Jose Valim Elixir Club
 
Distributed system in Elixir
Distributed system in ElixirDistributed system in Elixir
Distributed system in ElixirChangwook Park
 
Proteome array - antibody based proteome arrays
Proteome array - antibody based proteome arrays Proteome array - antibody based proteome arrays
Proteome array - antibody based proteome arrays saraswathi rajakumar
 
Elixir & Phoenix 推坑
Elixir & Phoenix 推坑Elixir & Phoenix 推坑
Elixir & Phoenix 推坑Chao-Ju Huang
 
Bottleneck in Elixir Application - Alexey Osipenko
 Bottleneck in Elixir Application - Alexey Osipenko  Bottleneck in Elixir Application - Alexey Osipenko
Bottleneck in Elixir Application - Alexey Osipenko Elixir Club
 
Anatomy of an elixir process and Actor Communication
Anatomy of an elixir process and Actor CommunicationAnatomy of an elixir process and Actor Communication
Anatomy of an elixir process and Actor CommunicationMustafa TURAN
 
Build Your Own Real-Time Web Service with Elixir Phoenix
Build Your Own Real-Time Web Service with Elixir PhoenixBuild Your Own Real-Time Web Service with Elixir Phoenix
Build Your Own Real-Time Web Service with Elixir PhoenixChi-chi Ekweozor
 
Professional Programmer (3 Years Later)
Professional Programmer (3 Years Later)Professional Programmer (3 Years Later)
Professional Programmer (3 Years Later)Gabriele Lana
 
Embedded Erlang, Nerves, and SumoBots
Embedded Erlang, Nerves, and SumoBotsEmbedded Erlang, Nerves, and SumoBots
Embedded Erlang, Nerves, and SumoBotsFrank Hunleth
 

Andere mochten auch (20)

Big Data eBook
Big Data eBookBig Data eBook
Big Data eBook
 
ELIXIR Webinar: Introducing TeSS
ELIXIR Webinar: Introducing TeSSELIXIR Webinar: Introducing TeSS
ELIXIR Webinar: Introducing TeSS
 
WEB MINING: PATTERN DISCOVERY ON THE WORLD WIDE WEB - 2011
WEB MINING: PATTERN DISCOVERY ON THE WORLD WIDE WEB - 2011WEB MINING: PATTERN DISCOVERY ON THE WORLD WIDE WEB - 2011
WEB MINING: PATTERN DISCOVERY ON THE WORLD WIDE WEB - 2011
 
Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov
Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov
Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov
 
나프다 웨비너 1604: Elixir와 함수형 프로그래밍을 이용한 웹 개발
나프다 웨비너 1604: Elixir와 함수형 프로그래밍을 이용한 웹 개발나프다 웨비너 1604: Elixir와 함수형 프로그래밍을 이용한 웹 개발
나프다 웨비너 1604: Elixir와 함수형 프로그래밍을 이용한 웹 개발
 
Control flow in_elixir
Control flow in_elixirControl flow in_elixir
Control flow in_elixir
 
Spring IO for startups
Spring IO for startupsSpring IO for startups
Spring IO for startups
 
Flowex: Flow-Based Programming with Elixir GenStage - Anton Mishchuk
Flowex: Flow-Based Programming with Elixir GenStage - Anton MishchukFlowex: Flow-Based Programming with Elixir GenStage - Anton Mishchuk
Flowex: Flow-Based Programming with Elixir GenStage - Anton Mishchuk
 
Phoenix: Inflame the Web - Alex Troush
Phoenix: Inflame the Web - Alex TroushPhoenix: Inflame the Web - Alex Troush
Phoenix: Inflame the Web - Alex Troush
 
GenStage and Flow - Jose Valim
GenStage and Flow - Jose Valim GenStage and Flow - Jose Valim
GenStage and Flow - Jose Valim
 
Distributed system in Elixir
Distributed system in ElixirDistributed system in Elixir
Distributed system in Elixir
 
Proteome array - antibody based proteome arrays
Proteome array - antibody based proteome arrays Proteome array - antibody based proteome arrays
Proteome array - antibody based proteome arrays
 
Elixir & Phoenix 推坑
Elixir & Phoenix 推坑Elixir & Phoenix 推坑
Elixir & Phoenix 推坑
 
Bottleneck in Elixir Application - Alexey Osipenko
 Bottleneck in Elixir Application - Alexey Osipenko  Bottleneck in Elixir Application - Alexey Osipenko
Bottleneck in Elixir Application - Alexey Osipenko
 
Anatomy of an elixir process and Actor Communication
Anatomy of an elixir process and Actor CommunicationAnatomy of an elixir process and Actor Communication
Anatomy of an elixir process and Actor Communication
 
Build Your Own Real-Time Web Service with Elixir Phoenix
Build Your Own Real-Time Web Service with Elixir PhoenixBuild Your Own Real-Time Web Service with Elixir Phoenix
Build Your Own Real-Time Web Service with Elixir Phoenix
 
Play vs Rails
Play vs RailsPlay vs Rails
Play vs Rails
 
Elixir basics
Elixir basicsElixir basics
Elixir basics
 
Professional Programmer (3 Years Later)
Professional Programmer (3 Years Later)Professional Programmer (3 Years Later)
Professional Programmer (3 Years Later)
 
Embedded Erlang, Nerves, and SumoBots
Embedded Erlang, Nerves, and SumoBotsEmbedded Erlang, Nerves, and SumoBots
Embedded Erlang, Nerves, and SumoBots
 

Ähnlich wie Spark as a distributed Scala

Introduction to parallel and distributed computation with spark
Introduction to parallel and distributed computation with sparkIntroduction to parallel and distributed computation with spark
Introduction to parallel and distributed computation with sparkAngelo Leto
 
The Scala Programming Language
The Scala Programming LanguageThe Scala Programming Language
The Scala Programming Languageleague
 
Scala training workshop 02
Scala training workshop 02Scala training workshop 02
Scala training workshop 02Nguyen Tuan
 
Functional programming with Scala
Functional programming with ScalaFunctional programming with Scala
Functional programming with ScalaNeelkanth Sachdeva
 
Functional Programming With Scala
Functional Programming With ScalaFunctional Programming With Scala
Functional Programming With ScalaKnoldus Inc.
 
Spark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin OderskySpark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin OderskySpark Summit
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data ManagementAlbert Bifet
 
Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Martin Odersky
 
The Evolution of Scala
The Evolution of ScalaThe Evolution of Scala
The Evolution of ScalaMartin Odersky
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdfHiroshi Ono
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdfHiroshi Ono
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdfHiroshi Ono
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdfHiroshi Ono
 
Scala final ppt vinay
Scala final ppt vinayScala final ppt vinay
Scala final ppt vinayViplav Jain
 
Scala for Java Programmers
Scala for Java ProgrammersScala for Java Programmers
Scala for Java ProgrammersEric Pederson
 
Metaprogramming in Scala 2.10, Eugene Burmako,
Metaprogramming  in Scala 2.10, Eugene Burmako, Metaprogramming  in Scala 2.10, Eugene Burmako,
Metaprogramming in Scala 2.10, Eugene Burmako, Vasil Remeniuk
 
Artigo 81 - spark_tutorial.pdf
Artigo 81 - spark_tutorial.pdfArtigo 81 - spark_tutorial.pdf
Artigo 81 - spark_tutorial.pdfWalmirCouto3
 

Ähnlich wie Spark as a distributed Scala (20)

Introduction to parallel and distributed computation with spark
Introduction to parallel and distributed computation with sparkIntroduction to parallel and distributed computation with spark
Introduction to parallel and distributed computation with spark
 
The Scala Programming Language
The Scala Programming LanguageThe Scala Programming Language
The Scala Programming Language
 
Scala training workshop 02
Scala training workshop 02Scala training workshop 02
Scala training workshop 02
 
Functional programming with Scala
Functional programming with ScalaFunctional programming with Scala
Functional programming with Scala
 
Functional Programming With Scala
Functional Programming With ScalaFunctional Programming With Scala
Functional Programming With Scala
 
Devoxx
DevoxxDevoxx
Devoxx
 
Spark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin OderskySpark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin Odersky
 
Quick introduction to scala
Quick introduction to scalaQuick introduction to scala
Quick introduction to scala
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009
 
Lecture1
Lecture1Lecture1
Lecture1
 
The Evolution of Scala
The Evolution of ScalaThe Evolution of Scala
The Evolution of Scala
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdf
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdf
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdf
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdf
 
Scala final ppt vinay
Scala final ppt vinayScala final ppt vinay
Scala final ppt vinay
 
Scala for Java Programmers
Scala for Java ProgrammersScala for Java Programmers
Scala for Java Programmers
 
Metaprogramming in Scala 2.10, Eugene Burmako,
Metaprogramming  in Scala 2.10, Eugene Burmako, Metaprogramming  in Scala 2.10, Eugene Burmako,
Metaprogramming in Scala 2.10, Eugene Burmako,
 
Artigo 81 - spark_tutorial.pdf
Artigo 81 - spark_tutorial.pdfArtigo 81 - spark_tutorial.pdf
Artigo 81 - spark_tutorial.pdf
 

Mehr von Alex Fruzenshtein

Scala UA: Big Step To Functional Programming
Scala UA: Big Step To Functional ProgrammingScala UA: Big Step To Functional Programming
Scala UA: Big Step To Functional ProgrammingAlex Fruzenshtein
 
Akka: Actor Design & Communication Technics
Akka: Actor Design & Communication TechnicsAkka: Actor Design & Communication Technics
Akka: Actor Design & Communication TechnicsAlex Fruzenshtein
 
N аргументов не идти в QA
N аргументов не идти в QAN аргументов не идти в QA
N аргументов не идти в QAAlex Fruzenshtein
 
XpDays - Automated testing of responsive design (GalenFramework)
XpDays - Automated testing of responsive design (GalenFramework)XpDays - Automated testing of responsive design (GalenFramework)
XpDays - Automated testing of responsive design (GalenFramework)Alex Fruzenshtein
 

Mehr von Alex Fruzenshtein (6)

Scala UA: Big Step To Functional Programming
Scala UA: Big Step To Functional ProgrammingScala UA: Big Step To Functional Programming
Scala UA: Big Step To Functional Programming
 
Akka: Actor Design & Communication Technics
Akka: Actor Design & Communication TechnicsAkka: Actor Design & Communication Technics
Akka: Actor Design & Communication Technics
 
Boost UI tests
Boost UI testsBoost UI tests
Boost UI tests
 
N аргументов не идти в QA
N аргументов не идти в QAN аргументов не идти в QA
N аргументов не идти в QA
 
XpDays - Automated testing of responsive design (GalenFramework)
XpDays - Automated testing of responsive design (GalenFramework)XpDays - Automated testing of responsive design (GalenFramework)
XpDays - Automated testing of responsive design (GalenFramework)
 
Test-case design
Test-case designTest-case design
Test-case design
 

Kürzlich hochgeladen

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxVivek487417
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制vexqp
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 

Kürzlich hochgeladen (20)

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 

Spark as a distributed Scala

  • 1. Spark As A Distributed Scala
  • 2. Write a lot to my blog www.Fruzenshtein.com Currently interested in Scala, Akka, Spark… Who is who? Alexey Zvolinskiy ~4 years of Scala experience Passing through Functional Programming in Scala Specialization on Coursera @Fruzenshtein
  • 4. What makes Scala so great? 1. Functional programming language* 2. Immutability 3. Type system 4. Collections API 5. Pattern matching 6. Implicit
  • 5. Functional programming language 1. Function is a first class citizen 2. Totality 3. Determinism 4. Purity A => B A1 A2 … An B1 B2 … Bn A => BAi Bi A => BAi Bi A => BAi Bi
  • 6. Immutability 1. Makes a code more predictable 2. Reduces efforts to understand a code 3. Key to thread-safety Books: Java concurrency in practice Effective Java 2nd Edition
  • 7. Type system 1. Static typing 2. Type inference 3. Bounds Map[V, K] List[T1 <: T2] Set[+T]
  • 8. Collections API val numbers = List(1,2,3,4,5,6,7,8,9,10) numbers.filter(_ % 2 == 0) .map(_ * 10) //List(20, 40, 60, 80, 100) filter(n:Int => Boolean) //(n => n % 2 == 0) //(n => n * 10) map(n:Int => Int)
  • 9. Collections API val groupsOfStudents = List( List(("Alex", 65), ("Kate", 87), ("Sam", 98)), List(("Peter", 84), ("Bob", 79), ("Samanta", 71)), List(("Rob", 82), ("Jack", 55), ("Ann", 90)) ) groupsOfStudents.flatMap(students => students) .groupBy(student => student._2 > 75) .get(true).get //List((Kate,87), (Sam,98), (Peter,84), (Bob,79), (Rob,82), (Ann,90))
  • 11. Idea of parallelism How to divide a problem into subproblems? How to use a hardware optimally?
  • 13. Scala parallel collections val from0to100000: Range = 0 until 100000 val list = from0to100000.toList //scala.collection.parallel.immutable.ParSeq[Int] val parList = list.par
  • 14. Some benchmarks val list = from0to100000.toList for (i <- 1 to 10) { val t0 = System.currentTimeMillis() list.filter(isPrime(_)) println(System.currentTimeMillis - t0) } def isPrime(n: Int): Boolean = ! ( (2 until n-1) exists (n % _ == 0) ) val parList = list.par for (i <- 1 to 10) { val t1 = System.currentTimeMillis() parList.filter(isPrime(_)) println(System.currentTimeMillis - t1) } 7106 6467 6315 6275 6478 8732 6543 6296 6299 6286 5130 5106 4649 4568 4580 4446 4447 4437 4290 4476
  • 15. Ok, but what about Spark?!
  • 16. Why distributed computations? single machine (shared memory) Multiple nodes (network) Parallel collections (scala) RDDs (spark) Almost the same API
  • 17. RDD example Spark Spark Spark val tweets: RDD[Tweet] = … tweets.filter( _.contains(“bigdata”) )
  • 18. Latency Numbers from Jeff Dean http://research.google.com/people/jeff/ https://gist.github.com/2841832 Graph and scale by Thomas Lee
  • 19. Computation model memory disk network seconds - days weeks - months weeks - years
  • 20. Scala transformations & actions 1. Transformations are lazy 2. Actions are eager map filter flatMap … reduce collect count … val tweets: RDD[Tweet] = … tweets.filter(_.contains(“bigdata”)) .map(t => (t.author, t.body) val tweets: RDD[Tweet] = … tweets.filter(_.contains(“bigdata”)) .map(t => (t.author, t.body) .collect()
  • 21. Rules of thumb 1. Cache 2. Apply efficiently 3. Avoid shuffling val tweets: RDD[Tweet] = … val cachedTweets = tweets.cache() cachedTweets.filter(_.contains(“USA”)) .map(t => (t.author, t.body) cachedTweets.map(t => (t.author, t.body) .filter(_.contains(“USA”))
  • 22. Shuffling (1, 240) (2, 500) (2, 105) (3, 100) (1, 200) (1, 500) (1, 450) (3, 100) (3, 100) (2, [500, 105]) (1, [240, 200, 500, 450]) (3, [100, 100, 100]) groupByKey() Transaction(id: Int, amount: Int) We want to know how much money spent each client
  • 23. Reduce before group (1, 240) (2, 605) (3, 100) (1, 700) (1, 450) (3, 200) (2, [605]) (1, [240, 700, 450]) (3, [100, 200]) groupByKey() (1, 240) (2, 500) (2, 105) (3, 100) (1, 200) (1, 500) (1, 450) (3, 100) (3, 100) reduceByKey(…)