SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
Big Data Processing in
Pharo
Matteo Marra

ESUG - August 2019 - Köln
Matteo Marra - Big Data Processing in Pharo
Big Data Frameworks
2
Programming Model
To express large data processing jobs in terms of simple
functions (e.g., Map/Reduce)
Parallel Execution Model
To execute in parallel computation on large amount of data
Matteo Marra - Big Data Processing in Pharo
Why Big Data?

Google MapReduce
3
Index the WWW
More than 20 billion web pages (> 400 TB)
Do it fast
With 1000 machines: < 3 hours, with one machine, 4 months.
Keep it cheap
Use a lot of recycled hardware, but with robust error recovery
MapReduce: Simplified Data
Processing on Large Clusters
Dean J. and Ghemawat S. OSDI ‘04
Matteo Marra - Big Data Processing in Pharo
Rossi Abruzzo 1522706400
Verdi Toscana 1527976800
Bianchi Toscana 1525298400
Verdi Lombardia 1527976800
Gialli Toscana 1525298400
Rossi Abruzzo 1525298400
Gialli Veneto 1527976800
Bianchi Sardegna 1522706400
Gialli Toscana 1527976800
Bianchi Lombardia 1527976800
Bianchi Toscana 1522706400
Gialli Sicilia 1522706400
Gialli Sicilia 1527976800
Bianchi Sardegna 1527976800
Dataset KeyBy
(Rossi, Abruzzo …)
(Verdi, Toscana …)
(Bianchi, Toscana …)
(Verdi, Lombardia …)
(Gialli, Toscana …)
(Rossi, Abruzzo …)
(Gialli, Veneto …)
(Bianchi, Sardegna …)
(Gialli, Toscana …)
(Bianchi, Lombardia
…)
(Bianchi, Toscana …)
(Gialli, Sicilia …)
(Gialli, Sicilia …)
(Bianchi, Sardegna …)
Map
NIL
(Verdi, Toscana …)
(Bianchi, Toscana …)
NIL
(Gialli, Toscana …)
NIL
NIL
NIL
(Gialli, Toscana …)
NIL
(Bianchi, Toscana …)
NIL
NIL
NIL
GroupBy
(Verdi, Toscana)
(Bianchi, Toscana …)
(Bianchi, Toscana …)
(Gialli, Toscana …)
(Gialli, Toscana …)
Reduce
(Verdi, 1)
(Bianchi, 2)
(Gialli, 2)
Map/Reduce by Example
4
Matteo Marra - Big Data Processing in Pharo
A Map/Reduce application in
Pharo
5
VoteCountingApp>>map: aLine
| splitted|
(line includesSubstring: ‘Toscana')
ifTrue: [ splitted := line substrings: ' '.
(DateAndTime fromUnixTime: splitted at: 3)
> DateAndTime yesterday
ifTrue: [ ^ (splitted at: 2) -> 1 ] ].
^ nil -> nil
VoteCountingApp>>reduceByKey: values
^ values inject: 0 into: [:sum :current | sum + current]
1
2
3
4
5
6
7
8
9
10
11
Matteo Marra - Big Data Processing in Pharo
Port: a Map/Reduce Framework
for Pharo
6
Apply Map and ReduceCoordinate
Matteo Marra - Big Data Processing in Pharo
Deploying Port
7
Local
Deploy on one machine
using multiple cores.
Standalone
Deploy on multiple machines
in the same network.
Yarn Mode
Deploy on a cluster using
Hadoop YARN.
Matteo Marra - Big Data Processing in Pharo
Using Port
8
app := VoteCountingApp new.
port startMapReduceApplication: app on: data.
Matteo Marra - Big Data Processing in Pharo
Handling Results
9
VoteCountingApp>>handleResult: resultPair
self storeToHDFS: resultPair
Asynchronous Execution
The application is executed
asynchronously.
Asynchronous Result Handling
The result is handled asynchronously by the
Map/Reduce Master
Matteo Marra - Big Data Processing in Pharo
From Map/Reduce to Spark
10
Distributed Collection
Instead of executing Map/Reduce on a Dataset, you load
the dataset in memory
Broader API
Not only execute map or reduce, but a broad set of
methods applicable to the distributed collection
In-Memory operations
Avoid heavy storing of intermediate results by explicitly
keep them in memory
map:
reduce:
filter:
aggregate:
groupBy:
map: reduce:
Matteo Marra - Big Data Processing in Pharo
The Spark-like API in Port
11
port startMapReduceApplication: app
on: dataSet .
data := port distribute: dataSet.
VoteCountingApp>>map: aLine
…
result := data map: […].
result storeToHDFSPath: ‘…’ .
VoteCountingApp>>handleResult: res
self storeToHDFS: res
VoteCountingApp>>run
Matteo Marra - Big Data Processing in Pharo
The Polls Analyser in Port
(V2)
12
port := Port withServerURL: ‘http://myServerURL’.
data := port distribute: dataSet.
data := data filter: [:line | line includesSubstring: ‘Toscana’].
data map: [:line | splitted := line substrings: ' ' .
(DateAndTime fromUnixTime: splitted at: 3)
> DateAndTime yesterday
ifTrue: [ (splitted at: 2) -> 1 ]]
result := data reduceByKey: [:value :sum| sum + value]
1
2
3
4
5
6
7
8
result getCollection.
result storeToHDFSPath: ‘…’
Matteo Marra - Big Data Processing in Pharo
DEMO
13
Matteo Marra - Big Data Processing in Pharo
Uses of Port
14
Blockchain Analysis
In collaboration with Santiago at RMoD Lille.
Full blockchain in 6 hours vs 2 days!
Genetic algorithms
In collaboration with Alexandre Bergel at Universidad de Chile
Matteo Marra - Big Data Processing in Pharo
Towards Live Big Data
Development Tools
15
Online Debugging
Debug the system as the bug is happening
Global View
Centralised debugging of the distributed system
Update of the Running System
Deploy code-fixes without restarting the whole system
Isolation
Debug the system without interfering with its execution
Conclusion
Matteo Marra - Vrije Universiteit Brussel
mmarra@vub.be gitlab.soft.vub.ac.be/Marra github.com/Marmat21
Want to try it?
CONTACT ME!
mmarra@vub.be

Weitere ähnliche Inhalte

Was ist angesagt?

Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins
Edureka!
 
Police2_6_poster_GregAmos
Police2_6_poster_GregAmosPolice2_6_poster_GregAmos
Police2_6_poster_GregAmos
Gregory Amos
 
Examples of My Recent Work
Examples of My Recent WorkExamples of My Recent Work
Examples of My Recent Work
guestf2807f
 
R programming language in spatial analysis
R programming language in spatial analysisR programming language in spatial analysis
R programming language in spatial analysis
Abhiram Kanigolla
 
Guug11 mashing up-google_apps
Guug11 mashing up-google_appsGuug11 mashing up-google_apps
Guug11 mashing up-google_apps
Tony Hirst
 

Was ist angesagt? (19)

Dr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GISDr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GIS
 
Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins
 
JSSPP 2010
JSSPP 2010JSSPP 2010
JSSPP 2010
 
Maps with leafletR
Maps with leafletRMaps with leafletR
Maps with leafletR
 
2016 bigdata - projects list
2016   bigdata - projects list2016   bigdata - projects list
2016 bigdata - projects list
 
OpenCube Workshop at eGov2015 & ePart2015 dual conference
OpenCube Workshop at eGov2015 & ePart2015 dual conferenceOpenCube Workshop at eGov2015 & ePart2015 dual conference
OpenCube Workshop at eGov2015 & ePart2015 dual conference
 
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
 
OpenNebulaConf 2016 - MICHAL - flexible infrastructure accounting framework b...
OpenNebulaConf 2016 - MICHAL - flexible infrastructure accounting framework b...OpenNebulaConf 2016 - MICHAL - flexible infrastructure accounting framework b...
OpenNebulaConf 2016 - MICHAL - flexible infrastructure accounting framework b...
 
Police2_6_poster_GregAmos
Police2_6_poster_GregAmosPolice2_6_poster_GregAmos
Police2_6_poster_GregAmos
 
Examples of My Recent Work
Examples of My Recent WorkExamples of My Recent Work
Examples of My Recent Work
 
A Study on New York City Taxi Rides
A Study on New York City Taxi RidesA Study on New York City Taxi Rides
A Study on New York City Taxi Rides
 
Big data in GIS Environment
Big data in GIS Environment Big data in GIS Environment
Big data in GIS Environment
 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyUsing R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
 
On-the-fly Integration of Static and Dynamic Linked Data
On-the-fly Integration of Static and Dynamic Linked DataOn-the-fly Integration of Static and Dynamic Linked Data
On-the-fly Integration of Static and Dynamic Linked Data
 
Ross McDonald - PgRouting in QGIS
Ross McDonald - PgRouting in QGISRoss McDonald - PgRouting in QGIS
Ross McDonald - PgRouting in QGIS
 
Importing Data From Other Statistical Packages
Importing Data From Other Statistical PackagesImporting Data From Other Statistical Packages
Importing Data From Other Statistical Packages
 
R programming language in spatial analysis
R programming language in spatial analysisR programming language in spatial analysis
R programming language in spatial analysis
 
Guug11 mashing up-google_apps
Guug11 mashing up-google_appsGuug11 mashing up-google_apps
Guug11 mashing up-google_apps
 
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data
 

Ähnlich wie Big Data Processing in Pharo

Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)
MongoSF
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
NavNeet KuMar
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
ijwscjournal
 

Ähnlich wie Big Data Processing in Pharo (20)

BdT 3.1-3.6
BdT 3.1-3.6BdT 3.1-3.6
BdT 3.1-3.6
 
Distributed Cache With MapReduce
Distributed Cache With MapReduceDistributed Cache With MapReduce
Distributed Cache With MapReduce
 
E031201032036
E031201032036E031201032036
E031201032036
 
IRJET- Hadoop based Frequent Closed Item-Sets for Association Rules form ...
IRJET-  	  Hadoop based Frequent Closed Item-Sets for Association Rules form ...IRJET-  	  Hadoop based Frequent Closed Item-Sets for Association Rules form ...
IRJET- Hadoop based Frequent Closed Item-Sets for Association Rules form ...
 
Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map Reduce
 
Bulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduceBulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduce
 
T180304125129
T180304125129T180304125129
T180304125129
 
CDMA1X Pilot Panorama introduction
CDMA1X Pilot Panorama introductionCDMA1X Pilot Panorama introduction
CDMA1X Pilot Panorama introduction
 
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...
 
Analysis of parking citations mapreduce techniques
Analysis of parking citations   mapreduce techniquesAnalysis of parking citations   mapreduce techniques
Analysis of parking citations mapreduce techniques
 
FME Around the World
FME Around the WorldFME Around the World
FME Around the World
 
Fast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data AnalysisFast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data Analysis
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
 
Kudu as Storage Layer to Digitize Credit Processes
Kudu as Storage Layer to Digitize Credit ProcessesKudu as Storage Layer to Digitize Credit Processes
Kudu as Storage Layer to Digitize Credit Processes
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
 
FME Around the World (FME Trek Part 1): Ken Bragg - Safe Software FME World T...
FME Around the World (FME Trek Part 1): Ken Bragg - Safe Software FME World T...FME Around the World (FME Trek Part 1): Ken Bragg - Safe Software FME World T...
FME Around the World (FME Trek Part 1): Ken Bragg - Safe Software FME World T...
 
Bring Cartography to the Cloud
Bring Cartography to the CloudBring Cartography to the Cloud
Bring Cartography to the Cloud
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
 

Mehr von ESUG

Workshop: Identifying concept inventories in agile programming
Workshop: Identifying concept inventories in agile programmingWorkshop: Identifying concept inventories in agile programming
Workshop: Identifying concept inventories in agile programming
ESUG
 
The Pharo Debugger and Debugging tools: Advances and Roadmap
The Pharo Debugger and Debugging tools: Advances and RoadmapThe Pharo Debugger and Debugging tools: Advances and Roadmap
The Pharo Debugger and Debugging tools: Advances and Roadmap
ESUG
 
Migration process from monolithic to micro frontend architecture in mobile ap...
Migration process from monolithic to micro frontend architecture in mobile ap...Migration process from monolithic to micro frontend architecture in mobile ap...
Migration process from monolithic to micro frontend architecture in mobile ap...
ESUG
 
Analyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early resultsAnalyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early results
ESUG
 
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
ESUG
 
A Unit Test Metamodel for Test Generation
A Unit Test Metamodel for Test GenerationA Unit Test Metamodel for Test Generation
A Unit Test Metamodel for Test Generation
ESUG
 
Creating Unit Tests Using Genetic Programming
Creating Unit Tests Using Genetic ProgrammingCreating Unit Tests Using Genetic Programming
Creating Unit Tests Using Genetic Programming
ESUG
 
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution ModesThreaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
ESUG
 
Exploring GitHub Actions through EGAD: An Experience Report
Exploring GitHub Actions through EGAD: An Experience ReportExploring GitHub Actions through EGAD: An Experience Report
Exploring GitHub Actions through EGAD: An Experience Report
ESUG
 
Pharo: a reflective language A first systematic analysis of reflective APIs
Pharo: a reflective language A first systematic analysis of reflective APIsPharo: a reflective language A first systematic analysis of reflective APIs
Pharo: a reflective language A first systematic analysis of reflective APIs
ESUG
 
Improving Performance Through Object Lifetime Profiling: the DataFrame Case
Improving Performance Through Object Lifetime Profiling: the DataFrame CaseImproving Performance Through Object Lifetime Profiling: the DataFrame Case
Improving Performance Through Object Lifetime Profiling: the DataFrame Case
ESUG
 
Pharo DataFrame: Past, Present, and Future
Pharo DataFrame: Past, Present, and FuturePharo DataFrame: Past, Present, and Future
Pharo DataFrame: Past, Present, and Future
ESUG
 
A New Architecture Reconciling Refactorings and Transformations
A New Architecture Reconciling Refactorings and TransformationsA New Architecture Reconciling Refactorings and Transformations
A New Architecture Reconciling Refactorings and Transformations
ESUG
 

Mehr von ESUG (20)

Workshop: Identifying concept inventories in agile programming
Workshop: Identifying concept inventories in agile programmingWorkshop: Identifying concept inventories in agile programming
Workshop: Identifying concept inventories in agile programming
 
Technical documentation support in Pharo
Technical documentation support in PharoTechnical documentation support in Pharo
Technical documentation support in Pharo
 
The Pharo Debugger and Debugging tools: Advances and Roadmap
The Pharo Debugger and Debugging tools: Advances and RoadmapThe Pharo Debugger and Debugging tools: Advances and Roadmap
The Pharo Debugger and Debugging tools: Advances and Roadmap
 
Sequence: Pipeline modelling in Pharo
Sequence: Pipeline modelling in PharoSequence: Pipeline modelling in Pharo
Sequence: Pipeline modelling in Pharo
 
Migration process from monolithic to micro frontend architecture in mobile ap...
Migration process from monolithic to micro frontend architecture in mobile ap...Migration process from monolithic to micro frontend architecture in mobile ap...
Migration process from monolithic to micro frontend architecture in mobile ap...
 
Analyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early resultsAnalyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early results
 
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
 
A Unit Test Metamodel for Test Generation
A Unit Test Metamodel for Test GenerationA Unit Test Metamodel for Test Generation
A Unit Test Metamodel for Test Generation
 
Creating Unit Tests Using Genetic Programming
Creating Unit Tests Using Genetic ProgrammingCreating Unit Tests Using Genetic Programming
Creating Unit Tests Using Genetic Programming
 
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution ModesThreaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
 
Exploring GitHub Actions through EGAD: An Experience Report
Exploring GitHub Actions through EGAD: An Experience ReportExploring GitHub Actions through EGAD: An Experience Report
Exploring GitHub Actions through EGAD: An Experience Report
 
Pharo: a reflective language A first systematic analysis of reflective APIs
Pharo: a reflective language A first systematic analysis of reflective APIsPharo: a reflective language A first systematic analysis of reflective APIs
Pharo: a reflective language A first systematic analysis of reflective APIs
 
Garbage Collector Tuning
Garbage Collector TuningGarbage Collector Tuning
Garbage Collector Tuning
 
Improving Performance Through Object Lifetime Profiling: the DataFrame Case
Improving Performance Through Object Lifetime Profiling: the DataFrame CaseImproving Performance Through Object Lifetime Profiling: the DataFrame Case
Improving Performance Through Object Lifetime Profiling: the DataFrame Case
 
Pharo DataFrame: Past, Present, and Future
Pharo DataFrame: Past, Present, and FuturePharo DataFrame: Past, Present, and Future
Pharo DataFrame: Past, Present, and Future
 
thisContext in the Debugger
thisContext in the DebuggerthisContext in the Debugger
thisContext in the Debugger
 
Websockets for Fencing Score
Websockets for Fencing ScoreWebsockets for Fencing Score
Websockets for Fencing Score
 
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScriptShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
 
Advanced Object- Oriented Design Mooc
Advanced Object- Oriented Design MoocAdvanced Object- Oriented Design Mooc
Advanced Object- Oriented Design Mooc
 
A New Architecture Reconciling Refactorings and Transformations
A New Architecture Reconciling Refactorings and TransformationsA New Architecture Reconciling Refactorings and Transformations
A New Architecture Reconciling Refactorings and Transformations
 

Kürzlich hochgeladen

Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 

Kürzlich hochgeladen (20)

Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
 

Big Data Processing in Pharo

  • 1. Big Data Processing in Pharo Matteo Marra
 ESUG - August 2019 - Köln
  • 2. Matteo Marra - Big Data Processing in Pharo Big Data Frameworks 2 Programming Model To express large data processing jobs in terms of simple functions (e.g., Map/Reduce) Parallel Execution Model To execute in parallel computation on large amount of data
  • 3. Matteo Marra - Big Data Processing in Pharo Why Big Data?
 Google MapReduce 3 Index the WWW More than 20 billion web pages (> 400 TB) Do it fast With 1000 machines: < 3 hours, with one machine, 4 months. Keep it cheap Use a lot of recycled hardware, but with robust error recovery MapReduce: Simplified Data Processing on Large Clusters Dean J. and Ghemawat S. OSDI ‘04
  • 4. Matteo Marra - Big Data Processing in Pharo Rossi Abruzzo 1522706400 Verdi Toscana 1527976800 Bianchi Toscana 1525298400 Verdi Lombardia 1527976800 Gialli Toscana 1525298400 Rossi Abruzzo 1525298400 Gialli Veneto 1527976800 Bianchi Sardegna 1522706400 Gialli Toscana 1527976800 Bianchi Lombardia 1527976800 Bianchi Toscana 1522706400 Gialli Sicilia 1522706400 Gialli Sicilia 1527976800 Bianchi Sardegna 1527976800 Dataset KeyBy (Rossi, Abruzzo …) (Verdi, Toscana …) (Bianchi, Toscana …) (Verdi, Lombardia …) (Gialli, Toscana …) (Rossi, Abruzzo …) (Gialli, Veneto …) (Bianchi, Sardegna …) (Gialli, Toscana …) (Bianchi, Lombardia …) (Bianchi, Toscana …) (Gialli, Sicilia …) (Gialli, Sicilia …) (Bianchi, Sardegna …) Map NIL (Verdi, Toscana …) (Bianchi, Toscana …) NIL (Gialli, Toscana …) NIL NIL NIL (Gialli, Toscana …) NIL (Bianchi, Toscana …) NIL NIL NIL GroupBy (Verdi, Toscana) (Bianchi, Toscana …) (Bianchi, Toscana …) (Gialli, Toscana …) (Gialli, Toscana …) Reduce (Verdi, 1) (Bianchi, 2) (Gialli, 2) Map/Reduce by Example 4
  • 5. Matteo Marra - Big Data Processing in Pharo A Map/Reduce application in Pharo 5 VoteCountingApp>>map: aLine | splitted| (line includesSubstring: ‘Toscana') ifTrue: [ splitted := line substrings: ' '. (DateAndTime fromUnixTime: splitted at: 3) > DateAndTime yesterday ifTrue: [ ^ (splitted at: 2) -> 1 ] ]. ^ nil -> nil VoteCountingApp>>reduceByKey: values ^ values inject: 0 into: [:sum :current | sum + current] 1 2 3 4 5 6 7 8 9 10 11
  • 6. Matteo Marra - Big Data Processing in Pharo Port: a Map/Reduce Framework for Pharo 6 Apply Map and ReduceCoordinate
  • 7. Matteo Marra - Big Data Processing in Pharo Deploying Port 7 Local Deploy on one machine using multiple cores. Standalone Deploy on multiple machines in the same network. Yarn Mode Deploy on a cluster using Hadoop YARN.
  • 8. Matteo Marra - Big Data Processing in Pharo Using Port 8 app := VoteCountingApp new. port startMapReduceApplication: app on: data.
  • 9. Matteo Marra - Big Data Processing in Pharo Handling Results 9 VoteCountingApp>>handleResult: resultPair self storeToHDFS: resultPair Asynchronous Execution The application is executed asynchronously. Asynchronous Result Handling The result is handled asynchronously by the Map/Reduce Master
  • 10. Matteo Marra - Big Data Processing in Pharo From Map/Reduce to Spark 10 Distributed Collection Instead of executing Map/Reduce on a Dataset, you load the dataset in memory Broader API Not only execute map or reduce, but a broad set of methods applicable to the distributed collection In-Memory operations Avoid heavy storing of intermediate results by explicitly keep them in memory map: reduce: filter: aggregate: groupBy: map: reduce:
  • 11. Matteo Marra - Big Data Processing in Pharo The Spark-like API in Port 11 port startMapReduceApplication: app on: dataSet . data := port distribute: dataSet. VoteCountingApp>>map: aLine … result := data map: […]. result storeToHDFSPath: ‘…’ . VoteCountingApp>>handleResult: res self storeToHDFS: res VoteCountingApp>>run
  • 12. Matteo Marra - Big Data Processing in Pharo The Polls Analyser in Port (V2) 12 port := Port withServerURL: ‘http://myServerURL’. data := port distribute: dataSet. data := data filter: [:line | line includesSubstring: ‘Toscana’]. data map: [:line | splitted := line substrings: ' ' . (DateAndTime fromUnixTime: splitted at: 3) > DateAndTime yesterday ifTrue: [ (splitted at: 2) -> 1 ]] result := data reduceByKey: [:value :sum| sum + value] 1 2 3 4 5 6 7 8 result getCollection. result storeToHDFSPath: ‘…’
  • 13. Matteo Marra - Big Data Processing in Pharo DEMO 13
  • 14. Matteo Marra - Big Data Processing in Pharo Uses of Port 14 Blockchain Analysis In collaboration with Santiago at RMoD Lille. Full blockchain in 6 hours vs 2 days! Genetic algorithms In collaboration with Alexandre Bergel at Universidad de Chile
  • 15. Matteo Marra - Big Data Processing in Pharo Towards Live Big Data Development Tools 15 Online Debugging Debug the system as the bug is happening Global View Centralised debugging of the distributed system Update of the Running System Deploy code-fixes without restarting the whole system Isolation Debug the system without interfering with its execution
  • 16. Conclusion Matteo Marra - Vrije Universiteit Brussel mmarra@vub.be gitlab.soft.vub.ac.be/Marra github.com/Marmat21 Want to try it? CONTACT ME! mmarra@vub.be