C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors

•Als PPTX, PDF herunterladen•

0 gefällt mir•385 views

C-MR enables continuously executing MapReduce workflows on streams of data by using windows to subdivide streams into finite batches and a pull-based scheduling model. It provides a programming interface for defining MapReduce jobs on input/output streams and coordinating workflows. Evaluation shows C-MR can process streams with lower latency than batch systems by incrementally sharing computation across windows and using hybrid scheduling policies that prioritize oldest data first but also optimize memory usage.

Bildung

C-MR: Continuously Executing
MapReduce Workflows on Multi-
Core Processors

Speaker: LIN Qian
http://www.comp.nus.edu.sg/~linqian

Problem
• Stream applications are often time-critical
• Enabling stream support for MapReduce
jobs
– Simple for the Map operations
– Hard for the Reduce operations
• Continuously executing MapReduce
workflows requires a great deal of
coordination
1

C-MR Workflow

• Windows: temporal subdivisions of a stream
described by
– size (the amount of the stream spanning)
– slide (the interval between windows)
2

C-MR Programming Interface
• Map/Reduce operations

C-MR Programming Interface (cont.1)
• Input/Output streams

C-MR Programming Interface (cont.2)
• Create workflows of continuous
MapReduce jobs

C-MR vs. MapReduce
• MapReduce computing nodes receive a set of
Map or Reduce tasks and each node must wait
for all other nodes to complete their tasks
before being allocated additional tasks.
• C-MR uses pull-based data acquisition allowing
computing nodes to execute any Map or
Reduce workload as they are able. Thus,
straggling nodes will not hinder the progress of
the other nodes if there is data available to
process elsewhere in the workflow.
6

Stream and Window Management
• The merged output streams are not
guaranteed to retain their original
orderings.
• Solution: Replicating window-bounding
punctuations

Stream and Window Management (cont.1)

A node consumes the punctuation from the sorted input
stream-buffer
9

Stream and Window Management (cont.2)

Replicate that punctuation to the other nodes

Stream and Window Management (cont.3)

After all replicas are received at the intermediate buffer,
collect data whose timestamps fall into the applicable
interval and materialize them as a window

Operator Scheduling
• Scheduling framework
– Execute multiple policies simultaneously
– Transition between policies based on
resource availability
• Scheduling policies

Incremental Computation

Output1 = d1 + d2 + d3 + ... + dn
Output2 = d2 + d3 + d4 + ... + dn+1
Output3 = d3 + d4 + d5 + ... + dn+2
Output4 = d4 + d5 + d6 + ... + dn+3

Share the common data subset of computation

Evaluation
• Continuously executing a MapReduce job
– Compare with Phoenix++

14

Evaluation (cont.1)
• Operator scheduling
– Oldest data first (ODF)
– Best memory trade-off (MEM)
– Hybrid utilization of both policies

15

Evaluation (cont.2)
• Workflow optimization

16

Evaluation (cont.3)
• Workflow optimization
– Latency and throughput

17

Two Properties of Streams
• Unbounded
• Accessed sequentially

Hard to be handled using traditional DBMS

19

Query Operators
• Unbounded stateful operators
– maintain state with no upper bound in size
 run out of memory
• Blocking operators
– read an entire input before emitting a
single output
 might never produce a result
• Never use them, or
• Use them under a refactoring
20

Punctuations
• Mark the end of substreams
– allowing us to view an infinite stream as a
mixture of finite streams

21

Empfohlen

Adaptive Execution Support for Malleable ComputationQian Lin

An Enhanced MapReduce Model (on BSP)Yu Liu

Map reduce대호 김

"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...Adrian Florea

MapReduce: Simplified Data Processing On Large Clusterskazuma_sato

MapReduce : Simplified Data Processing on Large ClustersAbolfazl Asudeh

MapReduce ParadigmNilaNila16

MapReduce: Simplified Data Processing on Large ClustersAshraf Uddin

Empfohlen

Adaptive Execution Support for Malleable ComputationQian Lin

An Enhanced MapReduce Model (on BSP)Yu Liu

Map reduce대호 김

"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...Adrian Florea

MapReduce: Simplified Data Processing On Large Clusterskazuma_sato

MapReduce : Simplified Data Processing on Large ClustersAbolfazl Asudeh

MapReduce ParadigmNilaNila16

MapReduce: Simplified Data Processing on Large ClustersAshraf Uddin

Applications of FME in a Consultant EnvironmentSterling Geo

02 Map ReduceOmid Djoudi

Hadoop map reduce v2Subhas Kumar Ghosh

Memory Requirements for Convolutional Neural Network Hardware AcceleratorsSepidehShirkhanzadeh

2D_BitBlt_ScaleShereef Shehata

Assignment of Different-Sized Inputs in MapReduceShantanu Sharma

Vector computingSafayet Hossain

Memory management based on MCAAbhiSaxena16

Mapreduce scriptHaripritha

Unit3 MapReduceIntegral university, India

Aca2 08 newSumit Mittu

Computer center labManoj Jhawar

A BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SETcsandit

Aca2 09 newSumit Mittu

Hadoop Map Reduce OSVedant Mane

In-situ MapReduce for Log ProcessingQian Lin

Kineograph: Taking the Pulse of a Fast-Changing and Connected WorldQian Lin

C-Cube: Elastic Continuous Clustering in the CloudQian Lin

A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin

Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesQian Lin

Optimizing Virtual Machines Using Hybrid VirtualizationQian Lin

Trinity: A Distributed Graph Engine on a Memory CloudQian Lin

Weitere ähnliche Inhalte

Was ist angesagt?

Applications of FME in a Consultant EnvironmentSterling Geo

02 Map ReduceOmid Djoudi

Hadoop map reduce v2Subhas Kumar Ghosh

Memory Requirements for Convolutional Neural Network Hardware AcceleratorsSepidehShirkhanzadeh

2D_BitBlt_ScaleShereef Shehata

Assignment of Different-Sized Inputs in MapReduceShantanu Sharma

Vector computingSafayet Hossain

Memory management based on MCAAbhiSaxena16

Mapreduce scriptHaripritha

Unit3 MapReduceIntegral university, India

Aca2 08 newSumit Mittu

Computer center labManoj Jhawar

A BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SETcsandit

Aca2 09 newSumit Mittu

Hadoop Map Reduce OSVedant Mane

Was ist angesagt? (15)

Applications of FME in a Consultant Environment

02 Map Reduce

Hadoop map reduce v2

Memory Requirements for Convolutional Neural Network Hardware Accelerators

2D_BitBlt_Scale

Assignment of Different-Sized Inputs in MapReduce

Vector computing

Memory management based on MCA

Mapreduce script

Unit3 MapReduce

Aca2 08 new

Computer center lab

A BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SET

Aca2 09 new

Hadoop Map Reduce OS

Andere mochten auch

In-situ MapReduce for Log ProcessingQian Lin

Kineograph: Taking the Pulse of a Fast-Changing and Connected WorldQian Lin

C-Cube: Elastic Continuous Clustering in the CloudQian Lin

A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin

Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesQian Lin

Optimizing Virtual Machines Using Hybrid VirtualizationQian Lin

Trinity: A Distributed Graph Engine on a Memory CloudQian Lin

Andere mochten auch (7)

In-situ MapReduce for Log Processing

Kineograph: Taking the Pulse of a Fast-Changing and Connected World

C-Cube: Elastic Continuous Clustering in the Cloud

A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...

Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Optimizing Virtual Machines Using Hybrid Virtualization

Trinity: A Distributed Graph Engine on a Memory Cloud

Ähnlich wie C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors

Disco workshopspil-engineering

Cloud infrastructure. Google File System and MapReduce - Andrii VozniukAndrii Vozniuk

High Performance Computer ArchitectureSubhasis Dash

tryLamha Agarwal

Hadoop mapreduce and yarn frame work- unit5RojaT4

The Legion Programming Model for HPCinside-BigData.com

Intro to Big Data and NoSQLDon Demcsak

Pregel reading circlecharlingual

Spark Overview and Performance IssuesAntonios Katsarakis

Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks

Hadoop and SparkShravan (Sean) Pabba

Architectures for parallelSanjivani Sontakke

Introduction to map reduceM Baddar

Crash course on data streaming (with examples using Apache Flink)Vincenzo Gulisano

IBM MQ Disaster RecoveryMarkTaylorIBM

Lec 4 (program and network properties)Sudarshan Mondal

Leveraging Endpoint Flexibility in Data-Intensive ClustersRan Ziv

PregelWeiru Dai

Taming Latency: Case Studies in MapReduce Data AnalyticsEMC

Deterministic Memory Abstraction and Supporting Multicore System ArchitectureHeechul Yun

Ähnlich wie C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors (20)

Disco workshop

Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk

High Performance Computer Architecture

try

Hadoop mapreduce and yarn frame work- unit5

The Legion Programming Model for HPC

Intro to Big Data and NoSQL

Pregel reading circle

Spark Overview and Performance Issues

Apache Hadoop YARN - The Future of Data Processing with Hadoop

Hadoop and Spark

Architectures for parallel

Introduction to map reduce

Crash course on data streaming (with examples using Apache Flink)

IBM MQ Disaster Recovery

Lec 4 (program and network properties)

Leveraging Endpoint Flexibility in Data-Intensive Clusters

Pregel

Taming Latency: Case Studies in MapReduce Data Analytics

Deterministic Memory Abstraction and Supporting Multicore System Architecture

Kürzlich hochgeladen

Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic

Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh

Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIShubhangi Sonawane

ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43

1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh

The basics of sentences session 2pptx copy.pptxheathfieldcps1

1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh

Class 11th Physics NEET formula sheet pdfAyushMahapatra5

ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22

Activity 01 - Artificial Culture (1).pdfciinovamais

Nutritional Needs Presentation - HLTH 104misteraugie

microwave assisted reaction. General introductionMaksud Ahmed

General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil

Sociology 101 Demonstration of Learning Exhibitjbellavia9

Making and Justifying Mathematical Decisions.pdfChris Hunter

On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash

Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George

The basics of sentences session 3pptx.pptxheathfieldcps1

Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB

Advanced Views - Calendar View in Odoo 17Celine George

Kürzlich hochgeladen (20)

Key note speaker Neum_Admir Softic_ENG.pdf

Micro-Scholarship, What it is, How can it help me.pdf

Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II

ComPTIA Overview | Comptia Security+ Book SY0-701

1029-Danh muc Sach Giao Khoa khoi 6.pdf

The basics of sentences session 2pptx copy.pptx

1029 - Danh muc Sach Giao Khoa 10 . pdf

Class 11th Physics NEET formula sheet pdf

ICT Role in 21st Century Education & its Challenges.pptx

Activity 01 - Artificial Culture (1).pdf

Nutritional Needs Presentation - HLTH 104

microwave assisted reaction. General introduction

General Principles of Intellectual Property: Concepts of Intellectual Proper...

Sociology 101 Demonstration of Learning Exhibit

Making and Justifying Mathematical Decisions.pdf

On National Teacher Day, meet the 2024-25 Kenan Fellows

Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes

The basics of sentences session 3pptx.pptx

Beyond the EU: DORA and NIS 2 Directive's Global Impact

Advanced Views - Calendar View in Odoo 17

C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors

1. C-MR: Continuously Executing MapReduce Workflows on Multi- Core Processors Speaker: LIN Qian http://www.comp.nus.edu.sg/~linqian

2. Problem • Stream applications are often time-critical • Enabling stream support for MapReduce jobs – Simple for the Map operations – Hard for the Reduce operations • Continuously executing MapReduce workflows requires a great deal of coordination 1

3. C-MR Workflow • Windows: temporal subdivisions of a stream described by – size (the amount of the stream spanning) – slide (the interval between windows) 2

4. C-MR Programming Interface • Map/Reduce operations

5. C-MR Programming Interface (cont.1) • Input/Output streams

6. C-MR Programming Interface (cont.2) • Create workflows of continuous MapReduce jobs

7. C-MR vs. MapReduce • MapReduce computing nodes receive a set of Map or Reduce tasks and each node must wait for all other nodes to complete their tasks before being allocated additional tasks. • C-MR uses pull-based data acquisition allowing computing nodes to execute any Map or Reduce workload as they are able. Thus, straggling nodes will not hinder the progress of the other nodes if there is data available to process elsewhere in the workflow. 6

8. C-MR Architecture 7

9. Stream and Window Management • The merged output streams are not guaranteed to retain their original orderings. • Solution: Replicating window-bounding punctuations

10. Stream and Window Management (cont.1) A node consumes the punctuation from the sorted input stream-buffer 9

11. Stream and Window Management (cont.2) Replicate that punctuation to the other nodes

12. Stream and Window Management (cont.3) After all replicas are received at the intermediate buffer, collect data whose timestamps fall into the applicable interval and materialize them as a window

13. Operator Scheduling • Scheduling framework – Execute multiple policies simultaneously – Transition between policies based on resource availability • Scheduling policies

14. Incremental Computation Output1 = d1 + d2 + d3 + ... + dn Output2 = d2 + d3 + d4 + ... + dn+1 Output3 = d3 + d4 + d5 + ... + dn+2 Output4 = d4 + d5 + d6 + ... + dn+3 Share the common data subset of computation

15. Evaluation • Continuously executing a MapReduce job – Compare with Phoenix++ 14

16. Evaluation (cont.1) • Operator scheduling – Oldest data first (ODF) – Best memory trade-off (MEM) – Hybrid utilization of both policies 15

17. Evaluation (cont.2) • Workflow optimization 16

18. Evaluation (cont.3) • Workflow optimization – Latency and throughput 17

19. Thank you 18

20. Two Properties of Streams • Unbounded • Accessed sequentially Hard to be handled using traditional DBMS 19

21. Query Operators • Unbounded stateful operators – maintain state with no upper bound in size  run out of memory • Blocking operators – read an entire input before emitting a single output  might never produce a result • Never use them, or • Use them under a refactoring 20

22. Punctuations • Mark the end of substreams – allowing us to view an infinite stream as a mixture of finite streams 21

Hinweis der Redaktion

Repeatedly invoking a Phoenix++ MapReduce job over a stream results in many redundant computations (at both Map and Reduce operations). C-MR allows data to be processed only once by Map and the inclusion of the Combine operator significantly decreases redundant work performed at the Reduce operator.
1. Data is often generated from a source that can potentially produce an unbounded stream.2. A stream’s contents can only be accessed sequentially.Traditional queries are comprised of relational operators that assume a finite data source that can be accessed randomly.