Data Processing in MapReduceV1

•

0 gefällt mir•453 views

This presentation explains the Life Cycle of Classic MapReduce1 Job" and it's working as an efficient data processing model. For more detailed info, kindly visit http://hashprompt.blogspot.in/search/label/Hadoop

Technologie

JOB INITIALIZATION & TASK ASSIGNMENT
# HashPrompt

SHUFFLE / SORT - MAP SIDE
Mappers are run on unsorted data in input splits.
Mappers generate multiple key/value pairs.
Mapper
Mapper
Mapper
Mapper
Unsorted Data Key/Value Pairs
Mappers write the output to circular memory buffer (default size=100MB).
Circular
Memory
Buffer
The buffer spills the output after reaching the threshold limit (default=80MB).
Spilled data is first partitioned.
Circular
Memory
Buffer
Circular
Memory
Buffer
Circular
Memory
Buffer
PartitionSort
Data in each partition is sorted by grouping the same keys together (k1,v1) (k1,v1).
Data in each partition is combined with the same key together (k1,v2).
Combine
Finally is the ouput is spilled to the local disk of tasktracker
Spill to
Local
Disk
Input Splits
Map
Outputs
# HashPrompt

SHUFFLE / SORT - REDUCE SIDE
Map Outputs Reduce Inputs
Sort / Copy
MergeTasktracker
Local Disk
Tasktracker
Local Disk
Reducer
Reducer
Reducer
Reducer
Reducer
Reducer
Reducer
Reducer
Reducer
Reduce Outputs
Tasktracker
Local Disk
# HashPrompt

Input Files
HDFS
Input Split 1
Input Split 2
Input Split 3
Input Split 4
Map Tasks
Reduce Tasks
Input Split 1
Mapper
Partition 1
Sort
Partition 2
Partition 3
Input
Spill to Disk Spill to
Disk
Merge
Reduce
Output
Ouput Files
SUMMARY OF MAPREDUCE DATA
FLOW
Shuffle
& Sort
1. Client stores input
files into HDFS
2. Client submits job
3. Input files are
split by the client
5. Input splits are assigned map
tasks by jobtracker
5. Map outputs
are sorted and
shuffled.
User MapReduce
Application
Job Client
Client JVM
Client Node
Datanode JVM
Datanode
Child JVM
Map Task
Child JVM
Reduce
Task
Tasktracker JVM
Launch
Namenode JVM
Namenode
Jobtracker JVM
4. Jobtracker retrieves
input splits
# HashPrompt

Empfohlen

SQL vs. NoSQL DatabasesOsama Jomaa

10 Common Hadoop-able Problems WebinarCloudera, Inc.

Sql vs NoSQLRTigger

Apache metron - An IntroductionBaban Gaigole

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

Empfohlen

SQL vs. NoSQL DatabasesOsama Jomaa

10 Common Hadoop-able Problems WebinarCloudera, Inc.

Sql vs NoSQLRTigger

Apache metron - An IntroductionBaban Gaigole

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot

Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya

ICT role in 21st century education and its challengesrafiqahmad00786416

Elevate Developer Efficiency & build GenAI Application with Amazon QBhuvaneswari Subramani

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

FWD Group - Insurer Innovation Award 2024The Digital Insurer

MINDCTI Revenue Release Quarter One 2024MIND CTI

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub

Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays

Architecting Cloud Native ApplicationsWSO2

DBX First Quarter 2024 Investor PresentationDropbox

[BuildWithAI] Introduction to Gemini.pdfSandro Moreira

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

Weitere ähnliche Inhalte

Kürzlich hochgeladen

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot

Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya

ICT role in 21st century education and its challengesrafiqahmad00786416

Elevate Developer Efficiency & build GenAI Application with Amazon QBhuvaneswari Subramani

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

FWD Group - Insurer Innovation Award 2024The Digital Insurer

MINDCTI Revenue Release Quarter One 2024MIND CTI

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub

Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays

Architecting Cloud Native ApplicationsWSO2

DBX First Quarter 2024 Investor PresentationDropbox

[BuildWithAI] Introduction to Gemini.pdfSandro Moreira

Kürzlich hochgeladen (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

How to Troubleshoot Apps for the Modern Connected Worker

Artificial Intelligence Chap.5 : Uncertainty

ICT role in 21st century education and its challenges

Elevate Developer Efficiency & build GenAI Application with Amazon Q

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

presentation ICT roal in 21st century education

Introduction to Multilingual Retrieval Augmented Generation (RAG)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

FWD Group - Insurer Innovation Award 2024

MINDCTI Revenue Release Quarter One 2024

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

Architecting Cloud Native Applications

DBX First Quarter 2024 Investor Presentation

[BuildWithAI] Introduction to Gemini.pdf

Empfohlen

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools

12 Ways to Increase Your Influence at WorkGetSmarter

ChatGPT webinar slidesAlireza Esmikhani

More than Just Lines on a Map: Best Practices for U.S Bike RoutesProject for Public Spaces & National Center for Biking and Walking

Empfohlen (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...

12 Ways to Increase Your Influence at Work

ChatGPT webinar slides

More than Just Lines on a Map: Best Practices for U.S Bike Routes

Data Processing in MapReduceV1

1. MAPREDUCE ANATOMY # HashPrompt

2. JOB SUBMISSION # HashPrompt

3. # HashPrompt

4. JOB INITIALIZATION & TASK ASSIGNMENT # HashPrompt

5. TASK EXECUTION # HashPrompt

6. SHUFFLE / SORT - MAP SIDE Mappers are run on unsorted data in input splits. Mappers generate multiple key/value pairs. Mapper Mapper Mapper Mapper Unsorted Data Key/Value Pairs Mappers write the output to circular memory buffer (default size=100MB). Circular Memory Buffer The buffer spills the output after reaching the threshold limit (default=80MB). Spilled data is first partitioned. Circular Memory Buffer Circular Memory Buffer Circular Memory Buffer PartitionSort Data in each partition is sorted by grouping the same keys together (k1,v1) (k1,v1). Data in each partition is combined with the same key together (k1,v2). Combine Finally is the ouput is spilled to the local disk of tasktracker Spill to Local Disk Input Splits Map Outputs # HashPrompt

7. SHUFFLE / SORT - REDUCE SIDE Map Outputs Reduce Inputs Sort / Copy MergeTasktracker Local Disk Tasktracker Local Disk Reducer Reducer Reducer Reducer Reducer Reducer Reducer Reducer Reducer Reduce Outputs Tasktracker Local Disk # HashPrompt

8. Input Files HDFS Input Split 1 Input Split 2 Input Split 3 Input Split 4 Map Tasks Reduce Tasks Input Split 1 Mapper Partition 1 Sort Partition 2 Partition 3 Input Spill to Disk Spill to Disk Merge Reduce Output Ouput Files SUMMARY OF MAPREDUCE DATA FLOW Shuffle & Sort 1. Client stores input files into HDFS 2. Client submits job 3. Input files are split by the client 5. Input splits are assigned map tasks by jobtracker 5. Map outputs are sorted and shuffled. User MapReduce Application Job Client Client JVM Client Node Datanode JVM Datanode Child JVM Map Task Child JVM Reduce Task Tasktracker JVM Launch Namenode JVM Namenode Jobtracker JVM 4. Jobtracker retrieves input splits # HashPrompt