SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Pig’s Map Reduce Execution xiafei.qiu@PCA
Agenda Data type Data structure Pig-Latin to Map-Reduce job compilation Physical Plan Execution UDF Invocation
Data Type Tuple An ordered list of Data. DefaultTuple has List<Object> mFields DataBag A collection of Tuples. Memory Manager calls spill() to spill to disk Map – Java Type Integer, Double, etc.. – Java Type
Data Structure
Map-Reduce Compilation Pig-Latin to Logical Plan Parser invoke logicalPlanBuilder Logical Plan to Physical Plan LogToPhyTranslationVisitor  group, distinct:LR-GR-Pack Join: LR-GR-JoinPack(with inner foreach)
Map-Reduce Compilation Physical Plan to Map-Reduce Plan A MROperator stands for a MR job Traverse in topological order If POLoad or GlobalRearrnge, new MR operator/job
Map-Reduce Compilation
Map-Reduce Compilation
Map Execution protectedvoid map(Text key, Tuple inpTuple, Context context) throws IOException, InterruptedException  {      //........... for (PhysicalOperator root : roots) { if (inIllustrator) { if (root != null) {                     root.attachInput(inpTuple); }             } else {                 root.attachInput(tf.newTupleNoCopy(inpTuple.getAll())); } } 		runPipeline(leaf); }
Map Execution protectedvoid runPipeline(PhysicalOperator leaf) throws IOException, InterruptedException { while(true){             Result res = leaf.getNext(DUMMYTUPLE); if(res.returnStatus==POStatus.STATUS_OK){                 collect(outputCollector,(Tuple)res.result); continue; } } //........... }
Reduce Execution protectedvoid reduce(PigNullableWritable key, Iterable<NullableTuple> tupIter, Context context)  throws IOException, InterruptedException  { //........... if (packinstanceofPOJoinPackage) { pack.attachInput(key, tupIter.iterator()); 		while (true) 		{ 		    if (processOnePackageOutput(context)) 			break; } 	} 	else { pack.attachInput(key, tupIter.iterator()); processOnePackageOutput(context); }  }
Reduce Execution publicbooleanprocessOnePackageOutput(Context oc)  throws IOException, InterruptedException  {     Result res = pack.getNext(DUMMYTUPLE); if(res.returnStatus==POStatus.STATUS_OK) {         Tuple packRes = (Tuple)res.result; 	   //........... for (int i = 0; i < roots.length; i++) { roots[i].attachInput(packRes); } runPipeline(leaf);     } if(res.returnStatus==POStatus.STATUS_NULL) { returnfalse; }     //........... if(res.returnStatus==POStatus.STATUS_EOP) { returntrue; }     returnfalse; }
Physical Plan Execution PhysicalPlan extends OperatorPlan<PhysicalOperator> Operation on Graph PhysicalOperator as vertex Each vertex has a group of getNext() methods processInput() if necessary
Physical Plan Execution 	public Result getNext(Tuple t) throwsExecException { //........... 	     Result res = new Result(); try { res.result = loader.getNext(); if(res.result==null){ res.returnStatus = POStatus.STATUS_EOP; tearDown(); } else res.returnStatus = POStatus.STATUS_OK; if (res.returnStatus == POStatus.STATUS_OK) res.result = illustratorMarkup(res, res.result, 0);         } catch (IOException e) { log.error("Received error from loader function: " + e); return res; } return res; }
Physical Plan Execution public Result getNext(Tuple t) throwsExecException {         Result res = null;         Result inp = null; while (true) { inp = processInput(); if (inp.returnStatus == POStatus.STATUS_EOP                     || inp.returnStatus == POStatus.STATUS_ERR) break; illustratorMarkup(inp.result, null, 0); // illustrator ignore LIMIT before the post processing if ((illustrator == null || illustrator.getOriginalLimit() != -1) && soFar>=mLimit) inp.returnStatus = POStatus.STATUS_EOP; soFar++; break; } returninp; }
UDF/Built-In Invocation POUserFunc

Weitere ähnliche Inhalte

Was ist angesagt?

pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
 
Secuencias Recursivas, Sucesiones Recursivas & Progresiones con Geogebra
Secuencias Recursivas, Sucesiones Recursivas & Progresiones con GeogebraSecuencias Recursivas, Sucesiones Recursivas & Progresiones con Geogebra
Secuencias Recursivas, Sucesiones Recursivas & Progresiones con GeogebraJose Perez
 
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...Altinity Ltd
 
Bruce Momjian - Inside PostgreSQL Shared Memory @ Postgres Open
Bruce Momjian - Inside PostgreSQL Shared Memory @ Postgres OpenBruce Momjian - Inside PostgreSQL Shared Memory @ Postgres Open
Bruce Momjian - Inside PostgreSQL Shared Memory @ Postgres OpenPostgresOpen
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxData
 

Was ist angesagt? (6)

pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
Secuencias Recursivas, Sucesiones Recursivas & Progresiones con Geogebra
Secuencias Recursivas, Sucesiones Recursivas & Progresiones con GeogebraSecuencias Recursivas, Sucesiones Recursivas & Progresiones con Geogebra
Secuencias Recursivas, Sucesiones Recursivas & Progresiones con Geogebra
 
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
 
Bruce Momjian - Inside PostgreSQL Shared Memory @ Postgres Open
Bruce Momjian - Inside PostgreSQL Shared Memory @ Postgres OpenBruce Momjian - Inside PostgreSQL Shared Memory @ Postgres Open
Bruce Momjian - Inside PostgreSQL Shared Memory @ Postgres Open
 
Full Text Search in PostgreSQL
Full Text Search in PostgreSQLFull Text Search in PostgreSQL
Full Text Search in PostgreSQL
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
 

Ähnlich wie Pig Map Reduce Execution

Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...CloudxLab
 
Droidjam 2019 flutter isolates pdf
Droidjam 2019 flutter isolates pdfDroidjam 2019 flutter isolates pdf
Droidjam 2019 flutter isolates pdfAnvith Bhat
 
Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章moai kids
 
05 pig user defined functions (udfs)
05 pig user defined functions (udfs)05 pig user defined functions (udfs)
05 pig user defined functions (udfs)Subhas Kumar Ghosh
 
Writable Foreign Data Wrapper (JPUG Unconference 16-Feb-2013)
Writable Foreign Data Wrapper (JPUG Unconference 16-Feb-2013)Writable Foreign Data Wrapper (JPUG Unconference 16-Feb-2013)
Writable Foreign Data Wrapper (JPUG Unconference 16-Feb-2013)Kohei KaiGai
 
エンタープライズ・クラウドと 並列・分散・非同期処理
エンタープライズ・クラウドと 並列・分散・非同期処理エンタープライズ・クラウドと 並列・分散・非同期処理
エンタープライズ・クラウドと 並列・分散・非同期処理maruyama097
 
Java 7 Launch Event at LyonJUG, Lyon France. Fork / Join framework and Projec...
Java 7 Launch Event at LyonJUG, Lyon France. Fork / Join framework and Projec...Java 7 Launch Event at LyonJUG, Lyon France. Fork / Join framework and Projec...
Java 7 Launch Event at LyonJUG, Lyon France. Fork / Join framework and Projec...julien.ponge
 
20140427 parallel programming_zlobin_lecture11
20140427 parallel programming_zlobin_lecture1120140427 parallel programming_zlobin_lecture11
20140427 parallel programming_zlobin_lecture11Computer Science Club
 
For problems 3 and 4, consider the following functions that implemen.pdf
For problems 3 and 4, consider the following functions that implemen.pdfFor problems 3 and 4, consider the following functions that implemen.pdf
For problems 3 and 4, consider the following functions that implemen.pdfanjandavid
 
Web Optimization Summit: Coding for Performance
Web Optimization Summit: Coding for PerformanceWeb Optimization Summit: Coding for Performance
Web Optimization Summit: Coding for Performancejohndaviddalton
 
オープンデータを使ったモバイルアプリ開発(応用編)
オープンデータを使ったモバイルアプリ開発(応用編)オープンデータを使ったモバイルアプリ開発(応用編)
オープンデータを使ったモバイルアプリ開発(応用編)Takayuki Goto
 
Gearmam, from the_worker's_perspective copy
Gearmam, from the_worker's_perspective copyGearmam, from the_worker's_perspective copy
Gearmam, from the_worker's_perspective copyBrian Aker
 
Gearmam, from the_worker's_perspective copy
Gearmam, from the_worker's_perspective copyGearmam, from the_worker's_perspective copy
Gearmam, from the_worker's_perspective copyBrian Aker
 
스위프트를 여행하는 히치하이커를 위한 스타일 안내
스위프트를 여행하는 히치하이커를 위한 스타일 안내스위프트를 여행하는 히치하이커를 위한 스타일 안내
스위프트를 여행하는 히치하이커를 위한 스타일 안내Jung Kim
 
Functional programming using underscorejs
Functional programming using underscorejsFunctional programming using underscorejs
Functional programming using underscorejs偉格 高
 
Столпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай МозговойСтолпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай МозговойSigma Software
 
Modern technologies in data science
Modern technologies in data science Modern technologies in data science
Modern technologies in data science Chucheng Hsieh
 

Ähnlich wie Pig Map Reduce Execution (20)

Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
 
Txjs
TxjsTxjs
Txjs
 
Droidjam 2019 flutter isolates pdf
Droidjam 2019 flutter isolates pdfDroidjam 2019 flutter isolates pdf
Droidjam 2019 flutter isolates pdf
 
Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章
 
Spark_Documentation_Template1
Spark_Documentation_Template1Spark_Documentation_Template1
Spark_Documentation_Template1
 
05 pig user defined functions (udfs)
05 pig user defined functions (udfs)05 pig user defined functions (udfs)
05 pig user defined functions (udfs)
 
Writable Foreign Data Wrapper (JPUG Unconference 16-Feb-2013)
Writable Foreign Data Wrapper (JPUG Unconference 16-Feb-2013)Writable Foreign Data Wrapper (JPUG Unconference 16-Feb-2013)
Writable Foreign Data Wrapper (JPUG Unconference 16-Feb-2013)
 
エンタープライズ・クラウドと 並列・分散・非同期処理
エンタープライズ・クラウドと 並列・分散・非同期処理エンタープライズ・クラウドと 並列・分散・非同期処理
エンタープライズ・クラウドと 並列・分散・非同期処理
 
Java 7 Launch Event at LyonJUG, Lyon France. Fork / Join framework and Projec...
Java 7 Launch Event at LyonJUG, Lyon France. Fork / Join framework and Projec...Java 7 Launch Event at LyonJUG, Lyon France. Fork / Join framework and Projec...
Java 7 Launch Event at LyonJUG, Lyon France. Fork / Join framework and Projec...
 
20140427 parallel programming_zlobin_lecture11
20140427 parallel programming_zlobin_lecture1120140427 parallel programming_zlobin_lecture11
20140427 parallel programming_zlobin_lecture11
 
For problems 3 and 4, consider the following functions that implemen.pdf
For problems 3 and 4, consider the following functions that implemen.pdfFor problems 3 and 4, consider the following functions that implemen.pdf
For problems 3 and 4, consider the following functions that implemen.pdf
 
Web Optimization Summit: Coding for Performance
Web Optimization Summit: Coding for PerformanceWeb Optimization Summit: Coding for Performance
Web Optimization Summit: Coding for Performance
 
オープンデータを使ったモバイルアプリ開発(応用編)
オープンデータを使ったモバイルアプリ開発(応用編)オープンデータを使ったモバイルアプリ開発(応用編)
オープンデータを使ったモバイルアプリ開発(応用編)
 
Gearmam, from the_worker's_perspective copy
Gearmam, from the_worker's_perspective copyGearmam, from the_worker's_perspective copy
Gearmam, from the_worker's_perspective copy
 
Gearmam, from the_worker's_perspective copy
Gearmam, from the_worker's_perspective copyGearmam, from the_worker's_perspective copy
Gearmam, from the_worker's_perspective copy
 
스위프트를 여행하는 히치하이커를 위한 스타일 안내
스위프트를 여행하는 히치하이커를 위한 스타일 안내스위프트를 여행하는 히치하이커를 위한 스타일 안내
스위프트를 여행하는 히치하이커를 위한 스타일 안내
 
Functional programming using underscorejs
Functional programming using underscorejsFunctional programming using underscorejs
Functional programming using underscorejs
 
Столпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай МозговойСтолпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай Мозговой
 
In kor we Trust
In kor we TrustIn kor we Trust
In kor we Trust
 
Modern technologies in data science
Modern technologies in data science Modern technologies in data science
Modern technologies in data science
 

Kürzlich hochgeladen

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Kürzlich hochgeladen (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Pig Map Reduce Execution

  • 1. Pig’s Map Reduce Execution xiafei.qiu@PCA
  • 2. Agenda Data type Data structure Pig-Latin to Map-Reduce job compilation Physical Plan Execution UDF Invocation
  • 3. Data Type Tuple An ordered list of Data. DefaultTuple has List<Object> mFields DataBag A collection of Tuples. Memory Manager calls spill() to spill to disk Map – Java Type Integer, Double, etc.. – Java Type
  • 5. Map-Reduce Compilation Pig-Latin to Logical Plan Parser invoke logicalPlanBuilder Logical Plan to Physical Plan LogToPhyTranslationVisitor group, distinct:LR-GR-Pack Join: LR-GR-JoinPack(with inner foreach)
  • 6. Map-Reduce Compilation Physical Plan to Map-Reduce Plan A MROperator stands for a MR job Traverse in topological order If POLoad or GlobalRearrnge, new MR operator/job
  • 9. Map Execution protectedvoid map(Text key, Tuple inpTuple, Context context) throws IOException, InterruptedException { //........... for (PhysicalOperator root : roots) { if (inIllustrator) { if (root != null) { root.attachInput(inpTuple); } } else { root.attachInput(tf.newTupleNoCopy(inpTuple.getAll())); } } runPipeline(leaf); }
  • 10. Map Execution protectedvoid runPipeline(PhysicalOperator leaf) throws IOException, InterruptedException { while(true){ Result res = leaf.getNext(DUMMYTUPLE); if(res.returnStatus==POStatus.STATUS_OK){ collect(outputCollector,(Tuple)res.result); continue; } } //........... }
  • 11. Reduce Execution protectedvoid reduce(PigNullableWritable key, Iterable<NullableTuple> tupIter, Context context) throws IOException, InterruptedException { //........... if (packinstanceofPOJoinPackage) { pack.attachInput(key, tupIter.iterator()); while (true) { if (processOnePackageOutput(context)) break; } } else { pack.attachInput(key, tupIter.iterator()); processOnePackageOutput(context); } }
  • 12. Reduce Execution publicbooleanprocessOnePackageOutput(Context oc) throws IOException, InterruptedException { Result res = pack.getNext(DUMMYTUPLE); if(res.returnStatus==POStatus.STATUS_OK) { Tuple packRes = (Tuple)res.result; //........... for (int i = 0; i < roots.length; i++) { roots[i].attachInput(packRes); } runPipeline(leaf); } if(res.returnStatus==POStatus.STATUS_NULL) { returnfalse; } //........... if(res.returnStatus==POStatus.STATUS_EOP) { returntrue; } returnfalse; }
  • 13. Physical Plan Execution PhysicalPlan extends OperatorPlan<PhysicalOperator> Operation on Graph PhysicalOperator as vertex Each vertex has a group of getNext() methods processInput() if necessary
  • 14. Physical Plan Execution public Result getNext(Tuple t) throwsExecException { //........... Result res = new Result(); try { res.result = loader.getNext(); if(res.result==null){ res.returnStatus = POStatus.STATUS_EOP; tearDown(); } else res.returnStatus = POStatus.STATUS_OK; if (res.returnStatus == POStatus.STATUS_OK) res.result = illustratorMarkup(res, res.result, 0); } catch (IOException e) { log.error("Received error from loader function: " + e); return res; } return res; }
  • 15. Physical Plan Execution public Result getNext(Tuple t) throwsExecException { Result res = null; Result inp = null; while (true) { inp = processInput(); if (inp.returnStatus == POStatus.STATUS_EOP || inp.returnStatus == POStatus.STATUS_ERR) break; illustratorMarkup(inp.result, null, 0); // illustrator ignore LIMIT before the post processing if ((illustrator == null || illustrator.getOriginalLimit() != -1) && soFar>=mLimit) inp.returnStatus = POStatus.STATUS_EOP; soFar++; break; } returninp; }