2. Agenda Data type Data structure Pig-Latin to Map-Reduce job compilation Physical Plan Execution UDF Invocation
3. Data Type Tuple An ordered list of Data. DefaultTuple has List<Object> mFields DataBag A collection of Tuples. Memory Manager calls spill() to spill to disk Map – Java Type Integer, Double, etc.. – Java Type
5. Map-Reduce Compilation Pig-Latin to Logical Plan Parser invoke logicalPlanBuilder Logical Plan to Physical Plan LogToPhyTranslationVisitor group, distinct:LR-GR-Pack Join: LR-GR-JoinPack(with inner foreach)
6. Map-Reduce Compilation Physical Plan to Map-Reduce Plan A MROperator stands for a MR job Traverse in topological order If POLoad or GlobalRearrnge, new MR operator/job
12. Reduce Execution publicbooleanprocessOnePackageOutput(Context oc) throws IOException, InterruptedException { Result res = pack.getNext(DUMMYTUPLE); if(res.returnStatus==POStatus.STATUS_OK) { Tuple packRes = (Tuple)res.result; //........... for (int i = 0; i < roots.length; i++) { roots[i].attachInput(packRes); } runPipeline(leaf); } if(res.returnStatus==POStatus.STATUS_NULL) { returnfalse; } //........... if(res.returnStatus==POStatus.STATUS_EOP) { returntrue; } returnfalse; }
13. Physical Plan Execution PhysicalPlan extends OperatorPlan<PhysicalOperator> Operation on Graph PhysicalOperator as vertex Each vertex has a group of getNext() methods processInput() if necessary
14. Physical Plan Execution public Result getNext(Tuple t) throwsExecException { //........... Result res = new Result(); try { res.result = loader.getNext(); if(res.result==null){ res.returnStatus = POStatus.STATUS_EOP; tearDown(); } else res.returnStatus = POStatus.STATUS_OK; if (res.returnStatus == POStatus.STATUS_OK) res.result = illustratorMarkup(res, res.result, 0); } catch (IOException e) { log.error("Received error from loader function: " + e); return res; } return res; }
15. Physical Plan Execution public Result getNext(Tuple t) throwsExecException { Result res = null; Result inp = null; while (true) { inp = processInput(); if (inp.returnStatus == POStatus.STATUS_EOP || inp.returnStatus == POStatus.STATUS_ERR) break; illustratorMarkup(inp.result, null, 0); // illustrator ignore LIMIT before the post processing if ((illustrator == null || illustrator.getOriginalLimit() != -1) && soFar>=mLimit) inp.returnStatus = POStatus.STATUS_EOP; soFar++; break; } returninp; }