5. Apache Pig Data Processing
1. Schema on Read policy
2. No Meta Store
3. Processing control
4. Set Mappers
5. Integrations
6. File Formats
7. User defined functions
8. Script execution plan
9. Data set splitting
PIG Supports
9. Apache Pig First Script
A = LOAD '/user/mapr/training/pig/emp.csv' USING
PigStorage(',') AS (id, firstname, lastname, designation,
city);
STORE A INTO '/user/mapr/training/pig/output';
DUMP A INTO '/user/mapr/training/pig/output';
10. Apache Pig Example Scripts
X = LOAD '/user/mapr/training/pig/emp_pig1.csv' USING PigStorage(',') AS
(id, firstname, lastname, designation, city);
Y = LOAD '/user/mapr/training/pig/emp_pig2.csv' USING PigStorage(',') AS
(id, firstname, lastname, designation, city);
Z = JOIN X by (designation), Y BY (designation);
final = FILTER Z by X::designation MATCHES 'Manager';
A = GROUP X BY city;
B = FOREACH X GENERATE id, designation;
STORE final INTO '/user/mapr/training/pig/output';
PIG – More Samples