SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Downloaden Sie, um offline zu lesen
PIG STATEMENTS IN HADOOP
What is Pig in hadoop ?
Pig is a platform for analyzing large dataset that
consist of high-level language for expressing data
analysis programs.
Originally Created by Yahoo! to answer an in-house data analysis
requirement.
 Pig is a Dataflow language
•Language is called Pig Latin.
•Relatively simple syntax
•Very easy for SQL developers to learn and understand the
language.
•Under the cover, Pig Latin Scripts are converted into Map-
Reduce job and executed on the cluster.
data = <1 , {<2,3>,<4,5>,<6,7>},["key":"value"]>
Method Example Result
Position $0 1
Name field2 bag{<2,3>,<4,5>,<6,7>}
Projection field2.$1 bag{<3>,<5>,<7>}
Function AVG(field2.$0) (2+4+6)/3=4
Conditional field1 == 1? 'yes' : 'no' yes
Lookup field3#'key' value
• Collection of statements
• Statements built using operators,
expressions and return relations.
• Data in Relations:
• Atom * Tuple * Bag * Map –
Field
DATA PROCESS COMBINE VIEW
LOAD FILTER JOIN ORDER
DUMP FOREACH GROUP LIMIT
STORE DISTINCT COGROUP UNION
SAMPLE CROSS SPLIT
Common Operations
Pig Latin
Let’s start with PIG…type pig in to terminal
LOAD :
bag/relation path to the i/p file hdfs/local delimiter
A = LOAD “sample.txt” USING PigStorage(‘,’)
AS (id:int, Name:chararray, Addr:chararray);
column name with data type stmt complete
LOAD is use to load data from hdfs/local file system to pig
bag/relation.
DUMP :
display data name of the relation
DUMP A ;
DUMP is used to send the result to screen.
STORE :
name of the relation path to the o/p file hdfs/local
STORE A INTO ‘hdfs:/data/result’ USING
PigStorage(‘:’); store data by “:” separated
STORE is used to store/dump data into the cluster HDFS or
Local file system .
FILTER :
name of the column filter address by PUNE city
B = FILTER A BY Addr = = ‘PUNE’;
FILTER is like WHERE clause in SQL , used to filter relation
by given conditions.
FOREACH :
for each record into the bag can take only Name and Addr
from bag
C = FOREACH A GENERATE Name , Addr;
FOREACH GENERATE is used to add or
remove fields from the relation.
DISTINCT:
D = DISTINCT A ;
DISTINCT is used removes duplicate records. It
works only on entire records, not on individual
fields.
SAMPLE:
Sample form D relation 0.1% data
E = SAMPLE D 0.1 ;
To get a sample of your data. It reads through all of your data
but returns only a percentage of rows.
JOIN:
col_name of first relation
F = JOIN A BY Name, C BY Name ;
col_name of second relation
JOIN is used to join relations on given fields.
GROUP:
col_names
G = GROUP A BY (Name , Addr);
GROUP is used to group related data into one group, you can
perform group operation on multiple fields.
COGROUP:
col_name of first relation
H = COGROUP A BY Name, C BY Name ;
col_name of second relation
COGROUP is a generalization of group. Instead of collecting
records of one input based on a key, it collects records of n
inputs based on a key. The result is a record with a key and one
bag for each input.
CROSS:
first relation
I = CROSS A , C ;
second relation
CROSS matches the mathematical set operation of the same
name.
ORDER:
second column
J = ORDER A BY $1 DESC;
ORDER used to sort the relation by one or more fields.
LIMIT:
10 records from A relation
K = LIMIT A 10;
LIMIT used to limits the size of a relation to maximum number
of tuples.
UNION:
relations
L = UNION A,B,C,D;
UNION is used to combine one or more relation into one.
Sometimes you want to put two data sets together by
concatenating them instead of joining them. Pig Latin provides
union for this purpose.
SPLIT:
M = LOAD ‘sample1.txt’ AS (ID:INT, NAME:CAHRARRAY, DOB:CHARARRAY);
--Our date format like “20140126”
N = SPLIT M INTO
Month1 IF SUBSTRING (DOB, 4, 6) ==“01”,
Month2 IF SUBSTRING (DOB, 4, 6) ==“02”,
Month3 IF SUBSTRING (DOB, 4, 6) ==“03”,
RestMonts SPLITREST IF SUBSTRING (DOB, 4, 6) != ‘01’
|| ‘02’ || ‘03’ ;
Pig Latin also supports splitting data in relations and create
multiple new relations on the basis of it. It splits the relation
into two or more relations.
PIG FUNCTIONS
AVG:
A = LOAD ‘sample2’ AS(id:int, Fname:chararray, Lname:chararray,
marks:chararray);
B = FOREACH A GENERATE A.Fname, AVG(A.marks);
CONCAT:
C = FOREACH A GENERATE CONCAT(Fname,Lname);
CONT:
D = FOREACH B GENERATE CONT(A);
IsEmpty:
E = Filter B BY IsEmpty(marks);
MAX:
F = FOREACH A GENERATE MAX(marks);
MIN:
F = FOREACH A GENERATE MIN(A.marks);
SUM:
F = FOREACH A GENERATE SUM(A.marks);
TOKENIZE: Splits a string and outputs a bag of words.
F = FOREACH A GENERATE TOKENIZE(Fname);
Ganesh L. Sanap
connectoganesh@gmail.com

Weitere ähnliche Inhalte

Was ist angesagt?

Perl names values and variables
Perl names values and variablesPerl names values and variables
Perl names values and variablessana mateen
 
Genomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using RGenomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using RAvjinder (Avi) Kaler
 
Data Structure Lecture 6
Data Structure Lecture 6Data Structure Lecture 6
Data Structure Lecture 6Teksify
 
Dbms fundamentals
Dbms fundamentalsDbms fundamentals
Dbms fundamentalsvenkatme83
 
Getting Started with Regular Expressions In MarcEdit
Getting Started with Regular Expressions In MarcEditGetting Started with Regular Expressions In MarcEdit
Getting Started with Regular Expressions In MarcEditTerry Reese
 
Basic Tutorial of Association Mapping by Avjinder Kaler
Basic Tutorial of Association Mapping by Avjinder KalerBasic Tutorial of Association Mapping by Avjinder Kaler
Basic Tutorial of Association Mapping by Avjinder KalerAvjinder (Avi) Kaler
 
Linked List, Types of Linked LIst, Various Operations, Applications of Linked...
Linked List, Types of Linked LIst, Various Operations, Applications of Linked...Linked List, Types of Linked LIst, Various Operations, Applications of Linked...
Linked List, Types of Linked LIst, Various Operations, Applications of Linked...Balwant Gorad
 
Import Data using R
Import Data using R Import Data using R
Import Data using R Rupak Roy
 
Manipulating Data using DPLYR in R Studio
Manipulating Data using DPLYR in R StudioManipulating Data using DPLYR in R Studio
Manipulating Data using DPLYR in R StudioRupak Roy
 
Hashing In Data Structure
Hashing In Data Structure Hashing In Data Structure
Hashing In Data Structure Meghaj Mallick
 
Data structure & its types
Data structure & its typesData structure & its types
Data structure & its typesRameesha Sadaqat
 
Regular Expressions in JavaScript and Command Line
Regular Expressions in JavaScript and Command LineRegular Expressions in JavaScript and Command Line
Regular Expressions in JavaScript and Command LineMandi Grant
 
Tutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using RTutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using RAvjinder (Avi) Kaler
 

Was ist angesagt? (19)

R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
 
Perl names values and variables
Perl names values and variablesPerl names values and variables
Perl names values and variables
 
Hashing
HashingHashing
Hashing
 
Genomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using RGenomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using R
 
Data Structure Lecture 6
Data Structure Lecture 6Data Structure Lecture 6
Data Structure Lecture 6
 
Dbms fundamentals
Dbms fundamentalsDbms fundamentals
Dbms fundamentals
 
Db fund
Db fundDb fund
Db fund
 
Getting Started with Regular Expressions In MarcEdit
Getting Started with Regular Expressions In MarcEditGetting Started with Regular Expressions In MarcEdit
Getting Started with Regular Expressions In MarcEdit
 
Basic Tutorial of Association Mapping by Avjinder Kaler
Basic Tutorial of Association Mapping by Avjinder KalerBasic Tutorial of Association Mapping by Avjinder Kaler
Basic Tutorial of Association Mapping by Avjinder Kaler
 
Array
ArrayArray
Array
 
Linked List, Types of Linked LIst, Various Operations, Applications of Linked...
Linked List, Types of Linked LIst, Various Operations, Applications of Linked...Linked List, Types of Linked LIst, Various Operations, Applications of Linked...
Linked List, Types of Linked LIst, Various Operations, Applications of Linked...
 
Import Data using R
Import Data using R Import Data using R
Import Data using R
 
Manipulating Data using DPLYR in R Studio
Manipulating Data using DPLYR in R StudioManipulating Data using DPLYR in R Studio
Manipulating Data using DPLYR in R Studio
 
Singly & Circular Linked list
Singly & Circular Linked listSingly & Circular Linked list
Singly & Circular Linked list
 
Hashing In Data Structure
Hashing In Data Structure Hashing In Data Structure
Hashing In Data Structure
 
Data structure & its types
Data structure & its typesData structure & its types
Data structure & its types
 
Hashing
HashingHashing
Hashing
 
Regular Expressions in JavaScript and Command Line
Regular Expressions in JavaScript and Command LineRegular Expressions in JavaScript and Command Line
Regular Expressions in JavaScript and Command Line
 
Tutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using RTutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using R
 

Andere mochten auch

High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinPietro Michiardi
 
Flume and Flive Introduction
Flume and Flive IntroductionFlume and Flive Introduction
Flume and Flive IntroductionHanborq Inc.
 
Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Yahoo Developer Network
 
Hadoop, Pig, and Python (PyData NYC 2012)
Hadoop, Pig, and Python (PyData NYC 2012)Hadoop, Pig, and Python (PyData NYC 2012)
Hadoop, Pig, and Python (PyData NYC 2012)mortardata
 
Practical Hadoop using Pig
Practical Hadoop using PigPractical Hadoop using Pig
Practical Hadoop using PigDavid Wellman
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingMitsuharu Hamba
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Nathan Bijnens
 
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010Cloudera, Inc.
 
Hadoop and pig at twitter (oscon 2010)
Hadoop and pig at twitter (oscon 2010)Hadoop and pig at twitter (oscon 2010)
Hadoop and pig at twitter (oscon 2010)Kevin Weil
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Hadoop User Group
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaEdureka!
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and PigRicardo Varela
 
Hadoop et son écosystème
Hadoop et son écosystèmeHadoop et son écosystème
Hadoop et son écosystèmeKhanh Maudoux
 
Big Data: Concepts, techniques et démonstration de Apache Hadoop
Big Data: Concepts, techniques et démonstration de Apache HadoopBig Data: Concepts, techniques et démonstration de Apache Hadoop
Big Data: Concepts, techniques et démonstration de Apache Hadoophajlaoui jaleleddine
 
Pig and Python to Process Big Data
Pig and Python to Process Big DataPig and Python to Process Big Data
Pig and Python to Process Big DataShawn Hermans
 
Apache Cassandra - Concepts et fonctionnalités
Apache Cassandra - Concepts et fonctionnalitésApache Cassandra - Concepts et fonctionnalités
Apache Cassandra - Concepts et fonctionnalitésRomain Hardouin
 

Andere mochten auch (18)

High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig Latin
 
Flume and Flive Introduction
Flume and Flive IntroductionFlume and Flive Introduction
Flume and Flive Introduction
 
Apache pig
Apache pigApache pig
Apache pig
 
Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010
 
Hadoop, Pig, and Python (PyData NYC 2012)
Hadoop, Pig, and Python (PyData NYC 2012)Hadoop, Pig, and Python (PyData NYC 2012)
Hadoop, Pig, and Python (PyData NYC 2012)
 
Practical Hadoop using Pig
Practical Hadoop using PigPractical Hadoop using Pig
Practical Hadoop using Pig
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
 
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010
 
Hadoop and pig at twitter (oscon 2010)
Hadoop and pig at twitter (oscon 2010)Hadoop and pig at twitter (oscon 2010)
Hadoop and pig at twitter (oscon 2010)
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Hadoop et son écosystème
Hadoop et son écosystèmeHadoop et son écosystème
Hadoop et son écosystème
 
Big Data: Concepts, techniques et démonstration de Apache Hadoop
Big Data: Concepts, techniques et démonstration de Apache HadoopBig Data: Concepts, techniques et démonstration de Apache Hadoop
Big Data: Concepts, techniques et démonstration de Apache Hadoop
 
Pig and Python to Process Big Data
Pig and Python to Process Big DataPig and Python to Process Big Data
Pig and Python to Process Big Data
 
Un introduction à Pig
Un introduction à PigUn introduction à Pig
Un introduction à Pig
 
Apache Cassandra - Concepts et fonctionnalités
Apache Cassandra - Concepts et fonctionnalitésApache Cassandra - Concepts et fonctionnalités
Apache Cassandra - Concepts et fonctionnalités
 

Ähnlich wie Pig statements

Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017Parth Khare
 
Coclustering Base Classification For Out Of Domain Documents
Coclustering Base Classification For Out Of Domain DocumentsCoclustering Base Classification For Out Of Domain Documents
Coclustering Base Classification For Out Of Domain Documentslau
 
Database Management Lab -SQL Queries
Database Management Lab -SQL Queries Database Management Lab -SQL Queries
Database Management Lab -SQL Queries shamim hossain
 
SAS cheat sheet
SAS cheat sheetSAS cheat sheet
SAS cheat sheetAli Ajouz
 
Relaxing global-as-view in mediated data integration from linked data
Relaxing global-as-view in mediated data integration from linked dataRelaxing global-as-view in mediated data integration from linked data
Relaxing global-as-view in mediated data integration from linked dataAlessandro Adamou
 
Lecture 06 relational algebra and calculus
Lecture 06 relational algebra and calculusLecture 06 relational algebra and calculus
Lecture 06 relational algebra and calculusemailharmeet
 
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table packageJanuary 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table packageZurich_R_User_Group
 
Extended relational algebra
Extended relational algebraExtended relational algebra
Extended relational algebra1Arun_Pandey
 
Introduction to R r.nabati - iausdj.ac.ir
Introduction to R   r.nabati - iausdj.ac.irIntroduction to R   r.nabati - iausdj.ac.ir
Introduction to R r.nabati - iausdj.ac.irnabati
 

Ähnlich wie Pig statements (20)

Pig
PigPig
Pig
 
Apache pig
Apache pigApache pig
Apache pig
 
Hadoop Pig
Hadoop PigHadoop Pig
Hadoop Pig
 
Data Management in R
Data Management in RData Management in R
Data Management in R
 
4.1-Pig.pptx
4.1-Pig.pptx4.1-Pig.pptx
4.1-Pig.pptx
 
Lec_4_1_IntrotoPIG.pptx
Lec_4_1_IntrotoPIG.pptxLec_4_1_IntrotoPIG.pptx
Lec_4_1_IntrotoPIG.pptx
 
Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017
 
Coclustering Base Classification For Out Of Domain Documents
Coclustering Base Classification For Out Of Domain DocumentsCoclustering Base Classification For Out Of Domain Documents
Coclustering Base Classification For Out Of Domain Documents
 
Database Management Lab -SQL Queries
Database Management Lab -SQL Queries Database Management Lab -SQL Queries
Database Management Lab -SQL Queries
 
ADVANCE ITT BY PRASAD
ADVANCE ITT BY PRASADADVANCE ITT BY PRASAD
ADVANCE ITT BY PRASAD
 
SAS cheat sheet
SAS cheat sheetSAS cheat sheet
SAS cheat sheet
 
ch2
ch2ch2
ch2
 
Relaxing global-as-view in mediated data integration from linked data
Relaxing global-as-view in mediated data integration from linked dataRelaxing global-as-view in mediated data integration from linked data
Relaxing global-as-view in mediated data integration from linked data
 
Cs341
Cs341Cs341
Cs341
 
Lecture 06 relational algebra and calculus
Lecture 06 relational algebra and calculusLecture 06 relational algebra and calculus
Lecture 06 relational algebra and calculus
 
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table packageJanuary 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
 
Chapter 4 Structured Query Language
Chapter 4 Structured Query LanguageChapter 4 Structured Query Language
Chapter 4 Structured Query Language
 
Extended relational algebra
Extended relational algebraExtended relational algebra
Extended relational algebra
 
Excel formulas-a-quick-list
Excel formulas-a-quick-listExcel formulas-a-quick-list
Excel formulas-a-quick-list
 
Introduction to R r.nabati - iausdj.ac.ir
Introduction to R   r.nabati - iausdj.ac.irIntroduction to R   r.nabati - iausdj.ac.ir
Introduction to R r.nabati - iausdj.ac.ir
 

Kürzlich hochgeladen

Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxAS Design & AST.
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdfSteve Caron
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfkalichargn70th171
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxSasikiranMarri
 

Kürzlich hochgeladen (20)

Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptx
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
 

Pig statements

  • 2. What is Pig in hadoop ? Pig is a platform for analyzing large dataset that consist of high-level language for expressing data analysis programs. Originally Created by Yahoo! to answer an in-house data analysis requirement.  Pig is a Dataflow language •Language is called Pig Latin. •Relatively simple syntax •Very easy for SQL developers to learn and understand the language. •Under the cover, Pig Latin Scripts are converted into Map- Reduce job and executed on the cluster.
  • 3. data = <1 , {<2,3>,<4,5>,<6,7>},["key":"value"]> Method Example Result Position $0 1 Name field2 bag{<2,3>,<4,5>,<6,7>} Projection field2.$1 bag{<3>,<5>,<7>} Function AVG(field2.$0) (2+4+6)/3=4 Conditional field1 == 1? 'yes' : 'no' yes Lookup field3#'key' value • Collection of statements • Statements built using operators, expressions and return relations. • Data in Relations: • Atom * Tuple * Bag * Map – Field DATA PROCESS COMBINE VIEW LOAD FILTER JOIN ORDER DUMP FOREACH GROUP LIMIT STORE DISTINCT COGROUP UNION SAMPLE CROSS SPLIT Common Operations Pig Latin
  • 4. Let’s start with PIG…type pig in to terminal LOAD : bag/relation path to the i/p file hdfs/local delimiter A = LOAD “sample.txt” USING PigStorage(‘,’) AS (id:int, Name:chararray, Addr:chararray); column name with data type stmt complete LOAD is use to load data from hdfs/local file system to pig bag/relation.
  • 5. DUMP : display data name of the relation DUMP A ; DUMP is used to send the result to screen. STORE : name of the relation path to the o/p file hdfs/local STORE A INTO ‘hdfs:/data/result’ USING PigStorage(‘:’); store data by “:” separated STORE is used to store/dump data into the cluster HDFS or Local file system .
  • 6. FILTER : name of the column filter address by PUNE city B = FILTER A BY Addr = = ‘PUNE’; FILTER is like WHERE clause in SQL , used to filter relation by given conditions. FOREACH : for each record into the bag can take only Name and Addr from bag C = FOREACH A GENERATE Name , Addr; FOREACH GENERATE is used to add or remove fields from the relation.
  • 7. DISTINCT: D = DISTINCT A ; DISTINCT is used removes duplicate records. It works only on entire records, not on individual fields. SAMPLE: Sample form D relation 0.1% data E = SAMPLE D 0.1 ; To get a sample of your data. It reads through all of your data but returns only a percentage of rows.
  • 8. JOIN: col_name of first relation F = JOIN A BY Name, C BY Name ; col_name of second relation JOIN is used to join relations on given fields. GROUP: col_names G = GROUP A BY (Name , Addr); GROUP is used to group related data into one group, you can perform group operation on multiple fields.
  • 9. COGROUP: col_name of first relation H = COGROUP A BY Name, C BY Name ; col_name of second relation COGROUP is a generalization of group. Instead of collecting records of one input based on a key, it collects records of n inputs based on a key. The result is a record with a key and one bag for each input. CROSS: first relation I = CROSS A , C ; second relation CROSS matches the mathematical set operation of the same name.
  • 10. ORDER: second column J = ORDER A BY $1 DESC; ORDER used to sort the relation by one or more fields. LIMIT: 10 records from A relation K = LIMIT A 10; LIMIT used to limits the size of a relation to maximum number of tuples.
  • 11. UNION: relations L = UNION A,B,C,D; UNION is used to combine one or more relation into one. Sometimes you want to put two data sets together by concatenating them instead of joining them. Pig Latin provides union for this purpose.
  • 12. SPLIT: M = LOAD ‘sample1.txt’ AS (ID:INT, NAME:CAHRARRAY, DOB:CHARARRAY); --Our date format like “20140126” N = SPLIT M INTO Month1 IF SUBSTRING (DOB, 4, 6) ==“01”, Month2 IF SUBSTRING (DOB, 4, 6) ==“02”, Month3 IF SUBSTRING (DOB, 4, 6) ==“03”, RestMonts SPLITREST IF SUBSTRING (DOB, 4, 6) != ‘01’ || ‘02’ || ‘03’ ; Pig Latin also supports splitting data in relations and create multiple new relations on the basis of it. It splits the relation into two or more relations.
  • 13. PIG FUNCTIONS AVG: A = LOAD ‘sample2’ AS(id:int, Fname:chararray, Lname:chararray, marks:chararray); B = FOREACH A GENERATE A.Fname, AVG(A.marks); CONCAT: C = FOREACH A GENERATE CONCAT(Fname,Lname); CONT: D = FOREACH B GENERATE CONT(A);
  • 14. IsEmpty: E = Filter B BY IsEmpty(marks); MAX: F = FOREACH A GENERATE MAX(marks); MIN: F = FOREACH A GENERATE MIN(A.marks); SUM: F = FOREACH A GENERATE SUM(A.marks);
  • 15. TOKENIZE: Splits a string and outputs a bag of words. F = FOREACH A GENERATE TOKENIZE(Fname); Ganesh L. Sanap connectoganesh@gmail.com

Hinweis der Redaktion

  1. Pig is much helpful for Data Analysts, BI Developers or even SQL developers who have no or limited knowledge of Java.
  2. Atom = record, Tuple = row , Bag = table …in SQL.
  3. Note :- All the relations/bag in pig are temporary , If you close GRUNT/TERMINAL you lost your relations.
  4. When you use DUMP result is not stored , it’s simply display on your screen. If you use store stmt then result is store into the given file. Make sure your o/p dir is not already present in your file system. It get created automatically. STORE or DUMP statements may invoke a Map Reduce job execution.
  5. Data in pig is case sensitive, so you need to take care of it. But statements may or may not be case sensitive. You can use also $1,$2 for name and Addr field . It store the name and address into the relation C , it not store id .