SlideShare ist ein Scribd-Unternehmen logo
1 von 16
UNIT-4
Apache Pig
 Pig is a high-level programming language
useful for analyzing large data sets. Pig
was a result of development effort at
Yahoo!
 In a MapReduce framework, programs
need to be translated into a series of Map
and Reduce stages. However, this is not a
programming model which data analysts
are familiar with. So, in order to bridge this
gap, an abstraction called Pig was built on
top of Hadoop.
 Apache Pig enables people to focus more
on analyzing bulk data sets and to
spend less time writing Map-Reduce
programs. Similar to Pigs, who eat
anything, the Apache Pig programming
language is designed to work upon any
kind of data. That's why the name, Pig!
Pig Architecture
The Architecture of Pig consists of two components:
 Pig Latin, which is a language
 A runtime environment, for running PigLatin programs.
A Pig Latin program consists of a series of operations or
transformations which are applied to the input data to produce
output. These operations describe a data flow which is
translated into an executable representation, by Hadoop Pig
execution environment. Underneath, results of these
transformations are series of MapReduce jobs which a
programmer is unaware of. So, in a way, Pig in Hadoop allows
the programmer to focus on data rather than the nature of
execution.
PigLatin is a relatively stiffened language which uses familiar
keywords from data processing e.g., Join, Group and Filter.
Apache Pig Architecture in Hadoop
 Apache Pig architecture consists of a Pig Latin
interpreter that uses Pig Latin scripts to process and
analyze massive datasets. Programmers use Pig
Latin language to analyze large datasets in
the Hadoop environment. Apache pig has a rich set
of datasets for performing different data operations
like join, filter, sort, load, group, etc.
 Programmers must use Pig Latin language to write a
Pig script to perform a specific task. Pig converts
these Pig scripts into a series of Map-Reduce jobs to
ease programmers’ work. Pig Latin programs are
executed via various mechanisms such as UDFs,
embedded, and Grunt shells.

Apache Pig architecture is consisting of the following
major components:
 Parser
 Optimizer
 Compiler
 Execution Engine
 Execution Mode
Pig Latin Scripts
 Pig scripts are submitted to the Pig execution
environment to produce the desired results.
You can execute the Pig scripts by using one of
the methods:
 Grunt Shell
 Script file
 Embedded script
Parser
Parser handles all the Pig Latin statements or commands. Parser performs several checks on the Pig
statements like syntax check, type check, and generates a DAG (Directed Acyclic Graph) output.
DAG output represents all the logical operators of the scripts as nodes and data flow as edges.
Optimizer
Once parsing operation is completed and a DAG output is generated, the output is passed to the
optimizer. The optimizer then performs the optimization activities on the output, such as split, merge,
projection, pushdown, transform, and reorder, etc. The optimizer processes the extracted data and
omits unnecessary data or columns by performing pushdown and projection activity and improves
query performance.
Compiler
The compiler compiles the output that is generated by the optimizer into a series of Map Reduce jobs.
The compiler automatically converts Pig jobs into Map Reduce jobs and optimizes performance by
rearranging the execution order.
Execution Engine
After performing all the above operations, these Map Reduce jobs are submitted to the execution engine,
which is then executed on the Hadoop platform to produce the desired results. You can then use the
DUMP statement to display the results on screen or STORE statements to store the results
in HDFS (Hadoop Distributed File System).
Execution Mode
Apache Pig is executed in two execution modes that are local and Map Reduce. The choice of execution
mode depends on where the data is stored and where you want to run the Pig script. You can either
store your data locally (in a single machine) or in a distributed Hadoop cluster environment.
Local Mode – You can use local mode if your dataset is small. In local mode, Pig runs in a single JVM
using the local host and file system. In this mode, parallel mapper execution is impossible as all files
are installed and run on the localhost. You can use pig -x local command to specify the local mode.
Map Reduce Mode – Apache Pig uses the Map Reduce mode by default. In Map Reduce mode, a
programmer executes the Pig Latin statements on data that is already stored in the HDFS (Hadoop
Distributed File System). You can use pig -x mapreduce command to specify the Map-Reduce
mode.
Apache Pig Components
Parser
Initially the Pig Scripts are handled by the Parser. It checks the syntax
of the script, does type checking, and other miscellaneous checks.
The output of the parser will be a DAG (directed acyclic graph),
which represents the Pig Latin statements and logical operators.
In the DAG, the logical operators of the script are represented as the
nodes and the data flows are represented as edges.
Optimizer
The logical plan (DAG) is passed to the logical optimizer, which carries
out the logical optimizations such as projection and pushdown.
Compiler
The compiler compiles the optimized logical plan into a series of
MapReduce jobs.
Execution engine
Finally the MapReduce jobs are submitted to Hadoop in a sorted order.
Finally, these MapReduce jobs are executed on Hadoop producing
the desired results.
Pig Latin Data Model
 The data model of Pig Latin is fully nested and it
allows complex non-atomic datatypes such
as map and tuple. Given below is the
diagrammatical representation of Pig Latin’s data
model.
 Atom
Any single value in Pig Latin, irrespective of their data, type is known as an Atom. It is stored as string
and can be used as string and number. int, long, float, double, chararray, and bytearray are the
atomic values of Pig. A piece of data or a simple atomic value is known as a field.
Example − ‘raja’ or ‘30’
 Tuple
A record that is formed by an ordered set of fields is known as a tuple, the fields can be of any type. A
tuple is similar to a row in a table of RDBMS.
Example − (Raja, 30)
 Bag
A bag is an unordered set of tuples. In other words, a collection of tuples (non-unique) is known as a
bag. Each tuple can have any number of fields (flexible schema). A bag is represented by ‘{}’. It is
similar to a table in RDBMS, but unlike a table in RDBMS, it is not necessary that every tuple
contain the same number of fields or that the fields in the same position (column) have the same
type.
Example − {(Raja, 30), (Mohammad, 45)}
A bag can be a field in a relation; in that context, it is known as inner bag.
Example − {Raja, 30, {9848022338, raja@gmail.com,}}
 Map
A map (or data map) is a set of key-value pairs. The key needs to be of type chararray and should be
unique. The value might be of any type. It is represented by ‘[]’
Example − [name#Raja, age#30]
 Relation
A relation is a bag of tuples. The relations in Pig Latin are unordered (there is no guarantee that tuples
are processed in any particular order).
Map Reduce vs. Apache Pig
Apache Pig Map Reduce
Scripting language Compiled language
Provides a higher level of
abstraction
Provides a low level of abstraction
Requires a few lines of code (10
lines of code can summarize 200
lines of Map Reduce code)
Requires a more extensive code
(more lines of code)
Requires less development time
and effort
Requires more development time
and effort
Lesser code efficiency
Higher efficiency of code in
comparison to Apache Pig
Apache Pig Features
 Allows programmers to write fewer lines of codes.
Programmers can write 200 lines of Java code in only
ten lines using the Pig Latin language.
 Apache Pig multi-query approach reduces the
development time.
 Apache pig has a rich set of datasets for performing
operations like join, filter, sort, load, group, etc.
 Pig Latin language is very similar to SQL.
Programmers with good SQL knowledge find it easy
to write Pig script.
 Allows programmers to write fewer lines of codes.
Programmers can write 200 lines of Java code in only
ten lines using the Pig Latin language.
 Apache Pig handles both structured and unstructured
data analysis.
Apache Pig Applications
 Processes large volume of data
 Supports quick prototyping and ad-hoc queries
across large datasets
 Performs data processing in search platforms
 Processes time-sensitive data loads
 Used by telecom companies to de-identify the
user call data information.
 https://www.geeksforgeeks.org/apache-pig-
installation-on-windows-and-case-study/

Weitere Àhnliche Inhalte

Was ist angesagt?

Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQLkristinferrier
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
Spark overview
Spark overviewSpark overview
Spark overviewLisa Hua
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examplesAndrea Iacono
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
 
Apache Pig: A big data processor
Apache Pig: A big data processorApache Pig: A big data processor
Apache Pig: A big data processorTushar B Kute
 
Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Edureka!
 
Spark architecture
Spark architectureSpark architecture
Spark architectureGauravBiswas9
 
Apache spark
Apache sparkApache spark
Apache sparkshima jafari
 
Apache spark 소개 및 싀슔
Apache spark 소개 및 싀슔Apache spark 소개 및 싀슔
Apache spark 소개 및 싀슔동현 강
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationateeq ateeq
 
Big Data Analytics with Spark
Big Data Analytics with SparkBig Data Analytics with Spark
Big Data Analytics with SparkMohammed Guller
 
Introduction to Spark with Python
Introduction to Spark with PythonIntroduction to Spark with Python
Introduction to Spark with PythonGokhan Atil
 
04 spark-pair rdd-rdd-persistence
04 spark-pair rdd-rdd-persistence04 spark-pair rdd-rdd-persistence
04 spark-pair rdd-rdd-persistenceVenkat Datla
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceDatabricks
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 

Was ist angesagt? (20)

Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQL
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
Spark overview
Spark overviewSpark overview
Spark overview
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Spark
SparkSpark
Spark
 
Apache Pig: A big data processor
Apache Pig: A big data processorApache Pig: A big data processor
Apache Pig: A big data processor
 
Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
 
Apache spark
Apache sparkApache spark
Apache spark
 
Apache spark 소개 및 싀슔
Apache spark 소개 및 싀슔Apache spark 소개 및 싀슔
Apache spark 소개 및 싀슔
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Hadoop and Spark
Hadoop and SparkHadoop and Spark
Hadoop and Spark
 
Big Data Analytics with Spark
Big Data Analytics with SparkBig Data Analytics with Spark
Big Data Analytics with Spark
 
Introduction to Spark with Python
Introduction to Spark with PythonIntroduction to Spark with Python
Introduction to Spark with Python
 
04 spark-pair rdd-rdd-persistence
04 spark-pair rdd-rdd-persistence04 spark-pair rdd-rdd-persistence
04 spark-pair rdd-rdd-persistence
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 

Ähnlich wie Apache Pig: Analyze Big Data Using High-Level Language

Apache pig
Apache pigApache pig
Apache pigSadiq Basha
 
A slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analyticsA slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analyticsKrishnaVeni451953
 
Introduction to PIG components
Introduction to PIG components Introduction to PIG components
Introduction to PIG components Rupak Roy
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsViswanath Gangavaram
 
An Introduction to Apache Pig
An Introduction to Apache PigAn Introduction to Apache Pig
An Introduction to Apache PigSachin Vakkund
 
Pig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaramPig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaramViswanath Gangavaram
 
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...DrPDShebaKeziaMalarc
 
pig.ppt
pig.pptpig.ppt
pig.pptSheba41
 
lecturte 5. Hgfjhffjyy to the data will be 1.ppt
lecturte 5. Hgfjhffjyy to the data will be 1.pptlecturte 5. Hgfjhffjyy to the data will be 1.ppt
lecturte 5. Hgfjhffjyy to the data will be 1.pptYashJadhav496388
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Trainingstratapps
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsNetajiGandi1
 
Introduction to pig.
Introduction to pig.Introduction to pig.
Introduction to pig.Triloki Gupta
 
06 pig-01-intro
06 pig-01-intro06 pig-01-intro
06 pig-01-introAasim Naveed
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurSiddharth Mathur
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview questionpappupassindia
 

Ähnlich wie Apache Pig: Analyze Big Data Using High-Level Language (20)

Apache pig
Apache pigApache pig
Apache pig
 
Apache PIG
Apache PIGApache PIG
Apache PIG
 
Unit 4 lecture2
Unit 4 lecture2Unit 4 lecture2
Unit 4 lecture2
 
Unit V.pdf
Unit V.pdfUnit V.pdf
Unit V.pdf
 
A slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analyticsA slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analytics
 
Introduction to PIG components
Introduction to PIG components Introduction to PIG components
Introduction to PIG components
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
 
An Introduction to Apache Pig
An Introduction to Apache PigAn Introduction to Apache Pig
An Introduction to Apache Pig
 
Pig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaramPig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaram
 
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
 
pig.ppt
pig.pptpig.ppt
pig.ppt
 
Pig
PigPig
Pig
 
lecturte 5. Hgfjhffjyy to the data will be 1.ppt
lecturte 5. Hgfjhffjyy to the data will be 1.pptlecturte 5. Hgfjhffjyy to the data will be 1.ppt
lecturte 5. Hgfjhffjyy to the data will be 1.ppt
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Training
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data Analytics
 
Introduction to pig.
Introduction to pig.Introduction to pig.
Introduction to pig.
 
06 pig-01-intro
06 pig-01-intro06 pig-01-intro
06 pig-01-intro
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview question
 
43_Sameer_Kumar_Das2
43_Sameer_Kumar_Das243_Sameer_Kumar_Das2
43_Sameer_Kumar_Das2
 

Mehr von vishal choudhary

SE-CyclomaticComplexityand Testing.ppt
SE-CyclomaticComplexityand Testing.pptSE-CyclomaticComplexityand Testing.ppt
SE-CyclomaticComplexityand Testing.pptvishal choudhary
 
SE-coupling and cohesion.ppt
SE-coupling and cohesion.pptSE-coupling and cohesion.ppt
SE-coupling and cohesion.pptvishal choudhary
 
SE-software design.ppt
SE-software design.pptSE-software design.ppt
SE-software design.pptvishal choudhary
 
Multimedia-Lecture-Animation.pptx
Multimedia-Lecture-Animation.pptxMultimedia-Lecture-Animation.pptx
Multimedia-Lecture-Animation.pptxvishal choudhary
 
MultimediaLecture5.pptx
MultimediaLecture5.pptxMultimediaLecture5.pptx
MultimediaLecture5.pptxvishal choudhary
 
Multimedia-Lecture-7.pptx
Multimedia-Lecture-7.pptxMultimedia-Lecture-7.pptx
Multimedia-Lecture-7.pptxvishal choudhary
 
MultiMedia-Lecture-4.pptx
MultiMedia-Lecture-4.pptxMultiMedia-Lecture-4.pptx
MultiMedia-Lecture-4.pptxvishal choudhary
 
Multimedia-Lecture-6.pptx
Multimedia-Lecture-6.pptxMultimedia-Lecture-6.pptx
Multimedia-Lecture-6.pptxvishal choudhary
 
Multimedia-Lecture-3.pptx
Multimedia-Lecture-3.pptxMultimedia-Lecture-3.pptx
Multimedia-Lecture-3.pptxvishal choudhary
 

Mehr von vishal choudhary (20)

SE-Lecture1.ppt
SE-Lecture1.pptSE-Lecture1.ppt
SE-Lecture1.ppt
 
SE-Testing.ppt
SE-Testing.pptSE-Testing.ppt
SE-Testing.ppt
 
SE-CyclomaticComplexityand Testing.ppt
SE-CyclomaticComplexityand Testing.pptSE-CyclomaticComplexityand Testing.ppt
SE-CyclomaticComplexityand Testing.ppt
 
SE-Lecture-7.pptx
SE-Lecture-7.pptxSE-Lecture-7.pptx
SE-Lecture-7.pptx
 
Se-Lecture-6.ppt
Se-Lecture-6.pptSe-Lecture-6.ppt
Se-Lecture-6.ppt
 
SE-Lecture-5.pptx
SE-Lecture-5.pptxSE-Lecture-5.pptx
SE-Lecture-5.pptx
 
XML.pptx
XML.pptxXML.pptx
XML.pptx
 
SE-Lecture-8.pptx
SE-Lecture-8.pptxSE-Lecture-8.pptx
SE-Lecture-8.pptx
 
SE-coupling and cohesion.ppt
SE-coupling and cohesion.pptSE-coupling and cohesion.ppt
SE-coupling and cohesion.ppt
 
SE-Lecture-2.pptx
SE-Lecture-2.pptxSE-Lecture-2.pptx
SE-Lecture-2.pptx
 
SE-software design.ppt
SE-software design.pptSE-software design.ppt
SE-software design.ppt
 
SE1.ppt
SE1.pptSE1.ppt
SE1.ppt
 
SE-Lecture-4.pptx
SE-Lecture-4.pptxSE-Lecture-4.pptx
SE-Lecture-4.pptx
 
SE-Lecture=3.pptx
SE-Lecture=3.pptxSE-Lecture=3.pptx
SE-Lecture=3.pptx
 
Multimedia-Lecture-Animation.pptx
Multimedia-Lecture-Animation.pptxMultimedia-Lecture-Animation.pptx
Multimedia-Lecture-Animation.pptx
 
MultimediaLecture5.pptx
MultimediaLecture5.pptxMultimediaLecture5.pptx
MultimediaLecture5.pptx
 
Multimedia-Lecture-7.pptx
Multimedia-Lecture-7.pptxMultimedia-Lecture-7.pptx
Multimedia-Lecture-7.pptx
 
MultiMedia-Lecture-4.pptx
MultiMedia-Lecture-4.pptxMultiMedia-Lecture-4.pptx
MultiMedia-Lecture-4.pptx
 
Multimedia-Lecture-6.pptx
Multimedia-Lecture-6.pptxMultimedia-Lecture-6.pptx
Multimedia-Lecture-6.pptx
 
Multimedia-Lecture-3.pptx
Multimedia-Lecture-3.pptxMultimedia-Lecture-3.pptx
Multimedia-Lecture-3.pptx
 

KĂŒrzlich hochgeladen

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Call Girls đŸ«€ Dwarka âžĄïž 9711199171 âžĄïž Delhi đŸ«Š Two shot with one girl
Call Girls đŸ«€ Dwarka âžĄïž 9711199171 âžĄïž Delhi đŸ«Š Two shot with one girlCall Girls đŸ«€ Dwarka âžĄïž 9711199171 âžĄïž Delhi đŸ«Š Two shot with one girl
Call Girls đŸ«€ Dwarka âžĄïž 9711199171 âžĄïž Delhi đŸ«Š Two shot with one girlkumarajju5765
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 

KĂŒrzlich hochgeladen (20)

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
꧁❀ Aerocity Call Girls Service Aerocity Delhi ❀꧂ 9999965857 ☎ Hard And Sexy ...
꧁❀ Aerocity Call Girls Service Aerocity Delhi ❀꧂ 9999965857 ☎ Hard And Sexy ...꧁❀ Aerocity Call Girls Service Aerocity Delhi ❀꧂ 9999965857 ☎ Hard And Sexy ...
꧁❀ Aerocity Call Girls Service Aerocity Delhi ❀꧂ 9999965857 ☎ Hard And Sexy ...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call Girls đŸ«€ Dwarka âžĄïž 9711199171 âžĄïž Delhi đŸ«Š Two shot with one girl
Call Girls đŸ«€ Dwarka âžĄïž 9711199171 âžĄïž Delhi đŸ«Š Two shot with one girlCall Girls đŸ«€ Dwarka âžĄïž 9711199171 âžĄïž Delhi đŸ«Š Two shot with one girl
Call Girls đŸ«€ Dwarka âžĄïž 9711199171 âžĄïž Delhi đŸ«Š Two shot with one girl
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 

Apache Pig: Analyze Big Data Using High-Level Language

  • 2.  Pig is a high-level programming language useful for analyzing large data sets. Pig was a result of development effort at Yahoo!  In a MapReduce framework, programs need to be translated into a series of Map and Reduce stages. However, this is not a programming model which data analysts are familiar with. So, in order to bridge this gap, an abstraction called Pig was built on top of Hadoop.
  • 3.  Apache Pig enables people to focus more on analyzing bulk data sets and to spend less time writing Map-Reduce programs. Similar to Pigs, who eat anything, the Apache Pig programming language is designed to work upon any kind of data. That's why the name, Pig!
  • 4. Pig Architecture The Architecture of Pig consists of two components:  Pig Latin, which is a language  A runtime environment, for running PigLatin programs. A Pig Latin program consists of a series of operations or transformations which are applied to the input data to produce output. These operations describe a data flow which is translated into an executable representation, by Hadoop Pig execution environment. Underneath, results of these transformations are series of MapReduce jobs which a programmer is unaware of. So, in a way, Pig in Hadoop allows the programmer to focus on data rather than the nature of execution. PigLatin is a relatively stiffened language which uses familiar keywords from data processing e.g., Join, Group and Filter.
  • 5. Apache Pig Architecture in Hadoop  Apache Pig architecture consists of a Pig Latin interpreter that uses Pig Latin scripts to process and analyze massive datasets. Programmers use Pig Latin language to analyze large datasets in the Hadoop environment. Apache pig has a rich set of datasets for performing different data operations like join, filter, sort, load, group, etc.  Programmers must use Pig Latin language to write a Pig script to perform a specific task. Pig converts these Pig scripts into a series of Map-Reduce jobs to ease programmers’ work. Pig Latin programs are executed via various mechanisms such as UDFs, embedded, and Grunt shells. 
  • 6. Apache Pig architecture is consisting of the following major components:  Parser  Optimizer  Compiler  Execution Engine  Execution Mode
  • 7. Pig Latin Scripts  Pig scripts are submitted to the Pig execution environment to produce the desired results. You can execute the Pig scripts by using one of the methods:  Grunt Shell  Script file  Embedded script
  • 8. Parser Parser handles all the Pig Latin statements or commands. Parser performs several checks on the Pig statements like syntax check, type check, and generates a DAG (Directed Acyclic Graph) output. DAG output represents all the logical operators of the scripts as nodes and data flow as edges. Optimizer Once parsing operation is completed and a DAG output is generated, the output is passed to the optimizer. The optimizer then performs the optimization activities on the output, such as split, merge, projection, pushdown, transform, and reorder, etc. The optimizer processes the extracted data and omits unnecessary data or columns by performing pushdown and projection activity and improves query performance. Compiler The compiler compiles the output that is generated by the optimizer into a series of Map Reduce jobs. The compiler automatically converts Pig jobs into Map Reduce jobs and optimizes performance by rearranging the execution order. Execution Engine After performing all the above operations, these Map Reduce jobs are submitted to the execution engine, which is then executed on the Hadoop platform to produce the desired results. You can then use the DUMP statement to display the results on screen or STORE statements to store the results in HDFS (Hadoop Distributed File System). Execution Mode Apache Pig is executed in two execution modes that are local and Map Reduce. The choice of execution mode depends on where the data is stored and where you want to run the Pig script. You can either store your data locally (in a single machine) or in a distributed Hadoop cluster environment. Local Mode – You can use local mode if your dataset is small. In local mode, Pig runs in a single JVM using the local host and file system. In this mode, parallel mapper execution is impossible as all files are installed and run on the localhost. You can use pig -x local command to specify the local mode. Map Reduce Mode – Apache Pig uses the Map Reduce mode by default. In Map Reduce mode, a programmer executes the Pig Latin statements on data that is already stored in the HDFS (Hadoop Distributed File System). You can use pig -x mapreduce command to specify the Map-Reduce mode.
  • 9.
  • 10. Apache Pig Components Parser Initially the Pig Scripts are handled by the Parser. It checks the syntax of the script, does type checking, and other miscellaneous checks. The output of the parser will be a DAG (directed acyclic graph), which represents the Pig Latin statements and logical operators. In the DAG, the logical operators of the script are represented as the nodes and the data flows are represented as edges. Optimizer The logical plan (DAG) is passed to the logical optimizer, which carries out the logical optimizations such as projection and pushdown. Compiler The compiler compiles the optimized logical plan into a series of MapReduce jobs. Execution engine Finally the MapReduce jobs are submitted to Hadoop in a sorted order. Finally, these MapReduce jobs are executed on Hadoop producing the desired results.
  • 11. Pig Latin Data Model  The data model of Pig Latin is fully nested and it allows complex non-atomic datatypes such as map and tuple. Given below is the diagrammatical representation of Pig Latin’s data model.
  • 12.  Atom Any single value in Pig Latin, irrespective of their data, type is known as an Atom. It is stored as string and can be used as string and number. int, long, float, double, chararray, and bytearray are the atomic values of Pig. A piece of data or a simple atomic value is known as a field. Example − ‘raja’ or ‘30’  Tuple A record that is formed by an ordered set of fields is known as a tuple, the fields can be of any type. A tuple is similar to a row in a table of RDBMS. Example − (Raja, 30)  Bag A bag is an unordered set of tuples. In other words, a collection of tuples (non-unique) is known as a bag. Each tuple can have any number of fields (flexible schema). A bag is represented by ‘{}’. It is similar to a table in RDBMS, but unlike a table in RDBMS, it is not necessary that every tuple contain the same number of fields or that the fields in the same position (column) have the same type. Example − {(Raja, 30), (Mohammad, 45)} A bag can be a field in a relation; in that context, it is known as inner bag. Example − {Raja, 30, {9848022338, raja@gmail.com,}}  Map A map (or data map) is a set of key-value pairs. The key needs to be of type chararray and should be unique. The value might be of any type. It is represented by ‘[]’ Example − [name#Raja, age#30]  Relation A relation is a bag of tuples. The relations in Pig Latin are unordered (there is no guarantee that tuples are processed in any particular order).
  • 13. Map Reduce vs. Apache Pig Apache Pig Map Reduce Scripting language Compiled language Provides a higher level of abstraction Provides a low level of abstraction Requires a few lines of code (10 lines of code can summarize 200 lines of Map Reduce code) Requires a more extensive code (more lines of code) Requires less development time and effort Requires more development time and effort Lesser code efficiency Higher efficiency of code in comparison to Apache Pig
  • 14. Apache Pig Features  Allows programmers to write fewer lines of codes. Programmers can write 200 lines of Java code in only ten lines using the Pig Latin language.  Apache Pig multi-query approach reduces the development time.  Apache pig has a rich set of datasets for performing operations like join, filter, sort, load, group, etc.  Pig Latin language is very similar to SQL. Programmers with good SQL knowledge find it easy to write Pig script.  Allows programmers to write fewer lines of codes. Programmers can write 200 lines of Java code in only ten lines using the Pig Latin language.  Apache Pig handles both structured and unstructured data analysis.
  • 15. Apache Pig Applications  Processes large volume of data  Supports quick prototyping and ad-hoc queries across large datasets  Performs data processing in search platforms  Processes time-sensitive data loads  Used by telecom companies to de-identify the user call data information.