SlideShare ist ein Scribd-Unternehmen logo
1 von 102
Key Takeaways
Key Takeaways
What’s in it for you?
What’s in it for you?
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
8. Use case – Twitter
9. Features of Pig
Let’s get started with
Pig!
1. Why Pig?
What’s in it for you?
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
8. Use case – Twitter
9. Features of Pig
Let’s get started with
Pig!
1. Why Pig?
2. What is Pig?
What’s in it for you?
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
8. Use case – Twitter
9. Features of Pig
Let’s get started with
Pig!
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
What’s in it for you?
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
8. Use case – Twitter
9. Features of Pig
Let’s get started with
Pig!
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
What’s in it for you?
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
8. Use case – Twitter
9. Features of Pig
Let’s get started with
Pig!
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
What’s in it for you?
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
8. Use case – Twitter
9. Features of Pig
Let’s get started with
Pig!
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
What’s in it for you?
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
8. Use case – Twitter
9. Features of Pig
Let’s get started with
Pig!
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
What’s in it for you?
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
8. Use case – Twitter
9. Features of Pig
Let’s get started with
Pig!
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
8. Use case – Twitter
What’s in it for you?
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
8. Use case – Twitter
9. Features of Pig
Let’s get started with
Pig!
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
8. Use case – Twitter
9. Features of Pig
Key Takeaways
Why Pig?
As we all know, Hadoop uses MapReduce to analyze and process big data
Why Pig?
As we all know, Hadoop uses MapReduce to analyze and process big data
Before
Processing Big Data consumed more time
Why Pig?
As we all know, Hadoop uses MapReduce to analyze and process big data
Before
Processing Big Data consumed more time
Why Pig?
As we all know, Hadoop uses MapReduce to analyze and process big data
Before
Processing Big Data consumed more time Processing Big Data was faster using Mapreduce
Why Pig?
After
As we all know, Hadoop uses MapReduce to analyze and process big data
Before
Processing Big Data consumed more time Processing Big Data was faster using Mapreduce
Why Pig?
AfterThen, what is the
problem with
MapReduce ?
Prior to 2006, all MapReduce programs were written in Java
Why Pig?
Non-programmers found it
difficult to write lengthy Java
codes
They faced issues in
incorporating map, sort,
reduce fundamentals of
MapReduce while creating a
program
Eventually, it became a
difficult task to maintain and
optimize the code due to
which the processing time
increased
Map phase
Shuffle and sort
Reduce phase
Prior to 2006, all MapReduce programs were written in Java
Why Pig?
Non-programmers found it
difficult to write lengthy Java
codes
They faced issues in
incorporating map, sort,
reduce fundamentals of
MapReduce while creating a
program
Eventually, it became a
difficult task to maintain and
optimize the code due to
which the processing time
increased
Map phase
Shuffle and sort
Reduce phase
Prior to 2006, all MapReduce programs were written in Java
Why Pig?
Non-programmers found it
difficult to write lengthy Java
codes
They faced issues in
incorporating map, sort,
reduce fundamentals of
MapReduce while creating a
program
Eventually, it became a
difficult task to maintain and
optimize the code due to
which the processing time
increased
Map phase
Shuffle and sort
Reduce phase
Prior to 2006, all MapReduce programs were written in Java
Why Pig?
Why Pig?
Yahoo faced problems to process and analyze large
datasets using Java as the codes were complex and
lengthy
Problem
Why Pig?
Yahoo faced problems to process and analyze large
datasets using Java as the codes were complex and
lengthy
There was a necessity to develop an easier way to
analyze large datasets without using time consuming
complex Java codes
Problem
Necessity
Why Pig?
Yahoo faced problems to process and analyze large
datasets using Java as the codes were complex and
lengthy
• Apache Pig was developed by Yahoo researchers.
• It was developed with a vision to analyze and process large
datasets without using complex Java codes. Pig was
developed especially for non-programmers.
• Pig used simple steps to analyze datasets which was time
efficient.
Problem
Necessity
Solution
There was a necessity to develop an easier way to
analyze large datasets without using time consuming
complex Java codes
Key Takeaways
What is Pig?
What is Pig?
Pig is a scripting platform that runs on Hadoop clusters, designed to process and analyze
large datasets
What is Pig?
Pig is a scripting platform that runs on Hadoop clusters, designed to process and analyze
large datasets
Uses SQL
like queries
Analyze data
What is Pig?
Pig is a scripting platform that runs on Hadoop clusters, designed to process and analyze
large datasets
Pig operates on various types of data like
structured, semi-structured and
unstructured data
Uses SQL
like queries
Analyze data
Key Takeaways
MapReduce vs Hive vs Pig
How is Blockchain distributed ledger different from a traditional ledger?
SQL like query Scripting language
vs vs
Complied language
MapReduce vs Hive vs Pig
Need to write long complex codes
Lower level of abstraction
No need to write complex
codes
Higher level of abstraction
Can process structured, semi
structured and unstructured data
Higher level of abstraction
Can process only structured data Can process structured, semi
structured and unstructured data
No need to write complex
codes
How is Blockchain distributed ledger different from a traditional ledger?
SQL like query Scripting language
vs vs
Complied language
MapReduce vs Hive vs Pig
Need to write long complex codes
Lower level of abstraction
No need to write complex
codes
Higher level of abstraction
Can process structured, semi
structured and unstructured data
Higher level of abstraction
Can process only structured data Can process structured, semi
structured and unstructured data
No need to write complex
codes
How is Blockchain distributed ledger different from a traditional ledger?
SQL like query Scripting language
vs vs
Complied language
MapReduce vs Hive vs Pig
Need to write long complex codes
Lower level of abstraction
No need to write complex
codes
Higher level of abstraction
Can process structured, semi
structured and unstructured data
Higher level of abstraction
Can process only structured data Can process structured, semi
structured and unstructured data
No need to write complex
codes
How is Blockchain distributed ledger different from a traditional ledger?
SQL like query Scripting language
vs vs
Complied language
MapReduce vs Hive vs Pig
Need to write long complex codes
Lower level of abstraction
No need to write complex
codes
Higher level of abstraction
Can process structured, semi
structured and unstructured data
Higher level of abstraction
Can process only structured data Can process structured, semi
structured and unstructured data
No need to write complex
codes
How is Blockchain distributed ledger different from a traditional ledger?
SQL like query Scripting language
vs vs
Complied language
MapReduce vs Hive vs Pig
Need to write long complex codes
Lower level of abstraction
No need to write complex
codes
Higher level of abstraction
Can process structured, semi
structured and unstructured data
Higher level of abstraction
Can process only structured data Can process structured, semi
structured and unstructured data
No need to write complex
codes
How is Blockchain distributed ledger different from a traditional ledger?
SQL like query Scripting language
vs vs
Complied language
MapReduce vs Hive vs Pig
Need to write long complex codes
Lower level of abstraction
No need to write complex
codes
Higher level of abstraction
Can process structured, semi
structured and unstructured data
Higher level of abstraction
No need to write complex
codes
Can process only structured data Can process structured, semi
structured and unstructured data
This is the advantage Pig has over Hive
How is Blockchain distributed ledger different from a traditional ledger?
vs vs
Supports partitioning feature
MapReduce vs Hive vs Pig
MapReduce uses Java and
Python
Code performance is good
Hive uses a SQL like query language
known as HiveQL
Code performance is lesser than
MapReduce and Pig
MapReduce is used by programmers
Code performance is lesser than
MapReduce but better than Hive
Hive is used by data analysts Pig is used by researchers and
programmers
Pig Latin is used which is a
procedural data flow language
Supports partitioning feature No concept of partitioning in
Pig
How is Blockchain distributed ledger different from a traditional ledger?
vs vs
Supports partitioning feature
MapReduce vs Hive vs Pig
MapReduce uses Java and
Python
Code performance is good
Hive uses a SQL like query language
known as HiveQL
Code performance is lesser than
MapReduce and Pig
MapReduce is used by programmers
Code performance is lesser than
MapReduce but better than Hive
Hive is used by data analysts Pig is used by researchers and
programmers
Pig Latin is used which is a
procedural data flow language
Supports partitioning feature No concept of partitioning in
Pig
How is Blockchain distributed ledger different from a traditional ledger?
vs vs
Supports partitioning feature
MapReduce vs Hive vs Pig
MapReduce uses Java and
Python
Code performance is good
Hive uses a SQL like query language
known as HiveQL
Code performance is lesser than
MapReduce and Pig
MapReduce is used by programmers
Code performance is lesser than
MapReduce but better than Hive
Hive is used by data analysts Pig is used by researchers and
programmers
Pig Latin is used which is a
procedural data flow language
Supports partitioning feature No concept of partitioning in
Pig
How is Blockchain distributed ledger different from a traditional ledger?
vs vs
Supports partitioning feature
MapReduce vs Hive vs Pig
MapReduce uses Java and
Python
Code performance is good
Hive uses a SQL like query language
known as HiveQL
Code performance is lesser than
MapReduce and Pig
MapReduce is used by programmers
Code performance is lesser than
MapReduce but better than Hive
Hive is used by data analysts Pig is used by researchers and
programmers
Pig Latin is used which is a
procedural data flow language
Supports partitioning feature No concept of partitioning in
Pig
Key Takeaways
Components of Pig
Components of Pig
Pig has two components
Components of Pig
Pig has two components
Runtime engine
Pig Latin
Pig Latin is the procedural data
flow language used in Pig to
analyze data
It is easy to program using Pig
Latin as it is similar to SQL
Runtime engine represents the
execution environment created
to run Pig Latin programs
It is also a compiler that
produces MapReduce
programs
Uses HDFS for storing and
retrieving data
Components of Pig
Pig has two components
Pig Latin
Runtime engine
Pig Latin is the procedural
data flow language used in
Pig to analyze data
It is easy to program using
Pig Latin as it is similar to
SQL
It is also a compiler that produces
MapReduce programs
Uses HDFS for storing and
retrieving data
Runtime engine represents the
execution environment created
to run Pig Latin programs
Key Takeaways
Pig architecture
Pig architecture
There are 3 ways to execute the
written Pig script
Pig Latin Scripts
Programmers write a script In Pig
Latin to analyze data using Pig
Pig architecture
There are 3 ways to execute the
written Pig script
Pig Latin Scripts
Grunt Shell
Grunt Shell is Pig’s interactive shell which
is used to execute all Pig scripts
Pig architecture
There are 3 ways to execute the
written Pig script
Pig Latin Scripts
Grunt Shell Pig Server
If the Pig script is written in a script file, the
execution is done by the Pig Server
Pig architecture
There are 3 ways to execute the
written Pig script
Pig Latin Scripts
Grunt Shell Pig Server
Parser
Parser checks the syntax of the Pig script
After checking, the output will be a
DAG – Directed Acyclic Graph
Pig architecture
There are 3 ways to execute the
written Pig script
Pig Latin Scripts
Grunt Shell Pig Server
Parser
Optimizer DAG (logical plan) is passed to the logical
Optimizer where optimizations take place
Pig architecture
There are 3 ways to execute the
written Pig script
Pig Latin Scripts
Grunt Shell Pig Server
Parser
Optimizer
Compiler
The Compiler converts the DAG into
MapReduce jobs
Pig architecture
There are 3 ways to execute the
written Pig script
Pig Latin Scripts
Grunt Shell Pig Server
Parser
Optimizer
Compiler
Execution Engine
The MapReduce jobs are executed at the
Execution Engine
The results are displayed using “DUMP”
statement and stored in HDFS using
“STORE” statement
Pig architecture
There are 3 ways to execute the
written Pig script
Pig Latin Scripts
Grunt Shell Pig Server
Parser
Optimizer
Compiler
Execution Engine
MapReduce
HDFS
Key Takeaways
Working of Pig
Working of Pig
Load data and
write Pig script
Pig Latin script is written
by the users
1
Working of Pig
Load data and
write Pig script
Pig operations In this step, all the Pig
operations are performed by
parser, optimizer and
compiler
21
Working of Pig
Load data and
write Pig script
Pig operations
Execution of the
plan
In this stage, the results are shown
on the screen otherwise stored in
HDFS as per the code
1 2
3
Key Takeaways
Pig Latin data model
Pig Latin data model
The data model of Pig Latin helps Pig to handle various
types of data
Pig Latin data model
The data model of Pig Latin helps Pig to handle various
types of data
Atom represents any single value of primitive data type in
Pig Latin like int, float, string. It is stored as string
Examples
‘Rob’ or
50
Atom
Pig Latin data model
The data model of Pig Latin helps Pig to handle various
types of data
Examples
‘Rob’ or
50
Atom Tuple
(Rob,5)
Tuple represents sequence of fields that can be of any
data type. It is same as a row in RDBMS i.e.; a set of data
from a single row
Pig Latin data model
The data model of Pig Latin helps Pig to handle various
types of data
Examples
‘Rob’ or
50
Atom Tuple
(Rob,5)
Bag
{(Rob,5),(
Mike,10}
Bag is a collection of tuples. It is the same as a table in
RDBMS. It is represented by ‘{}’
Pig Latin data model
The data model of Pig Latin helps Pig to handle various
types of data
Examples
‘Rob’ or
50
Atom Tuple
(Rob,5)
Bag
{(Rob,5),(
Mike,10}
Map
[name#Mi
ke,
age#10]
Map is a set of key-value pairs. Key is of chararray type
and value can be of any type. It is represented by ‘[]’
Pig Latin data model
The data model of Pig Latin helps Pig to handle various
types of data
Examples
‘Rob’ or
50
Atom Tuple
(Rob,5)
Bag
{(Rob,5),(
Mike,10}
Map
[name#Mi
ke,
age#10]
Map is a set of key-value pairs. Key is of chararray type
and value can be of any type. It is represented by ‘[]’
Pig Latin has a fully nestable data model
that means one data type can be nested
with another
Pig Latin data model
Here is a diagrammatical representation of the Pig Latin data
model
Sl. no Name Age Place
01
02
03
Jack
Bob
Joe
23
25
29
Goa
London
California
Pig Latin data model
Here is a diagrammatical representation of the Pig Latin data
model
Sl. no Name Age Place
01
02
03
Jack
Bob
Joe
23
25
29
Goa
London
California
Field
Pig Latin data model
Here is a diagrammatical representation of the Pig Latin data
model
Sl. no Name Age Place
01
02
03
Jack
Bob
Joe
23
25
29
Goa
London
California Tuple
Pig Latin data model
Here is a diagrammatical representation of the Pig Latin data
model
Sl. no Name Age Place
01
02
03
Jack
Bob
Joe
23
25
29
Goa
London
California
}Bag
Key Takeaways
Pig Execution modes
Pig Execution modes
Pig works in two execution modes. Depending
on where the data is residing and where the Pig
script is going to run
Pig Execution modes
Pig works in two execution modes. Depending
on where the data is residing and where the Pig
script is going to run
Local Mode MapReduce Mode
Pig Execution modes
Local Mode MapReduce Mode
Here, the Pig engine takes input from the Linux file system and the output is
stored in the same file system
Local Mode is useful in analyzing small datasets using Pig
Pig works in two execution modes. Depending
on where the data is residing and where the Pig
script is going to run
Pig Execution modes
Local Mode MapReduce Mode
Pig works in two execution modes. Depending
on where the data is residing and where the Pig
script is going to run
Here, the Pig engine directly interacts and executes in HDFS and
MapReduce
In the MapReduce mode, queries written in Pig Latin are translated into
MapReduce jobs and are run on a Hadoop cluster. By default, Pig runs on
this mode
Pig Execution modes
There are three modes in Pig, depending on
how a Pig Latin code can be written
Pig Execution modes
Interactive Mode
Batch Mode
Embedded Mode
There are three modes in Pig, depending on
how a Pig Latin code can be written
Pig Execution modes
Interactive Mode
Batch Mode
Embedded Mode
Interactive mode means coding and executing the script, line by line
There are three modes in Pig, depending on
how a Pig Latin code can be written
Pig Execution modes
Interactive Mode
Batch Mode
Embedded Mode
There are three modes in Pig, depending on
how a Pig Latin code can be written
In Batch mode, all scripts are coded in a file with the extension .pig and
the file is directly executed
Pig Execution modes
Interactive Mode
Batch Mode
Embedded Mode
There are three modes in Pig, depending on
how a Pig Latin code can be written
Pig lets it’s users define their own functions (UDFs) in
programming languages such as Java
Key Takeaways
Use case - Twitter
Use case – Twitter
Users on Twitter generate about 500 million tweets
on a daily basis
Use case – Twitter
Users on Twitter generate about 500 million tweets
on a daily basis
Hadoop MapReduce was used to
process and analyze this data
Analyzing the number of tweets created by a user in
the tweet table was done using MapReduce in Java
programming language
Use case – Twitter
Users on Twitter generate about 500 million tweets
on a daily basis
Hadoop MapReduce was used to
process and analyze this data
Analyzing the number of tweets created by a user in
the tweet table was done using MapReduce in Java
programming language
It was difficult to perform MapReduce operations as users
were not well versed with writing complex Java codes
Use case – Twitter
The problems that were faced by Twitter while
analyzing datasets using MapReduce were :
Joining Datasets Sorting
Datasets
Grouping
Datasets
It was difficult to perform these operations on MapReduce as
it consumed more time since the Java codes were lengthy
and complex
Twitter used Apache Pig to overcome
these problems. Let’s see how.
Use case – Twitter
Problem statement
Analyze the user table and tweet table and find out how many tweets
are created by a person
Use case – Twitter
ID Name
1
2
3
Alice
Tim
John
User Table Tweet Table
1
2
1
3
1
2
Google….
Tennis…
Spacecraft…
Oscar…
Politics..…
Olympics…
ID Tweet
Problem statement
Analyze the user table and tweet table and find out how many tweets
are created by a person
Use case – Twitter
ID Name
1
2
3
Alice
Tim
John
User Table Tweet Table
1
2
1
3
1
2
Google….
Tennis…
Spacecraft…
Oscar…
Politics..…
Olympics…
ID Tweet
Problem statement
The following operations were
performed for analyzing the given data
Analyze the user table and tweet table and find out how many tweets
are created by a person
Use case – Twitter
ID Name
1
2
3
Alice
Tim
John
User Table Tweet Table
1
2
1
3
1
2
Google...
Tennis...
Spacecraft...
Oscar...
Politics...
Olympics...
ID Tweet
First, the twitter data is loaded onto the Pig storage using
LOAD command
Use case – Twitter
ID Name
1
2
3
Alice
Tim
John
User Table Tweet Table
1
2
1
3
1
2
Google...
Tennis...
Spacecraft...
Oscar...
Politics...
Olympics...
ID Tweet ID Name
1
2
3
Alice
Tim
John
User Table Tweet Table
1
2
1
3
1
2
Google...
Tennis...
Spacecraft...
Oscar...
Politics...
Olympics...
ID Tweet
First, the twitter data is loaded onto the Pig storage using
LOAD command
Use case – Twitter
In join and group operation, the tweet and user tables are joined
and grouped using COGROUP command
ID Name Tweet
1
1
2
1
2
3
Alice
Alice
Alice
Tim
Tim
John
Google...
Spacecraft...
Politics...
Tennis...
Oscar...
Olympics...
The remaining operations performed are shown below
Use case – Twitter
In join and group operation, the tweet and user tables are joined
and grouped using COGROUP command
ID Count
1
2
3
3
2
1
The next operation is the aggregation, the tweets are counted
according to the names. The command used is COUNT
The remaining operations performed are shown below
Use case – Twitter
The remaining operations performed are shown below
ID
1
2
3
Name Count
3
2
1
Alice
Tim
John
In join and group operation, the tweet and user tables are joined
and grouped using COGROUP command
The result after the count operation is joined with the user table to
find out the user name
The next operation is the aggregation, the tweets are counted
according to the names. The command used is COUNT
Use case – Twitter
The remaining operations performed are shown below
ID
1
2
3
Name Count
3
2
1
Alice
Tim
John
In join and group operation, the tweet and user tables are joined
and grouped using COGROUP command
The result after the count operation is joined with the user table to
find out the user name
The next operation is the aggregation, the tweets are counted
according to the names. The command used is COUNT
Pig reduces the complexity of
the operations which would
have been lengthier using
MapReduce
Use case – Twitter
The remaining operations performed are shown below
ID
1
2
3
Name Count
3
2
1
Alice
Tim
John
In join and group operation, the tweet and user tables are joined
and grouped using COGROUP command
The result after the count operation is joined with the user table to
find out the user name
The next operation is the aggregation, the tweets are counted
according to the names. The command used is COUNT
Finally, we could find out the
number of tweets created by a
user in a simple way
Optimization and
compilation is easy
as it is done
automatically and
internally
Allows multiple
queries to process
parallelly
Pig offers a large
set of operators
such as join, filter
and so on
Pig lets us create
User-defined
Functions
Handles all kind of data
like structured, semi
structured and
unstructured
Short development
time as the code is
simpler
Features of Pig
Ease of programming
as Pig Latin is similar
to SQL. Lesser lines
of code needs to be
written
Optimization and
compilation is easy
as it is done
automatically and
internally
Allows multiple
queries to process
parallelly
Pig offers a large
set of operators
such as join, filter
and so on
Handles all kind of data
like structured, semi
structured and
unstructured
Features of Pig
Ease of programming
as Pig Latin is similar
to SQL. Lesser lines
of code needs to be
written
Short development
time as the code is
simpler
Pig lets us create
User-defined
Functions
Optimization and
compilation is easy
as it is done
automatically and
internally
Allows multiple
queries to process
parallelly
Pig offers a large
set of operators
such as join, filter
and so on
Features of Pig
Ease of programming
as Pig Latin is similar
to SQL. Lesser lines
of code needs to be
written
Short development
time as the code is
simpler
Handles all kind of data
like structured, semi
structured and
unstructured
Pig lets us create
User-defined
Functions
Optimization and
compilation is easy
as it is done
automatically and
internally
Allows multiple
queries to process
parallelly
Pig offers a large
set of operators
such as join, filter
and so on
Features of Pig
Ease of programming
as Pig Latin is similar
to SQL. Lesser lines
of code needs to be
written
Short development
time as the code is
simpler
Handles all kind of data
like structured, semi
structured and
unstructured
Pig lets us create
User-defined
Functions
Optimization and
compilation is easy
as it is done
automatically and
internally
Allows multiple
queries to process
parallelly
Features of Pig
Ease of programming
as Pig Latin is similar
to SQL. Lesser lines
of code needs to be
written
Short development
time as the code is
simpler
Handles all kind of data
like structured, semi
structured and
unstructured
Pig offers a large
set of operators
such as join, filter
and so on
Pig lets us create
User-defined
Functions
Optimization and
compilation is easy
as it is done
automatically and
internally
Features of Pig
Ease of programming
as Pig Latin is similar
to SQL. Lesser lines
of code needs to be
written
Short development
time as the code is
simpler
Handles all kind of data
like structured, semi
structured and
unstructured
Pig offers a large
set of operators
such as join, filter
and so on
Allows multiple
queries to process
parallelly
Pig lets us create
User-defined
Functions
Features of Pig
Ease of programming
as Pig Latin is similar
to SQL. Lesser lines
of code needs to be
written
Short development
time as the code is
simpler
Handles all kind of data
like structured, semi
structured and
unstructured
Pig offers a large
set of operators
such as join, filter
and so on
Allows multiple
queries to process
parallelly
Optimization and
compilation is easy
as it is done
automatically and
internally
Pig lets us create
User-defined
Functions
Demo
Key Takeaways
Key Differences Between MapReduce, Hive and Pig

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Introduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingIntroduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingCloudera, Inc.
 
Batch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing DifferenceBatch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing Differencejeetendra mandal
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path ForwardAlluxio, Inc.
 
Apache Arrow Flight Overview
Apache Arrow Flight OverviewApache Arrow Flight Overview
Apache Arrow Flight OverviewJacques Nadeau
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRavi Teja
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data StreamsSujaAldrin
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationDatabricks
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map ReduceApache Apex
 

Was ist angesagt? (20)

Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Introduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingIntroduction to Apache Spark Developer Training
Introduction to Apache Spark Developer Training
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Batch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing DifferenceBatch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing Difference
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
 
Apache Arrow Flight Overview
Apache Arrow Flight OverviewApache Arrow Flight Overview
Apache Arrow Flight Overview
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
 
Spark
SparkSpark
Spark
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
 
HDFS Architecture
HDFS ArchitectureHDFS Architecture
HDFS Architecture
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 

Ähnlich wie Key Differences Between MapReduce, Hive and Pig

Hadoop Online training from www. Imaginelife.in
Hadoop Online training from www. Imaginelife.inHadoop Online training from www. Imaginelife.in
Hadoop Online training from www. Imaginelife.inImagine life
 
PyData Frankfurt - (Efficient) Data Exchange with "Foreign" Ecosystems
PyData Frankfurt - (Efficient) Data Exchange with "Foreign" EcosystemsPyData Frankfurt - (Efficient) Data Exchange with "Foreign" Ecosystems
PyData Frankfurt - (Efficient) Data Exchange with "Foreign" EcosystemsUwe Korn
 
Introduction to pig.
Introduction to pig.Introduction to pig.
Introduction to pig.Triloki Gupta
 
Best hadoop-online-training
Best hadoop-online-trainingBest hadoop-online-training
Best hadoop-online-trainingGeohedrick
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlySarah Guido
 
FEC2017-Introduction-to-programming
FEC2017-Introduction-to-programmingFEC2017-Introduction-to-programming
FEC2017-Introduction-to-programmingHenrikki Tenkanen
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online trainingsrikanthhadoop
 
JustEnoughDevOpsForDataScientists
JustEnoughDevOpsForDataScientistsJustEnoughDevOpsForDataScientists
JustEnoughDevOpsForDataScientistsAnya Bida
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudRightScale
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdbjixuan1989
 
Hadoop online training in india
Hadoop online training  in indiaHadoop online training  in india
Hadoop online training in indiaMadhu Trainer
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingHadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingYahoo Developer Network
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogC4Media
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talksyhadoop
 
Converging Big Data and Application Infrastructure by Steven Poutsy
Converging Big Data and Application Infrastructure by Steven PoutsyConverging Big Data and Application Infrastructure by Steven Poutsy
Converging Big Data and Application Infrastructure by Steven PoutsyBig Data Spain
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune amrutupre
 

Ähnlich wie Key Differences Between MapReduce, Hive and Pig (20)

Hadoop Online training from www. Imaginelife.in
Hadoop Online training from www. Imaginelife.inHadoop Online training from www. Imaginelife.in
Hadoop Online training from www. Imaginelife.in
 
Apache pig
Apache pigApache pig
Apache pig
 
PyData Frankfurt - (Efficient) Data Exchange with "Foreign" Ecosystems
PyData Frankfurt - (Efficient) Data Exchange with "Foreign" EcosystemsPyData Frankfurt - (Efficient) Data Exchange with "Foreign" Ecosystems
PyData Frankfurt - (Efficient) Data Exchange with "Foreign" Ecosystems
 
Apache PIG
Apache PIGApache PIG
Apache PIG
 
Introduction to pig.
Introduction to pig.Introduction to pig.
Introduction to pig.
 
Best hadoop-online-training
Best hadoop-online-trainingBest hadoop-online-training
Best hadoop-online-training
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
 
Ruby - The Hard Bits
Ruby - The Hard BitsRuby - The Hard Bits
Ruby - The Hard Bits
 
Unit V.pdf
Unit V.pdfUnit V.pdf
Unit V.pdf
 
FEC2017-Introduction-to-programming
FEC2017-Introduction-to-programmingFEC2017-Introduction-to-programming
FEC2017-Introduction-to-programming
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online training
 
JustEnoughDevOpsForDataScientists
JustEnoughDevOpsForDataScientistsJustEnoughDevOpsForDataScientists
JustEnoughDevOpsForDataScientists
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the Cloud
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdb
 
Hadoop online training in india
Hadoop online training  in indiaHadoop online training  in india
Hadoop online training in india
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingHadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data Processing
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Converging Big Data and Application Infrastructure by Steven Poutsy
Converging Big Data and Application Infrastructure by Steven PoutsyConverging Big Data and Application Infrastructure by Steven Poutsy
Converging Big Data and Application Infrastructure by Steven Poutsy
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
 

Mehr von Simplilearn

ChatGPT in Cybersecurity
ChatGPT in CybersecurityChatGPT in Cybersecurity
ChatGPT in CybersecuritySimplilearn
 
Whatis SQL Injection.pptx
Whatis SQL Injection.pptxWhatis SQL Injection.pptx
Whatis SQL Injection.pptxSimplilearn
 
Top 5 High Paying Cloud Computing Jobs in 2023
 Top 5 High Paying Cloud Computing Jobs in 2023  Top 5 High Paying Cloud Computing Jobs in 2023
Top 5 High Paying Cloud Computing Jobs in 2023 Simplilearn
 
Types Of Cloud Jobs In 2024
Types Of Cloud Jobs In 2024Types Of Cloud Jobs In 2024
Types Of Cloud Jobs In 2024Simplilearn
 
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...Simplilearn
 
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...Simplilearn
 
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...Simplilearn
 
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...Simplilearn
 
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...Simplilearn
 
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...Simplilearn
 
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...Simplilearn
 
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...Simplilearn
 
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...Simplilearn
 
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...Simplilearn
 
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...Simplilearn
 
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...Simplilearn
 
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...Simplilearn
 
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...Simplilearn
 
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...Simplilearn
 
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...Simplilearn
 

Mehr von Simplilearn (20)

ChatGPT in Cybersecurity
ChatGPT in CybersecurityChatGPT in Cybersecurity
ChatGPT in Cybersecurity
 
Whatis SQL Injection.pptx
Whatis SQL Injection.pptxWhatis SQL Injection.pptx
Whatis SQL Injection.pptx
 
Top 5 High Paying Cloud Computing Jobs in 2023
 Top 5 High Paying Cloud Computing Jobs in 2023  Top 5 High Paying Cloud Computing Jobs in 2023
Top 5 High Paying Cloud Computing Jobs in 2023
 
Types Of Cloud Jobs In 2024
Types Of Cloud Jobs In 2024Types Of Cloud Jobs In 2024
Types Of Cloud Jobs In 2024
 
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...
 
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
 
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
 
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
 
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
 
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
 
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
 
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
 
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
 
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
 
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
 
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
 
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
 
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
 
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
 
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
 

Kürzlich hochgeladen

Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 

Kürzlich hochgeladen (20)

Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 

Key Differences Between MapReduce, Hive and Pig

  • 3. What’s in it for you? 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes 8. Use case – Twitter 9. Features of Pig Let’s get started with Pig! 1. Why Pig?
  • 4. What’s in it for you? 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes 8. Use case – Twitter 9. Features of Pig Let’s get started with Pig! 1. Why Pig? 2. What is Pig?
  • 5. What’s in it for you? 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes 8. Use case – Twitter 9. Features of Pig Let’s get started with Pig! 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig
  • 6. What’s in it for you? 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes 8. Use case – Twitter 9. Features of Pig Let’s get started with Pig! 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture
  • 7. What’s in it for you? 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes 8. Use case – Twitter 9. Features of Pig Let’s get started with Pig! 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig
  • 8. What’s in it for you? 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes 8. Use case – Twitter 9. Features of Pig Let’s get started with Pig! 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model
  • 9. What’s in it for you? 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes 8. Use case – Twitter 9. Features of Pig Let’s get started with Pig! 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes
  • 10. What’s in it for you? 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes 8. Use case – Twitter 9. Features of Pig Let’s get started with Pig! 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes 8. Use case – Twitter
  • 11. What’s in it for you? 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes 8. Use case – Twitter 9. Features of Pig Let’s get started with Pig! 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes 8. Use case – Twitter 9. Features of Pig
  • 13. As we all know, Hadoop uses MapReduce to analyze and process big data Why Pig?
  • 14. As we all know, Hadoop uses MapReduce to analyze and process big data Before Processing Big Data consumed more time Why Pig?
  • 15. As we all know, Hadoop uses MapReduce to analyze and process big data Before Processing Big Data consumed more time Why Pig?
  • 16. As we all know, Hadoop uses MapReduce to analyze and process big data Before Processing Big Data consumed more time Processing Big Data was faster using Mapreduce Why Pig? After
  • 17. As we all know, Hadoop uses MapReduce to analyze and process big data Before Processing Big Data consumed more time Processing Big Data was faster using Mapreduce Why Pig? AfterThen, what is the problem with MapReduce ?
  • 18. Prior to 2006, all MapReduce programs were written in Java Why Pig?
  • 19. Non-programmers found it difficult to write lengthy Java codes They faced issues in incorporating map, sort, reduce fundamentals of MapReduce while creating a program Eventually, it became a difficult task to maintain and optimize the code due to which the processing time increased Map phase Shuffle and sort Reduce phase Prior to 2006, all MapReduce programs were written in Java Why Pig?
  • 20. Non-programmers found it difficult to write lengthy Java codes They faced issues in incorporating map, sort, reduce fundamentals of MapReduce while creating a program Eventually, it became a difficult task to maintain and optimize the code due to which the processing time increased Map phase Shuffle and sort Reduce phase Prior to 2006, all MapReduce programs were written in Java Why Pig?
  • 21. Non-programmers found it difficult to write lengthy Java codes They faced issues in incorporating map, sort, reduce fundamentals of MapReduce while creating a program Eventually, it became a difficult task to maintain and optimize the code due to which the processing time increased Map phase Shuffle and sort Reduce phase Prior to 2006, all MapReduce programs were written in Java Why Pig?
  • 22. Why Pig? Yahoo faced problems to process and analyze large datasets using Java as the codes were complex and lengthy Problem
  • 23. Why Pig? Yahoo faced problems to process and analyze large datasets using Java as the codes were complex and lengthy There was a necessity to develop an easier way to analyze large datasets without using time consuming complex Java codes Problem Necessity
  • 24. Why Pig? Yahoo faced problems to process and analyze large datasets using Java as the codes were complex and lengthy • Apache Pig was developed by Yahoo researchers. • It was developed with a vision to analyze and process large datasets without using complex Java codes. Pig was developed especially for non-programmers. • Pig used simple steps to analyze datasets which was time efficient. Problem Necessity Solution There was a necessity to develop an easier way to analyze large datasets without using time consuming complex Java codes
  • 26. What is Pig? Pig is a scripting platform that runs on Hadoop clusters, designed to process and analyze large datasets
  • 27. What is Pig? Pig is a scripting platform that runs on Hadoop clusters, designed to process and analyze large datasets Uses SQL like queries Analyze data
  • 28. What is Pig? Pig is a scripting platform that runs on Hadoop clusters, designed to process and analyze large datasets Pig operates on various types of data like structured, semi-structured and unstructured data Uses SQL like queries Analyze data
  • 30. How is Blockchain distributed ledger different from a traditional ledger? SQL like query Scripting language vs vs Complied language MapReduce vs Hive vs Pig Need to write long complex codes Lower level of abstraction No need to write complex codes Higher level of abstraction Can process structured, semi structured and unstructured data Higher level of abstraction Can process only structured data Can process structured, semi structured and unstructured data No need to write complex codes
  • 31. How is Blockchain distributed ledger different from a traditional ledger? SQL like query Scripting language vs vs Complied language MapReduce vs Hive vs Pig Need to write long complex codes Lower level of abstraction No need to write complex codes Higher level of abstraction Can process structured, semi structured and unstructured data Higher level of abstraction Can process only structured data Can process structured, semi structured and unstructured data No need to write complex codes
  • 32. How is Blockchain distributed ledger different from a traditional ledger? SQL like query Scripting language vs vs Complied language MapReduce vs Hive vs Pig Need to write long complex codes Lower level of abstraction No need to write complex codes Higher level of abstraction Can process structured, semi structured and unstructured data Higher level of abstraction Can process only structured data Can process structured, semi structured and unstructured data No need to write complex codes
  • 33. How is Blockchain distributed ledger different from a traditional ledger? SQL like query Scripting language vs vs Complied language MapReduce vs Hive vs Pig Need to write long complex codes Lower level of abstraction No need to write complex codes Higher level of abstraction Can process structured, semi structured and unstructured data Higher level of abstraction Can process only structured data Can process structured, semi structured and unstructured data No need to write complex codes
  • 34. How is Blockchain distributed ledger different from a traditional ledger? SQL like query Scripting language vs vs Complied language MapReduce vs Hive vs Pig Need to write long complex codes Lower level of abstraction No need to write complex codes Higher level of abstraction Can process structured, semi structured and unstructured data Higher level of abstraction Can process only structured data Can process structured, semi structured and unstructured data No need to write complex codes
  • 35. How is Blockchain distributed ledger different from a traditional ledger? SQL like query Scripting language vs vs Complied language MapReduce vs Hive vs Pig Need to write long complex codes Lower level of abstraction No need to write complex codes Higher level of abstraction Can process structured, semi structured and unstructured data Higher level of abstraction No need to write complex codes Can process only structured data Can process structured, semi structured and unstructured data This is the advantage Pig has over Hive
  • 36. How is Blockchain distributed ledger different from a traditional ledger? vs vs Supports partitioning feature MapReduce vs Hive vs Pig MapReduce uses Java and Python Code performance is good Hive uses a SQL like query language known as HiveQL Code performance is lesser than MapReduce and Pig MapReduce is used by programmers Code performance is lesser than MapReduce but better than Hive Hive is used by data analysts Pig is used by researchers and programmers Pig Latin is used which is a procedural data flow language Supports partitioning feature No concept of partitioning in Pig
  • 37. How is Blockchain distributed ledger different from a traditional ledger? vs vs Supports partitioning feature MapReduce vs Hive vs Pig MapReduce uses Java and Python Code performance is good Hive uses a SQL like query language known as HiveQL Code performance is lesser than MapReduce and Pig MapReduce is used by programmers Code performance is lesser than MapReduce but better than Hive Hive is used by data analysts Pig is used by researchers and programmers Pig Latin is used which is a procedural data flow language Supports partitioning feature No concept of partitioning in Pig
  • 38. How is Blockchain distributed ledger different from a traditional ledger? vs vs Supports partitioning feature MapReduce vs Hive vs Pig MapReduce uses Java and Python Code performance is good Hive uses a SQL like query language known as HiveQL Code performance is lesser than MapReduce and Pig MapReduce is used by programmers Code performance is lesser than MapReduce but better than Hive Hive is used by data analysts Pig is used by researchers and programmers Pig Latin is used which is a procedural data flow language Supports partitioning feature No concept of partitioning in Pig
  • 39. How is Blockchain distributed ledger different from a traditional ledger? vs vs Supports partitioning feature MapReduce vs Hive vs Pig MapReduce uses Java and Python Code performance is good Hive uses a SQL like query language known as HiveQL Code performance is lesser than MapReduce and Pig MapReduce is used by programmers Code performance is lesser than MapReduce but better than Hive Hive is used by data analysts Pig is used by researchers and programmers Pig Latin is used which is a procedural data flow language Supports partitioning feature No concept of partitioning in Pig
  • 41. Components of Pig Pig has two components
  • 42. Components of Pig Pig has two components Runtime engine Pig Latin Pig Latin is the procedural data flow language used in Pig to analyze data It is easy to program using Pig Latin as it is similar to SQL Runtime engine represents the execution environment created to run Pig Latin programs It is also a compiler that produces MapReduce programs Uses HDFS for storing and retrieving data
  • 43. Components of Pig Pig has two components Pig Latin Runtime engine Pig Latin is the procedural data flow language used in Pig to analyze data It is easy to program using Pig Latin as it is similar to SQL It is also a compiler that produces MapReduce programs Uses HDFS for storing and retrieving data Runtime engine represents the execution environment created to run Pig Latin programs
  • 45. Pig architecture There are 3 ways to execute the written Pig script Pig Latin Scripts Programmers write a script In Pig Latin to analyze data using Pig
  • 46. Pig architecture There are 3 ways to execute the written Pig script Pig Latin Scripts Grunt Shell Grunt Shell is Pig’s interactive shell which is used to execute all Pig scripts
  • 47. Pig architecture There are 3 ways to execute the written Pig script Pig Latin Scripts Grunt Shell Pig Server If the Pig script is written in a script file, the execution is done by the Pig Server
  • 48. Pig architecture There are 3 ways to execute the written Pig script Pig Latin Scripts Grunt Shell Pig Server Parser Parser checks the syntax of the Pig script After checking, the output will be a DAG – Directed Acyclic Graph
  • 49. Pig architecture There are 3 ways to execute the written Pig script Pig Latin Scripts Grunt Shell Pig Server Parser Optimizer DAG (logical plan) is passed to the logical Optimizer where optimizations take place
  • 50. Pig architecture There are 3 ways to execute the written Pig script Pig Latin Scripts Grunt Shell Pig Server Parser Optimizer Compiler The Compiler converts the DAG into MapReduce jobs
  • 51. Pig architecture There are 3 ways to execute the written Pig script Pig Latin Scripts Grunt Shell Pig Server Parser Optimizer Compiler Execution Engine The MapReduce jobs are executed at the Execution Engine The results are displayed using “DUMP” statement and stored in HDFS using “STORE” statement
  • 52. Pig architecture There are 3 ways to execute the written Pig script Pig Latin Scripts Grunt Shell Pig Server Parser Optimizer Compiler Execution Engine MapReduce HDFS
  • 54. Working of Pig Load data and write Pig script Pig Latin script is written by the users 1
  • 55. Working of Pig Load data and write Pig script Pig operations In this step, all the Pig operations are performed by parser, optimizer and compiler 21
  • 56. Working of Pig Load data and write Pig script Pig operations Execution of the plan In this stage, the results are shown on the screen otherwise stored in HDFS as per the code 1 2 3
  • 58. Pig Latin data model The data model of Pig Latin helps Pig to handle various types of data
  • 59. Pig Latin data model The data model of Pig Latin helps Pig to handle various types of data Atom represents any single value of primitive data type in Pig Latin like int, float, string. It is stored as string Examples ‘Rob’ or 50 Atom
  • 60. Pig Latin data model The data model of Pig Latin helps Pig to handle various types of data Examples ‘Rob’ or 50 Atom Tuple (Rob,5) Tuple represents sequence of fields that can be of any data type. It is same as a row in RDBMS i.e.; a set of data from a single row
  • 61. Pig Latin data model The data model of Pig Latin helps Pig to handle various types of data Examples ‘Rob’ or 50 Atom Tuple (Rob,5) Bag {(Rob,5),( Mike,10} Bag is a collection of tuples. It is the same as a table in RDBMS. It is represented by ‘{}’
  • 62. Pig Latin data model The data model of Pig Latin helps Pig to handle various types of data Examples ‘Rob’ or 50 Atom Tuple (Rob,5) Bag {(Rob,5),( Mike,10} Map [name#Mi ke, age#10] Map is a set of key-value pairs. Key is of chararray type and value can be of any type. It is represented by ‘[]’
  • 63. Pig Latin data model The data model of Pig Latin helps Pig to handle various types of data Examples ‘Rob’ or 50 Atom Tuple (Rob,5) Bag {(Rob,5),( Mike,10} Map [name#Mi ke, age#10] Map is a set of key-value pairs. Key is of chararray type and value can be of any type. It is represented by ‘[]’ Pig Latin has a fully nestable data model that means one data type can be nested with another
  • 64. Pig Latin data model Here is a diagrammatical representation of the Pig Latin data model Sl. no Name Age Place 01 02 03 Jack Bob Joe 23 25 29 Goa London California
  • 65. Pig Latin data model Here is a diagrammatical representation of the Pig Latin data model Sl. no Name Age Place 01 02 03 Jack Bob Joe 23 25 29 Goa London California Field
  • 66. Pig Latin data model Here is a diagrammatical representation of the Pig Latin data model Sl. no Name Age Place 01 02 03 Jack Bob Joe 23 25 29 Goa London California Tuple
  • 67. Pig Latin data model Here is a diagrammatical representation of the Pig Latin data model Sl. no Name Age Place 01 02 03 Jack Bob Joe 23 25 29 Goa London California }Bag
  • 69. Pig Execution modes Pig works in two execution modes. Depending on where the data is residing and where the Pig script is going to run
  • 70. Pig Execution modes Pig works in two execution modes. Depending on where the data is residing and where the Pig script is going to run Local Mode MapReduce Mode
  • 71. Pig Execution modes Local Mode MapReduce Mode Here, the Pig engine takes input from the Linux file system and the output is stored in the same file system Local Mode is useful in analyzing small datasets using Pig Pig works in two execution modes. Depending on where the data is residing and where the Pig script is going to run
  • 72. Pig Execution modes Local Mode MapReduce Mode Pig works in two execution modes. Depending on where the data is residing and where the Pig script is going to run Here, the Pig engine directly interacts and executes in HDFS and MapReduce In the MapReduce mode, queries written in Pig Latin are translated into MapReduce jobs and are run on a Hadoop cluster. By default, Pig runs on this mode
  • 73. Pig Execution modes There are three modes in Pig, depending on how a Pig Latin code can be written
  • 74. Pig Execution modes Interactive Mode Batch Mode Embedded Mode There are three modes in Pig, depending on how a Pig Latin code can be written
  • 75. Pig Execution modes Interactive Mode Batch Mode Embedded Mode Interactive mode means coding and executing the script, line by line There are three modes in Pig, depending on how a Pig Latin code can be written
  • 76. Pig Execution modes Interactive Mode Batch Mode Embedded Mode There are three modes in Pig, depending on how a Pig Latin code can be written In Batch mode, all scripts are coded in a file with the extension .pig and the file is directly executed
  • 77. Pig Execution modes Interactive Mode Batch Mode Embedded Mode There are three modes in Pig, depending on how a Pig Latin code can be written Pig lets it’s users define their own functions (UDFs) in programming languages such as Java
  • 79. Use case – Twitter Users on Twitter generate about 500 million tweets on a daily basis
  • 80. Use case – Twitter Users on Twitter generate about 500 million tweets on a daily basis Hadoop MapReduce was used to process and analyze this data Analyzing the number of tweets created by a user in the tweet table was done using MapReduce in Java programming language
  • 81. Use case – Twitter Users on Twitter generate about 500 million tweets on a daily basis Hadoop MapReduce was used to process and analyze this data Analyzing the number of tweets created by a user in the tweet table was done using MapReduce in Java programming language It was difficult to perform MapReduce operations as users were not well versed with writing complex Java codes
  • 82. Use case – Twitter The problems that were faced by Twitter while analyzing datasets using MapReduce were : Joining Datasets Sorting Datasets Grouping Datasets It was difficult to perform these operations on MapReduce as it consumed more time since the Java codes were lengthy and complex Twitter used Apache Pig to overcome these problems. Let’s see how.
  • 83. Use case – Twitter Problem statement Analyze the user table and tweet table and find out how many tweets are created by a person
  • 84. Use case – Twitter ID Name 1 2 3 Alice Tim John User Table Tweet Table 1 2 1 3 1 2 Google…. Tennis… Spacecraft… Oscar… Politics..… Olympics… ID Tweet Problem statement Analyze the user table and tweet table and find out how many tweets are created by a person
  • 85. Use case – Twitter ID Name 1 2 3 Alice Tim John User Table Tweet Table 1 2 1 3 1 2 Google…. Tennis… Spacecraft… Oscar… Politics..… Olympics… ID Tweet Problem statement The following operations were performed for analyzing the given data Analyze the user table and tweet table and find out how many tweets are created by a person
  • 86. Use case – Twitter ID Name 1 2 3 Alice Tim John User Table Tweet Table 1 2 1 3 1 2 Google... Tennis... Spacecraft... Oscar... Politics... Olympics... ID Tweet First, the twitter data is loaded onto the Pig storage using LOAD command
  • 87. Use case – Twitter ID Name 1 2 3 Alice Tim John User Table Tweet Table 1 2 1 3 1 2 Google... Tennis... Spacecraft... Oscar... Politics... Olympics... ID Tweet ID Name 1 2 3 Alice Tim John User Table Tweet Table 1 2 1 3 1 2 Google... Tennis... Spacecraft... Oscar... Politics... Olympics... ID Tweet First, the twitter data is loaded onto the Pig storage using LOAD command
  • 88. Use case – Twitter In join and group operation, the tweet and user tables are joined and grouped using COGROUP command ID Name Tweet 1 1 2 1 2 3 Alice Alice Alice Tim Tim John Google... Spacecraft... Politics... Tennis... Oscar... Olympics... The remaining operations performed are shown below
  • 89. Use case – Twitter In join and group operation, the tweet and user tables are joined and grouped using COGROUP command ID Count 1 2 3 3 2 1 The next operation is the aggregation, the tweets are counted according to the names. The command used is COUNT The remaining operations performed are shown below
  • 90. Use case – Twitter The remaining operations performed are shown below ID 1 2 3 Name Count 3 2 1 Alice Tim John In join and group operation, the tweet and user tables are joined and grouped using COGROUP command The result after the count operation is joined with the user table to find out the user name The next operation is the aggregation, the tweets are counted according to the names. The command used is COUNT
  • 91. Use case – Twitter The remaining operations performed are shown below ID 1 2 3 Name Count 3 2 1 Alice Tim John In join and group operation, the tweet and user tables are joined and grouped using COGROUP command The result after the count operation is joined with the user table to find out the user name The next operation is the aggregation, the tweets are counted according to the names. The command used is COUNT Pig reduces the complexity of the operations which would have been lengthier using MapReduce
  • 92. Use case – Twitter The remaining operations performed are shown below ID 1 2 3 Name Count 3 2 1 Alice Tim John In join and group operation, the tweet and user tables are joined and grouped using COGROUP command The result after the count operation is joined with the user table to find out the user name The next operation is the aggregation, the tweets are counted according to the names. The command used is COUNT Finally, we could find out the number of tweets created by a user in a simple way
  • 93. Optimization and compilation is easy as it is done automatically and internally Allows multiple queries to process parallelly Pig offers a large set of operators such as join, filter and so on Pig lets us create User-defined Functions Handles all kind of data like structured, semi structured and unstructured Short development time as the code is simpler Features of Pig Ease of programming as Pig Latin is similar to SQL. Lesser lines of code needs to be written
  • 94. Optimization and compilation is easy as it is done automatically and internally Allows multiple queries to process parallelly Pig offers a large set of operators such as join, filter and so on Handles all kind of data like structured, semi structured and unstructured Features of Pig Ease of programming as Pig Latin is similar to SQL. Lesser lines of code needs to be written Short development time as the code is simpler Pig lets us create User-defined Functions
  • 95. Optimization and compilation is easy as it is done automatically and internally Allows multiple queries to process parallelly Pig offers a large set of operators such as join, filter and so on Features of Pig Ease of programming as Pig Latin is similar to SQL. Lesser lines of code needs to be written Short development time as the code is simpler Handles all kind of data like structured, semi structured and unstructured Pig lets us create User-defined Functions
  • 96. Optimization and compilation is easy as it is done automatically and internally Allows multiple queries to process parallelly Pig offers a large set of operators such as join, filter and so on Features of Pig Ease of programming as Pig Latin is similar to SQL. Lesser lines of code needs to be written Short development time as the code is simpler Handles all kind of data like structured, semi structured and unstructured Pig lets us create User-defined Functions
  • 97. Optimization and compilation is easy as it is done automatically and internally Allows multiple queries to process parallelly Features of Pig Ease of programming as Pig Latin is similar to SQL. Lesser lines of code needs to be written Short development time as the code is simpler Handles all kind of data like structured, semi structured and unstructured Pig offers a large set of operators such as join, filter and so on Pig lets us create User-defined Functions
  • 98. Optimization and compilation is easy as it is done automatically and internally Features of Pig Ease of programming as Pig Latin is similar to SQL. Lesser lines of code needs to be written Short development time as the code is simpler Handles all kind of data like structured, semi structured and unstructured Pig offers a large set of operators such as join, filter and so on Allows multiple queries to process parallelly Pig lets us create User-defined Functions
  • 99. Features of Pig Ease of programming as Pig Latin is similar to SQL. Lesser lines of code needs to be written Short development time as the code is simpler Handles all kind of data like structured, semi structured and unstructured Pig offers a large set of operators such as join, filter and so on Allows multiple queries to process parallelly Optimization and compilation is easy as it is done automatically and internally Pig lets us create User-defined Functions
  • 100. Demo

Hinweis der Redaktion

  1. Style - 01
  2. Style - 01
  3. Style - 01
  4. Style - 01
  5. Style - 01
  6. Style - 01
  7. Style - 01
  8. Style - 01
  9. Style - 01
  10. Style - 01
  11. Style - 01
  12. Style - 01
  13. Style - 01
  14. Style - 01
  15. Style - 01
  16. Style - 01
  17. Style - 01
  18. Style - 01
  19. Style - 01
  20. Style - 01
  21. Style - 01
  22. Style - 01
  23. Style - 01
  24. Style - 01
  25. Style - 01
  26. Style - 01
  27. Style - 01
  28. Style - 01
  29. Style - 01
  30. Style - 01
  31. Style - 01
  32. Style - 01
  33. Style - 01
  34. Style - 01
  35. Style - 01
  36. Style - 01
  37. Style - 01
  38. Style - 01
  39. Style - 01
  40. Style - 01
  41. Style - 01
  42. Style - 01
  43. Style - 01
  44. Style - 01
  45. Style - 01
  46. Style - 01
  47. Style - 01
  48. Style - 01
  49. Style - 01
  50. Style - 01
  51. Style - 01
  52. Style - 01
  53. Style - 01
  54. Style - 01
  55. Style - 01
  56. Style - 01
  57. Style - 01
  58. Style - 01
  59. Style - 01
  60. Style - 01
  61. Style - 01
  62. Style - 01
  63. Style - 01
  64. Style - 01
  65. Style - 01
  66. Style - 01
  67. Style - 01
  68. Style - 01
  69. Style - 01
  70. Style - 01
  71. Style - 01
  72. Style - 01
  73. Style - 01
  74. Style - 01
  75. Style - 01
  76. Style - 01
  77. Style - 01
  78. Style - 01
  79. Style - 01
  80. Style - 01