Apache Pig

 It is a part of Hadoop ecosystem which is used to process
large datasets.
 Used to automate ETL for unstructured data.
 It’s a procedural language.
 Used by both Data analyst & developers.
 The language used here is called Pig Latin.
 Pig relations are non-persistent across sessions.

 Local Mode :
 In Local Mode of Pig execution, all the input data will be
taken from local file system. After execution it provides
output on top of local file system.
 This mode of suitable only for small datasets and when
trying out Pig.
 To start the local mode of execution, the following command
is used. pig -x local

 Mapreduce Mode :
 In this mode Apache Pig will take the input form HDFS paths
only, and after processing data it will put output files on top
of HDFS.
 In MapReduce mode of execution, Pig translates queries into
MapReduce jobs and runs them on a Hadoop Cluster.
 This is the default mode of execution.
 To start the Mapreduce mode of execution, the following
command is used. pig -x mapreduce

 Tez Mode :
 Tez mode is more flexible and faster than Mapreduce mode
but lack some performance issues.
 To run Pig inTez mode, you need access to a Hadoop cluster
and HDFS installation.
 To start the tez mode of execution, the following command
is used. pig -x tez

 Use the LOAD operator to load the data into a table.
 Use the STORE operator to store the data into another
location.
 Use the DUMP operator to display results to your terminal
screen.
 Use the DESCRIBE operator to review the schema of a
relation.

 While Loading data into a table if column name is not
specified in a table then we can name the columns from
starting as $0, $1,$2….

Apache Pig

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (13)

Similar to Apache Pig

Similar to Apache Pig (20)

More from Abhishek Gautam

More from Abhishek Gautam (7)

Recently uploaded

Recently uploaded (20)

Apache Pig