SlideShare ist ein Scribd-Unternehmen logo
1 von 15
PIG
Mike Unwin
Twitter: @mjunwin
Why are we talking about Pig?


Originally developed at Yahoo! now an
apache project



Engine for executing data flows in parallel
on Hadoop



Includes a language called Pig Latin for
expressing data flows



Easy to learn and extensible



Open source
What is a data flow language


Allows us to describe how data should be loaded, read,
processed and stored.



Can be simple linear flows e.g. word count



Complex workflows that include joins
Is it like SQL?


Pig Latin does look a bit like SQL e.g. Join, Group By



But SQL is declarative



In Pig you describe how the data flows



SQL you end up producing an inside out query whereas
with Pig you describe a pipeline.
SQL example
SELECT CustomerName,TotalOrders, PostCode
FROM Customers c

INNER JOIN
(
SELECT CustomerId, count(OrderId) as
FROM Orders
GROUP BY CustomerId
) as t on t.CustomerId = c.CustomerId
Same Query in Pig
orders = load ‘Orders’ as (CustomerId, OrderId);
grouped = group orders by CustomerId;

total = foreach grouped generate group,
COUNT(OrderId)
customer = load ‘Customers’ as (CustomerId,
CustomerName)

result = join total by group, customer by customerId
dump result;
Installing Pig


http://pig.apache.org/docs/r0.11.1/



Requires Java



Hadoop (it does have a built in version of hadoop which
is currently v0.20.2.)



Requires Cygwin on windows
What do you get?

Pig

Grunt Shell

Piggy Bank
Basic Pig Operators


FOREACH



FILTER



GROUP BY



ORDER BY



UNION



CROSS
Same Query in Pig
orders = load ‘Orders’ as (CustomerId, OrderId);
grouped = group orders by CustomerId;

total = foreach grouped generate group,
COUNT(OrderId)
customer = load ‘Customers’ as (CustomerId,
CustomerName)

result = join total by group, customer by customerId
dump result;
Debugging


Describe



Explain
How does Pig become a MR job?
Advantages of Pig


Easy to learn



Can achieve a lot with a small amount of code


E.g. Join example



Well written scripts can be easy to read and easy to
maintain



Has a local mode for testing scripts



Has a unit testing framework
Limitations of Pig


Unit testing



High level – often need to drop down into custom UDFs



If you are proficient at C# or F# sometimes this can be
easier to test e.g. Streaming unit allows unit testing.



Still doesn’t play nicely in a windows environment
http://elastastorage.blob.core.windows.n
et/hdinsight/PigOnHDInsight.pdf

Weitere ähnliche Inhalte

Was ist angesagt?

초보자를 위한 분산 캐시 이야기
초보자를 위한 분산 캐시 이야기초보자를 위한 분산 캐시 이야기
초보자를 위한 분산 캐시 이야기
OnGameServer
 
Wordpress optimization
Wordpress optimizationWordpress optimization
Wordpress optimization
paudelvinay
 
Introduction to Thrift
Introduction to ThriftIntroduction to Thrift
Introduction to Thrift
Dvir Volk
 
Migrating a Site Quickly with SSH and WP-CLI (It's not as scary as you think!)
Migrating a Site Quickly with SSH and WP-CLI (It's not as scary as you think!)Migrating a Site Quickly with SSH and WP-CLI (It's not as scary as you think!)
Migrating a Site Quickly with SSH and WP-CLI (It's not as scary as you think!)
Japheth Thomson
 

Was ist angesagt? (20)

Drupal Installation & Configuration
Drupal Installation & ConfigurationDrupal Installation & Configuration
Drupal Installation & Configuration
 
Ui perf
Ui perfUi perf
Ui perf
 
초보자를 위한 분산 캐시 이야기
초보자를 위한 분산 캐시 이야기초보자를 위한 분산 캐시 이야기
초보자를 위한 분산 캐시 이야기
 
[drupalday2017] - Drupal come frontend che consuma servizi: HTTP Client Manager
[drupalday2017] - Drupal come frontend che consuma servizi: HTTP Client Manager[drupalday2017] - Drupal come frontend che consuma servizi: HTTP Client Manager
[drupalday2017] - Drupal come frontend che consuma servizi: HTTP Client Manager
 
WordPress Need For Speed
WordPress Need For SpeedWordPress Need For Speed
WordPress Need For Speed
 
Less and faster – Cache tips for WordPress developers
Less and faster – Cache tips for WordPress developersLess and faster – Cache tips for WordPress developers
Less and faster – Cache tips for WordPress developers
 
Introduction to performance tuning perl web applications
Introduction to performance tuning perl web applicationsIntroduction to performance tuning perl web applications
Introduction to performance tuning perl web applications
 
Wordpress optimization
Wordpress optimizationWordpress optimization
Wordpress optimization
 
High Performance Drupal
High Performance DrupalHigh Performance Drupal
High Performance Drupal
 
Php
PhpPhp
Php
 
Search in WordPress - how it works and howto customize it
Search in WordPress - how it works and howto customize itSearch in WordPress - how it works and howto customize it
Search in WordPress - how it works and howto customize it
 
PHP BASIC PRESENTATION
PHP BASIC PRESENTATIONPHP BASIC PRESENTATION
PHP BASIC PRESENTATION
 
Translating WordPress themes and plugins WordCamp Bhopal 2015
Translating WordPress themes and plugins WordCamp Bhopal 2015Translating WordPress themes and plugins WordCamp Bhopal 2015
Translating WordPress themes and plugins WordCamp Bhopal 2015
 
Debugging Drupal - How to Debug your Drupal Application
Debugging Drupal - How to Debug your Drupal ApplicationDebugging Drupal - How to Debug your Drupal Application
Debugging Drupal - How to Debug your Drupal Application
 
High Performance - Joomla!Days NL 2009 #jd09nl
High Performance - Joomla!Days NL 2009 #jd09nlHigh Performance - Joomla!Days NL 2009 #jd09nl
High Performance - Joomla!Days NL 2009 #jd09nl
 
Improving PHP Application Performance with APC
Improving PHP Application Performance with APCImproving PHP Application Performance with APC
Improving PHP Application Performance with APC
 
PHP and PDFLib
PHP and PDFLibPHP and PDFLib
PHP and PDFLib
 
Introduction to Thrift
Introduction to ThriftIntroduction to Thrift
Introduction to Thrift
 
Migrating a Site Quickly with SSH and WP-CLI (It's not as scary as you think!)
Migrating a Site Quickly with SSH and WP-CLI (It's not as scary as you think!)Migrating a Site Quickly with SSH and WP-CLI (It's not as scary as you think!)
Migrating a Site Quickly with SSH and WP-CLI (It's not as scary as you think!)
 
I Can Haz More Performanz?
I Can Haz More Performanz?I Can Haz More Performanz?
I Can Haz More Performanz?
 

Andere mochten auch

Lesson 6 - power point presentation 3
Lesson 6  - power point presentation 3Lesson 6  - power point presentation 3
Lesson 6 - power point presentation 3
gerbs1010
 
Chapter 3 - Presentation 1
Chapter 3  - Presentation 1Chapter 3  - Presentation 1
Chapter 3 - Presentation 1
gerbs1010
 
Lesson 6 - power point presentation 4
Lesson 6  - power point presentation 4Lesson 6  - power point presentation 4
Lesson 6 - power point presentation 4
gerbs1010
 
Lesson 6 - power point presentation 2
Lesson 6  - power point presentation 2Lesson 6  - power point presentation 2
Lesson 6 - power point presentation 2
gerbs1010
 
Novo anexo iii_2013-11-14_10_17_52
Novo anexo iii_2013-11-14_10_17_52Novo anexo iii_2013-11-14_10_17_52
Novo anexo iii_2013-11-14_10_17_52
Resgate Cambuí
 
Guide to camera work and editing techniques
Guide to camera work and editing techniquesGuide to camera work and editing techniques
Guide to camera work and editing techniques
amieflan
 
Best iphone App for Video Resumes- Jobma
Best iphone App for Video Resumes- JobmaBest iphone App for Video Resumes- Jobma
Best iphone App for Video Resumes- Jobma
Antoine Lynd
 
Yamen_Sandouk-Syriatel_Praktikum
Yamen_Sandouk-Syriatel_PraktikumYamen_Sandouk-Syriatel_Praktikum
Yamen_Sandouk-Syriatel_Praktikum
Yamen Sandouk
 
Cameren battley journey to career
Cameren battley journey to careerCameren battley journey to career
Cameren battley journey to career
KingCameren
 
Seabolt, Michael L. Jr.
Seabolt, Michael L. Jr.  Seabolt, Michael L. Jr.
Seabolt, Michael L. Jr.
Mike Seabolt
 
小書從列印到折成
小書從列印到折成小書從列印到折成
小書從列印到折成
bell5
 
el maravilloso mundo de los cuentos
el maravilloso mundo de los cuentosel maravilloso mundo de los cuentos
el maravilloso mundo de los cuentos
vanina33l
 
Chapter 2 presentation - 1
Chapter 2 presentation  - 1Chapter 2 presentation  - 1
Chapter 2 presentation - 1
gerbs1010
 
Chapter 2 presentation - 2
Chapter 2 presentation  - 2Chapter 2 presentation  - 2
Chapter 2 presentation - 2
gerbs1010
 

Andere mochten auch (20)

Lesson 6 - power point presentation 3
Lesson 6  - power point presentation 3Lesson 6  - power point presentation 3
Lesson 6 - power point presentation 3
 
Chapter 3 - Presentation 1
Chapter 3  - Presentation 1Chapter 3  - Presentation 1
Chapter 3 - Presentation 1
 
Lesson 6 - power point presentation 4
Lesson 6  - power point presentation 4Lesson 6  - power point presentation 4
Lesson 6 - power point presentation 4
 
Lesson 6 - power point presentation 2
Lesson 6  - power point presentation 2Lesson 6  - power point presentation 2
Lesson 6 - power point presentation 2
 
Novo anexo iii_2013-11-14_10_17_52
Novo anexo iii_2013-11-14_10_17_52Novo anexo iii_2013-11-14_10_17_52
Novo anexo iii_2013-11-14_10_17_52
 
Gestion de desechos y reciclaje
Gestion de desechos y reciclajeGestion de desechos y reciclaje
Gestion de desechos y reciclaje
 
Guide to camera work and editing techniques
Guide to camera work and editing techniquesGuide to camera work and editing techniques
Guide to camera work and editing techniques
 
Sales Sheet
Sales SheetSales Sheet
Sales Sheet
 
Adriana esther rodriguez gafaro hardware.ppt.
Adriana esther rodriguez gafaro hardware.ppt.Adriana esther rodriguez gafaro hardware.ppt.
Adriana esther rodriguez gafaro hardware.ppt.
 
Best iphone App for Video Resumes- Jobma
Best iphone App for Video Resumes- JobmaBest iphone App for Video Resumes- Jobma
Best iphone App for Video Resumes- Jobma
 
Yamen_Sandouk-Syriatel_Praktikum
Yamen_Sandouk-Syriatel_PraktikumYamen_Sandouk-Syriatel_Praktikum
Yamen_Sandouk-Syriatel_Praktikum
 
Realnumbersystemnotes
RealnumbersystemnotesRealnumbersystemnotes
Realnumbersystemnotes
 
Cameren battley journey to career
Cameren battley journey to careerCameren battley journey to career
Cameren battley journey to career
 
Seabolt, Michael L. Jr.
Seabolt, Michael L. Jr.  Seabolt, Michael L. Jr.
Seabolt, Michael L. Jr.
 
Ministério firma parcerias para desenvolver 19 novos produtos de Saúde
Ministério firma parcerias para desenvolver 19 novos produtos de SaúdeMinistério firma parcerias para desenvolver 19 novos produtos de Saúde
Ministério firma parcerias para desenvolver 19 novos produtos de Saúde
 
小書從列印到折成
小書從列印到折成小書從列印到折成
小書從列印到折成
 
el maravilloso mundo de los cuentos
el maravilloso mundo de los cuentosel maravilloso mundo de los cuentos
el maravilloso mundo de los cuentos
 
1 نماذج من الشعر في العصر الجاهلي
1  نماذج من الشعر في العصر الجاهلي1  نماذج من الشعر في العصر الجاهلي
1 نماذج من الشعر في العصر الجاهلي
 
Chapter 2 presentation - 1
Chapter 2 presentation  - 1Chapter 2 presentation  - 1
Chapter 2 presentation - 1
 
Chapter 2 presentation - 2
Chapter 2 presentation  - 2Chapter 2 presentation  - 2
Chapter 2 presentation - 2
 

Ähnlich wie Introduction to Pig

Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan GateApache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Yahoo Developer Network
 
A Brief Introduce to WSGI
A Brief Introduce to WSGIA Brief Introduce to WSGI
A Brief Introduce to WSGI
Mingli Yuan
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Cloudera, Inc.
 
Matt Gauger - Lamp vs. the world - MKE PHP Users Group - December 14, 2010
Matt Gauger - Lamp vs. the world - MKE PHP Users Group - December 14, 2010 Matt Gauger - Lamp vs. the world - MKE PHP Users Group - December 14, 2010
Matt Gauger - Lamp vs. the world - MKE PHP Users Group - December 14, 2010
Matt Gauger
 

Ähnlich wie Introduction to Pig (20)

Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache Pig
 
Get your teeth into Plack
Get your teeth into PlackGet your teeth into Plack
Get your teeth into Plack
 
January 2011 HUG: Pig Presentation
January 2011 HUG: Pig PresentationJanuary 2011 HUG: Pig Presentation
January 2011 HUG: Pig Presentation
 
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan GateApache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
 
03 pig intro
03 pig intro03 pig intro
03 pig intro
 
Build your own discovery index of scholary e-resources
Build your own discovery index of scholary e-resourcesBuild your own discovery index of scholary e-resources
Build your own discovery index of scholary e-resources
 
A Brief Introduce to WSGI
A Brief Introduce to WSGIA Brief Introduce to WSGI
A Brief Introduce to WSGI
 
Apache Pig
Apache PigApache Pig
Apache Pig
 
node.js: Javascript's in your backend
node.js: Javascript's in your backendnode.js: Javascript's in your backend
node.js: Javascript's in your backend
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
NoSQL, Hadoop, Cascading June 2010
NoSQL, Hadoop, Cascading June 2010NoSQL, Hadoop, Cascading June 2010
NoSQL, Hadoop, Cascading June 2010
 
NodeJS
NodeJSNodeJS
NodeJS
 
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
 
Plugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsPlugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGems
 
"Xapi-lang For declarative code generation" By James Nelson
"Xapi-lang For declarative code generation" By James Nelson"Xapi-lang For declarative code generation" By James Nelson
"Xapi-lang For declarative code generation" By James Nelson
 
Experiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamExperiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure team
 
Hadoop content
Hadoop contentHadoop content
Hadoop content
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Matt Gauger - Lamp vs. the world - MKE PHP Users Group - December 14, 2010
Matt Gauger - Lamp vs. the world - MKE PHP Users Group - December 14, 2010 Matt Gauger - Lamp vs. the world - MKE PHP Users Group - December 14, 2010
Matt Gauger - Lamp vs. the world - MKE PHP Users Group - December 14, 2010
 
Drupal in 30 Minutes
Drupal in 30 MinutesDrupal in 30 Minutes
Drupal in 30 Minutes
 

Kürzlich hochgeladen

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Introduction to Pig

  • 2. Why are we talking about Pig?  Originally developed at Yahoo! now an apache project  Engine for executing data flows in parallel on Hadoop  Includes a language called Pig Latin for expressing data flows  Easy to learn and extensible  Open source
  • 3. What is a data flow language  Allows us to describe how data should be loaded, read, processed and stored.  Can be simple linear flows e.g. word count  Complex workflows that include joins
  • 4. Is it like SQL?  Pig Latin does look a bit like SQL e.g. Join, Group By  But SQL is declarative  In Pig you describe how the data flows  SQL you end up producing an inside out query whereas with Pig you describe a pipeline.
  • 5. SQL example SELECT CustomerName,TotalOrders, PostCode FROM Customers c INNER JOIN ( SELECT CustomerId, count(OrderId) as FROM Orders GROUP BY CustomerId ) as t on t.CustomerId = c.CustomerId
  • 6. Same Query in Pig orders = load ‘Orders’ as (CustomerId, OrderId); grouped = group orders by CustomerId; total = foreach grouped generate group, COUNT(OrderId) customer = load ‘Customers’ as (CustomerId, CustomerName) result = join total by group, customer by customerId dump result;
  • 7. Installing Pig  http://pig.apache.org/docs/r0.11.1/  Requires Java  Hadoop (it does have a built in version of hadoop which is currently v0.20.2.)  Requires Cygwin on windows
  • 8. What do you get? Pig Grunt Shell Piggy Bank
  • 9. Basic Pig Operators  FOREACH  FILTER  GROUP BY  ORDER BY  UNION  CROSS
  • 10. Same Query in Pig orders = load ‘Orders’ as (CustomerId, OrderId); grouped = group orders by CustomerId; total = foreach grouped generate group, COUNT(OrderId) customer = load ‘Customers’ as (CustomerId, CustomerName) result = join total by group, customer by customerId dump result;
  • 12. How does Pig become a MR job?
  • 13. Advantages of Pig  Easy to learn  Can achieve a lot with a small amount of code  E.g. Join example  Well written scripts can be easy to read and easy to maintain  Has a local mode for testing scripts  Has a unit testing framework
  • 14. Limitations of Pig  Unit testing  High level – often need to drop down into custom UDFs  If you are proficient at C# or F# sometimes this can be easier to test e.g. Streaming unit allows unit testing.  Still doesn’t play nicely in a windows environment