Yahoo! Hadoop User Group - May 2010 Meetup - What's new with Pig? Alan Gates, Yahoo!

•

4 gefällt mir•4,655 views

Hadoop User Group

Technologie Business

Pig 0.6 and 0.7 Alan Gates What’s New With Pig

Accumulator A = load ‘clicks’; B = group A by user; C = foreach B { C1 = order A by timestamp; generate user, sessionize(C1); } … Many aggregate operations cannot use combiner but do not need all records for a single key together New in 0.6, Accumulator interface which can be implemented by UDFs Pig calls accumulate multiple times with partial list of tuples, then when the key changes calls getValue

Also in 0.6 UDFContext, allows UDFs to pass info from frontend to backend and to access JobConf A lot of work with memory manager to reduce the number of GCOverhead and out of heap errors

New Load and Store Interfaces 0.6 and before Want to write a LoadFunc that works on files and uses standard splits? Easy Want to write a LoadFunc that works on something other than files or uses non-standard splits? Hard; have to write a Slicer (which mostly duplicates Hadoop’sInputFormat) Want to write a StoreFunc that works on something other than files? Sorry 0.7 LoadFunc now sits atop InputFormat, so if you have an InputFormat for your data, writing a LoadFunc is easy StoreFunc now sits atop OutputFormat, … Not backward compatible, will require rewrite of custom Load and StoreFuncs

Also in 0.7 Moved local mode to Hadoop’sLocalJobRunner; means debugging environment much closer to runtime environment More aggressive use of Hadoop distributed cache for features such as replicated join and order by

What We Are Working On Now Runtime statistics – track what features your script used, how many records it processed, etc. Results stored in Pig logs and job history files Adding UDFs in scripting languages (python initially) - PIG-928 Allow users to set a custom partitioner in some cases - PIG-282 Make Pig available in Maven repositories - PIG-1334 Label Interfaces for audience and stability - PIG-1311 Part of Hadoop’s compatibility plan, see the following blog posthttp://bit.ly/9yRDlH

Weitere ähnliche Inhalte

Andere mochten auch

Pig at LinkedinHadoop User Group

August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector Yahoo Developer Network

January 2011 HUG: Howl PresentationYahoo Developer Network

January 2011 HUG: Pig PresentationYahoo Developer Network

August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...Yahoo Developer Network

August 2016 HUG: Recent development in Apache OozieYahoo Developer Network

January 2011 HUG: Kafka PresentationYahoo Developer Network

Karmasphere hadoop-productivity-toolsHadoop User Group

Cascalog internal dsl_presoHadoop User Group

Yahoo compares Storm and SparkChicago Hadoop Users Group

Nov 2010 HUG: Business Intelligence for Big DataYahoo Developer Network

Nov 2010 HUG: Fuzzy Table - B.A.HYahoo Developer Network

HUG Nov 2010: HDFS Raid - FacebookYahoo Developer Network

Next Generation MapReduceOwen O'Malley

Bay Area HUG Feb 2011 IntroOwen O'Malley

Next Generation Hadoop OperationsOwen O'Malley

Hadoop Summit 2010 Benchmarking And Optimizing HadoopYahoo Developer Network

Rate Limiting at Scale, from SANS AppSec Las Vegas 2012Nick Galbreath

AWS Customer Presentation - eHarmonyAmazon Web Services

Ordered Record CollectionHadoop User Group

Andere mochten auch (20)

Pig at Linkedin

August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector

January 2011 HUG: Howl Presentation

January 2011 HUG: Pig Presentation

August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...

August 2016 HUG: Recent development in Apache Oozie

January 2011 HUG: Kafka Presentation

Karmasphere hadoop-productivity-tools

Cascalog internal dsl_preso

Yahoo compares Storm and Spark

Nov 2010 HUG: Business Intelligence for Big Data

Nov 2010 HUG: Fuzzy Table - B.A.H

HUG Nov 2010: HDFS Raid - Facebook

Next Generation MapReduce

Bay Area HUG Feb 2011 Intro

Next Generation Hadoop Operations

Hadoop Summit 2010 Benchmarking And Optimizing Hadoop

Rate Limiting at Scale, from SANS AppSec Las Vegas 2012

AWS Customer Presentation - eHarmony

Ordered Record Collection

Mehr von Hadoop User Group

Building a Scalable Web Crawler with HadoopHadoop User Group

Hdfs high availabilityHadoop User Group

HUG August 2010: Best practicesHadoop User Group

2 hadoop@e bay-hug-2010-07-21Hadoop User Group

1 content optimization-hug-2010-07-21Hadoop User Group

3 avro hug-2010-07-21Hadoop User Group

1 hadoop security_in_details_hadoop_summit2010Hadoop User Group

Hadoop Security PreviewHadoop User Group

Flightcaster Presentation HadoopHadoop User Group

Map Reduce OnlineHadoop User Group

Hadoop Security PreviewHadoop User Group

Hadoop Release Plan Feb17Hadoop User Group

Twitter Protobufs And Hadoop Hug 021709Hadoop User Group

Hadoop and Voldemort @ LinkedInHadoop User Group

Searching At ScaleHadoop User Group

Hadoop Record Reader In PythonHadoop User Group

File ContextHadoop User Group

Mehr von Hadoop User Group (18)

Building a Scalable Web Crawler with Hadoop

Hdfs high availability

HUG August 2010: Best practices

2 hadoop@e bay-hug-2010-07-21

1 content optimization-hug-2010-07-21

3 avro hug-2010-07-21

1 hadoop security_in_details_hadoop_summit2010

Hadoop Security Preview

Flightcaster Presentation Hadoop

Map Reduce Online

Hadoop Security Preview

Hadoop Release Plan Feb17

Twitter Protobufs And Hadoop Hug 021709

Hadoop and Voldemort @ LinkedIn

Searching At Scale

Hadoop Record Reader In Python

File Context

Kürzlich hochgeladen

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

How to convert PDF to text with Nanonetsnaman860154

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Pigging Solutions Piggable Sweeping ElbowsPigging Solutions

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Kürzlich hochgeladen (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

How to convert PDF to text with Nanonets

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

Presentation on how to chat with PDF using ChatGPT code interpreter

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

08448380779 Call Girls In Friends Colony Women Seeking Men

Human Factors of XR: Using Human Factors to Design XR Systems

Maximizing Board Effectiveness 2024 Webinar.pptx

Unblocking The Main Thread Solving ANRs and Frozen Frames

The 7 Things I Know About Cyber Security After 25 Years | April 2024

How to Troubleshoot Apps for the Modern Connected Worker

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

Scaling API-first – The story of a global engineering organization

Pigging Solutions Piggable Sweeping Elbows

Breaking the Kubernetes Kill Chain: Host Path Mount

SQL Database Design For Developers at php[tek] 2024

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

08448380779 Call Girls In Civil Lines Women Seeking Men

Yahoo! Hadoop User Group - May 2010 Meetup - What's new with Pig? Alan Gates, Yahoo!

1. Pig 0.6 and 0.7 Alan Gates What’s New With Pig

2. Accumulator A = load ‘clicks’; B = group A by user; C = foreach B { C1 = order A by timestamp; generate user, sessionize(C1); } … Many aggregate operations cannot use combiner but do not need all records for a single key together New in 0.6, Accumulator interface which can be implemented by UDFs Pig calls accumulate multiple times with partial list of tuples, then when the key changes calls getValue

3. Also in 0.6 UDFContext, allows UDFs to pass info from frontend to backend and to access JobConf A lot of work with memory manager to reduce the number of GCOverhead and out of heap errors

4. New Load and Store Interfaces 0.6 and before Want to write a LoadFunc that works on files and uses standard splits? Easy Want to write a LoadFunc that works on something other than files or uses non-standard splits? Hard; have to write a Slicer (which mostly duplicates Hadoop’sInputFormat) Want to write a StoreFunc that works on something other than files? Sorry 0.7 LoadFunc now sits atop InputFormat, so if you have an InputFormat for your data, writing a LoadFunc is easy StoreFunc now sits atop OutputFormat, … Not backward compatible, will require rewrite of custom Load and StoreFuncs

5. Also in 0.7 Moved local mode to Hadoop’sLocalJobRunner; means debugging environment much closer to runtime environment More aggressive use of Hadoop distributed cache for features such as replicated join and order by

6. What We Are Working On Now Runtime statistics – track what features your script used, how many records it processed, etc. Results stored in Pig logs and job history files Adding UDFs in scripting languages (python initially) - PIG-928 Allow users to set a custom partitioner in some cases - PIG-282 Make Pig available in Maven repositories - PIG-1334 Label Interfaces for audience and stability - PIG-1311 Part of Hadoop’s compatibility plan, see the following blog posthttp://bit.ly/9yRDlH

7. Questions

Hinweis der Redaktion

Brief description of combiner and algebraicOnly used if all UDFs in a foreach can use it

Yahoo! Hadoop User Group - May 2010 Meetup - What's new with Pig? Alan Gates, Yahoo!

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (20)

Mehr von Hadoop User Group

Mehr von Hadoop User Group (18)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Yahoo! Hadoop User Group - May 2010 Meetup - What's new with Pig? Alan Gates, Yahoo!

Hinweis der Redaktion