SlideShare a Scribd company logo
1 of 32
Pig at Linkedin
Chris Riccomini
9/29/10
Who?
What?
LinkedIn Analytics
Pig at LinkedIn
Why?
Production Quality
Streaming
Serialization
VoldemortStorage ~ Avro
views = LOAD '/data/awesome' USING VoldemortStorage();
Voldemort ♥ Pig
Partitioning
YYYY/MM/DD
Last N days?
views = LOAD '/data/etl/tracking/extracted/profile-view' USING
VoldemortStorage('date.range', 'num.days=90;days.ago=1’)
Some-file-YYYY-MM-DD
member_position = LOAD '/data/etl/replicated/member/member_position/#LATEST'
USING VoldemortStorage()
Scheduling
Azkaban
type=pig
pig.script=myscript.pig
Ad hoc?
Future at LinkedIn
Wishes
Dates
Fix Data Types
JSON
Cross Platform
Questions?
• criccomini@linkedin.com
• http://www.riccomini.name
• http://www.sna-projects.com
• http://www.project-voldemort.com
• @criccomini
• LinkedIn is Hiring! Email me!

More Related Content

Similar to Pig at Linkedin

Economies of Scaling Software
Economies of Scaling SoftwareEconomies of Scaling Software
Economies of Scaling SoftwareJoshua Long
 
Solving performance issues in Django ORM
Solving performance issues in Django ORMSolving performance issues in Django ORM
Solving performance issues in Django ORMSian Lerk Lau
 
Brandfolder - JSON + Postgres
Brandfolder - JSON + PostgresBrandfolder - JSON + Postgres
Brandfolder - JSON + PostgresBrandfolder
 
hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2
hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2
hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2Chris Gates
 
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...Amazon Web Services
 
Building Awesome API with Spring
Building Awesome API with SpringBuilding Awesome API with Spring
Building Awesome API with SpringVladimir Tsukur
 
D3.js: Data Visualization for the Web
D3.js: Data Visualization for the Web D3.js: Data Visualization for the Web
D3.js: Data Visualization for the Web Outliers Collective
 
Why Visibility into Your Stack Matters
Why Visibility into Your Stack MattersWhy Visibility into Your Stack Matters
Why Visibility into Your Stack MattersAmazon Web Services
 
Scalable data structures for data science
Scalable data structures for data scienceScalable data structures for data science
Scalable data structures for data scienceTuri, Inc.
 
apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...
apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...
apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...apidays
 
(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...
(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...
(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...Amazon Web Services
 
GAB 2019 - Graph as a data store
GAB 2019 - Graph as a data storeGAB 2019 - Graph as a data store
GAB 2019 - Graph as a data storeAlberto Diaz Martin
 
Empowering red and blue teams with osint c0c0n 2017
Empowering red and blue teams with osint   c0c0n 2017Empowering red and blue teams with osint   c0c0n 2017
Empowering red and blue teams with osint c0c0n 2017reconvillage
 
Bsides 2019 - Intelligent Threat Hunting
Bsides 2019 - Intelligent Threat HuntingBsides 2019 - Intelligent Threat Hunting
Bsides 2019 - Intelligent Threat HuntingDhruv Majumdar
 
Application Security for Rich Internet Applicationss (Jfokus 2012)
Application Security for Rich Internet Applicationss (Jfokus 2012)Application Security for Rich Internet Applicationss (Jfokus 2012)
Application Security for Rich Internet Applicationss (Jfokus 2012)johnwilander
 
BrazilJS Perf Doctor Talk
BrazilJS Perf Doctor TalkBrazilJS Perf Doctor Talk
BrazilJS Perf Doctor TalkJosh Holmes
 

Similar to Pig at Linkedin (20)

Economies of Scaling Software
Economies of Scaling SoftwareEconomies of Scaling Software
Economies of Scaling Software
 
Solving performance issues in Django ORM
Solving performance issues in Django ORMSolving performance issues in Django ORM
Solving performance issues in Django ORM
 
Brandfolder - JSON + Postgres
Brandfolder - JSON + PostgresBrandfolder - JSON + Postgres
Brandfolder - JSON + Postgres
 
hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2
hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2
hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2
 
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
 
Building Awesome API with Spring
Building Awesome API with SpringBuilding Awesome API with Spring
Building Awesome API with Spring
 
D3.js: Data Visualization for the Web
D3.js: Data Visualization for the Web D3.js: Data Visualization for the Web
D3.js: Data Visualization for the Web
 
Tactical Information Gathering
Tactical Information GatheringTactical Information Gathering
Tactical Information Gathering
 
Threat Modeling Using STRIDE
Threat Modeling Using STRIDEThreat Modeling Using STRIDE
Threat Modeling Using STRIDE
 
Why Visibility into Your Stack Matters
Why Visibility into Your Stack MattersWhy Visibility into Your Stack Matters
Why Visibility into Your Stack Matters
 
Scalable data structures for data science
Scalable data structures for data scienceScalable data structures for data science
Scalable data structures for data science
 
apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...
apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...
apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...
 
(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...
(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...
(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...
 
GAB 2019 - Graph as a data store
GAB 2019 - Graph as a data storeGAB 2019 - Graph as a data store
GAB 2019 - Graph as a data store
 
Empowering red and blue teams with osint c0c0n 2017
Empowering red and blue teams with osint   c0c0n 2017Empowering red and blue teams with osint   c0c0n 2017
Empowering red and blue teams with osint c0c0n 2017
 
Intro GraphQL
Intro GraphQLIntro GraphQL
Intro GraphQL
 
Bsides 2019 - Intelligent Threat Hunting
Bsides 2019 - Intelligent Threat HuntingBsides 2019 - Intelligent Threat Hunting
Bsides 2019 - Intelligent Threat Hunting
 
Application Security for Rich Internet Applicationss (Jfokus 2012)
Application Security for Rich Internet Applicationss (Jfokus 2012)Application Security for Rich Internet Applicationss (Jfokus 2012)
Application Security for Rich Internet Applicationss (Jfokus 2012)
 
BrazilJS Perf Doctor Talk
BrazilJS Perf Doctor TalkBrazilJS Perf Doctor Talk
BrazilJS Perf Doctor Talk
 
Threatcrowd
ThreatcrowdThreatcrowd
Threatcrowd
 

More from Hadoop User Group

Karmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsKarmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsHadoop User Group
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopHadoop User Group
 
HUG August 2010: Best practices
HUG August 2010: Best practicesHUG August 2010: Best practices
HUG August 2010: Best practicesHadoop User Group
 
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21Hadoop User Group
 
1 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-211 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-21Hadoop User Group
 
1 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit20101 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit2010Hadoop User Group
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Hadoop User Group
 
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Hadoop User Group
 
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Hadoop User Group
 
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReducePublic Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduceHadoop User Group
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop User Group
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupHadoop User Group
 
Flightcaster Presentation Hadoop
Flightcaster  Presentation  HadoopFlightcaster  Presentation  Hadoop
Flightcaster Presentation HadoopHadoop User Group
 

More from Hadoop User Group (20)

Cascalog internal dsl_preso
Cascalog internal dsl_presoCascalog internal dsl_preso
Cascalog internal dsl_preso
 
Karmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsKarmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-tools
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 
Hdfs high availability
Hdfs high availabilityHdfs high availability
Hdfs high availability
 
HUG August 2010: Best practices
HUG August 2010: Best practicesHUG August 2010: Best practices
HUG August 2010: Best practices
 
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21
 
1 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-211 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-21
 
3 avro hug-2010-07-21
3 avro hug-2010-07-213 avro hug-2010-07-21
3 avro hug-2010-07-21
 
1 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit20101 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit2010
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
 
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
 
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
 
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReducePublic Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User Group
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user group
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Flightcaster Presentation Hadoop
Flightcaster  Presentation  HadoopFlightcaster  Presentation  Hadoop
Flightcaster Presentation Hadoop
 
Map Reduce Online
Map Reduce OnlineMap Reduce Online
Map Reduce Online
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 

Recently uploaded

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 

Recently uploaded (20)

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 

Pig at Linkedin

Editor's Notes

  1. Chris Riccomini Senior Data Scientist at LinkedIn Involved in People you may know, Who’s viewed my profile, Avatara, and Distributed Computing at LinkedIn Worked in PayPal’s anti-fraud team as a data visualization engineer
  2. talking about linkedin’s analytic environment, motivation for pig at linkedin, how we integrated it, and pig in the future
  3. aster, hadoop, voldemort, azkaban, pig
  4. aster, hadoop, voldemort, azkaban, pig
  5. 40% of jobs run are pig # of production products that use pig pymk, ads, profile stats, jobs for you, talent match, groups you might like, browse maps, experimentation platform
  6. in early 2009 we were working on converting PYMK from Aster to Hadoop everything was java based tired of writing joins, filters, etc (glue) built and deployed pig on laptop while at a conference wrote serializer in a few days significantly sped up delivery time for PYMK
  7. motivation was not ad-hoc/sql/business analytics. motivation was product analytics, and PRODUCTION products. stability was key. reproducability was key. simplicity/understandability was key. (both the scripts and the system itself) "if it runs now, it will always run"
  8. as streaming became more popular, pig is still used as glue, but complex jobs are now just python instead of java.
  9. we use "voldemort" serialization (binary json) .. basically the same as avro not much csv (pigstorage) used some pain was involved in writing/updating the serializer (0.3 interface was insufficient)
  10. we use "voldemort" serialization (binary json) .. basically the same as avro not much csv (pigstorage) used some pain was involved in writing/updating the serializer (0.3 interface was insufficient)
  11. we use "voldemort" serialization (binary json) .. basically the same as avro not much csv (pigstorage) used some pain was involved in writing/updating the serializer (0.3 interface was insufficient)
  12. we use pig to read from and write to voldemort all writes are currently done with read-only stores reads are done using roshan's voldemort loader func can also use roshan's voldemort store func to write directly to read-write stores
  13. one problem that we had with pig was how to handle folders partitioned by date (yyyy/mm/dd) people were querying the root directory, and filtering out only the days they needed other people were writing custom jobs that would add only the sub folders that they were interested in as input paths our solution was to add a filter parameter to voldemort views = LOAD '/data/etl/tracking/extracted/profile-view' USING VoldemortStorage('date.range', 'num.days=90;days.ago=1'); member_position = LOAD '/data/etl/replicated/member/member_position/#LATEST' USING VoldemortStorage();
  14. one problem that we had with pig was how to handle folders partitioned by date (yyyy/mm/dd) people were querying the root directory, and filtering out only the days they needed other people were writing custom jobs that would add only the sub folders that they were interested in as input paths our solution was to add a filter parameter to voldemort views = LOAD '/data/etl/tracking/extracted/profile-view' USING VoldemortStorage('date.range', 'num.days=90;days.ago=1'); member_position = LOAD '/data/etl/replicated/member/member_position/#LATEST' USING VoldemortStorage();
  15. one problem that we had with pig was how to handle folders partitioned by date (yyyy/mm/dd) people were querying the root directory, and filtering out only the days they needed other people were writing custom jobs that would add only the sub folders that they were interested in as input paths our solution was to add a filter parameter to voldemort views = LOAD '/data/etl/tracking/extracted/profile-view' USING VoldemortStorage('date.range', 'num.days=90;days.ago=1'); member_position = LOAD '/data/etl/replicated/member/member_position/#LATEST' USING VoldemortStorage();
  16. one problem that we had with pig was how to handle folders partitioned by date (yyyy/mm/dd) people were querying the root directory, and filtering out only the days they needed other people were writing custom jobs that would add only the sub folders that they were interested in as input paths our solution was to add a filter parameter to voldemort views = LOAD '/data/etl/tracking/extracted/profile-view' USING VoldemortStorage('date.range', 'num.days=90;days.ago=1'); member_position = LOAD '/data/etl/replicated/member/member_position/#LATEST' USING VoldemortStorage();
  17. one problem that we had with pig was how to handle folders partitioned by date (yyyy/mm/dd) people were querying the root directory, and filtering out only the days they needed other people were writing custom jobs that would add only the sub folders that they were interested in as input paths our solution was to add a filter parameter to voldemort views = LOAD '/data/etl/tracking/extracted/profile-view' USING VoldemortStorage('date.range', 'num.days=90;days.ago=1'); member_position = LOAD '/data/etl/replicated/member/member_position/#LATEST' USING VoldemortStorage();
  18. one problem that we had with pig was how to handle folders partitioned by date (yyyy/mm/dd) people were querying the root directory, and filtering out only the days they needed other people were writing custom jobs that would add only the sub folders that they were interested in as input paths our solution was to add a filter parameter to voldemort views = LOAD '/data/etl/tracking/extracted/profile-view' USING VoldemortStorage('date.range', 'num.days=90;days.ago=1'); member_position = LOAD '/data/etl/replicated/member/member_position/#LATEST' USING VoldemortStorage();
  19. we use azkaban (like a very simple version of oozie) azkaban contains a "pig" job. specify type=pig, pig.script=path/to/pig/script.pig supports parameter passing between azkaban properties and pig parameters azkaban also provides resource locking and dependencies and scheduling makes it very easy to write a production pig job write pig file write job file throw pig and job file into a zip upload the zip
  20. we use azkaban (like a very simple version of oozie) azkaban contains a "pig" job. specify type=pig, pig.script=path/to/pig/script.pig supports parameter passing between azkaban properties and pig parameters azkaban also provides resource locking and dependencies and scheduling makes it very easy to write a production pig job write pig file write job file throw pig and job file into a zip upload the zip
  21. we use azkaban (like a very simple version of oozie) azkaban contains a "pig" job. specify type=pig, pig.script=path/to/pig/script.pig supports parameter passing between azkaban properties and pig parameters azkaban also provides resource locking and dependencies and scheduling makes it very easy to write a production pig job write pig file write job file throw pig and job file into a zip upload the zip
  22. we use azkaban (like a very simple version of oozie) azkaban contains a "pig" job. specify type=pig, pig.script=path/to/pig/script.pig supports parameter passing between azkaban properties and pig parameters azkaban also provides resource locking and dependencies and scheduling makes it very easy to write a production pig job write pig file write job file throw pig and job file into a zip upload the zip
  23. we use azkaban (like a very simple version of oozie) azkaban contains a "pig" job. specify type=pig, pig.script=path/to/pig/script.pig supports parameter passing between azkaban properties and pig parameters azkaban also provides resource locking and dependencies and scheduling makes it very easy to write a production pig job write pig file write job file throw pig and job file into a zip upload the zip
  24. just starting to use pig for ad hoc analysis mostly engineers using it now some business analytics are starting to use it also looking at hive
  25. pig 0.8, avro, hive, UDFs
  26. dates the promise of pig as a generic map reduce language (not just hadoop) fix the data structures more json
  27. dates the promise of pig as a generic map reduce language (not just hadoop) fix the data structures more json
  28. dates the promise of pig as a generic map reduce language (not just hadoop) fix the data structures more json
  29. dates the promise of pig as a generic map reduce language (not just hadoop) fix the data structures more json
  30. dates the promise of pig as a generic map reduce language (not just hadoop) fix the data structures more json