SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Putting Lipstick on Apache Pig
Big Data Gurus Meetup
August 14, 2013
Data should be accessible, easy to discover, and
easy to process for everyone.
Motivation
Big Data Users at Netflix
Analysts Engineers
Desires
Self Service
Easy
Rich Toolset Rich APIs
A Single Platform / Data Architecture that Serves Both Groups
Netflix Data Warehouse - Storage
S3 is the source of truth
Decouples storage from
processing.
Persistent data; multiple/
transient Hadoop clusters
Data sources
Event data from cloud
services via Ursula/Honu
Dimension data from
Cassandra via Aegisthus
~100 billion events processed
/ day
Petabytes of data persisted
and available to queries on
S3.
Netflix Data Platform - Processing
Long running clusters
sla and ad-hoc
Supplemental nightly
bonus clusters
For high priority ETL jobs
2,000+ instances in
aggregate across the
clusters
Netflix Hadoop Platform as a Service
S3
https://github.com/Netflix/genie
Netflix Data Platform – Primitive
Service Layer
Primitive, decoupled services
Building blocks for more
complicated
tools/services/apps
Serves 1000s of MapReduce
Jobs / day
100+ jobs concurrently
Netflix Data Platform – Tools
Sting
(Adhoc
Visualization)
Looper
(Backloading)
Forklift
(Data Movement)
Ignite
(A/B Test Analytics)
Lipstick
(Workflow
Visualization)
Spock
(Data Auditing)
Heavily utilize services in the
primitive layer.
Follow the same design
philosophy as primitive apps:
RESTful API
Decoupled javascript interfaces
Pig and Hive at Netflix
• Hive
– AdHoc queries
– Lightweight aggregation
• Pig
– Complex Dataflows / ETL
– Data movement “glue” between complex
operations
What is Pig?
• A data flow language
• Simple to learn
– Very few reserved words
– Comparable to a SQL logical query plan
• Easy to extend and optimize
• Extendable via UDFs written in multiple
languages
– Java, Python, Ruby, Groovy, Javascript
Sample Pig Script* (Word Count)
input_lines = LOAD '/tmp/my-copy-of-all-pages-on-internet' AS (line:chararray);
-- Extract words from each line and put them into a pig bag
-- datatype, then flatten the bag to get one word on each row
words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word;
-- filter out any words that are just white spaces
filtered_words = FILTER words BY word MATCHES 'w+';
-- create a group for each word
word_groups = GROUP filtered_words BY word;
-- count the entries in each group
word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS count, group AS
word;
-- order the records by count
ordered_word_count = ORDER word_count BY count DESC;
STORE ordered_word_count INTO '/tmp/number-of-words-on-internet';
* http://en.wikipedia.org/wiki/Pig_(programming_tool)#Example
A Typical Pig Script
Pig…
• Data flows are easy & flexible to express in text
– Facilitates code reuse via UDFs and macros
– Allows logical grouping of operations vs grouping by order
of execution.
– But errors are easy to make and overlook.
• Scripts can quickly get complicated
• Visualization quickly draws attention to:
– Common errors
– Execution order / logical flow
– Optimization opportunities
Lipstick
• Generates graphical
representations of Pig data flows.
• Compatible with Apache Pig v11+
• Has been used to monitor more
than 25,000 Pig jobs at Netflix
Lipstick
Overall Job
Progress
Logical
Plan
Overall Job
Progress
Logical Operator
(reduce side)
Logical Operator
(map side)
Map/Reduce Job
Intermediate Row Count
Records
Loaded
Hadoop
Counters
Lipstick for Fast Development
• During development:
– Keep track of data flow
– Spot common errors
• Omitted (hanging) operators
• Data type issues
– Easily estimate and optimize complexity
• Number of MR jobs generated
• Map only vs full Map/Reduce jobs
• Opportunities to rejigger logic to:
– Combine multiple jobs into a single job
– Manipulate execution order to achieve better parallelism (e.g.
less blocking)
Lipstick for Job Monitoring
• During execution:
– Graphically monitor execution status from a single
console
– Spot optimization opportunities
• Map vs reduce side joins
• Data skew
• Better parallelism settings
Lipstick for Support
• Empowers users to support themselves
– Better operational visibility
• What is my script currently doing?
• Why is my script slow?
– Examine intermediate output of jobs
– All execution information in one place
• Facilitates communication between
infrastructure / support teams and end users
– Lipstick link contains all information needed to
provide support.
Lipstick Architecture
Pig v11+
lipstick-console.jar
Lipstick Server
(RESTful
Grails app)
Javascript Client
(Frontend GUI)
RDS
Persistence
Lipstick Architecture - Console
• Implements PigProgressNotificationListener interface
• Listens for:
1. New statements to be registered (unoptimized plan)
2. Script launched event (optimized, physical, M/R plan)
3. MR Job completion/failure event
4. Heartbeat progress (during execution)
• Pig Plans and Progress  Lipstick objects
• Communicates with Lipstick Server
Pig Compilation Plans
Optimized Logical Plan
Physical Plan
MapReduce Plan
(grouping of Physical Operators into
map or reduce jobs)
Pig Script
Unoptimized Logical Plan
(~1:1 logical operator / line of Pig)
Lipstick associates Logical Operators
with MapReduce jobs by inferring
relationships between Logical and
Physical Operations.
Lipstick Architecture - Server
• Simple REST interface
• It’s a Grails app!
• Pig client posts plans and puts progress
• Javascript client
• gets plans and progress
• Searches jobs by job name and user name
Lipstick Architecture – JS Client
• Displays and annotates graphs with status / progress
• Completely decoupled from Server
• Event based design
• Periodically polls Server for job progress
• Usability is a key focus
My Job has stalled.
Solving Problems with Lipstick -
Common Problem #1
Unoptimized/Optimized
Logical Plan Toggle
Dangling
Operator
I didn’t get the data I was expecting
Common Problem #2
I don’t understand why my job failed.
Common Problem #3
Failed Job
(light red background)
Successful Job
(light blue background)
Future of Lipstick
• Annotate common errors and inefficiencies on the graph
– Skew / map side join opportunities / scalar issues
– E.g. Warnings / error dashboard
• Provide better details of runtime performance
– Timings annotated on graph
– Min / median / max mapper and reducer times
– Map / reduce completion over time
• Search through execution history
– Examine trends in runtime and data volumes
– History of failure / success
• Search jobs for commonalities
– Common datasets loaded / saved
– Better grasp data lineage
– Common uses of UDFs and macros
Lipstick on Hive
Honey?
A closer look…
Wrapping up
• Lipstick is part of Netflix OSS.
• Clone it on github at
http://github.com/Netflix/Lipstick
• Check out the quickstart guide
– https://github.com/Netflix/Lipstick/wiki/Getting-
Started#1-quick-start
– Get started playing with Lipstick in under 5 minutes!
• We happily welcome your feedback and
contributions!
 Jeff Magnusson:
jmagnusson@netflix.com | http://www.linkedin.com/in/jmagnuss |@jeffmagnusson
Thank you!
Jobs: http://jobs.netflix.com
Netflix OSS: http://netflix.github.io
Tech Blog: http://techblog.netflix.com/

Weitere ähnliche Inhalte

Was ist angesagt?

Ray: Enterprise-Grade, Distributed Python
Ray: Enterprise-Grade, Distributed PythonRay: Enterprise-Grade, Distributed Python
Ray: Enterprise-Grade, Distributed PythonDatabricks
 
Rethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleRethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleHelena Edelson
 
Big Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics PlatformBig Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics PlatformSudhir Tonse
 
Random Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min ShenRandom Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min ShenDatabricks
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Databricks
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingJen Aman
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache SparkMammoth Data
 
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Helena Edelson
 
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerCloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerDatabricks
 
An Architect's guide to real time big data systems
An Architect's guide to real time big data systemsAn Architect's guide to real time big data systems
An Architect's guide to real time big data systemsRaja SP
 
Implementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkImplementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkDataWorks Summit
 
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...Spark Summit
 
Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Databricks
 
Spark at Airbnb
Spark at AirbnbSpark at Airbnb
Spark at AirbnbHao Wang
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesRental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesDatabricks
 
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...Jen Aman
 
Data Science with Spark & Zeppelin
Data Science with Spark & ZeppelinData Science with Spark & Zeppelin
Data Science with Spark & ZeppelinVinay Shukla
 
Powering Custom Apps at Facebook using Spark Script Transformation
Powering Custom Apps at Facebook using Spark Script TransformationPowering Custom Apps at Facebook using Spark Script Transformation
Powering Custom Apps at Facebook using Spark Script TransformationDatabricks
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at TwitterPrasad Wagle
 
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringApache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringDatabricks
 

Was ist angesagt? (20)

Ray: Enterprise-Grade, Distributed Python
Ray: Enterprise-Grade, Distributed PythonRay: Enterprise-Grade, Distributed Python
Ray: Enterprise-Grade, Distributed Python
 
Rethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleRethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For Scale
 
Big Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics PlatformBig Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics Platform
 
Random Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min ShenRandom Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min Shen
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark Streaming
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
 
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerCloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
 
An Architect's guide to real time big data systems
An Architect's guide to real time big data systemsAn Architect's guide to real time big data systems
An Architect's guide to real time big data systems
 
Implementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkImplementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache Spark
 
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
 
Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...
 
Spark at Airbnb
Spark at AirbnbSpark at Airbnb
Spark at Airbnb
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesRental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean Downes
 
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
 
Data Science with Spark & Zeppelin
Data Science with Spark & ZeppelinData Science with Spark & Zeppelin
Data Science with Spark & Zeppelin
 
Powering Custom Apps at Facebook using Spark Script Transformation
Powering Custom Apps at Facebook using Spark Script TransformationPowering Custom Apps at Facebook using Spark Script Transformation
Powering Custom Apps at Facebook using Spark Script Transformation
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringApache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
 

Andere mochten auch

Netflix - Pig with Lipstick by Jeff Magnusson
Netflix - Pig with Lipstick by Jeff Magnusson Netflix - Pig with Lipstick by Jeff Magnusson
Netflix - Pig with Lipstick by Jeff Magnusson Hakka Labs
 
Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15Zhenxiao Luo
 
Lipstick, Lies,Tears, Tragedy- Script
Lipstick, Lies,Tears, Tragedy- ScriptLipstick, Lies,Tears, Tragedy- Script
Lipstick, Lies,Tears, Tragedy- Scriptcclarkson
 
Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3aspyker
 
Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache PigAvkash Chauhan
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)Eva Tse
 
Netflix Data Engineering @ Uber Engineering Meetup
Netflix Data Engineering @ Uber Engineering MeetupNetflix Data Engineering @ Uber Engineering Meetup
Netflix Data Engineering @ Uber Engineering MeetupBlake Irvine
 
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...Data Con LA
 
Use of Analytics by Netflix - Case Study
Use of Analytics by Netflix - Case StudyUse of Analytics by Netflix - Case Study
Use of Analytics by Netflix - Case StudySaket Toshniwal
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
 
Netflix - Enabling a Culture of Analytics
Netflix - Enabling a Culture of AnalyticsNetflix - Enabling a Culture of Analytics
Netflix - Enabling a Culture of AnalyticsBlake Irvine
 
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...Amazon Web Services
 
Apache Pig: A big data processor
Apache Pig: A big data processorApache Pig: A big data processor
Apache Pig: A big data processorTushar B Kute
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaEdureka!
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
 
Launching presentation of Lipstick
Launching presentation of Lipstick Launching presentation of Lipstick
Launching presentation of Lipstick Amrin Jahan
 

Andere mochten auch (18)

Netflix - Pig with Lipstick by Jeff Magnusson
Netflix - Pig with Lipstick by Jeff Magnusson Netflix - Pig with Lipstick by Jeff Magnusson
Netflix - Pig with Lipstick by Jeff Magnusson
 
Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15
 
Lipstick, Lies,Tears, Tragedy- Script
Lipstick, Lies,Tears, Tragedy- ScriptLipstick, Lies,Tears, Tragedy- Script
Lipstick, Lies,Tears, Tragedy- Script
 
Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3
 
JOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big DataJOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big Data
 
Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache Pig
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)
 
Netflix Data Engineering @ Uber Engineering Meetup
Netflix Data Engineering @ Uber Engineering MeetupNetflix Data Engineering @ Uber Engineering Meetup
Netflix Data Engineering @ Uber Engineering Meetup
 
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...
 
Use of Analytics by Netflix - Case Study
Use of Analytics by Netflix - Case StudyUse of Analytics by Netflix - Case Study
Use of Analytics by Netflix - Case Study
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
Netflix - Enabling a Culture of Analytics
Netflix - Enabling a Culture of AnalyticsNetflix - Enabling a Culture of Analytics
Netflix - Enabling a Culture of Analytics
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
 
Apache Pig: A big data processor
Apache Pig: A big data processorApache Pig: A big data processor
Apache Pig: A big data processor
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 
Launching presentation of Lipstick
Launching presentation of Lipstick Launching presentation of Lipstick
Launching presentation of Lipstick
 

Ähnlich wie Putting Lipstick on Apache Pig at Netflix

Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureGabriele Modena
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsDatabricks
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.pptSathish24111
 
Spark Kafka summit 2017
Spark Kafka summit 2017Spark Kafka summit 2017
Spark Kafka summit 2017ajay_ei
 
Architecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationArchitecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationGeorge Long
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poliivascucristian
 
Splunk and map_reduce
Splunk and map_reduceSplunk and map_reduce
Splunk and map_reduceGreg Hanchin
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkC4Media
 
Lyft talks #4 Orchestrating big data and ML pipelines at Lyft
Lyft talks #4 Orchestrating big data and ML pipelines at LyftLyft talks #4 Orchestrating big data and ML pipelines at Lyft
Lyft talks #4 Orchestrating big data and ML pipelines at LyftConstantine Slisenka
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?samthemonad
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesJon Meredith
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Landon Robinson
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsDatabricks
 
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)Amazon Web Services
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleJim Dowling
 
Discovery & Consumption of Analytics Data @Twitter
Discovery & Consumption of Analytics Data @TwitterDiscovery & Consumption of Analytics Data @Twitter
Discovery & Consumption of Analytics Data @TwitterKamran Munshi
 

Ähnlich wie Putting Lipstick on Apache Pig at Netflix (20)

Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming Architecture
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Spark Kafka summit 2017
Spark Kafka summit 2017Spark Kafka summit 2017
Spark Kafka summit 2017
 
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
 
Architecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationArchitecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & Manipulation
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
 
Splunk and map_reduce
Splunk and map_reduceSplunk and map_reduce
Splunk and map_reduce
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
 
Lyft talks #4 Orchestrating big data and ML pipelines at Lyft
Lyft talks #4 Orchestrating big data and ML pipelines at LyftLyft talks #4 Orchestrating big data and ML pipelines at Lyft
Lyft talks #4 Orchestrating big data and ML pipelines at Lyft
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
 
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
 
Hadoop
HadoopHadoop
Hadoop
 
Discovery & Consumption of Analytics Data @Twitter
Discovery & Consumption of Analytics Data @TwitterDiscovery & Consumption of Analytics Data @Twitter
Discovery & Consumption of Analytics Data @Twitter
 

Kürzlich hochgeladen

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Kürzlich hochgeladen (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Putting Lipstick on Apache Pig at Netflix

  • 1. Putting Lipstick on Apache Pig Big Data Gurus Meetup August 14, 2013
  • 2. Data should be accessible, easy to discover, and easy to process for everyone. Motivation
  • 3. Big Data Users at Netflix Analysts Engineers Desires Self Service Easy Rich Toolset Rich APIs A Single Platform / Data Architecture that Serves Both Groups
  • 4. Netflix Data Warehouse - Storage S3 is the source of truth Decouples storage from processing. Persistent data; multiple/ transient Hadoop clusters Data sources Event data from cloud services via Ursula/Honu Dimension data from Cassandra via Aegisthus ~100 billion events processed / day Petabytes of data persisted and available to queries on S3.
  • 5. Netflix Data Platform - Processing Long running clusters sla and ad-hoc Supplemental nightly bonus clusters For high priority ETL jobs 2,000+ instances in aggregate across the clusters
  • 6. Netflix Hadoop Platform as a Service S3 https://github.com/Netflix/genie
  • 7. Netflix Data Platform – Primitive Service Layer Primitive, decoupled services Building blocks for more complicated tools/services/apps Serves 1000s of MapReduce Jobs / day 100+ jobs concurrently
  • 8. Netflix Data Platform – Tools Sting (Adhoc Visualization) Looper (Backloading) Forklift (Data Movement) Ignite (A/B Test Analytics) Lipstick (Workflow Visualization) Spock (Data Auditing) Heavily utilize services in the primitive layer. Follow the same design philosophy as primitive apps: RESTful API Decoupled javascript interfaces
  • 9. Pig and Hive at Netflix • Hive – AdHoc queries – Lightweight aggregation • Pig – Complex Dataflows / ETL – Data movement “glue” between complex operations
  • 10. What is Pig? • A data flow language • Simple to learn – Very few reserved words – Comparable to a SQL logical query plan • Easy to extend and optimize • Extendable via UDFs written in multiple languages – Java, Python, Ruby, Groovy, Javascript
  • 11. Sample Pig Script* (Word Count) input_lines = LOAD '/tmp/my-copy-of-all-pages-on-internet' AS (line:chararray); -- Extract words from each line and put them into a pig bag -- datatype, then flatten the bag to get one word on each row words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word; -- filter out any words that are just white spaces filtered_words = FILTER words BY word MATCHES 'w+'; -- create a group for each word word_groups = GROUP filtered_words BY word; -- count the entries in each group word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS count, group AS word; -- order the records by count ordered_word_count = ORDER word_count BY count DESC; STORE ordered_word_count INTO '/tmp/number-of-words-on-internet'; * http://en.wikipedia.org/wiki/Pig_(programming_tool)#Example
  • 12. A Typical Pig Script
  • 13. Pig… • Data flows are easy & flexible to express in text – Facilitates code reuse via UDFs and macros – Allows logical grouping of operations vs grouping by order of execution. – But errors are easy to make and overlook. • Scripts can quickly get complicated • Visualization quickly draws attention to: – Common errors – Execution order / logical flow – Optimization opportunities
  • 14. Lipstick • Generates graphical representations of Pig data flows. • Compatible with Apache Pig v11+ • Has been used to monitor more than 25,000 Pig jobs at Netflix
  • 18. Logical Operator (reduce side) Logical Operator (map side) Map/Reduce Job Intermediate Row Count Records Loaded
  • 20. Lipstick for Fast Development • During development: – Keep track of data flow – Spot common errors • Omitted (hanging) operators • Data type issues – Easily estimate and optimize complexity • Number of MR jobs generated • Map only vs full Map/Reduce jobs • Opportunities to rejigger logic to: – Combine multiple jobs into a single job – Manipulate execution order to achieve better parallelism (e.g. less blocking)
  • 21. Lipstick for Job Monitoring • During execution: – Graphically monitor execution status from a single console – Spot optimization opportunities • Map vs reduce side joins • Data skew • Better parallelism settings
  • 22. Lipstick for Support • Empowers users to support themselves – Better operational visibility • What is my script currently doing? • Why is my script slow? – Examine intermediate output of jobs – All execution information in one place • Facilitates communication between infrastructure / support teams and end users – Lipstick link contains all information needed to provide support.
  • 23. Lipstick Architecture Pig v11+ lipstick-console.jar Lipstick Server (RESTful Grails app) Javascript Client (Frontend GUI) RDS Persistence
  • 24. Lipstick Architecture - Console • Implements PigProgressNotificationListener interface • Listens for: 1. New statements to be registered (unoptimized plan) 2. Script launched event (optimized, physical, M/R plan) 3. MR Job completion/failure event 4. Heartbeat progress (during execution) • Pig Plans and Progress  Lipstick objects • Communicates with Lipstick Server
  • 25. Pig Compilation Plans Optimized Logical Plan Physical Plan MapReduce Plan (grouping of Physical Operators into map or reduce jobs) Pig Script Unoptimized Logical Plan (~1:1 logical operator / line of Pig) Lipstick associates Logical Operators with MapReduce jobs by inferring relationships between Logical and Physical Operations.
  • 26. Lipstick Architecture - Server • Simple REST interface • It’s a Grails app! • Pig client posts plans and puts progress • Javascript client • gets plans and progress • Searches jobs by job name and user name
  • 27. Lipstick Architecture – JS Client • Displays and annotates graphs with status / progress • Completely decoupled from Server • Event based design • Periodically polls Server for job progress • Usability is a key focus
  • 28. My Job has stalled. Solving Problems with Lipstick - Common Problem #1
  • 29.
  • 31. I didn’t get the data I was expecting Common Problem #2
  • 32.
  • 33.
  • 34. I don’t understand why my job failed. Common Problem #3
  • 35. Failed Job (light red background) Successful Job (light blue background)
  • 36. Future of Lipstick • Annotate common errors and inefficiencies on the graph – Skew / map side join opportunities / scalar issues – E.g. Warnings / error dashboard • Provide better details of runtime performance – Timings annotated on graph – Min / median / max mapper and reducer times – Map / reduce completion over time • Search through execution history – Examine trends in runtime and data volumes – History of failure / success • Search jobs for commonalities – Common datasets loaded / saved – Better grasp data lineage – Common uses of UDFs and macros
  • 39. Wrapping up • Lipstick is part of Netflix OSS. • Clone it on github at http://github.com/Netflix/Lipstick • Check out the quickstart guide – https://github.com/Netflix/Lipstick/wiki/Getting- Started#1-quick-start – Get started playing with Lipstick in under 5 minutes! • We happily welcome your feedback and contributions!
  • 40.  Jeff Magnusson: jmagnusson@netflix.com | http://www.linkedin.com/in/jmagnuss |@jeffmagnusson Thank you! Jobs: http://jobs.netflix.com Netflix OSS: http://netflix.github.io Tech Blog: http://techblog.netflix.com/