SlideShare ist ein Scribd-Unternehmen logo
1 von 37
LinkedIn Segmentation & Targeting
Platform: A Big Data Application
Hadoop Summit, June 2013
Hien Luu, Sid Anand
©2013 LinkedIn Corporation. All Rights Reserved.
About Us
*
Hien Luu Sid Anand
©2013 LinkedIn Corporation. All Rights Reserved.
Our mission
Connect the world’s professionals to make
them more productive and successful
Over 200M members and counting
2 4 8
17
32
55
90
145
2004 2005 2006 2007 2008 2009 2010 2011 2012
LinkedIn Members (Millions)
200+
The world’s largest professional network
Growing at more than 2 members/sec
Source :
http://press.linkedin.com/about
©2013 LinkedIn Corporation. All Rights Reserved.
*
>88%Fortune 100 Companies
use LinkedIn Talent Soln to hire
Company Pages
>2.9M
Professional searches in 2012
>5.7B
Languages
19
>30MFastest growing demographic:
Students and NCGs
The world’s largest professional network
Over 64% of members are now international
Source :
http://press.linkedin.com/about
©2013 LinkedIn Corporation. All Rights Reserved.
Other Company Facts
*
• Headquartered in Mountain View, Calif., with offices around the world!
• As of June 1, 2013, LinkedIn has ~3,700 full-time employees located around
the world
Source :
http://press.linkedin.com/about
Agenda
 Company Overview
• Big Data @ LinkedIn
• The Segmentation & Targeting Problem
• Solution : LinkedIn Segmentation & Targeting Platform
• Q & A
Big Data @ LinkedIn
©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn : Big Data Story
©2013 LinkedIn Corporation. All Rights Reserved.
Our Big Data Story depends on Infrastructure!
• On-line Data Infrastructure
• Near-line Data Infrastructure
• Offline Data Infrastructure
Oracle or
Espresso
Updates
Web
Serving
Teradata
Data Streams
Near-lineOn-line Off-line
Big Data Story : On-line Data
©2013 LinkedIn Corporation. All Rights Reserved.
On-line Data Infrastructure
• Supports typical OLTP requirements
• Highly concurrent R/W access
• Transactional guarantees
• Back-up & Recovery
• Supports a central LinkedIn Data Principle!
• “All data everywhere”
• All OLTP databases need to provide a
time-line consistent change stream
• For this, we developed and open-
sourced Databus!
Oracle or
Espresso
Updates
Web
Serving
On-line
Big Data Story : On-line Data
Oracle or
Espresso Data Change Events
Search
Index
Graph
Index
Read
Replicas
Updates
Standar
dization
A user updates the company, title, & school on his profile. He also accepts a
connection
The write is made to an Oracle or Espresso Master and DataBus replicates it:
• the profile change is applied to the Standardization service
 E.g. the many forms of IBM were canonicalized for search-friendliness
• …. and to the Search Index
 Recruiters can find you immediately by new keywords
• the connection change is applied to the Graph Index service
 The user can now start receiving feed updates from his new connections
Big Data Story : On-line Data
Databus streams also update Hadoop!
Oracle or
Espresso
Search
Index
Graph
Index
Read
Replica
Updates
Standar
dization
Data Change Events
Big Data Story : Near-line & Off-line Data
©2013 LinkedIn Corporation. All Rights Reserved.
2 Main Sources of Data @ LinkedIn
• User-provided data
• e.g. Member Profile data (e.g. employment, education history, endorsements)
• Tracking data via web site instrumentation
• e.g. pages viewed, email opened/sent, social gestures : posts/likes/shares
Oracle or
Espresso
Updates
Databus
Web
Servers
Teradata
The
Segmentation & Targeting
Problem
©2013 LinkedIn Corporation. All Rights Reserved.
Segmentation & Targeting
Segmentation & Targeting Attribute types
Bhaskar Ghosh
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Step 1 : Take some information about users
Member ID Join Date Country Responded to
Promotion X1
1 01/01/2013 FR F
2 01/02/2013 BE F
3 01/03/2013 FR F
4 02/01/2013 FR T
Step 2 : Provide some targeting criteria for a new promotion
Pick members where
• Join Date between('01/01/2013", '01/31/2013") and
• Country="FR" and
• Responded to Promotion X1="F"
 Members 1 & 3
Step 3 : Target them for a different email campaign (promotion_X2)
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Step 1 : Take some information about users
Member ID Join Date Country Responded to
Promotion X1
1 01/01/2013 FR F
2 01/02/2013 BE F
3 01/03/2013 FR F
4 02/01/2013 FR T
Step 2 : Provide some targeting criteria for a new promotion
Pick members where
• Join Date between('01/01/2013", '01/31/2013") and
• Country="FR" and
• Responded to Promotion X1="F"
 Members 1 & 3
Step 3 : Target them for a different email campaign (promotion_X2)
Attributes
Segment
Definition
Segment
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Problem Definition
• The business wants to launch new campaigns often
• The business wants to specify targeting criteria (segment
definitions) using an arbitrary set of attributes
• The attributes often need to be computed to fulfill the targeting
criteria
• This data resides on Hadoop or TD
• The business is most comfortable with SQL-like languages
Segmentation & Targeting Solution
©2013 LinkedIn Corporation. All Rights Reserved.
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute
Computation
Engine
Attribute
Serving
Engine
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute
Computation
Engine
Self-service
Support various
data sources
Attribute
consolidation
Attribute
availability
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute computation
~225M
PB
TB
TB
~240
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute Portal Web Application
Attribute & Definition
Metadata
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute &
Definition
Metadata
TD Executor
Hive Executor
Pig Executor
REST
REST
REST
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
M/R
Stitcher
/path/dataset1
/path/dataset2
/path/dataset3
/path/dataset4
/path/lnkd_big_table
Data
Loader
Attribute consolidation & availability
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn big table, the most sought after data
Segmentation
Propensity
Model
Ad hoc analysis
LinkedIn big table
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute
Serving
Engine
Self-service
Attribute predicate
expression
Build
segments
Build lists
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Serving Engine
$
count filter sum
complex
expressions
Σ1234
LinkedIn big table
~225M
~240
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Inverted
Index
Inverted
Index
Inverted
Index
M/R
Indexer
LinkedIn big table
Attribute &
Definition
Metadata
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Who are north American recruiters that
don’t work for a competitor?
Who are the LinkedIn Talent Solution prospects
in Europe?
Who are the job seekers?
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
JSON Predicate
Expression
JSON Lucene
Query Parser
Inverted
Index
Inverted
Index
Inverted
Index
Segment &
List
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Complex tree-like attribute predicate expressions
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
A marketing campaign is represented by a list
Conclusion
©2013 LinkedIn Corporation. All Rights Reserved.
Move at business speed and scale at LinkedIn scale
 Segmentation & Targeting Platform
– Self-service
– Multiple data sources & massive data volume
– Support complex expression evaluation in seconds
– Attribute availability at business speed
Engineering Team
 Jessica Ho
 Swetha Karthik
 Raj Rangaswamy
 Tony Tong
 Ajinkya Harkare
 Hien Luu
 Sid Anand
©2013 LinkedIn Corporation. All Rights Reserved.
Questions?
More info: data.linkedin.com
©2013 LinkedIn Corporation. All Rights Reserved.

Weitere ähnliche Inhalte

Was ist angesagt?

Social Media Strategy: Nike, Adidas, Puma
Social Media Strategy: Nike, Adidas, PumaSocial Media Strategy: Nike, Adidas, Puma
Social Media Strategy: Nike, Adidas, PumaAlexandra Dragic
 
BUAD 497 Class Project: Adidas Consulting project
BUAD 497 Class Project: Adidas Consulting projectBUAD 497 Class Project: Adidas Consulting project
BUAD 497 Class Project: Adidas Consulting projectAlex Sugano
 
Private label in Big bazaar
Private label in Big bazaarPrivate label in Big bazaar
Private label in Big bazaarMadhu K
 

Was ist angesagt? (6)

Social Media Strategy: Nike, Adidas, Puma
Social Media Strategy: Nike, Adidas, PumaSocial Media Strategy: Nike, Adidas, Puma
Social Media Strategy: Nike, Adidas, Puma
 
A Report on Nike
A Report on NikeA Report on Nike
A Report on Nike
 
Bata company
Bata companyBata company
Bata company
 
Adidas
AdidasAdidas
Adidas
 
BUAD 497 Class Project: Adidas Consulting project
BUAD 497 Class Project: Adidas Consulting projectBUAD 497 Class Project: Adidas Consulting project
BUAD 497 Class Project: Adidas Consulting project
 
Private label in Big bazaar
Private label in Big bazaarPrivate label in Big bazaar
Private label in Big bazaar
 

Andere mochten auch

Segmentation and Messaging 2014Aug LinkedIn
Segmentation and Messaging 2014Aug LinkedInSegmentation and Messaging 2014Aug LinkedIn
Segmentation and Messaging 2014Aug LinkedInchristyaron
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platformlucenerevolution
 
Targeting, Segmentation and Messaging Approaches for Marketing and Sales Effe...
Targeting, Segmentation and Messaging Approaches for Marketing and Sales Effe...Targeting, Segmentation and Messaging Approaches for Marketing and Sales Effe...
Targeting, Segmentation and Messaging Approaches for Marketing and Sales Effe...christyaron
 
Lessons learned from growing LinkedIn to 400m members - Growth Hackers Confer...
Lessons learned from growing LinkedIn to 400m members - Growth Hackers Confer...Lessons learned from growing LinkedIn to 400m members - Growth Hackers Confer...
Lessons learned from growing LinkedIn to 400m members - Growth Hackers Confer...Aatif Awan
 
How we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBaseHow we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBaseDataWorks Summit
 
How LinkedIn built a Community of Half a Billion
How LinkedIn built a Community of Half a BillionHow LinkedIn built a Community of Half a Billion
How LinkedIn built a Community of Half a BillionAatif Awan
 
Market segmentation presentation
Market segmentation presentationMarket segmentation presentation
Market segmentation presentationAmol Salve
 

Andere mochten auch (8)

Segmentation and Messaging 2014Aug LinkedIn
Segmentation and Messaging 2014Aug LinkedInSegmentation and Messaging 2014Aug LinkedIn
Segmentation and Messaging 2014Aug LinkedIn
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
 
LinkedIn Targeting
LinkedIn TargetingLinkedIn Targeting
LinkedIn Targeting
 
Targeting, Segmentation and Messaging Approaches for Marketing and Sales Effe...
Targeting, Segmentation and Messaging Approaches for Marketing and Sales Effe...Targeting, Segmentation and Messaging Approaches for Marketing and Sales Effe...
Targeting, Segmentation and Messaging Approaches for Marketing and Sales Effe...
 
Lessons learned from growing LinkedIn to 400m members - Growth Hackers Confer...
Lessons learned from growing LinkedIn to 400m members - Growth Hackers Confer...Lessons learned from growing LinkedIn to 400m members - Growth Hackers Confer...
Lessons learned from growing LinkedIn to 400m members - Growth Hackers Confer...
 
How we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBaseHow we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBase
 
How LinkedIn built a Community of Half a Billion
How LinkedIn built a Community of Half a BillionHow LinkedIn built a Community of Half a Billion
How LinkedIn built a Community of Half a Billion
 
Market segmentation presentation
Market segmentation presentationMarket segmentation presentation
Market segmentation presentation
 

Ähnlich wie LinkedIn Segmentation & Targeting Platform: A Big Data Application

LinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationLinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationAmy W. Tang
 
Big Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInBig Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInMinh-Hoang Nguyen
 
Linked in for small businesses 2013
Linked in for small businesses 2013Linked in for small businesses 2013
Linked in for small businesses 2013Richard Masters
 
LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)Jun Rao
 
What Are The Best LinkedIn Email Scrapers To Get Unlimited Emails.pdf
What Are The Best LinkedIn Email Scrapers To Get Unlimited Emails.pdfWhat Are The Best LinkedIn Email Scrapers To Get Unlimited Emails.pdf
What Are The Best LinkedIn Email Scrapers To Get Unlimited Emails.pdfAqsaBatool21
 
#SPSOttawa introduction to the #microsoftGraph
#SPSOttawa introduction to the #microsoftGraph#SPSOttawa introduction to the #microsoftGraph
#SPSOttawa introduction to the #microsoftGraphVincent Biret
 
Linked in stream experimentation framework
Linked in stream experimentation frameworkLinked in stream experimentation framework
Linked in stream experimentation frameworkJoseph Adler
 
Big data arch_analytics
Big data arch_analyticsBig data arch_analytics
Big data arch_analyticsSrinu Adira
 
Unveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdf
Unveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdfUnveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdf
Unveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdfAqsaBatool21
 
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012Bhaskar Ghosh
 
Hive at LinkedIn
Hive at LinkedIn Hive at LinkedIn
Hive at LinkedIn mislam77
 
Common Data Model - A Business Database!
Common Data Model - A Business Database!Common Data Model - A Business Database!
Common Data Model - A Business Database!Pedro Azevedo
 
How Can I Extract Leads From LinkedIn Profiles.pdf
How Can I Extract Leads From LinkedIn Profiles.pdfHow Can I Extract Leads From LinkedIn Profiles.pdf
How Can I Extract Leads From LinkedIn Profiles.pdfAqsaBatool21
 
Age of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide DiscoveryAge of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide DiscoveryInside Analysis
 
Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)
Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)
Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)vivekkaushik795
 
#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...
#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...
#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...Vincent Biret
 
The Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with VirtualizationThe Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with VirtualizationInside Analysis
 
Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution  Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution Sirinporn Setworaya
 

Ähnlich wie LinkedIn Segmentation & Targeting Platform: A Big Data Application (20)

LinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationLinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data Application
 
Big Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInBig Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedIn
 
Ict careers
Ict careersIct careers
Ict careers
 
Linked in for small businesses 2013
Linked in for small businesses 2013Linked in for small businesses 2013
Linked in for small businesses 2013
 
LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)
 
What Are The Best LinkedIn Email Scrapers To Get Unlimited Emails.pdf
What Are The Best LinkedIn Email Scrapers To Get Unlimited Emails.pdfWhat Are The Best LinkedIn Email Scrapers To Get Unlimited Emails.pdf
What Are The Best LinkedIn Email Scrapers To Get Unlimited Emails.pdf
 
#SPSOttawa introduction to the #microsoftGraph
#SPSOttawa introduction to the #microsoftGraph#SPSOttawa introduction to the #microsoftGraph
#SPSOttawa introduction to the #microsoftGraph
 
Linked in stream experimentation framework
Linked in stream experimentation frameworkLinked in stream experimentation framework
Linked in stream experimentation framework
 
Big data arch_analytics
Big data arch_analyticsBig data arch_analytics
Big data arch_analytics
 
Add-On Demo
Add-On DemoAdd-On Demo
Add-On Demo
 
Unveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdf
Unveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdfUnveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdf
Unveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdf
 
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
 
Hive at LinkedIn
Hive at LinkedIn Hive at LinkedIn
Hive at LinkedIn
 
Common Data Model - A Business Database!
Common Data Model - A Business Database!Common Data Model - A Business Database!
Common Data Model - A Business Database!
 
How Can I Extract Leads From LinkedIn Profiles.pdf
How Can I Extract Leads From LinkedIn Profiles.pdfHow Can I Extract Leads From LinkedIn Profiles.pdf
How Can I Extract Leads From LinkedIn Profiles.pdf
 
Age of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide DiscoveryAge of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide Discovery
 
Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)
Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)
Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)
 
#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...
#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...
#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...
 
The Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with VirtualizationThe Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with Virtualization
 
Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution  Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution
 

Mehr von Sid Anand

Building High Fidelity Data Streams (QCon London 2023)
Building High Fidelity Data Streams (QCon London 2023)Building High Fidelity Data Streams (QCon London 2023)
Building High Fidelity Data Streams (QCon London 2023)Sid Anand
 
Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021Sid Anand
 
Low Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionLow Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionSid Anand
 
YOW! Data Keynote (2021)
YOW! Data Keynote (2021)YOW! Data Keynote (2021)
YOW! Data Keynote (2021)Sid Anand
 
Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Sid Anand
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowSid Anand
 
Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)Sid Anand
 
Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)Sid Anand
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Sid Anand
 
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese)  - QCon TokyoCloud Native Data Pipelines (in Eng & Japanese)  - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese) - QCon TokyoSid Anand
 
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)Sid Anand
 
Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Sid Anand
 
Airflow @ Agari
Airflow @ Agari Airflow @ Agari
Airflow @ Agari Sid Anand
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Sid Anand
 
Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Sid Anand
 
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Sid Anand
 
Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)Sid Anand
 
Hands On with Maven
Hands On with MavenHands On with Maven
Hands On with MavenSid Anand
 
Learning git
Learning gitLearning git
Learning gitSid Anand
 
LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)Sid Anand
 

Mehr von Sid Anand (20)

Building High Fidelity Data Streams (QCon London 2023)
Building High Fidelity Data Streams (QCon London 2023)Building High Fidelity Data Streams (QCon London 2023)
Building High Fidelity Data Streams (QCon London 2023)
 
Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021
 
Low Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionLow Latency Fraud Detection & Prevention
Low Latency Fraud Detection & Prevention
 
YOW! Data Keynote (2021)
YOW! Data Keynote (2021)YOW! Data Keynote (2021)
YOW! Data Keynote (2021)
 
Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache Airflow
 
Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)
 
Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)
 
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese)  - QCon TokyoCloud Native Data Pipelines (in Eng & Japanese)  - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
 
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
 
Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016
 
Airflow @ Agari
Airflow @ Agari Airflow @ Agari
Airflow @ Agari
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
 
Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)
 
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
 
Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)
 
Hands On with Maven
Hands On with MavenHands On with Maven
Hands On with Maven
 
Learning git
Learning gitLearning git
Learning git
 
LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)
 

Kürzlich hochgeladen

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Kürzlich hochgeladen (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

LinkedIn Segmentation & Targeting Platform: A Big Data Application

  • 1. LinkedIn Segmentation & Targeting Platform: A Big Data Application Hadoop Summit, June 2013 Hien Luu, Sid Anand ©2013 LinkedIn Corporation. All Rights Reserved.
  • 3. ©2013 LinkedIn Corporation. All Rights Reserved. Our mission Connect the world’s professionals to make them more productive and successful
  • 4. Over 200M members and counting 2 4 8 17 32 55 90 145 2004 2005 2006 2007 2008 2009 2010 2011 2012 LinkedIn Members (Millions) 200+ The world’s largest professional network Growing at more than 2 members/sec Source : http://press.linkedin.com/about ©2013 LinkedIn Corporation. All Rights Reserved.
  • 5. * >88%Fortune 100 Companies use LinkedIn Talent Soln to hire Company Pages >2.9M Professional searches in 2012 >5.7B Languages 19 >30MFastest growing demographic: Students and NCGs The world’s largest professional network Over 64% of members are now international Source : http://press.linkedin.com/about ©2013 LinkedIn Corporation. All Rights Reserved.
  • 6. Other Company Facts * • Headquartered in Mountain View, Calif., with offices around the world! • As of June 1, 2013, LinkedIn has ~3,700 full-time employees located around the world Source : http://press.linkedin.com/about
  • 7. Agenda  Company Overview • Big Data @ LinkedIn • The Segmentation & Targeting Problem • Solution : LinkedIn Segmentation & Targeting Platform • Q & A
  • 8. Big Data @ LinkedIn ©2013 LinkedIn Corporation. All Rights Reserved.
  • 9. LinkedIn : Big Data Story ©2013 LinkedIn Corporation. All Rights Reserved. Our Big Data Story depends on Infrastructure! • On-line Data Infrastructure • Near-line Data Infrastructure • Offline Data Infrastructure Oracle or Espresso Updates Web Serving Teradata Data Streams Near-lineOn-line Off-line
  • 10. Big Data Story : On-line Data ©2013 LinkedIn Corporation. All Rights Reserved. On-line Data Infrastructure • Supports typical OLTP requirements • Highly concurrent R/W access • Transactional guarantees • Back-up & Recovery • Supports a central LinkedIn Data Principle! • “All data everywhere” • All OLTP databases need to provide a time-line consistent change stream • For this, we developed and open- sourced Databus! Oracle or Espresso Updates Web Serving On-line
  • 11. Big Data Story : On-line Data Oracle or Espresso Data Change Events Search Index Graph Index Read Replicas Updates Standar dization A user updates the company, title, & school on his profile. He also accepts a connection The write is made to an Oracle or Espresso Master and DataBus replicates it: • the profile change is applied to the Standardization service  E.g. the many forms of IBM were canonicalized for search-friendliness • …. and to the Search Index  Recruiters can find you immediately by new keywords • the connection change is applied to the Graph Index service  The user can now start receiving feed updates from his new connections
  • 12. Big Data Story : On-line Data Databus streams also update Hadoop! Oracle or Espresso Search Index Graph Index Read Replica Updates Standar dization Data Change Events
  • 13. Big Data Story : Near-line & Off-line Data ©2013 LinkedIn Corporation. All Rights Reserved. 2 Main Sources of Data @ LinkedIn • User-provided data • e.g. Member Profile data (e.g. employment, education history, endorsements) • Tracking data via web site instrumentation • e.g. pages viewed, email opened/sent, social gestures : posts/likes/shares Oracle or Espresso Updates Databus Web Servers Teradata
  • 14. The Segmentation & Targeting Problem ©2013 LinkedIn Corporation. All Rights Reserved.
  • 16. Segmentation & Targeting Attribute types Bhaskar Ghosh
  • 17. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Step 1 : Take some information about users Member ID Join Date Country Responded to Promotion X1 1 01/01/2013 FR F 2 01/02/2013 BE F 3 01/03/2013 FR F 4 02/01/2013 FR T Step 2 : Provide some targeting criteria for a new promotion Pick members where • Join Date between('01/01/2013", '01/31/2013") and • Country="FR" and • Responded to Promotion X1="F"  Members 1 & 3 Step 3 : Target them for a different email campaign (promotion_X2)
  • 18. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Step 1 : Take some information about users Member ID Join Date Country Responded to Promotion X1 1 01/01/2013 FR F 2 01/02/2013 BE F 3 01/03/2013 FR F 4 02/01/2013 FR T Step 2 : Provide some targeting criteria for a new promotion Pick members where • Join Date between('01/01/2013", '01/31/2013") and • Country="FR" and • Responded to Promotion X1="F"  Members 1 & 3 Step 3 : Target them for a different email campaign (promotion_X2) Attributes Segment Definition Segment
  • 19. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Problem Definition • The business wants to launch new campaigns often • The business wants to specify targeting criteria (segment definitions) using an arbitrary set of attributes • The attributes often need to be computed to fulfill the targeting criteria • This data resides on Hadoop or TD • The business is most comfortable with SQL-like languages
  • 20. Segmentation & Targeting Solution ©2013 LinkedIn Corporation. All Rights Reserved.
  • 21. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Computation Engine Attribute Serving Engine
  • 22. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Computation Engine Self-service Support various data sources Attribute consolidation Attribute availability
  • 23. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute computation ~225M PB TB TB ~240
  • 24. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Portal Web Application Attribute & Definition Metadata
  • 25. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Attribute & Definition Metadata TD Executor Hive Executor Pig Executor REST REST REST
  • 26. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. M/R Stitcher /path/dataset1 /path/dataset2 /path/dataset3 /path/dataset4 /path/lnkd_big_table Data Loader Attribute consolidation & availability
  • 27. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. LinkedIn big table, the most sought after data Segmentation Propensity Model Ad hoc analysis LinkedIn big table
  • 28. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Serving Engine Self-service Attribute predicate expression Build segments Build lists
  • 29. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Serving Engine $ count filter sum complex expressions Σ1234 LinkedIn big table ~225M ~240
  • 30. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Inverted Index Inverted Index Inverted Index M/R Indexer LinkedIn big table Attribute & Definition Metadata
  • 31. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Who are north American recruiters that don’t work for a competitor? Who are the LinkedIn Talent Solution prospects in Europe? Who are the job seekers?
  • 32. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. JSON Predicate Expression JSON Lucene Query Parser Inverted Index Inverted Index Inverted Index Segment & List
  • 33. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Complex tree-like attribute predicate expressions
  • 34. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. A marketing campaign is represented by a list
  • 35. Conclusion ©2013 LinkedIn Corporation. All Rights Reserved. Move at business speed and scale at LinkedIn scale  Segmentation & Targeting Platform – Self-service – Multiple data sources & massive data volume – Support complex expression evaluation in seconds – Attribute availability at business speed
  • 36. Engineering Team  Jessica Ho  Swetha Karthik  Raj Rangaswamy  Tony Tong  Ajinkya Harkare  Hien Luu  Sid Anand ©2013 LinkedIn Corporation. All Rights Reserved.
  • 37. Questions? More info: data.linkedin.com ©2013 LinkedIn Corporation. All Rights Reserved.

Hinweis der Redaktion

  1. We’re making great strides toward our mission:LinkedIn has over 225 million members, and we’re now adding more than two members per second. This is the fastest rate of absolute member growth in the company’s history. Sixty-four percent of LinkedIn members are currently located outside of the United States.LinkedIn counts executives from all 2012 Fortune 500 companies as members; its corporate talent solutions are used by 88 of the Fortune 100 companies.More than 2.9 million companies have LinkedIn Company Pages.LinkedIn members did over 5.7 billion professionally-oriented searches on the platform in 2012.[See http://press.linkedin.com/about for a complete list of LinkedIn facts and stats]
  2. Email Campaign & Ad targetingAcquire new paid customersRetain and engage existing customersPromote new productsTraining and other important announcements* Talk about the speed of changing segmentation and targeting criteria
  3. Professional identitySocial dataBehavioral
  4. Given the business problem that Sid outlined, the solution we came up with has two partsThe first part is about compute attributes based on the attribute definitionThe second part is about serving the attribute values to define segments, effectively performing user segmentation
  5. The attribute computation engine needs to support these 4 high level requirementsSelf-service meaning thatThere needs to be an easy way for someone on the business team to express the computational logic to compute a set of attributes for the needs of their marketing campaignsThis engine takes care of the complexity in executing the computational logic in terms of when, how and where to store the computation resultSupport various data sourcesData are in multiple places – TD and Hadoop. We need support thatFortunately SQL and HiveSQL are very similarAttribute consolidationOnce all the attributes are computed, they needed to be consolidated into a single dataset to make it easy everyone to consume and analyzeData availabilityRegister with Hive and copy the data onto TD system for business folks to consume
  6. At the high level, the attribute computation engine needs to be able compute attributes that come from different data sets, and some of these data sets are huge.And this presents all kinds of interesting challenges, as you can imagineThe output of the computation engine is this big table – 225M roows, one for each member, ~240 columns, one for each attributesBehavioral Data Site Engagement,OL Transactions,Searches,Comments,Discussions….Social DataConnections,Follows,EndorsementsDemographic DataThis data comes from member profileLocation,Gender,Title,Function,Seniority,Education
  7. Self-service way to manage attributesA web application where a member of marketing operations or business analyst team can use to express the computation logic in the form SQL select statement. And we call that attribute definition.The SQL statement is either a Teradata SQL statement or Hive QL statementThe web application validates the SQL statements to make sure they are valid and plus we need to extract the attribute name and their types, which will be useful for various purposeThe metadata about the attribute definitions and attributes are captured in a MySQL database. For HIVE QL queries - we support Hive hints as well general tuning parameters like split sizeOnce an attribute definition passes the validation step, it will go through an approval process, which is designed toMake sure there is no attribute duplicatesMake sure the query properly tunedOne of the benefits of this attribute portal is the centralization attribute definitions and make it easy to discovery attributes, the logic behind these attributes and data sourcewhen someone starts working on a marketing campaign, they first identify the targeting criteria based on the goals of the campaignfrom the set of targeting criteria, they identify what are the needed member attributes
  8. Attribute computing workhorseThese executors are scheduled to run on a regular basisThey contact the attribute definition metadata repository to retrieve what attribute definitions to executeThey execute the query in parallel using APIsTD executorExecute using JDBC and store result in temporary tablesWe are using an in house library called LASSEN, which is an M/R library that leverages the power of MapReduce framework to quickly and efficiently download the data to HDFS. Hive executorProgrammatically execute these Hive queriesOne of the classes in Hive is not thread safe, therefore we can’t execute Hive QLs in parallel using multiple threads, so we use multiple Hive executors approach insteadPig executorExecute pig script filesHas the ability to rerun only the failed scriptsInteresting runtime detailsWe have all kinds of queries, simple one and complex ones. The complex ones that may take hours to complete. However we don’t want a query that takes 5 or 6 hours. That would delay the attribute computing phase for all the queries. Our system has a built in mechanism to kill a long running query that exceeds certain amount of timeWhat about failed queries – even though we validate them at the attribute def. submission time, some of them will fail at runtime due to various reason. Our system is built to be resilient against these failed queries. Only the attributes of the failed queries will not be available. Our system collects accounting information about each of the queries – so we know how many queries were successfully completed, how many failed and how long each takes.The output of each attribute definition is stored in a separated folder. So if we have 50 attribute definitions, the result of those queries are scattered across 50 places on Hadoop
  9. Once the executors are completed executing and materializing the attributesThe job of the stitcher is to combine all these attributes together into a single data set, which I call LinkedIn big tableIt is an MapReduce job and it acts as a gateway to perform some validations like member id must not be less than 0 or certain values can’t be longer than certain lengthThe output of sticher is a single data set in Avro format that contains one record for every single LinkedIn memberThis output is also registered in Hive for data scientists to consumeTo make the linkedIn big table available for business analysts to generate more insights and further analysis, this same date set is copied onto TD via Data Loader componentThe processing executing these attribute definitions or select statements, stitching the attributes together into s single dataset and load the data onto TD takes about 5 to 6 hours.Not all attributes need to be refreshed daily, so we have a concept partial refresh and full refreshPartial refresh – only a subset of needed attribute definitions are executed and it takes much less time – 2-3 hours vs 5 to 6 hrs
  10. Linkedin big table – 200GBThe LinkedIn big table is used for multiple purposesPropensity modelRanking model, where each member is assigned a certain score to indicate how likely a member belongs to certain class of member or likely to take an action.i.e job seeker, or how likely someone will upgrade to paid subscription.Business analysts and data scientistsFor their own analysis The most sought after dataA very rich data set that contains all kinds of interesting attributes about our members and it is all in a single place.Because of the heavy lifting has been done and data is available in a single placeOthers don’t to have hunt down what data sets
  11. Self-service – web application for business analysts and marketing team to useSomeone who is not familiar with SQLUI that support drag and dropAttribute predicate expression is basically a boolean expression that is evaluated to true or false by comparing an attribute value to an expected valueFor example, whether the value of country attribute is United States or whether a member has more than 30 connectionsIn order to build segments – we need a way for expressing attribute predicates i.e. country in canada or in united statesSave this expression and evaluate it at a later pointBuilding segmentCombining various attribute predicates into a segmentBuild listsCombining segments together to target a certain set of member population for a marketing campagin
  12. Based on the requirements I talked about in the previous slide, the serving engine needs to support the following features/operationsCount – how many members meet certain criteriaFilter members that meet certain criteriaSum – each member is assigned a life time value for a particular product, so we need the ability compute the total dollar amount of a segment based on how many members meet the defined criteriaComplex nested expression with support for conjunction (and) and disjunction (or)The core problem that the serving engine needs to solve is to support arbitrary predicate expression against any of the attributes and return the result in a reasonable amount of time. We basically think this is an information retrieval problem, so we leverage Lucene to help us with this problemTo support those arbitrary predicate expressions, we found Lucene to be pretty good at this kind of problem.
  13. Map reduce applicationConsume data in Avro format and create Lucece indexesUsing custom writable to wrap a Lucene documentEach Lucence document contains all the 240+ attributes for each memberUse custom OutputFormat to build Lucene index segmentStore on local disk of reducer taskCopy onto HDFS at the end of the reduce taskLinkedIn big table – 200GBIndex – 175GB* # of map and reduce task
  14. First one requires only one attributes – job seeker statusSecond requires two attributesTalent solution prospectsCountry where they work inFirst one would need 3 attributesWhether a member is a recruiterThe country that member works inWhether the company they work is considered a competitor of LinkedIn
  15. JSON Predicate Expression – use JSON to define the format of the predicate expression. JSON is well suited for this purpose and it supports nested data structure, fairly flexible, easy to parseSupports different data typesFor each data types, certain operators are supported.An JSON predicate expression consists of an attribute name, data type, operator, and one or more valuesThe JSON predication expression is the contract between the browser and serverStoring the predicate expression in mysql and evaluate it at run time
  16. Web applicationHas a UI for defining segments and listsSegment builderDrag arbitrary attributes and build predicate expressionsWith a click of a button, marketing team can get a sense of how many members meet the defined criteria define in the segmentThis will allow them a chance to change the criteria to increase the count for decrease the countSegments are meant as building blocks
  17. Segments are building blocks and certain reusable Each marketing campaign is represented by a list, which is a collection of segments, each segment can be one of the two types.Inclusions – include members that meet the defined criteria of each of the selected segmentsNet count and raw countExclusions – exclude those members
  18. One of things we are working on is to improve the turn around time for attributes – from the time an attribute is defined to the time it is available for building segments
  19. * Give a shout out for engineering team that work on this platform