SlideShare ist ein Scribd-Unternehmen logo
1 von 35
LinkedIn’s STREAM EXPERIMENTATION FRAMEWORK
Joseph Adler, Bee-Chung Chen, and Xin Fu
O’Reilly Strata Conference
February 12 2014

Š2014 LinkedIn Corporation. All Rights Reserved.
Š2014 LinkedIn Corporation. All Rights Reserved.
The LinkedIn Stream
Like many social networks, the
centerpiece of LinkedIn’s home
page is a news stream.
It contains

• Updates about users’ networks
• News stories and shares
• Recommendations

Š2014 LinkedIn Corporation. All Rights Reserved.
The LinkedIn Stream
We operate at a large scale.

• 277+ million members
• 75+ million monthly unique
•

users
5000+ employees

Š2014 LinkedIn Corporation. All Rights Reserved.
The LinkedIn Stream
Today, we’ll tell you how we
experiment with new content in
the stream:

• Creating new content
• Maximizing relevance
• Managing tests

Š2014 LinkedIn Corporation. All Rights Reserved.
History of the LinkedIn Stream
Network updates were
introduced in 2006
Back then, LinkedIn had

• 5mm members
• 875k monthly uniques
• 70 employees

Š2014 LinkedIn Corporation. All Rights Reserved.
History of the LinkedIn Stream
In practice this meant:

•Slow changing content, small

number of updates, weekly visit
rate

‣ No ranking/optimization

•Small number of active tests,
limited analytics resources

‣ Primitive resources for A/B tests

•Limited engineering resources
‣ Hacky solution for testing new
content...

Š2014 LinkedIn Corporation. All Rights Reserved.
History of the LinkedIn Stream

We experimented with new
content using a system called
the Analytics Prototype Engine,
or APE. It was implemented as
an ad slot on the home page.
Big wins included:

• People You May Know
• Groups You Might Like
• Jobs You Might Be Interested In
Š2014 LinkedIn Corporation. All Rights Reserved.
History of the LinkedIn Stream
We added more content over
the next couple of years:

•Status updates
•Twitter content
•Group discussions
•OpenSocial content (TripIt,
GitHub, and more...)

Š2014 LinkedIn Corporation. All Rights Reserved.
History of the LinkedIn Stream
By 2009, the stream looked
very similar to the stream
today.
LinkedIn was much bigger than
when we first added a news
stream...

• 55mm members
• 36mm monthly uniques
• 500 employees (end of year)

Š2014 LinkedIn Corporation. All Rights Reserved.
History of the LinkedIn Stream
… but the infrastructure hadn’t
changed much and we were
experiencing growing
pains:

•No system for ranking and
optimization:

‣ Users were overwhelmed with low
relevance updates

•No system for A/B testing

‣ Overlapping A/B tests, poor

experiment design, difficult analysis

•No system for rapid
prototyping/testing

‣ APE was making the site slow and
unstable, and was shut down

Š2014 LinkedIn Corporation. All Rights Reserved.
History of the Stream
In the rest of this talk, we’ll tell
you how we’ve addressed
these challenges (and used a
lot of data science to make this
happen).

Š2014 LinkedIn Corporation. All Rights Reserved.
Content Insertion
In the beginning (2006),
experiments happened outside
the stream through APE:

• Easy data uploads
• Management UI
• Templates

Š2014 LinkedIn Corporation. All Rights Reserved.
Content Insertion
Most new content experiments
boil down to one thing: creating
experimental data.
We wanted the data experts to
be able to create experiments
easily by focusing on data, not
on writing production code (and
wrestling with build systems,
deployment processes, etc).
We created a system that lets
data scientists push new
content into the stream by
writing scripts (in Pig, Hive, etc).
Š2014 LinkedIn Corporation. All Rights Reserved.
Content Insertion
Project Gorilla brought the spirit
of APE back to the home page,
inside the stream.

nhome

USCP

Federator

Gorilla First Pass
Ranker

Architecture diagram →

Gorilla Voldemort Store

Gorilla Batch

Gorilla jobs

Š2014 LinkedIn Corporation. All Rights Reserved.
Content Insertion
What does this consist of?

•An Apache Pig UDF for

pushing content
•A batch process that filters,
consolidates, and ranks
updates
•A process that pushes data
from Hadoop into Voldemort
(our NoSQL key/value store)
•An online system that fetches
updates from the store and
mixes them into the stream
Š2014 LinkedIn Corporation. All Rights Reserved.

nhome

USCP

Federator

Gorilla First Pass
Ranker

Gorilla Voldemort Store

Gorilla Batch

Gorilla jobs
Content Insertion
Our implementation is very simple:

•LinkedIn production systems use

rest.li as an API (JSON data +
schema)
•We create data offline on Hadoop,
put it in Voldemort, and surface it
through an API
This means that we can experiment
easily using existing templates,
tracking, etc; we just have to change
the data that’s rendered.
(We’re also experimenting with a
similar real time system based on
Apache Samza.)
Š2014 LinkedIn Corporation. All Rights Reserved.
Relevance Optimization
Bring each individual user the most relevant items from different
sources to optimize for a single or multiple measurable
objectives

Š2014 LinkedIn Corporation. All Rights Reserved.
Relevance Optimization

• Maximize users’ clicks on items in the stream
• Rank items according their click rates

• Probability that a user would click an item

• Predict the click rate based on

• User features: Profile, visit pattern, interests, …
• Item features: Type, topics, keywords, …
• User-item interaction features
• Context: Device, time of day, previous page …

Š2014 LinkedIn Corporation. All Rights Reserved.
Relevance Optimization
Large scale logistic regression

•Input: A set of past users’ responses to items
Response
1
0
…

Feature Vector
(Gender=M, JobTitle=CEO, ItemType=JobChange, ...)
(Gender=F, JobTitle=Engineer, ItemType=Article, ...)
…

•Output: Model parameters
•Challenge: Data too large to fit in a single machine
•Solution: Train a model using MapReduce on Hadoop

Š2014 LinkedIn Corporation. All Rights Reserved.
Relevance Optimization
Large scale Logistic Regression with ADMM

Large Input Data Set

Partition 1

Partition 2

Partition 3

…

Partition K

Logistic
Regression

Logistic
Regression

Logistic
Regression

…

Logistic
Regression

Consensus
Computation

Š2014 LinkedIn Corporation. All Rights Reserved.
Relevance Optimization
Large scale Logistic Regression with ADMM

Large Input Data Set

Partition 1

Partition 2

Partition 3

…

Partition K

Logistic
Regression

Logistic
Regression

Logistic
Regression

…

Logistic
Regression

Consensus
Computation

Š2014 LinkedIn Corporation. All Rights Reserved.
Relevance Optimization
Large scale Logistic Regression with ADMM

Large Input Data Set

Partition 1

Partition 2

Partition 3

…

Partition K

Logistic
Regression

Logistic
Regression

Logistic
Regression

…

Logistic
Regression

Consensus
Computation

Š2014 LinkedIn Corporation. All Rights Reserved.
Relevance Optimization
Large scale Logistic Regression with ADMM

Large Input Data Set

Partition 1

Partition 2

Partition 3

…

Partition K

Logistic
Regression

Logistic
Regression

Logistic
Regression

…

Logistic
Regression

Consensus
Computation

Š2014 LinkedIn Corporation. All Rights Reserved.
Relevance Optimization
Large scale Logistic Regression with ADMM

Large Input Data Set

Partition 1

Partition 2

Partition 3

…

Partition K

Logistic
Regression

Logistic
Regression

Logistic
Regression

…

Logistic
Regression

Consensus
Computation

Š2014 LinkedIn Corporation. All Rights Reserved.
Relevance Optimization
Large scale Logistic Regression with ADMM

Large Input Data Set

Partition 1

Partition 2

Partition 3

…

Partition K

Logistic
Regression

Logistic
Regression

Logistic
Regression

…

Logistic
Regression

Consensus
Computation

Š2014 LinkedIn Corporation. All Rights Reserved.
Relevance Optimization
Large scale Logistic Regression with ADMM

Large Input Data Set

Partition 1

Partition 2

Partition 3

…

Partition K

Logistic
Regression

Logistic
Regression

Logistic
Regression

…

Logistic
Regression

Consensus
Computation

Š2014 LinkedIn Corporation. All Rights Reserved.
Relevance Optimization
Large scale Logistic Regression with ADMM

Large Input Data Set

Partition 1

Partition 2

Partition 3

…

Partition K

Logistic
Regression

Logistic
Regression

Logistic
Regression

…

Logistic
Regression

Consensus
Computation

Š2014 LinkedIn Corporation. All Rights Reserved.
Relevance Optimization
Diversity
Users get tired when seeing items of the same type many times in the
stream.
Example: Group discussions
Drop in Click Rate
2 consecutive
discussions

21%

3 consecutive
discussions

48%

Š2014 LinkedIn Corporation. All Rights Reserved.
Relevance Optimization
Multi-Objective Optimization

• Different items in the stream generate different kinds of value
• Click
• Social actions: Like, share, comment, …
• Revenue from sponsored items
• One approach:

Maximize revenue s.t. clicks and social actions are
still within Îľ% of optimal

• It requires extensive experiments!

Š2014 LinkedIn Corporation. All Rights Reserved.
Experimentation Framework
Stream experiments are carried
out on LinkedIn’s central
experimentation platform:

• A one stop solution for feature
•
•

A/B testing, ramping, and
advanced targeting needs
Built-in power calculation to aid
experiment design
Automated reporting and
analysis capabilities
Mock­up of UI

Š2014 LinkedIn Corporation. All Rights Reserved.
Experimentation Framework

• History: assign members into test groups based on modulo of
Member IDs

• A very high likelihood of range overlaps between tests
• Just one experiment can negatively affect results of other tests
executed on the same page

• Now: deterministic pseudo-random algorithm for treatment
assignment computation

• Improved logging of treatment assignment
• Automated scorecards
• Record of historical experiments

Š2014 LinkedIn Corporation. All Rights Reserved.
Experimentation Framework

• History: focus on productspecific metrics

• Stream relevance change
•

⇒ CTR
Profile redesign
⇒ # of profile views

• Now: standardized, tiered
metric system

• Sitewide Tier 1 metrics
• Product-specific Tier 2 / Tier 3
•

metrics
Comprehensive understanding
of feature impact

Š2014 LinkedIn Corporation. All Rights Reserved.

Mock­up of UI
Conclusions
LinkedIn has always experimented with site content. As we’ve
grown, we’ve had to rethink how we experiment.
Key lessons:

•Managing experimentation at scale is hard
•Scale means users, content volume, and employees
•Invest in platforms if it saves time, money, labor.

Š2014 LinkedIn Corporation. All Rights Reserved.
Š2014 LinkedIn Corporation. All Rights Reserved.

Weitere ähnliche Inhalte

Was ist angesagt?

Sparkling Water 5 28-14
Sparkling Water 5 28-14Sparkling Water 5 28-14
Sparkling Water 5 28-14
Sri Ambati
 
A look inside pandas design and development
A look inside pandas design and developmentA look inside pandas design and development
A look inside pandas design and development
Wes McKinney
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Databricks
 

Was ist angesagt? (20)

Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...
 
Machine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlibMachine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlib
 
Hadoop to spark-v2
Hadoop to spark-v2Hadoop to spark-v2
Hadoop to spark-v2
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Big Data Science with H2O in R
Big Data Science with H2O in RBig Data Science with H2O in R
Big Data Science with H2O in R
 
Sparkling Water 5 28-14
Sparkling Water 5 28-14Sparkling Water 5 28-14
Sparkling Water 5 28-14
 
Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...
Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...
Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
 
PySaprk
PySaprkPySaprk
PySaprk
 
Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in Search
 
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
 
How to Integrate Spark MLlib and Apache Solr to Build Real-Time Entity Type R...
How to Integrate Spark MLlib and Apache Solr to Build Real-Time Entity Type R...How to Integrate Spark MLlib and Apache Solr to Build Real-Time Entity Type R...
How to Integrate Spark MLlib and Apache Solr to Build Real-Time Entity Type R...
 
A look inside pandas design and development
A look inside pandas design and developmentA look inside pandas design and development
A look inside pandas design and development
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
 
Strata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark communityStrata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark community
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
 
Sparkling pandas Letting Pandas Roam - PyData Seattle 2015
Sparkling pandas Letting Pandas Roam - PyData Seattle 2015Sparkling pandas Letting Pandas Roam - PyData Seattle 2015
Sparkling pandas Letting Pandas Roam - PyData Seattle 2015
 
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on TutorialsSparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
 

Andere mochten auch

Secrets Playa Mujeres
Secrets Playa MujeresSecrets Playa Mujeres
Secrets Playa Mujeres
chglat
 
Antigua Options
Antigua OptionsAntigua Options
Antigua Options
chglat
 
Dream work
Dream workDream work
Dream work
AB Design
 
AB Solut призентация
AB Solut призентацияAB Solut призентация
AB Solut призентация
AB Design
 
история компании для новых сотрудников
история компании для новых сотрудниковистория компании для новых сотрудников
история компании для новых сотрудников
AB Design
 
Mobile Trends in media (in Russian)
Mobile Trends in media (in Russian)Mobile Trends in media (in Russian)
Mobile Trends in media (in Russian)
Vsevolod Pulya
 
Secrets Royal Beach
Secrets Royal BeachSecrets Royal Beach
Secrets Royal Beach
chglat
 
Brenda Dominican Republic Option
Brenda Dominican Republic OptionBrenda Dominican Republic Option
Brenda Dominican Republic Option
chglat
 

Andere mochten auch (20)

Secrets Playa Mujeres
Secrets Playa MujeresSecrets Playa Mujeres
Secrets Playa Mujeres
 
Antigua Options
Antigua OptionsAntigua Options
Antigua Options
 
Tugas TIK 2
Tugas TIK 2Tugas TIK 2
Tugas TIK 2
 
Dream work
Dream workDream work
Dream work
 
Attract audience
Attract audienceAttract audience
Attract audience
 
Join The Solution Netwerkevent
Join The Solution NetwerkeventJoin The Solution Netwerkevent
Join The Solution Netwerkevent
 
AB Solut призентация
AB Solut призентацияAB Solut призентация
AB Solut призентация
 
Design sprint - Saas and Online Business Meetup
Design sprint  - Saas and Online Business MeetupDesign sprint  - Saas and Online Business Meetup
Design sprint - Saas and Online Business Meetup
 
история компании для новых сотрудников
история компании для новых сотрудниковистория компании для новых сотрудников
история компании для новых сотрудников
 
Dallas
Dallas Dallas
Dallas
 
Jamaica
JamaicaJamaica
Jamaica
 
Ray Cancun Options
Ray Cancun OptionsRay Cancun Options
Ray Cancun Options
 
Почему СМИ должны превращаться в IT-компании
Почему СМИ должны превращаться в IT-компанииПочему СМИ должны превращаться в IT-компании
Почему СМИ должны превращаться в IT-компании
 
Costa Rica
Costa RicaCosta Rica
Costa Rica
 
Mobile Trends in media (in Russian)
Mobile Trends in media (in Russian)Mobile Trends in media (in Russian)
Mobile Trends in media (in Russian)
 
Jessica Honeymoon Option
Jessica Honeymoon OptionJessica Honeymoon Option
Jessica Honeymoon Option
 
Secrets Royal Beach
Secrets Royal BeachSecrets Royal Beach
Secrets Royal Beach
 
Maddi
MaddiMaddi
Maddi
 
Livingwaterspresents
LivingwaterspresentsLivingwaterspresents
Livingwaterspresents
 
Brenda Dominican Republic Option
Brenda Dominican Republic OptionBrenda Dominican Republic Option
Brenda Dominican Republic Option
 

Ähnlich wie Linked in stream experimentation framework

Rethinking SharePoint WSS 2009
Rethinking SharePoint WSS 2009Rethinking SharePoint WSS 2009
Rethinking SharePoint WSS 2009
tobyspendiff
 
Transitioning to-lean-at-infochimps
Transitioning to-lean-at-infochimpsTransitioning to-lean-at-infochimps
Transitioning to-lean-at-infochimps
Ash Maurya
 

Ähnlich wie Linked in stream experimentation framework (20)

What’s the Impact of Open Source on the Future of Supply Chain? slide deck
What’s the Impact of Open Source on the Future of Supply Chain? slide deckWhat’s the Impact of Open Source on the Future of Supply Chain? slide deck
What’s the Impact of Open Source on the Future of Supply Chain? slide deck
 
Webinar: The Slippery Slope of Migrating to SharePoint Online or On-Premise
Webinar: The Slippery Slope of Migrating to SharePoint Online or On-PremiseWebinar: The Slippery Slope of Migrating to SharePoint Online or On-Premise
Webinar: The Slippery Slope of Migrating to SharePoint Online or On-Premise
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
 
SMB100: The Best of Both Worlds: Service Management Powered by Ivanti
SMB100: The Best of Both Worlds: Service Management Powered by IvantiSMB100: The Best of Both Worlds: Service Management Powered by Ivanti
SMB100: The Best of Both Worlds: Service Management Powered by Ivanti
 
Atlassian Executive Business Forum - LinkedIn HQ
Atlassian Executive Business Forum - LinkedIn HQAtlassian Executive Business Forum - LinkedIn HQ
Atlassian Executive Business Forum - LinkedIn HQ
 
Webinar: Slippery Slope of SharePoint Migrations
Webinar: Slippery Slope of SharePoint Migrations Webinar: Slippery Slope of SharePoint Migrations
Webinar: Slippery Slope of SharePoint Migrations
 
LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)
 
Rethinking SharePoint WSS 2009
Rethinking SharePoint WSS 2009Rethinking SharePoint WSS 2009
Rethinking SharePoint WSS 2009
 
Measuring Successful Sharepoint Installation
Measuring Successful Sharepoint InstallationMeasuring Successful Sharepoint Installation
Measuring Successful Sharepoint Installation
 
Big Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInBig Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedIn
 
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedInDataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
 
#SPFestSEA Introduction to #MicrosoftGraph
#SPFestSEA Introduction to #MicrosoftGraph#SPFestSEA Introduction to #MicrosoftGraph
#SPFestSEA Introduction to #MicrosoftGraph
 
SharePoint 2013 governance model
SharePoint 2013 governance modelSharePoint 2013 governance model
SharePoint 2013 governance model
 
Master IAM in the Cloud with SCIM v2.0
Master IAM in the Cloud with SCIM v2.0Master IAM in the Cloud with SCIM v2.0
Master IAM in the Cloud with SCIM v2.0
 
Splunk in Rakuten: Splunk as a Service for all
Splunk in Rakuten: Splunk as a Service for allSplunk in Rakuten: Splunk as a Service for all
Splunk in Rakuten: Splunk as a Service for all
 
Migrating Your Intranet to SharePoint Online
Migrating Your Intranet to SharePoint OnlineMigrating Your Intranet to SharePoint Online
Migrating Your Intranet to SharePoint Online
 
Clarisoft Software Development Process (Lunch & Learn Presentation)
Clarisoft Software Development Process (Lunch & Learn Presentation)Clarisoft Software Development Process (Lunch & Learn Presentation)
Clarisoft Software Development Process (Lunch & Learn Presentation)
 
Transitioning to-lean-at-infochimps
Transitioning to-lean-at-infochimpsTransitioning to-lean-at-infochimps
Transitioning to-lean-at-infochimps
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
SEF2013 - Create a Business Solution, Step by Step, with No Managed Code
SEF2013 - Create a Business Solution, Step by Step, with No Managed CodeSEF2013 - Create a Business Solution, Step by Step, with No Managed Code
SEF2013 - Create a Business Solution, Step by Step, with No Managed Code
 

KĂźrzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

KĂźrzlich hochgeladen (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Linked in stream experimentation framework

  • 1. LinkedIn’s STREAM EXPERIMENTATION FRAMEWORK Joseph Adler, Bee-Chung Chen, and Xin Fu O’Reilly Strata Conference February 12 2014 Š2014 LinkedIn Corporation. All Rights Reserved.
  • 2. Š2014 LinkedIn Corporation. All Rights Reserved.
  • 3. The LinkedIn Stream Like many social networks, the centerpiece of LinkedIn’s home page is a news stream. It contains • Updates about users’ networks • News stories and shares • Recommendations Š2014 LinkedIn Corporation. All Rights Reserved.
  • 4. The LinkedIn Stream We operate at a large scale. • 277+ million members • 75+ million monthly unique • users 5000+ employees Š2014 LinkedIn Corporation. All Rights Reserved.
  • 5. The LinkedIn Stream Today, we’ll tell you how we experiment with new content in the stream: • Creating new content • Maximizing relevance • Managing tests Š2014 LinkedIn Corporation. All Rights Reserved.
  • 6. History of the LinkedIn Stream Network updates were introduced in 2006 Back then, LinkedIn had • 5mm members • 875k monthly uniques • 70 employees Š2014 LinkedIn Corporation. All Rights Reserved.
  • 7. History of the LinkedIn Stream In practice this meant: •Slow changing content, small number of updates, weekly visit rate ‣ No ranking/optimization •Small number of active tests, limited analytics resources ‣ Primitive resources for A/B tests •Limited engineering resources ‣ Hacky solution for testing new content... Š2014 LinkedIn Corporation. All Rights Reserved.
  • 8. History of the LinkedIn Stream We experimented with new content using a system called the Analytics Prototype Engine, or APE. It was implemented as an ad slot on the home page. Big wins included: • People You May Know • Groups You Might Like • Jobs You Might Be Interested In Š2014 LinkedIn Corporation. All Rights Reserved.
  • 9. History of the LinkedIn Stream We added more content over the next couple of years: •Status updates •Twitter content •Group discussions •OpenSocial content (TripIt, GitHub, and more...) Š2014 LinkedIn Corporation. All Rights Reserved.
  • 10. History of the LinkedIn Stream By 2009, the stream looked very similar to the stream today. LinkedIn was much bigger than when we first added a news stream... • 55mm members • 36mm monthly uniques • 500 employees (end of year) Š2014 LinkedIn Corporation. All Rights Reserved.
  • 11. History of the LinkedIn Stream … but the infrastructure hadn’t changed much and we were experiencing growing pains: •No system for ranking and optimization: ‣ Users were overwhelmed with low relevance updates •No system for A/B testing ‣ Overlapping A/B tests, poor experiment design, difficult analysis •No system for rapid prototyping/testing ‣ APE was making the site slow and unstable, and was shut down Š2014 LinkedIn Corporation. All Rights Reserved.
  • 12. History of the Stream In the rest of this talk, we’ll tell you how we’ve addressed these challenges (and used a lot of data science to make this happen). Š2014 LinkedIn Corporation. All Rights Reserved.
  • 13. Content Insertion In the beginning (2006), experiments happened outside the stream through APE: • Easy data uploads • Management UI • Templates Š2014 LinkedIn Corporation. All Rights Reserved.
  • 14. Content Insertion Most new content experiments boil down to one thing: creating experimental data. We wanted the data experts to be able to create experiments easily by focusing on data, not on writing production code (and wrestling with build systems, deployment processes, etc). We created a system that lets data scientists push new content into the stream by writing scripts (in Pig, Hive, etc). Š2014 LinkedIn Corporation. All Rights Reserved.
  • 15. Content Insertion Project Gorilla brought the spirit of APE back to the home page, inside the stream. nhome USCP Federator Gorilla First Pass Ranker Architecture diagram → Gorilla Voldemort Store Gorilla Batch Gorilla jobs Š2014 LinkedIn Corporation. All Rights Reserved.
  • 16. Content Insertion What does this consist of? •An Apache Pig UDF for pushing content •A batch process that filters, consolidates, and ranks updates •A process that pushes data from Hadoop into Voldemort (our NoSQL key/value store) •An online system that fetches updates from the store and mixes them into the stream Š2014 LinkedIn Corporation. All Rights Reserved. nhome USCP Federator Gorilla First Pass Ranker Gorilla Voldemort Store Gorilla Batch Gorilla jobs
  • 17. Content Insertion Our implementation is very simple: •LinkedIn production systems use rest.li as an API (JSON data + schema) •We create data offline on Hadoop, put it in Voldemort, and surface it through an API This means that we can experiment easily using existing templates, tracking, etc; we just have to change the data that’s rendered. (We’re also experimenting with a similar real time system based on Apache Samza.) Š2014 LinkedIn Corporation. All Rights Reserved.
  • 18. Relevance Optimization Bring each individual user the most relevant items from different sources to optimize for a single or multiple measurable objectives Š2014 LinkedIn Corporation. All Rights Reserved.
  • 19. Relevance Optimization • Maximize users’ clicks on items in the stream • Rank items according their click rates • Probability that a user would click an item • Predict the click rate based on • User features: Profile, visit pattern, interests, … • Item features: Type, topics, keywords, … • User-item interaction features • Context: Device, time of day, previous page … Š2014 LinkedIn Corporation. All Rights Reserved.
  • 20. Relevance Optimization Large scale logistic regression •Input: A set of past users’ responses to items Response 1 0 … Feature Vector (Gender=M, JobTitle=CEO, ItemType=JobChange, ...) (Gender=F, JobTitle=Engineer, ItemType=Article, ...) … •Output: Model parameters •Challenge: Data too large to fit in a single machine •Solution: Train a model using MapReduce on Hadoop Š2014 LinkedIn Corporation. All Rights Reserved.
  • 21. Relevance Optimization Large scale Logistic Regression with ADMM Large Input Data Set Partition 1 Partition 2 Partition 3 … Partition K Logistic Regression Logistic Regression Logistic Regression … Logistic Regression Consensus Computation Š2014 LinkedIn Corporation. All Rights Reserved.
  • 22. Relevance Optimization Large scale Logistic Regression with ADMM Large Input Data Set Partition 1 Partition 2 Partition 3 … Partition K Logistic Regression Logistic Regression Logistic Regression … Logistic Regression Consensus Computation Š2014 LinkedIn Corporation. All Rights Reserved.
  • 23. Relevance Optimization Large scale Logistic Regression with ADMM Large Input Data Set Partition 1 Partition 2 Partition 3 … Partition K Logistic Regression Logistic Regression Logistic Regression … Logistic Regression Consensus Computation Š2014 LinkedIn Corporation. All Rights Reserved.
  • 24. Relevance Optimization Large scale Logistic Regression with ADMM Large Input Data Set Partition 1 Partition 2 Partition 3 … Partition K Logistic Regression Logistic Regression Logistic Regression … Logistic Regression Consensus Computation Š2014 LinkedIn Corporation. All Rights Reserved.
  • 25. Relevance Optimization Large scale Logistic Regression with ADMM Large Input Data Set Partition 1 Partition 2 Partition 3 … Partition K Logistic Regression Logistic Regression Logistic Regression … Logistic Regression Consensus Computation Š2014 LinkedIn Corporation. All Rights Reserved.
  • 26. Relevance Optimization Large scale Logistic Regression with ADMM Large Input Data Set Partition 1 Partition 2 Partition 3 … Partition K Logistic Regression Logistic Regression Logistic Regression … Logistic Regression Consensus Computation Š2014 LinkedIn Corporation. All Rights Reserved.
  • 27. Relevance Optimization Large scale Logistic Regression with ADMM Large Input Data Set Partition 1 Partition 2 Partition 3 … Partition K Logistic Regression Logistic Regression Logistic Regression … Logistic Regression Consensus Computation Š2014 LinkedIn Corporation. All Rights Reserved.
  • 28. Relevance Optimization Large scale Logistic Regression with ADMM Large Input Data Set Partition 1 Partition 2 Partition 3 … Partition K Logistic Regression Logistic Regression Logistic Regression … Logistic Regression Consensus Computation Š2014 LinkedIn Corporation. All Rights Reserved.
  • 29. Relevance Optimization Diversity Users get tired when seeing items of the same type many times in the stream. Example: Group discussions Drop in Click Rate 2 consecutive discussions 21% 3 consecutive discussions 48% Š2014 LinkedIn Corporation. All Rights Reserved.
  • 30. Relevance Optimization Multi-Objective Optimization • Different items in the stream generate different kinds of value • Click • Social actions: Like, share, comment, … • Revenue from sponsored items • One approach: Maximize revenue s.t. clicks and social actions are still within Îľ% of optimal • It requires extensive experiments! Š2014 LinkedIn Corporation. All Rights Reserved.
  • 31. Experimentation Framework Stream experiments are carried out on LinkedIn’s central experimentation platform: • A one stop solution for feature • • A/B testing, ramping, and advanced targeting needs Built-in power calculation to aid experiment design Automated reporting and analysis capabilities Mock­up of UI Š2014 LinkedIn Corporation. All Rights Reserved.
  • 32. Experimentation Framework • History: assign members into test groups based on modulo of Member IDs • A very high likelihood of range overlaps between tests • Just one experiment can negatively affect results of other tests executed on the same page • Now: deterministic pseudo-random algorithm for treatment assignment computation • Improved logging of treatment assignment • Automated scorecards • Record of historical experiments Š2014 LinkedIn Corporation. All Rights Reserved.
  • 33. Experimentation Framework • History: focus on productspecific metrics • Stream relevance change • ⇒ CTR Profile redesign ⇒ # of profile views • Now: standardized, tiered metric system • Sitewide Tier 1 metrics • Product-specific Tier 2 / Tier 3 • metrics Comprehensive understanding of feature impact Š2014 LinkedIn Corporation. All Rights Reserved. Mock­up of UI
  • 34. Conclusions LinkedIn has always experimented with site content. As we’ve grown, we’ve had to rethink how we experiment. Key lessons: •Managing experimentation at scale is hard •Scale means users, content volume, and employees •Invest in platforms if it saves time, money, labor. Š2014 LinkedIn Corporation. All Rights Reserved.
  • 35. Š2014 LinkedIn Corporation. All Rights Reserved.