1. LinkedInâs STREAM EXPERIMENTATION FRAMEWORK
Joseph Adler, Bee-Chung Chen, and Xin Fu
OâReilly Strata Conference
February 12 2014
Š2014 LinkedIn Corporation. All Rights Reserved.
3. The LinkedIn Stream
Like many social networks, the
centerpiece of LinkedInâs home
page is a news stream.
It contains
⢠Updates about usersâ networks
⢠News stories and shares
⢠Recommendations
Š2014 LinkedIn Corporation. All Rights Reserved.
4. The LinkedIn Stream
We operate at a large scale.
⢠277+ million members
⢠75+ million monthly unique
â˘
users
5000+ employees
Š2014 LinkedIn Corporation. All Rights Reserved.
5. The LinkedIn Stream
Today, weâll tell you how we
experiment with new content in
the stream:
⢠Creating new content
⢠Maximizing relevance
⢠Managing tests
Š2014 LinkedIn Corporation. All Rights Reserved.
6. History of the LinkedIn Stream
Network updates were
introduced in 2006
Back then, LinkedIn had
⢠5mm members
⢠875k monthly uniques
⢠70 employees
Š2014 LinkedIn Corporation. All Rights Reserved.
7. History of the LinkedIn Stream
In practice this meant:
â˘Slow changing content, small
number of updates, weekly visit
rate
⣠No ranking/optimization
â˘Small number of active tests,
limited analytics resources
⣠Primitive resources for A/B tests
â˘Limited engineering resources
⣠Hacky solution for testing new
content...
Š2014 LinkedIn Corporation. All Rights Reserved.
8. History of the LinkedIn Stream
We experimented with new
content using a system called
the Analytics Prototype Engine,
or APE. It was implemented as
an ad slot on the home page.
Big wins included:
⢠People You May Know
⢠Groups You Might Like
⢠Jobs You Might Be Interested In
Š2014 LinkedIn Corporation. All Rights Reserved.
9. History of the LinkedIn Stream
We added more content over
the next couple of years:
â˘Status updates
â˘Twitter content
â˘Group discussions
â˘OpenSocial content (TripIt,
GitHub, and more...)
Š2014 LinkedIn Corporation. All Rights Reserved.
10. History of the LinkedIn Stream
By 2009, the stream looked
very similar to the stream
today.
LinkedIn was much bigger than
when we first added a news
stream...
⢠55mm members
⢠36mm monthly uniques
⢠500 employees (end of year)
Š2014 LinkedIn Corporation. All Rights Reserved.
11. History of the LinkedIn Stream
⌠but the infrastructure hadnât
changed much and we were
experiencing growing
pains:
â˘No system for ranking and
optimization:
⣠Users were overwhelmed with low
relevance updates
â˘No system for A/B testing
⣠Overlapping A/B tests, poor
experiment design, difficult analysis
â˘No system for rapid
prototyping/testing
⣠APE was making the site slow and
unstable, and was shut down
Š2014 LinkedIn Corporation. All Rights Reserved.
12. History of the Stream
In the rest of this talk, weâll tell
you how weâve addressed
these challenges (and used a
lot of data science to make this
happen).
Š2014 LinkedIn Corporation. All Rights Reserved.
13. Content Insertion
In the beginning (2006),
experiments happened outside
the stream through APE:
⢠Easy data uploads
⢠Management UI
⢠Templates
Š2014 LinkedIn Corporation. All Rights Reserved.
14. Content Insertion
Most new content experiments
boil down to one thing: creating
experimental data.
We wanted the data experts to
be able to create experiments
easily by focusing on data, not
on writing production code (and
wrestling with build systems,
deployment processes, etc).
We created a system that lets
data scientists push new
content into the stream by
writing scripts (in Pig, Hive, etc).
Š2014 LinkedIn Corporation. All Rights Reserved.
15. Content Insertion
Project Gorilla brought the spirit
of APE back to the home page,
inside the stream.
nhome
USCP
Federator
Gorilla First Pass
Ranker
Architecture diagram â
Gorilla Voldemort Store
Gorilla Batch
Gorilla jobs
Š2014 LinkedIn Corporation. All Rights Reserved.
16. Content Insertion
What does this consist of?
â˘An Apache Pig UDF for
pushing content
â˘A batch process that filters,
consolidates, and ranks
updates
â˘A process that pushes data
from Hadoop into Voldemort
(our NoSQL key/value store)
â˘An online system that fetches
updates from the store and
mixes them into the stream
Š2014 LinkedIn Corporation. All Rights Reserved.
nhome
USCP
Federator
Gorilla First Pass
Ranker
Gorilla Voldemort Store
Gorilla Batch
Gorilla jobs
17. Content Insertion
Our implementation is very simple:
â˘LinkedIn production systems use
rest.li as an API (JSON data +
schema)
â˘We create data offline on Hadoop,
put it in Voldemort, and surface it
through an API
This means that we can experiment
easily using existing templates,
tracking, etc; we just have to change
the data thatâs rendered.
(Weâre also experimenting with a
similar real time system based on
Apache Samza.)
Š2014 LinkedIn Corporation. All Rights Reserved.
18. Relevance Optimization
Bring each individual user the most relevant items from different
sources to optimize for a single or multiple measurable
objectives
Š2014 LinkedIn Corporation. All Rights Reserved.
19. Relevance Optimization
⢠Maximize usersâ clicks on items in the stream
⢠Rank items according their click rates
⢠Probability that a user would click an item
⢠Predict the click rate based on
⢠User features: Profile, visit pattern, interests, âŚ
⢠Item features: Type, topics, keywords, âŚ
⢠User-item interaction features
⢠Context: Device, time of day, previous page âŚ
Š2014 LinkedIn Corporation. All Rights Reserved.
20. Relevance Optimization
Large scale logistic regression
â˘Input: A set of past usersâ responses to items
Response
1
0
âŚ
Feature Vector
(Gender=M, JobTitle=CEO, ItemType=JobChange, ...)
(Gender=F, JobTitle=Engineer, ItemType=Article, ...)
âŚ
â˘Output: Model parameters
â˘Challenge: Data too large to fit in a single machine
â˘Solution: Train a model using MapReduce on Hadoop
Š2014 LinkedIn Corporation. All Rights Reserved.
21. Relevance Optimization
Large scale Logistic Regression with ADMM
Large Input Data Set
Partition 1
Partition 2
Partition 3
âŚ
Partition K
Logistic
Regression
Logistic
Regression
Logistic
Regression
âŚ
Logistic
Regression
Consensus
Computation
Š2014 LinkedIn Corporation. All Rights Reserved.
22. Relevance Optimization
Large scale Logistic Regression with ADMM
Large Input Data Set
Partition 1
Partition 2
Partition 3
âŚ
Partition K
Logistic
Regression
Logistic
Regression
Logistic
Regression
âŚ
Logistic
Regression
Consensus
Computation
Š2014 LinkedIn Corporation. All Rights Reserved.
23. Relevance Optimization
Large scale Logistic Regression with ADMM
Large Input Data Set
Partition 1
Partition 2
Partition 3
âŚ
Partition K
Logistic
Regression
Logistic
Regression
Logistic
Regression
âŚ
Logistic
Regression
Consensus
Computation
Š2014 LinkedIn Corporation. All Rights Reserved.
24. Relevance Optimization
Large scale Logistic Regression with ADMM
Large Input Data Set
Partition 1
Partition 2
Partition 3
âŚ
Partition K
Logistic
Regression
Logistic
Regression
Logistic
Regression
âŚ
Logistic
Regression
Consensus
Computation
Š2014 LinkedIn Corporation. All Rights Reserved.
25. Relevance Optimization
Large scale Logistic Regression with ADMM
Large Input Data Set
Partition 1
Partition 2
Partition 3
âŚ
Partition K
Logistic
Regression
Logistic
Regression
Logistic
Regression
âŚ
Logistic
Regression
Consensus
Computation
Š2014 LinkedIn Corporation. All Rights Reserved.
26. Relevance Optimization
Large scale Logistic Regression with ADMM
Large Input Data Set
Partition 1
Partition 2
Partition 3
âŚ
Partition K
Logistic
Regression
Logistic
Regression
Logistic
Regression
âŚ
Logistic
Regression
Consensus
Computation
Š2014 LinkedIn Corporation. All Rights Reserved.
27. Relevance Optimization
Large scale Logistic Regression with ADMM
Large Input Data Set
Partition 1
Partition 2
Partition 3
âŚ
Partition K
Logistic
Regression
Logistic
Regression
Logistic
Regression
âŚ
Logistic
Regression
Consensus
Computation
Š2014 LinkedIn Corporation. All Rights Reserved.
28. Relevance Optimization
Large scale Logistic Regression with ADMM
Large Input Data Set
Partition 1
Partition 2
Partition 3
âŚ
Partition K
Logistic
Regression
Logistic
Regression
Logistic
Regression
âŚ
Logistic
Regression
Consensus
Computation
Š2014 LinkedIn Corporation. All Rights Reserved.
29. Relevance Optimization
Diversity
Users get tired when seeing items of the same type many times in the
stream.
Example: Group discussions
Drop in Click Rate
2 consecutive
discussions
21%
3 consecutive
discussions
48%
Š2014 LinkedIn Corporation. All Rights Reserved.
30. Relevance Optimization
Multi-Objective Optimization
⢠Different items in the stream generate different kinds of value
⢠Click
⢠Social actions: Like, share, comment, âŚ
⢠Revenue from sponsored items
⢠One approach:
Maximize revenue s.t. clicks and social actions are
still within Îľ% of optimal
⢠It requires extensive experiments!
Š2014 LinkedIn Corporation. All Rights Reserved.
31. Experimentation Framework
Stream experiments are carried
out on LinkedInâs central
experimentation platform:
⢠A one stop solution for feature
â˘
â˘
A/B testing, ramping, and
advanced targeting needs
Built-in power calculation to aid
experiment design
Automated reporting and
analysis capabilities
MockÂup of UI
Š2014 LinkedIn Corporation. All Rights Reserved.
32. Experimentation Framework
⢠History: assign members into test groups based on modulo of
Member IDs
⢠A very high likelihood of range overlaps between tests
⢠Just one experiment can negatively affect results of other tests
executed on the same page
⢠Now: deterministic pseudo-random algorithm for treatment
assignment computation
⢠Improved logging of treatment assignment
⢠Automated scorecards
⢠Record of historical experiments
Š2014 LinkedIn Corporation. All Rights Reserved.
33. Experimentation Framework
⢠History: focus on productspecific metrics
⢠Stream relevance change
â˘
â CTR
Profile redesign
â # of profile views
⢠Now: standardized, tiered
metric system
⢠Sitewide Tier 1 metrics
⢠Product-specific Tier 2 / Tier 3
â˘
metrics
Comprehensive understanding
of feature impact
Š2014 LinkedIn Corporation. All Rights Reserved.
MockÂup of UI
34. Conclusions
LinkedIn has always experimented with site content. As weâve
grown, weâve had to rethink how we experiment.
Key lessons:
â˘Managing experimentation at scale is hard
â˘Scale means users, content volume, and employees
â˘Invest in platforms if it saves time, money, labor.
Š2014 LinkedIn Corporation. All Rights Reserved.