Generating a Billion Personal News Feeds: With exponential growth of information and improved access, there is more and more data and not enough time to digest it. Facebook’s News Feed attempts to solve this by offering a way to show the most relevant content to each individual person. We create billions of personalized experiences by ranking stories for each person. Over the years, News Feed ranking has evolved to use large-scale machine learning techniques, driving to maximize the value created for each individual. Ranking and organizing the content in a unique way for a billion of users poses unique challenges. Each time a person visits their News Feed, we need to find the best piece of content out of all the available stories for them and put it at the top of Feed, where people are most likely to see it. To accomplish this, we model each person, attempting to figure out which friends, pages, and topics they care most about, and pick the stories and ordering they will find most interesting. In addition to the machine learning problems we work on for directing those choices, another primary area of research is understanding the value we are creating for people. These joint problems of selection and evaluation are essential for delivering continued value in personalized Feeds, and they would not be possible at the huge scale of content and users that Facebook operates at without powerful machine learning and analytics.
4. What is Facebook News Feed?
Way to connect with stories that matter to you most
Connect, Inform and Entertain
News Feed is the constantly updating list of stories in the middle of your home
page or mobile app. News Feed includes status updates, photos, videos,
links, app activity and likes from people, pages and groups that you follow
on Facebook.
Feed Ranking is just a tool, you are in control of what you see in your News
Feed and can adjust your settings.
5. ▪ Over 1,000,000,000 daily users
▪ Hundreds of billions stories seen
per day
▪ Trillions of stories ranked per day
▪ Publish -> In Feed < 1s
▪ Retrieval + rank time < 200ms
Basic Stats
Alex Chapel
Francis McDonough
Victoria Beckchen
6. Deliver everything that matters to people
and nothing that doesn't
▪ Don’t miss any important stories
▪ New stories should show up within seconds
▪ Put the best content at the top
▪ People notice/interact with
content at the top first.
▪ Better content at top means
better experience, less good
content missed.
▪ It’s not just winner takes all
ordering of content matters
7. How News Feed Works?
Story 1
PHOTO
Story 2
FRIEND
POST
Story 3
VIDEO
Story 4
LINK
SHARE
Story 5
PHOTO
Story 6
LINK
SHARE
Story 7
FRIEND
POST
▪ Goal is to put best content on top
▪ Solution – every time you visit we
rank all new content and put on top
▪ Anything you haven’t seen
is new to you
▪ New friend shares same link you’ve seen
▪ Unseen old stories
▪ Seen stories with new
comments
▪ For frequent users ranking
is almost chronological
▪ Diversity of content matters
9:00 AM
1.9
1.1
0.8
0.6
0.4
0.3
0.4
Story 4
LINK
SHARE
Story 5
PHOTO
Story 6
LINK
SHARE
Story 7
FRIEND
POST
0.9
0.6
0.3
0.2
Story 8
PAGE POST
Story 3
F_comment 1.0
1.3
10:00 AM
Story 5
PHOTO
0.1
0.9
Story 9
VIDEO
10:10 AM
Story 10
FRIEND
POST
1.8
Story 11
LINK
SHARE
1.5
Story 12
FRIEND
POST
1.1
Story 13
VIDEO 0.2
12:00 PM
9. Scoring based Ranking
▪ Given a potential feed story, how
good is it?
▪ Express as probability of click, like,
comment, etc.
▪ Assign different weights to different
events, according to significance
▪ Example: close coworker feels
earthquake
▪ Highest chance of click
▪ Decent chance of like/comment
Event Probability Value*
Click 5.1% 1
Like 2.9% 5
Comment 0.55% 20
Share 0.00005% 40
Friend 0.00003% 50
Hide 0.00002% -100
Total 0.306
*Example, not real values
10. ▪ Why this structure:
▪ Uses Machine Learning to predict true, measurable behavior
▪ Models train on their own data
▪ Allows fast iteration
▪ Allows distributed development
▪ Allows for easy ranking of heterogeneous content
▪ Allows for value to be adjusted independently
▪ WORKS WELL IN PRACTICE
Learnings
11. Role of Network Structure
▪ News Feed delivers content from
friends along social network
▪ Understanding the network is
key to defining quality
▪ Who are your close friends?
▪ Whose photos do you always like?
▪ Whose links are the most interesting
to you?
Click Like Comment Weighted
Sum
Joe 0.012 0.0042 0.00082 0.0494
Susan 0.023 0.02 0.0082 0.287
Li 0.012 0.0037 0.001 0.0505
0.287
0.0494 0.0505
Joe
Susan
Li
12. Feature Selection (BDTs)
▪ Start with over >100K potential (dense) features and all historical activity
▪ First, prune these to top ~2K
▪ Training time is proportional to number of examples * number of features
▪ Under-sample negative examples (impressions, no action) to help with # of examples
▪ Start with 100K features, max rows, keep most important 10K, train 10x rows
▪ Do this for each feed event type: train many forests
▪ Historical counts and propensity are some of the strongest features
13. Model Training (Logistic regression)
▪ We need to react quickly and incorporate new content - use a simple model
▪ Logistic regression is simple, fast and easy to distribute
▪ Treat the trees as feature transforms, each one turning the input features into
a set of categorical features, one per tree.
▪ Use logistic regression for online learning to quickly re-learn leaf weights
F3
-0.1 0.3
0.2
F1
-0.5
0.2 -.05
F2
F3
Throw out boosted tree weights, use only transforms
Input: (F1, F2, F3)
Output (T1, T2) where T1 {Leaves of tree 1}
14. Stacking: Combined Tree + LR Model
▪ Main Advantage: Tree application is computationally expensive and slow
▪ Reuse click tree to predict likes, comments, etc.
▪ Only slightly more expensive than independent models; better prediction
performance – transfer learnings
~Thousands of
Raw features
Thousands of Tree Transforms
Sparse Boolean features + non-tree raw features
Like Comment Share Friend Outbound
Click
Follow HideClick
Click Like Comment Share Friend Outbound click Follow Hide
15. Other models + sparse features
▪ Train Neural nets to predict events
▪ Discard final layer, use final layer outputs as features
▪ Add sparse features such as text or content ID
Raw
Features
Forest
Raw
Features
Neural Network
Sparse features
Logistic Regression
Like Comment Share Hide Outbound
Click
Fan | Follow FriendClick
16. ▪ Data freshness matters – simple models allows for online learning and
twitch response
▪ Feature generation is part of the modeling process
▪ Stacking
▪ supports plugging-in new algorithms and features easily
▪ works very well in practice
▪ Use skewed sampling to manage high data volumes
▪ Historical counters as features provides highly predictive features, easy
to update online
Learnings
18. Measurement
18
▪ Selecting the right objective function
▪ Defining metrics
▪ Implicit: Engagement, ex. Click
▪ Longitudinal metrics, ex. Abandonment
▪ Explicit: Quality, ex. Survey Score
19. Why are implicit metrics limited?
▪ Some important stories don’t
get that much engagement
▪ Eg. Sad stories and world news
▪ Some lower quality stories get lots of
engagement, social expectations
▪ Relative importance: Is comment
always more important then a like?
▪ Goal is to align ranking with
personalized relevance
▪ Solution -> Ask users directly:
Collect explicit signals from survey
data
20. Pairwise Comparison
Survey
20
▪ Pair wise comparison
between two stories
from same feed.
▪ Pro: Real user preference
on two stories from same
query.
▪ Con: Don’t really know if
they are just comparing
two good stories, or two
poor stories or one of
each.
21. In or Out survey
21
▪ Single Story survey, “do
you want to see it or
not?”
▪ Pro: Fun, simple, absolute
▪ Con: People do not really
know the consequence of
the action, limited
resolution does not help
with ranking
22. Rating Survey
22
▪ 5pt rating scale, “how
much do you want to
see the story in your
feed?”.
▪ Pro: Absolute metrics,
good participant rate.
▪ Con: Out of context, raters
might not be truthful,
harder to do for users.
23. In-Context Survey
23
▪ 5pt rating scale, “how
much do you want to
see the story in your
feed?”.
▪ Pro: In context
▪ Con: Can distract, lead to
abandonment, takes up
valuable real-estate from
the feed
24. Absolute vs. relative ratings
▪ Relative
▪ Easier
▪ Infinite precision
▪ More self-consistent
▪ More calibrated cross people
▪ Absolute
▪ Gives amount of delta
▪ No intransitivity issues
▪ Clear definition of best
▪ Which one do we choose?
▪ Solution: Do both
25. Start to better understand what matters for each
individual
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Definitely do
NOT want to
see this in Feed
Do not want to
see in Feed
Don't mind
seeing in Feed
but wouldn't
mind missing
either
Want to see in
Feed
Definitely want
to see in feed
Someone you really don't care about Someone you don't care about
Someone you somewhat care about Someone you care about
Someone you really care about
26. Improving Feed based on rating results
▪ Not enough data to do large
scale, personalized machine
learning
▪ Look for other insights
▪ Eg. How much should we value a
comment vs. a like
▪ Ranking by
α p(like) + β p(comment)
▪ Optimal α, β depends on content
type, content creator, context
▪ Passive consumption (dwell time)
prediction improves relevance
▪ Response means different things in
different contexts and for different
people
▪ Eg: ‘like’ is harder to come by on public
content than friend content, and hence tends
to indicate higher quality there.
P(like)P(outboundclick)
Rating vs. p(like) and p(outbound click)
Avg.Rating
27. Learnings
▪ Both implicit and explicit signals are important and can be used
together
▪ Multiple survey types can be used simultaneously to get different
advantages
▪ Metrics are important - the right metrics are needed to define an
objective function for our models and to measure model performance
▪ ML and Metrics are tools that let us CONNECT, INFORM and
ENTERTAIN Facebook users.
28. We far from done…
Come join us to help solve the next big
challenge
Hinweis der Redaktion
Trade offs
What is Facebook News Feed?
Show you content you care about most
Constantly updating list of stories: status updates, videos, links, comments and likes from people, pages and groups that you follow.
Influenced by your connections and activities -> goal is to help connect, inform and entertain our users.
Tools for control for people – following & friending, Unfollowing, Hiding, See first
The stories that show in your News Feed are influenced by your connections and activity on Facebook. This helps you to see more stories that interest you from friends you interact with the most.
What are some basic stats?
Nothing gets scored until a user visits the feed, then we have less then 200ms to retrieve, score and rank all content
What is the value add?
Too much content and not enough time
The feed must be completely personalized but still highly engaging to Facebook’s users so they’ll keep coming back and seeing more
Feed helps organize all the content.
Better content on top
Ordering matters
Engagement is a proxy of what you might want to see:
Signals used: who posted, what kind of content, how many interactions, when the post was created
How News Feed Works?
Rank all new content and put on top
9am: 7 new stories, ranked specifically for you using a relevance score -> See 3, interact with 1
10am: what you’ve seen gets pushed down, 8 brand new, 3 has a new friend comment, 4-7 were unseen, scoring changes, now story 7 – friend post ranked higher, all scores adjusted based on context, see stories 8,3,7,4,6 and comment on 4
10:10am: (not much new content) 9 brand new, and 5 unseen (new scores)
12:00pm: 4 new stories, friend and link stories ranked higher, friend stories not adjacent
Since stories are ordered based on a score. How do we compute the relevance score?
Sum(Observed behaviors*Significance of each behavior)
Why structure the problem as ranking by sum of weighted, predicted behavior?
Well defined objective function, ML used to predict measurable behavior
Models are trained on their own data
Fast iteration
Distributed development
Heterogeneous content -> calibration is simplified
Value can be adjusted independently
-> Works well in practice
What is the role of the network structure?
How close you are to a person is an increasingly important metric, as judged by how often you like their posts, write on their Timeline, click through their photos.
Reflects explicit preferences made by the user (links are connections formed by friending a person, joining a group or following a page)
Links get strengthened through ongoing implicit actions: click, like, comment
Joe, Susan and Li are all Wei’s friends. Susan interacts with Wei’s posts a lot more so it’s more likely to see his posts ranked higher.
Social network signals and historical engagement are some of the most predictive features
What data and features do we used to build a model?
How do we select the right features?
Start is > 100K features, prune to top 2K for efficiency
Trillions of examples (train on the right data, subsample negative examples)
What model do we use for prediction?
Simple, fast to predict, fast to update, easy to distribute and debug -> Logistic regression
Why this structure?
Trees and expensive and slow but help select the best features
Each model is train on own data but learning can be transferred by re-using trees
Works well with very large amount of data and large number of features
Why BDTs and Logistic regression?
Empirically we have found these two to work well, but this learning structure supports easy way to plug in new algorithms
We are now also using Neural Nets and untransformed sparse features
What have we learned so far?
Online learning to have good predictions for new content < 1s
Using BDTs as a tool for feature selection
Stacking: able to use both simple and complex models on large data with high performance, and modular architecture allows for trying new things and quick iteration
Skewed sampling helps manage large data volumes
Historical counters (over different time-ranges) are highly predictive and easy to update
How do we know if we are doing a good job?
How do we define success?
Why measurement matters?
Selecting the right objective function is hard and important
We look at what people do and what they say:
Two types of metrics: implicit and explicit
Online survey
Experimentation
We invest in controls for people
Why are implicit metrics limited?
They can be inconsistent, they vary by person and by content, how can we normalize?
How do we collect explicit signals?
We ask a representative sample of people through direct surveys.
Type of surveys?
RELATIVE comparison
ABSOLUTE comparison
Multi-scale absolute comparison
Most common
Multi-scale, absolute comparison in context
Which survey type is better?
Why is survey data useful?
HQ pages vs. meme pages… Are a lot of these ad-like?
What data do we have here that is useful?
How can this data be used to improve ranking?
Ranking by α p(like) + β p(comment)
What have we learned so far?
Both implicit and explicit signals are important and can be used together
ML and Metrics are tools that let us CONECT, INFORM and ENTERNTAIN Facebook users.