SlideShare ist ein Scribd-Unternehmen logo
1 von 63
Downloaden Sie, um offline zu lesen
People You May Know
Fast Recommendations Over Massive Data
Jeff Weiner
Chief Executive Officer
Sumit Rangwala
Artificial Intelligence
Felix GV
Data Infrastructure
My Professional Network
Professional network in real world
Sumit Felix
Amol
GaojiePeter
My Professional Network
Professional network in real world Professional network on LinkedIn
Sumit Felix
Peter
Amol
Gaojie
Sumit Felix
Peter
Amol
Gaojie
My Professional Network
Professional network in real world Professional network on LinkedIn
Sumit Felix
Peter
Amol
Gaojie
Sumit Felix
Peter
Amol
Gaojie
Predicting
real world
connections
Helps grow member’s professional network
Recommends people that one might know
People You May Know
Enables many other LinkedIn services
Talk Outline
People You May Know
PYMK: Generating Recommendations
PYMK Architecture Evolution
PYMK Rebirth
Insights and Road Ahead
PYMK: Generating Recommendations
PYMK: Prediction Strategy
Data Mining
• LinkedIn’s Economic Graph
• Member’s activities and profile
LinkedIn Economic Graph
Sumit Felix
Peter
Amol
Gaojie
PYMK: Prediction Strategy
Data Mining
• LinkedIn’s Economic Graph
• Member’s activities and profile
LinkedIn Economic Graph
Felix
Peter
Amol
Gaojie
Microsoft
USC
Sumit
Recommendation System
Candidate Generation
Feature Generation
Scoring
PYMK: Candidate Generation
Using commonalities in
economic graph
• Friends of my friends
(triangle closing)
LinkedIn Economic Graph
Amol
Peter Gaojie
Sumit Felix
PYMK: Candidate Generation
Using commonalities in
economic graph
• Friends of my friends
(triangle closing)
• Coworkers
• Personalized Page Rank
LinkedIn Economic Graph
Amol
Peter Gaojie
Felix
Microsoft
Sumit
PYMK: Feature Generation
Using economic graph
characteristics
• Number of common friends
Using member
activities/profile
• Common work location
LinkedIn Economic Graph
Amol
Peter Gaojie
Felix
Microsoft
Sumit
PYMK: Recommendation System
Candidate
Generation
Feature
Generation
PYMK: Recommendation System
Candidate
Generation
Feature
Generation
Sumit might know Amol’s friend Felix
Sumit and Felix has one common friend
Sumit and Felix both work in Bay Area
PYMK: Recommendation System
Candidate
Generation
Feature
Generation
Sumit might know Amol’s friend Felix
Sumit and Felix has one common friend
Sumit and Felix both work in Bay Area
Graph processing
Data processing
PYMK Architecture Evolution
Pre-compute recommendations
A P P R O A C H
PYMK: The Beginning
Problem Space
• 10s of millions of members
Architecture
• Pre-compute using SQL
Shortcomings
• Staleness of 6 weeks to 6 months
• Extraneous computation
Oracle
PYMK: The Beginning
Problem Space
• 10s of millions of members
Architecture
• Pre-compute using SQL
Shortcomings
• Staleness of 6 weeks to 6 months
• Extraneous computation
Oracle PYMK
Service
Online service request
PYMK: Keeping up with Growth
Problem space
• Low 100s of millions of members
Architecture
• Pre-compute using Hadoop MR
• Push to a key-value store
Shortcomings
• Staleness of 2-3 days
• Extraneous computation
Voldemort
PYMK
Service
PYMK: Pushing the Technology Limits
Problem Space
• Mid 100s of millions of members
Architecture
• Pre-compute using Spark1
• Push to a key-value store
Shortcomings
• Staleness of 1-2 days
• Excessive computation cost
Venice
[1] Managing Exploding Big Data
PYMK
Service
PYMK: Exploring Data Freshness
Problem Space
• Use up to date member data
Architecture
• Hybrid offline-online approach
Shortcomings
• Split-brain design
• Didn’t scale
Venice
Realtime signals
PYMK
Service
Key Realization
Freshness
matters
Pre-computation
is costly
PYMK Rebirth
Compute recommendations on demand
A P P R O A C H
PYMK: Recommendation System
Candidate
Generation
Feature
Generation
Sumit might know Amol’s friend Felix
Sumit and Felix has one common friend
Sumit and Felix both work in Bay Area
Online Graph Traversal
Fast Data Access
An online graph processing system
G A I A
A generic service for executing complex graph algorithms
with low latency on massive graphs
Gaia: Overview
Gaia
Gaia: Overview
Gaia
Any kind of graph
A snapshot
on HDFS
Gaia: Overview
Gaia
Any kind of graph
Updates to graph
A snapshot
on HDFS
Via Kafka, etc.
Gaia: Overview
Gaia
Any kind of graph
Updates to graph
Graph algorithm code
A snapshot
on HDFS
Via Kafka, etc.
Using
compute
framework
e.g., triangle closing,
random graph walks
Design Choice
Gaia
• Single server architecture with replicas
• Full in-memory graph for fast execution
Gaia: Architecture
Server Server Server
Gaia
Gaia: Architecture
Server Server Server
Algo Algo Algo
Gaia
Gaia: Architecture
Graph snapshot on
disk
Server Server Server
Algo Algo Algo
Gaia
Gaia: Architecture
Graph snapshot on
disk Graph updates via
Kafka, etc.
Server Server Server
Algo Algo Algo
Gaia
PYMK
Gaia
• Candidate generation using triangle
closing and common connection count
• 10s of milliseconds (p90)
A key-value store with scoring capability
At a glance
Venice
• Tailored for serving ML jobs’ output
• High throughput ingestion
• Fast lookups
• Self-service onboarding
Supported Ingestion Modes in Venice
Batch
Hadoop Push Job
Supported Ingestion Modes in Venice
Batch Incremental
Hadoop Push Job
Samza Streaming Job
Supported Ingestion Modes in Venice
Batch Incremental
Hadoop Push Job Push Job
Samza Reprocessing Job
(Kappa Architecture)
Streaming Job
Supported Ingestion Modes in Venice
Batch Incremental
Hadoop Push Job Push Job
Samza Reprocessing Job
(Kappa Architecture)
Streaming Job
Hybrid Any Batch Job + Streaming Job
(Lambda Architecture)
Online Feature Retrieval
F i r s t P Y M K U s e C a s e
Requirements
Online Feature Retrieval
• Millions of lookups / sec at peak
• ~1000 keys / query
• Thousands of queries / sec
• ~80B / value
Before / After
Online Feature Retrieval
• Base latency
• 4 seconds (p99)
• Changed storage engine to RocksDB
• 60 ms (p99)
Embeddings
S e c o n d P Y M K U s e C a s e
Requirements
Embeddings
• Millions of lookups / sec at peak
• ~1000 keys / query
• Thousands of queries / sec
• ~800B / value
• 10x the previous size
Before / After
Embeddings
• Base latency
• 275 ms (p99)
• Server-side computation
• 60 ms (p99)
At a glance
Server-side Computation
• Simple vector operations
• Smaller response size
• Big input (vector)
• Small output (scalar)
• Declarative API
• No arbitrary code
More tuning
Fast Avro
• Online feature retrieval
• 60 to 40 ms (p99)
• Embeddings w/ computation
• 60 to 35 ms (p99)
• Now open-source!
• github.com/linkedin/avro-util
PYMK Today
P u t t i n g i t a l l t o g e t h e r
PYMK: Recommendation System
Candidate
Generation
Sumit might know Amol’s friend, Felix
Sumit and Felix have one common friend
Sumit and Felix both work in Bay Area
PYMK Service
Feature
Generation
Scoring Sumit and Felix likely know each other
Venice
Gaia
PYMK: Today
Venice
PYMK
Service
Gaia
1. Ingest in
Gaia & Venice
2. Candidate gen
& graph features
from Gaia
4. Final scoring
by PYMK Service
3. Member features
& partial scoring
from Venice
Staleness
• Seconds to minutes
Key Learnings
• Pre-computation is viable for many products
• Scaling RT computation requires moving compute close to data
• Infra aware Machine Learning
Looking Ahead
• Further scale Gaia & Venice
• More candidates
• More features
• Larger features
• More complex computations
ML-Aware Infra
• Continue democratizing access
• Easier onboarding to Venice & Gaia
• Multi-tenancy for Venice Compute
• Integration with other frameworksProductive ML
Contributors
Amol Ghoting Gaojie Liu Kevinjeet Gill Peter Chng Min Huang
Yao Chen Hema Raghavan Many othersAshish Singhai
Thank You
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data

Weitere ähnliche Inhalte

Was ist angesagt?

Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
Liang Xiang
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Xavier Amatriain
 

Was ist angesagt? (20)

Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 
Cohort Analysis at Scale
Cohort Analysis at ScaleCohort Analysis at Scale
Cohort Analysis at Scale
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
 
Foundation Models in Recommender Systems
Foundation Models in Recommender SystemsFoundation Models in Recommender Systems
Foundation Models in Recommender Systems
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisited
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experience
 
Netflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelNetflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time Travel
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
 
Supporting decisions with ML
Supporting decisions with MLSupporting decisions with ML
Supporting decisions with ML
 

Ähnlich wie [QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data

Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
confluent
 

Ähnlich wie [QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data (20)

Ohio Devfest - Visual Analysis with GCP
Ohio Devfest - Visual Analysis with GCPOhio Devfest - Visual Analysis with GCP
Ohio Devfest - Visual Analysis with GCP
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to production
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
 
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
Kontent.ai DevMeetup #1 - Evoluce prvního veřejného API v hotelovém světě
Kontent.ai DevMeetup #1 - Evoluce prvního veřejného API v hotelovém světěKontent.ai DevMeetup #1 - Evoluce prvního veřejného API v hotelovém světě
Kontent.ai DevMeetup #1 - Evoluce prvního veřejného API v hotelovém světě
 
Genji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesGenji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelines
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
Engineering Netflix Global Operations in the Cloud
Engineering Netflix Global Operations in the CloudEngineering Netflix Global Operations in the Cloud
Engineering Netflix Global Operations in the Cloud
 
PayPal Risk Platform High Performance Practice
PayPal Risk Platform High Performance PracticePayPal Risk Platform High Performance Practice
PayPal Risk Platform High Performance Practice
 
Architecting a next generation data platform
Architecting a next generation data platformArchitecting a next generation data platform
Architecting a next generation data platform
 
Beyond DevOps - How Netflix Bridges the Gap
Beyond DevOps - How Netflix Bridges the GapBeyond DevOps - How Netflix Bridges the Gap
Beyond DevOps - How Netflix Bridges the Gap
 
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
 
Trending with Purpose
Trending with PurposeTrending with Purpose
Trending with Purpose
 
(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The Cloud(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The Cloud
 
Building real-time data analytics on Google Cloud
Building real-time data analytics on Google CloudBuilding real-time data analytics on Google Cloud
Building real-time data analytics on Google Cloud
 
Security with the Speed of Continuous Delivery
Security with the Speed of Continuous DeliverySecurity with the Speed of Continuous Delivery
Security with the Speed of Continuous Delivery
 
Startup Showcase - QuizUp
Startup Showcase - QuizUpStartup Showcase - QuizUp
Startup Showcase - QuizUp
 
Eric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New Contexts
 

Kürzlich hochgeladen

Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 

Kürzlich hochgeladen (20)

A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 

[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data