Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Sharing and growing the world's
knowledge with machine learning
Lei Yang (leiyang@quora.com)
April 2016
Our mission
“To share and grow the world’s
knowledge”
● Millions of questions & answers
● Millions of users
● Thousands of...
Demand
What we care about
Quality
Relevance
Data
@Quora
Topic Question
User
Answer
Actions
Lots of data relations
Complex network propagation effects
Importance of topics & semantics
Machine Learning
@Quora
Ranking - Answer ranking
What is a good Quora answer?
● Truthful
● Reusable
● Provides explanation
● well formatted
...
Ranking - Answer ranking
How are those criteria translated
into features?
● Features that relate to the text quality
itsel...
Ranking - Feed
Present most interesting stories for a user at
a given time
● Interesting = topical relevance +
social rele...
Ranking - Feed
● Personalized LTR model
● Features
○ Quality of question/answer
○ Topics the user is interested in
or know...
Recommendations - Topics
Recommend new topics for the user
to follow, based on
● Topics you already follow
● Users you alr...
Recommendations - Users
Recommend new users for the user
to follow, based on:
● Users you already follow
● Topics you alre...
Related questions
Given interest in a question, what other questions
are interesting?
● Not only about similarity, but als...
Duplicate questions
● Important issue for Quora
○ Want to make sure we don’t disperse
knowledge to the same question
● Bin...
User expertise inference
Infer user’s trustworthiness in relation
to a given topic
● We take into account:
○ Answers writt...
Spam detection and moderation
● Very important for Quora to keep quality of
content
● Pure manual approaches do not scale
...
Content creation prediction
● Quora’s algorithms not only optimize for
probability of reading
● Important to predict proba...
Trending topics
Highlight current events that are interesting
to the user
● We take into account:
○ Global “Trendiness”
○ ...
Models &
Experimentation
Models
● Logistic Regression
● Elastic Nets
● Gradient Boosted Decision Trees
● Random Forests
● (Deep) Neural Networks
● ...
Open source project -- QMF
Quora Matrix Factorization
https://github.com/quora/qmf
● Currently BPR and WALS
● Multithreade...
ML platform
● Allow ML Engineers and Data
Scientists to collaborate within
the same ML framework
● Easy integration with w...
● Extensive A/B testing, data-driven
decision-making
● Separate, orthogonal “layers” for
different parts of the system
● E...
Conclusions
Conclusions
● At Quora we have not only Big, but also “rich” data
● Our algorithms need to understand and optimize complex...
We are hiring!
www.quora.com/careers
Nächste SlideShare
Wird geladen in …5
×

Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

731 Aufrufe

Veröffentlicht am

Sharing and Growing the World’s Knowledge with Machine Learning: At Quora our mission is to “share and grow the world’s knowledge”. To accomplish this, we need to build a complex ecosystem which requires us to understand and solve a variety of problems like content quality, demand, user engagement, personalization, and author reputation. In this talk, we will go over several exciting challenges of applying machine learning to these problems. We will give examples such as our ranking and recommendation approaches, as well as systems and tools we built to support experimentation and integration of machine learning models in the product.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

  1. 1. Sharing and growing the world's knowledge with machine learning Lei Yang (leiyang@quora.com) April 2016
  2. 2. Our mission “To share and grow the world’s knowledge” ● Millions of questions & answers ● Millions of users ● Thousands of topics ● ...
  3. 3. Demand What we care about Quality Relevance
  4. 4. Data @Quora
  5. 5. Topic Question User Answer Actions
  6. 6. Lots of data relations
  7. 7. Complex network propagation effects
  8. 8. Importance of topics & semantics
  9. 9. Machine Learning @Quora
  10. 10. Ranking - Answer ranking What is a good Quora answer? ● Truthful ● Reusable ● Provides explanation ● well formatted ...
  11. 11. Ranking - Answer ranking How are those criteria translated into features? ● Features that relate to the text quality itself ● Interaction features (upvotes/downvotes, clicks, comments…) ● User features (e.g. expertise in topic)
  12. 12. Ranking - Feed Present most interesting stories for a user at a given time ● Interesting = topical relevance + social relevance + timeliness ● Stories = questions + answers ● Personalized learning-to-rank approach ● Relevance-ordered vs time-ordered = big gains in engagement ● Challenges ○ Potentially many candidate stories ○ Real-time ranking ○ Objective function
  13. 13. Ranking - Feed ● Personalized LTR model ● Features ○ Quality of question/answer ○ Topics the user is interested in or knows about ○ Users the user is following ○ What is trending/popular ○ ... ● Different temporal windows ● Multi-stage solution with different “streams”
  14. 14. Recommendations - Topics Recommend new topics for the user to follow, based on ● Topics you already follow ● Users you already follow ● Interactions with questions/answers ● Topic-related features ● ...
  15. 15. Recommendations - Users Recommend new users for the user to follow, based on: ● Users you already follow ● Topics you already follow ● Interactions with users ● User-related features ● ...
  16. 16. Related questions Given interest in a question, what other questions are interesting? ● Not only about similarity, but also “interestingness” ● Features such as: ○ Textual ○ Co-visit ○ Topics ○ … ● Important for logged-out use case
  17. 17. Duplicate questions ● Important issue for Quora ○ Want to make sure we don’t disperse knowledge to the same question ● Binary classifier trained with labelled data ● Features ○ Textual vector space models ○ Usage-based features ○ ...
  18. 18. User expertise inference Infer user’s trustworthiness in relation to a given topic ● We take into account: ○ Answers written on topic ○ Upvotes/downvotes received ○ Endorsements ○ ... ● Trust/expertise propagates through the network ● Useful as input/features in other models
  19. 19. Spam detection and moderation ● Very important for Quora to keep quality of content ● Pure manual approaches do not scale ● Hard to get algorithms 100% right ● ML algorithms detect content/user issues ○ Output of the algorithms feed manually curated moderation queues
  20. 20. Content creation prediction ● Quora’s algorithms not only optimize for probability of reading ● Important to predict probability of a user answering a question ● Some product features completely rely on that prediction ○ E.g. A2A (ask to answer) suggestions
  21. 21. Trending topics Highlight current events that are interesting to the user ● We take into account: ○ Global “Trendiness” ○ Social “Trendiness” ○ User’s interest ○ ... ● Trending topics are a great discovery mechanism
  22. 22. Models & Experimentation
  23. 23. Models ● Logistic Regression ● Elastic Nets ● Gradient Boosted Decision Trees ● Random Forests ● (Deep) Neural Networks ● LambdaMART ● Matrix Factorization ● LDA ● ...
  24. 24. Open source project -- QMF Quora Matrix Factorization https://github.com/quora/qmf ● Currently BPR and WALS ● Multithreaded implementation in C++14
  25. 25. ML platform ● Allow ML Engineers and Data Scientists to collaborate within the same ML framework ● Easy integration with well known tools and open source libraries ● Offline evaluation and debugging ● User friendly Python frontend ● High performance and scalable C++/CUDA backend Redshift MySQL S3 Python User Interface Trainer Box Session CPU GPU Disk ...WALS BPR
  26. 26. ● Extensive A/B testing, data-driven decision-making ● Separate, orthogonal “layers” for different parts of the system ● Experiment framework showing comparisons for various metrics Experimentation
  27. 27. Conclusions
  28. 28. Conclusions ● At Quora we have not only Big, but also “rich” data ● Our algorithms need to understand and optimize complex aspects such as quality, interestingness, relevance, or user expertise ● We believe ML will be one of the keys to our success ● We have many interesting problems, and many unsolved challenges
  29. 29. We are hiring! www.quora.com/careers

×