SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
北京大学计算机科学技术研究所
Institute of Computer Science & Technology Peking University

CIKM 2013
Exploiting Ranking Factorization
Machines for Microblog Retrieval
Runwei Qiang Feng Liang

Jianwu Yang

Institute of Computer Science and Technology
Peking University

1

Exploiting Ranking Factorization Machines for Microblog Retrieval
Problem Definition
Q1

Q2

…

Qn

Q1

Q2

…

Qn

ranking

timestamp

Tweet Collection
2

relevance

(Q1 , t1)
(Q2 , t2)
…
(Qn , tn)

Real-time Search
At time t, find tweets
about topic X.
—— TREC’2011

Not Available !!
Exploiting Ranking Factorization Machines for Microblog Retrieval
Motivations
IR for microblog is a non-trivial problem









Length of document is very short
 severe vocabulary-mismatch problem, how to apply query
expansion technique?
Abundance of shortened URLs
 offer ways to expand document, but how to make use of it?
Large quantities of pointless babble


3

How to use the tweet quality to filter non-informative message?

Exploiting Ranking Factorization Machines for Microblog Retrieval
Motivations
Learning to rank methods can make full use of different
models or factors in microblog retrieval





different factors => different features

Many features have been proved useful






4

Semantic features between query and document
Tweet quality features, i.e. link, retweet, and mention
count/binary

Exploiting Ranking Factorization Machines for Microblog Retrieval
Limitations
Features are considered independent





Some features are closely related to each other.


RT and @ symbols occur in the same tweet frequently.

Feature utilization





Link feature: binary => semantic information

Small plane crashes at big airport; no one notices- CNN.com

5

Exploiting Ranking Factorization Machines for Microblog Retrieval
Proposal
Employ an Ranking FM Framework





Adopts FM as the ranking function to model interactions
between features

Utilize several effective features which are neglected in
existing work
Optimize Ranking FM by two optimization methods







6

Stochastic Gradient Descent
Adaptive Regularization

Exploiting Ranking Factorization Machines for Microblog Retrieval
Outline
Ranking FM for Microblog Retrieval






Ranking FM Framework
Optimization Methods

Feature Description
Experiments
Summary





7

Exploiting Ranking Factorization Machines for Microblog Retrieval
Ranking FM Framework
Pairwise approach



 x p , y p  ,  xq , yq 


1 y p
  x p , xq  , z  

 1 yq




yq 

yp 


Loss function





(
min L()   lt f ;  x (pt ) , xqt ) , z ( t )      2
l



t 1

FM ranking
Hinge Loss
function Function

8

 

Regularization
term

Exploiting Ranking Factorization Machines for Microblog Retrieval
Factorization Machines Model
n

n

ˆ
y ( x)  w0   wi xi  
i 1

n



i 1 j i 1

k

vi , v j xi x j

factorized
parameters

vi , v j  vi , f ·j , f
v
f 1

nested
interations

factorization dimensionality
2
n

1 k  n

2
2
ˆ
y ( x)  w0   wi xi      vi , f xi    vi , f xi 

2 f 1   i 1
i 1
 i 1


n

𝑂(𝑘 ∙ 𝑛)

9

Exploiting Ranking Factorization Machines for Microblog Retrieval
Learn Ranking FM




timeconsuming

Stochastic Gradient Descent
 Grid search on validation set for find the best λ
Adaptive Regularization [2]
Training set



ˆ
(t 1) |  (t ) : arg min   l  y (x | ( t ) ), y    ( t ) 2 

 
 
  x , y ST


Validation Set




ˆ
l  y (x | ( t 1) ), y    ( t ) 2 


 
  x , y SV


 (t 1) | (t 1) : arg min 



adapt the
regularization
automatically

10

Exploiting Ranking Factorization Machines for Microblog Retrieval
Feature Description


Content Relevance Features (3)





Semantic Expansion Features (3x3=9)







Query & Tweet
BM25、TFIDF、Language Model Score
Query & topic info;
Expanded query & Tweet;
Expanded query & Topic info
BM25、TFIDF、Language Model Score

Quality Features (5)


11

mention、retweet、hashtag、link binary feature
tweet length
Exploiting Ranking Factorization Machines for Microblog Retrieval
Experimental Setup


Dataset






title field of link pages

TREC’11 50 queries
TREC’12 60 queries

Evaluation Metrics

Status

200

OK

302

Found

815,794

403

Forbidden

817,273

404

Not Found

868,667

Null

about 2 weeks twitter data

TopicInfo Corpus




HTTP Code

TREC Tweet11 Corpus




Summary statistics of Tweet11 Corpus

Null

67,011

Searchable

# of tweets
8,084,724

8,900,518

Summary statistics of TopicInfo Corpus
200

OK

302

Found
Forbidden

5,050

404

Not Found

92,378

Null

P@30 & MAP

Status

403



HTTP Code

Null

265,468

Searchable
12

# of tweets
1,225,947

688

1,226,635

Exploiting Ranking Factorization Machines for Microblog Retrieval
Baselines


KL2SFBLoc [3]





hitURLrun3 [4]





Expanded language model with two-stage query expansion
Perform very well in TREC’11 real time search task
Use a logistic regression model to learn a pairwise ranking for
microblog retrieval
Best Performing system in TREC’12 real time search task

RSVM_Full



13

Ranking SVM with linear kernel
Same feature set the Ranking FM used

Exploiting Ranking Factorization Machines for Microblog Retrieval
Ranking FM Performance
7% improve
on P@30
4% improve
on P@30
Metric

KL2SFBLoc

RSVM_Full

hitURLrun3

RFM_FullSGD

RFM_FullAR

P@30

0.2441

0.2616

0.2701

0.2808

0.2746

MAP

0.2506

0.2597

0.2642

0.2694

0.2678

TREC’12
Best

14

Ranking FM

Exploiting Ranking Factorization Machines for Microblog Retrieval
Feature Study
0.5
Full
-Quality
-Document Expansion
-Query Expansion
-Content Relevance
Only Content Relevance

0.45

0.4

P@N

0.35

0.3

0.25

0.2

0

5

10

15
N

20

25

30

Ranking FM of k=3 optimized by SGD

15

Exploiting Ranking Factorization Machines for Microblog Retrieval
Influence of the hyper-parameter k

0.29

0.275
RFM_FullSGD

RFM_FullSGD

0.285

0.27
0.265

0.275

MAP

P@30

0.28

0.27

0.255

0.265

0.25

0.26
0.255
0

0.26

5

10

15

0.245
0

k

5

10

15

k

Ranking FM optimized by SGD

16

Exploiting Ranking Factorization Machines for Microblog Retrieval
Stochastic gradient descent v.s.
Adaptive regularization
4

3

x 10

Training time (s)

2.5

Stochastic Gradient Descent
Adaptive Regularization

2
1.5
1
0.5
0
0

5

10

15

k

Method

P@10

P@30

MAP

RFM_FullSGD

0.4068

0.3695

0.2808

0.2694

RFM_FullAR
17

P@5
0.4034

0.3678

0.2746

0.2678

Exploiting Ranking Factorization Machines for Microblog Retrieval
Summary


Ranking FM Framework





Two optimization methods





Pairwise approach
Use Factorization Machines as ranking function
Stochastic Gradient Descent
Adaptive Regularization

Three groups of features




18

Content Relevance Features
Semantic Expansion Features
Quality Features

Exploiting Ranking Factorization Machines for Microblog Retrieval
References







[1] Iadh Ounis, Jimmy Lin, and Ian Soboroff. Overview of the TREC2011 MicroblogTrack. In Proceedings of TREC 2011, 2012.
[2] S. Rendle. Learning recommender systems with adaptive
regularization. In Proceedings of the fifth ACM international conference
on Web search and data mining, WSDM ’12, pages 133–142. ACM,
2012.
[3] F. Liang, R. Qiang, and J. Yang. Exploiting real-time information
retrieval in the microblogosphere. JCDL ’12, pages 267–276. ACM,
2012.
[4] Z. Han, X. Li, M. Yang, H. Qi, S. Li, and T. Zhao. Hit at TREC 2012
Microblog Track. In Proceedings of TREC 2012, 2013.

19

Exploiting Ranking Factorization Machines for Microblog Retrieval
北京大学计算机科学技术研究所
Institute of Computer Science & Technology Peking University

CIKM 2013
Exploiting Ranking Factorization
Machines for Microblog Retrieval
Runwei Qiang Feng Liang

Jianwu Yang

Institute of Computer Science and Technology
Peking University

20

Exploiting Ranking Factorization Machines for Microblog Retrieval

Weitere ähnliche Inhalte

Was ist angesagt?

Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
MLconf
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Databricks
 
NTNU @ Social Event Detection Task (SED)
NTNU @ Social Event Detection Task (SED)NTNU @ Social Event Detection Task (SED)
NTNU @ Social Event Detection Task (SED)
Massimiliano Ruocco
 

Was ist angesagt? (20)

Hybrid acquisition of temporal scopes for rdf data
Hybrid acquisition of temporal scopes for rdf dataHybrid acquisition of temporal scopes for rdf data
Hybrid acquisition of temporal scopes for rdf data
 
Introduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learnIntroduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learn
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
 
Introduction to matplotlib
Introduction to matplotlibIntroduction to matplotlib
Introduction to matplotlib
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
 
Matplotlib Review 2021
Matplotlib Review 2021Matplotlib Review 2021
Matplotlib Review 2021
 
TensorFlow in 3 sentences
TensorFlow in 3 sentencesTensorFlow in 3 sentences
TensorFlow in 3 sentences
 
NTNU @ Social Event Detection Task (SED)
NTNU @ Social Event Detection Task (SED)NTNU @ Social Event Detection Task (SED)
NTNU @ Social Event Detection Task (SED)
 
Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkR
 
Feature Selection for Document Ranking
Feature Selection for Document RankingFeature Selection for Document Ranking
Feature Selection for Document Ranking
 
Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning Pipelines
 
Predicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkPredicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman network
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph Convolution
 
The Joy of SciPy
The Joy of SciPyThe Joy of SciPy
The Joy of SciPy
 
Parallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear ModelsParallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear Models
 
Data Analysis in Python-NumPy
Data Analysis in Python-NumPyData Analysis in Python-NumPy
Data Analysis in Python-NumPy
 
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
 
Python Scipy Numpy
Python Scipy NumpyPython Scipy Numpy
Python Scipy Numpy
 
ECE 565 FInal Project
ECE 565 FInal ProjectECE 565 FInal Project
ECE 565 FInal Project
 

Andere mochten auch

Computational Advertising in Yelp Local Ads
Computational Advertising in Yelp Local AdsComputational Advertising in Yelp Local Ads
Computational Advertising in Yelp Local Ads
soupsranjan
 
Factorization Machines with libFM
Factorization Machines with libFMFactorization Machines with libFM
Factorization Machines with libFM
Liangjie Hong
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Spark Summit
 
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengGeneralized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Spark Summit
 
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Spark Summit
 

Andere mochten auch (10)

Computational Advertising in Yelp Local Ads
Computational Advertising in Yelp Local AdsComputational Advertising in Yelp Local Ads
Computational Advertising in Yelp Local Ads
 
(2016 07-19) providing click predictions in real-time at scale
(2016 07-19) providing click predictions in real-time at scale(2016 07-19) providing click predictions in real-time at scale
(2016 07-19) providing click predictions in real-time at scale
 
Training
TrainingTraining
Training
 
Factorization Machines with libFM
Factorization Machines with libFMFactorization Machines with libFM
Factorization Machines with libFM
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengGeneralized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
 
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
 
Recent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and BeyondRecent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and Beyond
 
How to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkHow to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on Spark
 

Ähnlich wie Exploiting Ranking Factorization Machines for Microblog Retrieval

Self healing test automation with Healenium and Minimization of regression su...
Self healing test automation with Healenium and Minimization of regression su...Self healing test automation with Healenium and Minimization of regression su...
Self healing test automation with Healenium and Minimization of regression su...
Dmitriy Gumeniuk
 
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Lucidworks
 
MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"
Hideyuki Kawashima
 
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
Ali Ouni
 

Ähnlich wie Exploiting Ranking Factorization Machines for Microblog Retrieval (20)

PythonとAutoML at PyConJP 2019
PythonとAutoML at PyConJP 2019PythonとAutoML at PyConJP 2019
PythonとAutoML at PyConJP 2019
 
Performance is a Feature!
Performance is a Feature!Performance is a Feature!
Performance is a Feature!
 
ICMT 2016: Search-Based Model Transformations with MOMoT
ICMT 2016: Search-Based Model Transformations with MOMoTICMT 2016: Search-Based Model Transformations with MOMoT
ICMT 2016: Search-Based Model Transformations with MOMoT
 
Semantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media PostsSemantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media Posts
 
Session 2 - Akyildiz, Beinecke, Yee at MLconf NYC
Session 2 - Akyildiz, Beinecke, Yee at MLconf NYCSession 2 - Akyildiz, Beinecke, Yee at MLconf NYC
Session 2 - Akyildiz, Beinecke, Yee at MLconf NYC
 
Self healing test automation with Healenium and Minimization of regression su...
Self healing test automation with Healenium and Minimization of regression su...Self healing test automation with Healenium and Minimization of regression su...
Self healing test automation with Healenium and Minimization of regression su...
 
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
 
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
 
MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"
 
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
 
Junhua wang ai_next_con
Junhua wang ai_next_conJunhua wang ai_next_con
Junhua wang ai_next_con
 
Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning Pipelines
 
Performance and how to measure it - ProgSCon London 2016
Performance and how to measure it - ProgSCon London 2016Performance and how to measure it - ProgSCon London 2016
Performance and how to measure it - ProgSCon London 2016
 
Massaging the Pony: Message Queues and You
Massaging the Pony: Message Queues and YouMassaging the Pony: Message Queues and You
Massaging the Pony: Message Queues and You
 
PhD Thesis Presentation
PhD Thesis PresentationPhD Thesis Presentation
PhD Thesis Presentation
 
Automated Machine Learning via Sequential Uniform Designs
Automated Machine Learning via Sequential Uniform DesignsAutomated Machine Learning via Sequential Uniform Designs
Automated Machine Learning via Sequential Uniform Designs
 
Optimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone MLOptimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone ML
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and Tricks
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
 
Deep Learning Inference at speed and scale
Deep Learning Inference at speed and scaleDeep Learning Inference at speed and scale
Deep Learning Inference at speed and scale
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Exploiting Ranking Factorization Machines for Microblog Retrieval

  • 1. 北京大学计算机科学技术研究所 Institute of Computer Science & Technology Peking University CIKM 2013 Exploiting Ranking Factorization Machines for Microblog Retrieval Runwei Qiang Feng Liang Jianwu Yang Institute of Computer Science and Technology Peking University 1 Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 2. Problem Definition Q1 Q2 … Qn Q1 Q2 … Qn ranking timestamp Tweet Collection 2 relevance (Q1 , t1) (Q2 , t2) … (Qn , tn) Real-time Search At time t, find tweets about topic X. —— TREC’2011 Not Available !! Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 3. Motivations IR for microblog is a non-trivial problem     Length of document is very short  severe vocabulary-mismatch problem, how to apply query expansion technique? Abundance of shortened URLs  offer ways to expand document, but how to make use of it? Large quantities of pointless babble  3 How to use the tweet quality to filter non-informative message? Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 4. Motivations Learning to rank methods can make full use of different models or factors in microblog retrieval   different factors => different features Many features have been proved useful    4 Semantic features between query and document Tweet quality features, i.e. link, retweet, and mention count/binary Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 5. Limitations Features are considered independent   Some features are closely related to each other.  RT and @ symbols occur in the same tweet frequently. Feature utilization   Link feature: binary => semantic information Small plane crashes at big airport; no one notices- CNN.com 5 Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 6. Proposal Employ an Ranking FM Framework   Adopts FM as the ranking function to model interactions between features Utilize several effective features which are neglected in existing work Optimize Ranking FM by two optimization methods     6 Stochastic Gradient Descent Adaptive Regularization Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 7. Outline Ranking FM for Microblog Retrieval    Ranking FM Framework Optimization Methods Feature Description Experiments Summary    7 Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 8. Ranking FM Framework Pairwise approach   x p , y p  ,  xq , yq   1 y p   x p , xq  , z     1 yq   yq   yp   Loss function   ( min L()   lt f ;  x (pt ) , xqt ) , z ( t )      2 l  t 1 FM ranking Hinge Loss function Function 8   Regularization term Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 9. Factorization Machines Model n n ˆ y ( x)  w0   wi xi   i 1 n  i 1 j i 1 k vi , v j xi x j factorized parameters vi , v j  vi , f ·j , f v f 1 nested interations factorization dimensionality 2 n  1 k  n  2 2 ˆ y ( x)  w0   wi xi      vi , f xi    vi , f xi   2 f 1   i 1 i 1  i 1   n 𝑂(𝑘 ∙ 𝑛) 9 Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 10. Learn Ranking FM   timeconsuming Stochastic Gradient Descent  Grid search on validation set for find the best λ Adaptive Regularization [2] Training set   ˆ (t 1) |  (t ) : arg min   l  y (x | ( t ) ), y    ( t ) 2         x , y ST  Validation Set   ˆ l  y (x | ( t 1) ), y    ( t ) 2        x , y SV   (t 1) | (t 1) : arg min    adapt the regularization automatically 10 Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 11. Feature Description  Content Relevance Features (3)    Semantic Expansion Features (3x3=9)      Query & Tweet BM25、TFIDF、Language Model Score Query & topic info; Expanded query & Tweet; Expanded query & Topic info BM25、TFIDF、Language Model Score Quality Features (5)   11 mention、retweet、hashtag、link binary feature tweet length Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 12. Experimental Setup  Dataset    title field of link pages TREC’11 50 queries TREC’12 60 queries Evaluation Metrics Status 200 OK 302 Found 815,794 403 Forbidden 817,273 404 Not Found 868,667 Null about 2 weeks twitter data TopicInfo Corpus   HTTP Code TREC Tweet11 Corpus   Summary statistics of Tweet11 Corpus Null 67,011 Searchable # of tweets 8,084,724 8,900,518 Summary statistics of TopicInfo Corpus 200 OK 302 Found Forbidden 5,050 404 Not Found 92,378 Null P@30 & MAP Status 403  HTTP Code Null 265,468 Searchable 12 # of tweets 1,225,947 688 1,226,635 Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 13. Baselines  KL2SFBLoc [3]    hitURLrun3 [4]    Expanded language model with two-stage query expansion Perform very well in TREC’11 real time search task Use a logistic regression model to learn a pairwise ranking for microblog retrieval Best Performing system in TREC’12 real time search task RSVM_Full   13 Ranking SVM with linear kernel Same feature set the Ranking FM used Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 14. Ranking FM Performance 7% improve on P@30 4% improve on P@30 Metric KL2SFBLoc RSVM_Full hitURLrun3 RFM_FullSGD RFM_FullAR P@30 0.2441 0.2616 0.2701 0.2808 0.2746 MAP 0.2506 0.2597 0.2642 0.2694 0.2678 TREC’12 Best 14 Ranking FM Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 15. Feature Study 0.5 Full -Quality -Document Expansion -Query Expansion -Content Relevance Only Content Relevance 0.45 0.4 P@N 0.35 0.3 0.25 0.2 0 5 10 15 N 20 25 30 Ranking FM of k=3 optimized by SGD 15 Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 16. Influence of the hyper-parameter k 0.29 0.275 RFM_FullSGD RFM_FullSGD 0.285 0.27 0.265 0.275 MAP P@30 0.28 0.27 0.255 0.265 0.25 0.26 0.255 0 0.26 5 10 15 0.245 0 k 5 10 15 k Ranking FM optimized by SGD 16 Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 17. Stochastic gradient descent v.s. Adaptive regularization 4 3 x 10 Training time (s) 2.5 Stochastic Gradient Descent Adaptive Regularization 2 1.5 1 0.5 0 0 5 10 15 k Method P@10 P@30 MAP RFM_FullSGD 0.4068 0.3695 0.2808 0.2694 RFM_FullAR 17 P@5 0.4034 0.3678 0.2746 0.2678 Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 18. Summary  Ranking FM Framework    Two optimization methods    Pairwise approach Use Factorization Machines as ranking function Stochastic Gradient Descent Adaptive Regularization Three groups of features    18 Content Relevance Features Semantic Expansion Features Quality Features Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 19. References     [1] Iadh Ounis, Jimmy Lin, and Ian Soboroff. Overview of the TREC2011 MicroblogTrack. In Proceedings of TREC 2011, 2012. [2] S. Rendle. Learning recommender systems with adaptive regularization. In Proceedings of the fifth ACM international conference on Web search and data mining, WSDM ’12, pages 133–142. ACM, 2012. [3] F. Liang, R. Qiang, and J. Yang. Exploiting real-time information retrieval in the microblogosphere. JCDL ’12, pages 267–276. ACM, 2012. [4] Z. Han, X. Li, M. Yang, H. Qi, S. Li, and T. Zhao. Hit at TREC 2012 Microblog Track. In Proceedings of TREC 2012, 2013. 19 Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 20. 北京大学计算机科学技术研究所 Institute of Computer Science & Technology Peking University CIKM 2013 Exploiting Ranking Factorization Machines for Microblog Retrieval Runwei Qiang Feng Liang Jianwu Yang Institute of Computer Science and Technology Peking University 20 Exploiting Ranking Factorization Machines for Microblog Retrieval