SlideShare ist ein Scribd-Unternehmen logo
1 von 88
Internet  信息检索中的数学 Zhi-Ming Ma April 24, 2009,  厦门 Email: mazm@amt.ac.cn  http://www.amt.ac.cn/member/mazhiming/index.html
 
How can google make a ranking of  2,040,000  pages  in  0.11  seconds?
A main task of  Internet (Web)  Information Retrieval    = Design and  Analysis of  Search Engine (SE) Algorithm involving plenty of  Mathematics
Inter network  is a large scale complex  random network The Earth is developing an electronic nervous system, a network with diverse  nodes  and  links  are
搜索引擎的流程 Web Links & Anchors Pages Link Map 查询 在线部分 离线部分 Link Analysis 缓存 网页剖析器 倒排表 Page & Site 数据库 网络图 网页爬取器 r 用户界面 缓存页面 索引编辑器 Page Ranks 网络图生成器 Indexing and Ranking
Static Rank ( 静态排序) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dynamic Rank (动态排序) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Research on Complex Networks and Information Retrieval ,[object Object]
 
 
Outlines ,[object Object],[object Object],[object Object],[object Object],[object Object]
 
 
 
 
 
[object Object],[object Object],[object Object]
   HITS    PageRank 1998  Jon Kleinberg  Cornell University ,[object Object],[object Object]
Nevanlinna Prize ( 2006) Jon Kleinberg ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Page   Rank ,  the ranking system   used by the Google search   engine. ,[object Object],[object Object],[object Object]
 
Markov chain describing  surfing behavior
Markov chain describing  surfing behavior
[object Object],[object Object],[object Object]
where
More generally we may consider  personalized d .: PageRank is the unique positive eigenvector:   By the strong ergodic theorem:
Problem:
 
 
PageRank as a Function of the Damping Factor Paolo Boldi Massimo Santini Sebastiano Vigna DSI, Università degli Studi di Milano WWW 2005  paper 3.1 Choosing the damping factor 3  General Behaviour 3.2 Getting close to 1 ,[object Object],[object Object],[object Object]
is the limit distribution of  P  when the starting distribution is uniform, that is, Conjecture 1   :
Research results by our group: ,[object Object],[object Object],[object Object],[object Object]
Weak points of PageRank ,[object Object],[object Object],[object Object],[object Object],BrowseRankSIGIR.ppt
 
Letting Web Users Vote for Page Importance ,[object Object],[object Object],[object Object],[object Object],[object Object],06/09/09 Yuting Liu@SIGIR'08
 
Browsing Process ,[object Object],[object Object]
 
 
 
BrowseRank: User browsing graph 06/09/09 Yuting Liu@SIGIR'08 Vertex: Web page Edge: Transition  Edge weight  w ij : The number of transitions  Staying time  T i : The time spend on page  i Reset probability  : Normalized frequencies as first page of session
Mathematical Deduction Maximum likelihood estimation: of staying time
Mathematical Deduction where Therefore
Mathematical Deduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Mathematical Deduction Assume Noise:  Chi-square distribution with degree k
Mathematical Deduction ideally we would have:   However, due to data sparseness,  we encounter challenges……
Mathematical Deduction To tackle this challenge, we turn it into  optimization problems :
 
Mathematical Deduction ,[object Object],[object Object],[object Object],[object Object],[object Object]
Mathematical Deduction ,[object Object],[object Object],[object Object],[object Object]
Mathematical Deduction ,[object Object],[object Object],[object Object],[object Object]
 
Experiments ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],06/09/09 Yuting Liu@SIGIR'08
Website-level: Find good 06/09/09 Yuting Liu@SIGIR'08
Website-level: Fight spam  06/09/09 Yuting Liu@SIGIR'08
 
BrowseRank: Letting Web Users Vote for Page Importance Yuting Liu ,  Bin Gao, Tie-Yan Liu, Ying Zhang,  Zhiming Ma, Shuyuan He, and Hang Li July 23, 2008, Singapore the 31st Annual International ACM SIGIR  Conference on Research & Development  on  Information Retrieval. Best student paper !
BrowseRank: Letting Web Users Vote for Page Importance ,[object Object],[object Object],[object Object]
Further Studies ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dynamic Rank (动态排序) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Outlines ,[object Object],[object Object],[object Object],[object Object],[object Object]
Learning to Rank Model Learning  System Ranking  System Wei-Ying Ma, Microsoft Research Asia min Loss
learning to rank in IR is  a  two layer statistical learning   ,[object Object],[object Object],[object Object],[object Object]
Document level  vs  Query level ,[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Microsoft Scholar Fellowship
[object Object],[object Object],the two layer structure of training data is  not artificial , but  arises from the real world Especially from learning to rank in Information Retrieval
Two-Layer Statistical Learning Framework   ,[object Object],[object Object],:  instances : descriptions of instances Instances are the objectives which we are concern
[object Object],[object Object],[object Object],[object Object],[object Object],a score (or label) of a document an order on a pair of documents a permutation (list) of documents
Training Process i.i.d. For each i,  the associated samples ,  distribution the training data is denoted as
[object Object],[object Object]
empirical object level loss loss function  on expected object level loss
[object Object],expected risk
[object Object]
Generalization Analysis based on Stability Theory ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Definition:  We say a algorithm possesses: Object –level uniform leave-one-out stability Abbreviated as   Object –level stability,  if: Function learned from training data   Function learned from training data
Generalization based on Object-level Stability Object-level stability The number of training objects With probability at least
Note:  if  , then  the bound makes sense.  This condition can be  satisfied in many practical cases. As case studies, we investigate Ranking SVM and RankBoost.  We show that  after introducing query-level normalization to its objective function,  Ranking  SVM  will have query-level stability.  For  RankBoost , the query-level stability can be achieved if we introduce both query-level normalization and regularization to its objective function .  These analyses agree largely with our experiments and the experiments in  Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon,  2006 [5] and [11].
[object Object],[object Object],Query-level Empirical Risk Generalization Bound:
Generalization Bounds Comparison ,[object Object],Generalization Bound: Generalization Bound: Modified  RSVM
RankBoost with Query-level Normalization and Regularization ,[object Object],query-level normalization cannot make  RankBoost have query-level stability. ,[object Object],[object Object],[object Object],[object Object]
Experimental Results (I) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],Experimental Results (II)
Future Problems and Challenges ,[object Object],[object Object],[object Object],[object Object]
Outlines ,[object Object],[object Object],[object Object],[object Object],[object Object]
Outlines ,[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object]
Thank you !

Weitere ähnliche Inhalte

Was ist angesagt?

Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Editor IJARCET
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksDing Li
 
Computing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engineComputing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search enginecsandit
 
AN EFFECTIVE RANKING METHOD OF WEBPAGE THROUGH TFIDF AND HYPERLINK CLASSIFIED...
AN EFFECTIVE RANKING METHOD OF WEBPAGE THROUGH TFIDF AND HYPERLINK CLASSIFIED...AN EFFECTIVE RANKING METHOD OF WEBPAGE THROUGH TFIDF AND HYPERLINK CLASSIFIED...
AN EFFECTIVE RANKING METHOD OF WEBPAGE THROUGH TFIDF AND HYPERLINK CLASSIFIED...IJDKP
 
Done reread deeperinsidepagerank
Done reread deeperinsidepagerankDone reread deeperinsidepagerank
Done reread deeperinsidepagerankJames Arnold
 
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeAn Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeTraian Rebedea
 
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...IOSR Journals
 
Text Analytics Presentation
Text Analytics PresentationText Analytics Presentation
Text Analytics PresentationSkylar Ritchie
 
Using Information Scent to Model Users in Web1.0 and Web2.0
Using Information Scent to Model Users in Web1.0 and Web2.0Using Information Scent to Model Users in Web1.0 and Web2.0
Using Information Scent to Model Users in Web1.0 and Web2.0Ed Chi
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsCloudTechnologies
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Saeedeh Shekarpour
 
Approaches to automated metadata extraction : FixRep Project
Approaches to automated metadata extraction : FixRep ProjectApproaches to automated metadata extraction : FixRep Project
Approaches to automated metadata extraction : FixRep ProjectUKOLN (dev), University of Bath
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Editor IJARCET
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsPvrtechnologies Nellore
 
Optimizing Search User Interfaces and Interactions within Professional Social...
Optimizing Search User Interfaces and Interactions within Professional Social...Optimizing Search User Interfaces and Interactions within Professional Social...
Optimizing Search User Interfaces and Interactions within Professional Social...Nik Spirin
 

Was ist angesagt? (17)

Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Computing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engineComputing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engine
 
AN EFFECTIVE RANKING METHOD OF WEBPAGE THROUGH TFIDF AND HYPERLINK CLASSIFIED...
AN EFFECTIVE RANKING METHOD OF WEBPAGE THROUGH TFIDF AND HYPERLINK CLASSIFIED...AN EFFECTIVE RANKING METHOD OF WEBPAGE THROUGH TFIDF AND HYPERLINK CLASSIFIED...
AN EFFECTIVE RANKING METHOD OF WEBPAGE THROUGH TFIDF AND HYPERLINK CLASSIFIED...
 
Done reread deeperinsidepagerank
Done reread deeperinsidepagerankDone reread deeperinsidepagerank
Done reread deeperinsidepagerank
 
Sub1579
Sub1579Sub1579
Sub1579
 
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeAn Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
 
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
 
Text Analytics Presentation
Text Analytics PresentationText Analytics Presentation
Text Analytics Presentation
 
Using Information Scent to Model Users in Web1.0 and Web2.0
Using Information Scent to Model Users in Web1.0 and Web2.0Using Information Scent to Model Users in Web1.0 and Web2.0
Using Information Scent to Model Users in Web1.0 and Web2.0
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutions
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems
 
Approaches to automated metadata extraction : FixRep Project
Approaches to automated metadata extraction : FixRep ProjectApproaches to automated metadata extraction : FixRep Project
Approaches to automated metadata extraction : FixRep Project
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutions
 
Optimizing Search User Interfaces and Interactions within Professional Social...
Optimizing Search User Interfaces and Interactions within Professional Social...Optimizing Search User Interfaces and Interactions within Professional Social...
Optimizing Search User Interfaces and Interactions within Professional Social...
 

Andere mochten auch

我的数学之路(丘成桐)
我的数学之路(丘成桐)我的数学之路(丘成桐)
我的数学之路(丘成桐)Xu jiakon
 
Time Management
Time ManagementTime Management
Time ManagementXu jiakon
 
李克正:精英教育的迫切性与中国教育危机
李克正:精英教育的迫切性与中国教育危机李克正:精英教育的迫切性与中国教育危机
李克正:精英教育的迫切性与中国教育危机Xu jiakon
 
数学竞赛与初等数学研究在中国
数学竞赛与初等数学研究在中国数学竞赛与初等数学研究在中国
数学竞赛与初等数学研究在中国Xu jiakon
 
杨师群老师的古代汉语一的课件
杨师群老师的古代汉语一的课件杨师群老师的古代汉语一的课件
杨师群老师的古代汉语一的课件Xu jiakon
 
高中数学知识
高中数学知识高中数学知识
高中数学知识Xu jiakon
 
33 《学会提问-掌握批判性思维》
33 《学会提问-掌握批判性思维》33 《学会提问-掌握批判性思维》
33 《学会提问-掌握批判性思维》Xu jiakon
 
我的数学之路(丘成桐)
我的数学之路(丘成桐)我的数学之路(丘成桐)
我的数学之路(丘成桐)Xu jiakon
 

Andere mochten auch (8)

我的数学之路(丘成桐)
我的数学之路(丘成桐)我的数学之路(丘成桐)
我的数学之路(丘成桐)
 
Time Management
Time ManagementTime Management
Time Management
 
李克正:精英教育的迫切性与中国教育危机
李克正:精英教育的迫切性与中国教育危机李克正:精英教育的迫切性与中国教育危机
李克正:精英教育的迫切性与中国教育危机
 
数学竞赛与初等数学研究在中国
数学竞赛与初等数学研究在中国数学竞赛与初等数学研究在中国
数学竞赛与初等数学研究在中国
 
杨师群老师的古代汉语一的课件
杨师群老师的古代汉语一的课件杨师群老师的古代汉语一的课件
杨师群老师的古代汉语一的课件
 
高中数学知识
高中数学知识高中数学知识
高中数学知识
 
33 《学会提问-掌握批判性思维》
33 《学会提问-掌握批判性思维》33 《学会提问-掌握批判性思维》
33 《学会提问-掌握批判性思维》
 
我的数学之路(丘成桐)
我的数学之路(丘成桐)我的数学之路(丘成桐)
我的数学之路(丘成桐)
 

Ähnlich wie Internet 信息检索中的数学

Web Page Ranking using Machine Learning
Web Page Ranking using Machine LearningWeb Page Ranking using Machine Learning
Web Page Ranking using Machine LearningPradip Rahul
 
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A ReviewIRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A ReviewIRJET Journal
 
Link analysis for web search
Link analysis for web searchLink analysis for web search
Link analysis for web searchEmrullah Delibas
 
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVALCONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVALijcsa
 
PageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey reportPageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey reportIOSR Journals
 
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMSIDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMSZac Darcy
 
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMSIDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMSIJwest
 
Identifying Important Features of Users to Improve Page Ranking Algorithms
Identifying Important Features of Users to Improve Page Ranking Algorithms Identifying Important Features of Users to Improve Page Ranking Algorithms
Identifying Important Features of Users to Improve Page Ranking Algorithms dannyijwest
 
LyonALMProposal20041018.doc
LyonALMProposal20041018.docLyonALMProposal20041018.doc
LyonALMProposal20041018.docbutest
 
LyonALMProposal20041018.doc
LyonALMProposal20041018.docLyonALMProposal20041018.doc
LyonALMProposal20041018.docbutest
 
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...iosrjce
 
A Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient AlgorithmA Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient AlgorithmIOSR Journals
 
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...inventionjournals
 
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank AlgorithmEnhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithmijnlc
 
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank AlgorithmEnhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithmkevig
 

Ähnlich wie Internet 信息检索中的数学 (20)

Macran
MacranMacran
Macran
 
K1803057782
K1803057782K1803057782
K1803057782
 
Web Page Ranking using Machine Learning
Web Page Ranking using Machine LearningWeb Page Ranking using Machine Learning
Web Page Ranking using Machine Learning
 
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A ReviewIRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
 
Link analysis for web search
Link analysis for web searchLink analysis for web search
Link analysis for web search
 
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVALCONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
 
PageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey reportPageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey report
 
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMSIDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
 
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMSIDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
 
Identifying Important Features of Users to Improve Page Ranking Algorithms
Identifying Important Features of Users to Improve Page Ranking Algorithms Identifying Important Features of Users to Improve Page Ranking Algorithms
Identifying Important Features of Users to Improve Page Ranking Algorithms
 
LyonALMProposal20041018.doc
LyonALMProposal20041018.docLyonALMProposal20041018.doc
LyonALMProposal20041018.doc
 
LyonALMProposal20041018.doc
LyonALMProposal20041018.docLyonALMProposal20041018.doc
LyonALMProposal20041018.doc
 
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
 
A017250106
A017250106A017250106
A017250106
 
H017124652
H017124652H017124652
H017124652
 
A Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient AlgorithmA Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient Algorithm
 
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
 
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank AlgorithmEnhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
 
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank AlgorithmEnhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
 

Mehr von Xu jiakon

动态金融风险度量,稳型中心极限定理和G-Brown运动
动态金融风险度量,稳型中心极限定理和G-Brown运动动态金融风险度量,稳型中心极限定理和G-Brown运动
动态金融风险度量,稳型中心极限定理和G-Brown运动Xu jiakon
 
Professional Development Of Chinese Mathematics Teachers Research A Review Of...
Professional Development Of Chinese Mathematics Teachers Research A Review Of...Professional Development Of Chinese Mathematics Teachers Research A Review Of...
Professional Development Of Chinese Mathematics Teachers Research A Review Of...Xu jiakon
 
艺术院线纳新宣传
艺术院线纳新宣传艺术院线纳新宣传
艺术院线纳新宣传Xu jiakon
 
信息采集与编辑培训(090421)
信息采集与编辑培训(090421)信息采集与编辑培训(090421)
信息采集与编辑培训(090421)Xu jiakon
 
On Mathematics Learning Perspective Among Teachers And Students In China
On Mathematics Learning  Perspective Among Teachers And Students In ChinaOn Mathematics Learning  Perspective Among Teachers And Students In China
On Mathematics Learning Perspective Among Teachers And Students In ChinaXu jiakon
 
The Teaching Of Mathematics At Senior High School In France
The Teaching Of Mathematics At Senior High School In FranceThe Teaching Of Mathematics At Senior High School In France
The Teaching Of Mathematics At Senior High School In FranceXu jiakon
 
An Experimental Study On Students Higher Level Mathematics Cognition
An Experimental Study On Students Higher Level Mathematics CognitionAn Experimental Study On Students Higher Level Mathematics Cognition
An Experimental Study On Students Higher Level Mathematics CognitionXu jiakon
 
☆中高考综合报告[英]
☆中高考综合报告[英]☆中高考综合报告[英]
☆中高考综合报告[英]Xu jiakon
 
福建省数学高考改革与高中数学教育的过去与现在 厦门讲稿
福建省数学高考改革与高中数学教育的过去与现在 厦门讲稿福建省数学高考改革与高中数学教育的过去与现在 厦门讲稿
福建省数学高考改革与高中数学教育的过去与现在 厦门讲稿Xu jiakon
 
数学教育文化与数学英才教育
数学教育文化与数学英才教育数学教育文化与数学英才教育
数学教育文化与数学英才教育Xu jiakon
 
2009年中国数学会学术年会与会者通讯录
2009年中国数学会学术年会与会者通讯录2009年中国数学会学术年会与会者通讯录
2009年中国数学会学术年会与会者通讯录Xu jiakon
 
报告摘要
报告摘要报告摘要
报告摘要Xu jiakon
 
吉林大学讲座信息网前卫南校区宣讲课件
吉林大学讲座信息网前卫南校区宣讲课件吉林大学讲座信息网前卫南校区宣讲课件
吉林大学讲座信息网前卫南校区宣讲课件Xu jiakon
 
王安石诗
王安石诗王安石诗
王安石诗Xu jiakon
 
宋诗研究
宋诗研究宋诗研究
宋诗研究Xu jiakon
 
欧阳修——宋诗的第一位
欧阳修——宋诗的第一位欧阳修——宋诗的第一位
欧阳修——宋诗的第一位Xu jiakon
 

Mehr von Xu jiakon (20)

动态金融风险度量,稳型中心极限定理和G-Brown运动
动态金融风险度量,稳型中心极限定理和G-Brown运动动态金融风险度量,稳型中心极限定理和G-Brown运动
动态金融风险度量,稳型中心极限定理和G-Brown运动
 
Professional Development Of Chinese Mathematics Teachers Research A Review Of...
Professional Development Of Chinese Mathematics Teachers Research A Review Of...Professional Development Of Chinese Mathematics Teachers Research A Review Of...
Professional Development Of Chinese Mathematics Teachers Research A Review Of...
 
艺术院线纳新宣传
艺术院线纳新宣传艺术院线纳新宣传
艺术院线纳新宣传
 
信息采集与编辑培训(090421)
信息采集与编辑培训(090421)信息采集与编辑培训(090421)
信息采集与编辑培训(090421)
 
On Mathematics Learning Perspective Among Teachers And Students In China
On Mathematics Learning  Perspective Among Teachers And Students In ChinaOn Mathematics Learning  Perspective Among Teachers And Students In China
On Mathematics Learning Perspective Among Teachers And Students In China
 
The Teaching Of Mathematics At Senior High School In France
The Teaching Of Mathematics At Senior High School In FranceThe Teaching Of Mathematics At Senior High School In France
The Teaching Of Mathematics At Senior High School In France
 
An Experimental Study On Students Higher Level Mathematics Cognition
An Experimental Study On Students Higher Level Mathematics CognitionAn Experimental Study On Students Higher Level Mathematics Cognition
An Experimental Study On Students Higher Level Mathematics Cognition
 
☆中高考综合报告[英]
☆中高考综合报告[英]☆中高考综合报告[英]
☆中高考综合报告[英]
 
福建省数学高考改革与高中数学教育的过去与现在 厦门讲稿
福建省数学高考改革与高中数学教育的过去与现在 厦门讲稿福建省数学高考改革与高中数学教育的过去与现在 厦门讲稿
福建省数学高考改革与高中数学教育的过去与现在 厦门讲稿
 
数学教育文化与数学英才教育
数学教育文化与数学英才教育数学教育文化与数学英才教育
数学教育文化与数学英才教育
 
2009年中国数学会学术年会与会者通讯录
2009年中国数学会学术年会与会者通讯录2009年中国数学会学术年会与会者通讯录
2009年中国数学会学术年会与会者通讯录
 
报告摘要
报告摘要报告摘要
报告摘要
 
吉林大学讲座信息网前卫南校区宣讲课件
吉林大学讲座信息网前卫南校区宣讲课件吉林大学讲座信息网前卫南校区宣讲课件
吉林大学讲座信息网前卫南校区宣讲课件
 
王安石诗
王安石诗王安石诗
王安石诗
 
苏轼
苏轼苏轼
苏轼
 
王安石
王安石王安石
王安石
 
宋诗研究
宋诗研究宋诗研究
宋诗研究
 
欧阳修——宋诗的第一位
欧阳修——宋诗的第一位欧阳修——宋诗的第一位
欧阳修——宋诗的第一位
 
苏作品
苏作品苏作品
苏作品
 
欧阳修
欧阳修欧阳修
欧阳修
 

Kürzlich hochgeladen

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 

Kürzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

Internet 信息检索中的数学

  • 1. Internet 信息检索中的数学 Zhi-Ming Ma April 24, 2009, 厦门 Email: mazm@amt.ac.cn http://www.amt.ac.cn/member/mazhiming/index.html
  • 2.  
  • 3. How can google make a ranking of 2,040,000 pages in 0.11 seconds?
  • 4. A main task of Internet (Web) Information Retrieval = Design and Analysis of Search Engine (SE) Algorithm involving plenty of Mathematics
  • 5. Inter network is a large scale complex random network The Earth is developing an electronic nervous system, a network with diverse nodes and links are
  • 6. 搜索引擎的流程 Web Links & Anchors Pages Link Map 查询 在线部分 离线部分 Link Analysis 缓存 网页剖析器 倒排表 Page & Site 数据库 网络图 网页爬取器 r 用户界面 缓存页面 索引编辑器 Page Ranks 网络图生成器 Indexing and Ranking
  • 7.
  • 8.
  • 9.
  • 10.  
  • 11.  
  • 12.
  • 13.  
  • 14.  
  • 15.  
  • 16.  
  • 17.  
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.  
  • 23. Markov chain describing surfing behavior
  • 24. Markov chain describing surfing behavior
  • 25.
  • 26. where
  • 27. More generally we may consider personalized d .: PageRank is the unique positive eigenvector: By the strong ergodic theorem:
  • 29.  
  • 30.  
  • 31.
  • 32. is the limit distribution of P when the starting distribution is uniform, that is, Conjecture 1 :
  • 33.
  • 34.
  • 35.  
  • 36.
  • 37.  
  • 38.
  • 39.  
  • 40.  
  • 41.  
  • 42. BrowseRank: User browsing graph 06/09/09 Yuting Liu@SIGIR'08 Vertex: Web page Edge: Transition Edge weight w ij : The number of transitions Staying time T i : The time spend on page i Reset probability : Normalized frequencies as first page of session
  • 43. Mathematical Deduction Maximum likelihood estimation: of staying time
  • 45.
  • 46. Mathematical Deduction Assume Noise: Chi-square distribution with degree k
  • 47. Mathematical Deduction ideally we would have: However, due to data sparseness, we encounter challenges……
  • 48. Mathematical Deduction To tackle this challenge, we turn it into optimization problems :
  • 49.  
  • 50.
  • 51.
  • 52.
  • 53.  
  • 54.
  • 55. Website-level: Find good 06/09/09 Yuting Liu@SIGIR'08
  • 56. Website-level: Fight spam 06/09/09 Yuting Liu@SIGIR'08
  • 57.  
  • 58. BrowseRank: Letting Web Users Vote for Page Importance Yuting Liu , Bin Gao, Tie-Yan Liu, Ying Zhang, Zhiming Ma, Shuyuan He, and Hang Li July 23, 2008, Singapore the 31st Annual International ACM SIGIR Conference on Research & Development on Information Retrieval. Best student paper !
  • 59.
  • 60.
  • 61.
  • 62.
  • 63. Learning to Rank Model Learning System Ranking System Wei-Ying Ma, Microsoft Research Asia min Loss
  • 64.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70. Training Process i.i.d. For each i, the associated samples , distribution the training data is denoted as
  • 71.
  • 72. empirical object level loss loss function on expected object level loss
  • 73.
  • 74.
  • 75.
  • 76. Definition: We say a algorithm possesses: Object –level uniform leave-one-out stability Abbreviated as Object –level stability, if: Function learned from training data Function learned from training data
  • 77. Generalization based on Object-level Stability Object-level stability The number of training objects With probability at least
  • 78. Note: if , then the bound makes sense. This condition can be satisfied in many practical cases. As case studies, we investigate Ranking SVM and RankBoost. We show that after introducing query-level normalization to its objective function, Ranking SVM will have query-level stability. For RankBoost , the query-level stability can be achieved if we introduce both query-level normalization and regularization to its objective function . These analyses agree largely with our experiments and the experiments in Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon, 2006 [5] and [11].
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.
  • 87.