The document discusses several mathematical models and algorithms used in internet information retrieval and search engines:
1. Markov chain methods can be used to model a user's web surfing behavior and page visit transitions.
2. BrowseRank models user browsing as a Markov process to calculate page importance based on observed user behavior rather than artificial assumptions.
3. Learning to rank problems in information retrieval can be framed as a two-layer statistical learning problem where queries are the first layer and document relevance judgments are the second layer.
4. Stability theory can provide generalization bounds for learning to rank algorithms under this two-layer framework. Modifying algorithms like SVM and Boosting to have query-level stability improves performance.
Advantages of Hiring UIUX Design Service Providers for Your Business
Internet 信息检索中的数学
1. Internet 信息检索中的数学 Zhi-Ming Ma April 24, 2009, 厦门 Email: mazm@amt.ac.cn http://www.amt.ac.cn/member/mazhiming/index.html
2.
3. How can google make a ranking of 2,040,000 pages in 0.11 seconds?
4. A main task of Internet (Web) Information Retrieval = Design and Analysis of Search Engine (SE) Algorithm involving plenty of Mathematics
5. Inter network is a large scale complex random network The Earth is developing an electronic nervous system, a network with diverse nodes and links are
6. 搜索引擎的流程 Web Links & Anchors Pages Link Map 查询 在线部分 离线部分 Link Analysis 缓存 网页剖析器 倒排表 Page & Site 数据库 网络图 网页爬取器 r 用户界面 缓存页面 索引编辑器 Page Ranks 网络图生成器 Indexing and Ranking
32. is the limit distribution of P when the starting distribution is uniform, that is, Conjecture 1 :
33.
34.
35.
36.
37.
38.
39.
40.
41.
42. BrowseRank: User browsing graph 06/09/09 Yuting Liu@SIGIR'08 Vertex: Web page Edge: Transition Edge weight w ij : The number of transitions Staying time T i : The time spend on page i Reset probability : Normalized frequencies as first page of session
58. BrowseRank: Letting Web Users Vote for Page Importance Yuting Liu , Bin Gao, Tie-Yan Liu, Ying Zhang, Zhiming Ma, Shuyuan He, and Hang Li July 23, 2008, Singapore the 31st Annual International ACM SIGIR Conference on Research & Development on Information Retrieval. Best student paper !
59.
60.
61.
62.
63. Learning to Rank Model Learning System Ranking System Wei-Ying Ma, Microsoft Research Asia min Loss
64.
65.
66.
67.
68.
69.
70. Training Process i.i.d. For each i, the associated samples , distribution the training data is denoted as
76. Definition: We say a algorithm possesses: Object –level uniform leave-one-out stability Abbreviated as Object –level stability, if: Function learned from training data Function learned from training data
77. Generalization based on Object-level Stability Object-level stability The number of training objects With probability at least
78. Note: if , then the bound makes sense. This condition can be satisfied in many practical cases. As case studies, we investigate Ranking SVM and RankBoost. We show that after introducing query-level normalization to its objective function, Ranking SVM will have query-level stability. For RankBoost , the query-level stability can be achieved if we introduce both query-level normalization and regularization to its objective function . These analyses agree largely with our experiments and the experiments in Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon, 2006 [5] and [11].