11. File Traverser 其他文件 PDF 文件 XML 文件 File Traverser Document Processing DB collection collection collection
12.
13. JDBC Connector JDBC Connector Document Processing DB Result Set sql 内 容 分 发 每行 1 个 文档 collection collection collection
14.
15. 文档处理系统 Document Processing Engine 内 容 API 内 容 分 发 Search index index QR SFE collection collection collection
16.
17. 内容流 Document Processing Engin Collection 1 内 容 API 内 容 分 发 index searchApi SFE Collection n Collection 2 API 客户扩展的处理器
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28. 搜索处理系统 index 搜索 引擎 SFE Search Api Query and Result Server query 结果 Query& 参数 Query& 参数 HTTP client Text/xml 结果 Enhanced 结果 API client 结果处理 pipeline query 处理 pipeline
29.
30.
31.
32.
33.
34.
35. 相关术语 ( Relevancy Terminology ) For muli-term queries:the shorter the distance between query terms in a document,the higher the document’s rank value Proximity Importance of matching a query in a given document field Context Importance of geographical distance between a document’s associated latitude/longitude and a target location specified in a query Geo Assigned importance of a document , independent of the query Quality Importance of a document determined by the links to it from other documents Authority Age of a document compared to the time when the query is issued Freshness 描述 术语
36. 相关术语 ( Relevancy Terminology ) 计算 context 和 proximity 时额外用到的统计数据。 The greater the number of query terms present in the same field of a matching document, the highter the document’s rank value Completeness The more frequent a query term occurs in the document(term frequency or TF)relative to the term’s frequency in the index(inverse document frequency or IDF),the higher the document’s rank value Frequency The earlier a query term occurs in a field,the highter the document’s rank value Position 描述 术语
37. 相关算法 ( Relevancy Formula ) R(d,q)=S(d)+F(d,T)+D(d,q) R=query q 在 document d 中的 rank 值 S=document d 的静态 rank 值,与 query 无关 F=freshness of document d at time t D=dynamic rank