Successfully reported this slideshow.

11 Information Retrieval - 2

199 Aufrufe

Veröffentlicht am

Lecture 11 - Information Service Engineering with foundations of Information Retrieval

Veröffentlicht in: Bildung
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

11 Information Retrieval - 2

  1. 1. 2 ● ● ●
  2. 2. 3 ● ● ○ ○ ● ○ ● ○
  3. 3. 4 ● ● ● ● ●
  4. 4. 5 ● ℝ ● ● ● ● ● ●
  5. 5. 6 ● ○ ■ ■ ● ○ ■ ϴ dot product vector norm
  6. 6. 7 ● ● ● term weights for terms ti for a search query q documents relevant for search query q
  7. 7. 8
  8. 8. How to judge the Quality of a Retrieval Result?
  9. 9. 10 Monitors and measures effectiveness and efficiency (primarily offline)
  10. 10. 11 ● ● ● ● ○ ● ○ ● ○ 11
  11. 11. ● ○ ● ○ … ● ○ ○ ○ 12
  12. 12. ● ● ○ retrieved true false relevant true true positive false negative false false positive true negative ground truth Search results 13
  13. 13. ● ●
  14. 14. ● is the harmonic mean of precision and recall.
  15. 15. ● ● 16
  16. 16. ● ● Recall 0.17 Precision 1.0 Recall 0.0 Precision 0.0 17
  17. 17. ● ● Recall 0.17 0.17 Precision 1.0 0.5 Recall 0.0 0.17 Precision 0.0 0.5 18
  18. 18. ● ● Recall 0.17 0.17 0.33 Precision 1.0 0.5 0.67 Recall 0.0 0.17 0.17 Precision 0.0 0.5 0.33 19
  19. 19. ● ● Recall 0.17 0.17 0.33 0.5 Precision 1.0 0.5 0.67 0.75 Recall 0.0 0.17 0.17 0.17 Precision 0.0 0.5 0.33 0.25 20
  20. 20. ● ● Recall 0.17 0.17 0.33 0.5 0.67 Precision 1.0 0.5 0.67 0.75 0.8 Recall 0.0 0.17 0.17 0.17 0.33 Precision 0.0 0.5 0.33 0.25 0.4 21
  21. 21. ● ● Recall 0.17 0.17 0.33 0.5 0.67 0.83 Precision 1.0 0.5 0.67 0.75 0.8 0.83 Recall 0.0 0.17 0.17 0.17 0.33 0.5 Precision 0.0 0.5 0.33 0.25 0.4 0.5 22
  22. 22. ● ● Recall 0.17 0.17 0.33 0.5 0.67 0.83 0.83 0.83 0.83 1.0 Precision 1.0 0.5 0.67 0.75 0.8 0.83 0.71 0.63 0.56 0.6 Recall 0.0 0.17 0.17 0.17 0.33 0.5 0.67 0.67 0.83 1.0 Precision 0.0 0.5 0.33 0.25 0.4 0.5 0.57 0.5 0.56 0.6 23
  23. 23. ● ● ■ ● ● ● 24
  24. 24. Recall 0.17 0.17 0.33 0.5 0.67 0.83 0.83 0.83 0.83 1.0 Precision 1.0 0.5 0.67 0.75 0.8 0.83 0.71 0.63 0.56 0.6 Recall 0.0 0.17 0.17 0.17 0.33 0.5 0.67 0.67 0.83 1.0 Precision 0.0 0.5 0.33 0.25 0.4 0.5 0.57 0.5 0.56 0.6 Emphasizes top ranked documents 25
  25. 25. ● ● ○ ○ ○ 26
  26. 26. Recall 0.2 0.2 0.4 0.4 0.4 0.6 0.6 0.6 0.8 1.0 Precision 1.0 0.5 0.67 0.5 0.4 0.5 0.43 0.38 0.44 0.5 Recall 0.0 0.33 0.33 0.33 0.67 0.67 1.0 1.0 1.0 1.0 Precision 0.0 0.5 0.33 0.25 0.4 0.33 0.43 0.38 0.33 0.3 27
  27. 27. 28
  28. 28. ● ○ ○ ○ ● ○ ● ○ ● ● 29
  29. 29. 30
  30. 30. ● ○ ○ ○ ○ ○ ○ ○ ● 31
  31. 31. ● ● ○ … ○ … ○ … 32
  32. 32. ● ● ● 33
  33. 33. ● ● ● 34
  34. 34. ● ○ ○ ○ 35
  35. 35. 36 Web Crawler or Web Robot
  36. 36. Web Server 37
  37. 37. ● ○ ○ ○ ○ ● ● ● ○ ○ ○ 38
  38. 38. ● ○ ● ○ ○ ■ ■ 39
  39. 39. 40
  40. 40. 41 Text normalization & standard encooding
  41. 41. ● ○ … ○ ● ● ○ ○ ○ ● ● 42
  42. 42. 43 Creation of Index terms and implementation of data structures for fast access
  43. 43. ● ○ ● ○ Ananas Banana 1 5 16 22 31 1 2 67 71 6 9 12 15 33Zucchini 5 44
  44. 44. ● ● 45
  45. 45. ● ○ … ○ ● ● ● ○ ○ ○ ● ○
  46. 46. 47
  47. 47. 48 Determines the Order in which search results are presented
  48. 48. ● ● ● ● ○ 49
  49. 49. ● ○ ● ● ● ○ ⇒ ○ ⇒ ● ・ 50
  50. 50. ● 51
  51. 51. ● ○ ○ ○ ● ○ 52
  52. 52. ● ○ ● avoid log to become zero Normalization usually via cosine similarity, But there are more variants... 53
  53. 53. ● ● ● ○ ○ ○ 54
  54. 54. 55 damping factor for all incoming links importance of incoming link j ∈
  55. 55. 56 1.0 1.0 1.0 1.0 DC Iteration r(A) r(B) r(C) r(D) 1 1,0 1,0 1,0 1,0 2 1,0 0,575 2,275 0,15 3 2,083 0,575 1,1912 0,15 … … … … … n 1,49 0,7833 1,577 0,15 BA DC BA
  56. 56. 57
  57. 57. ● ● ○ ○ ○ ○ ○ 58
  58. 58. ● ● ● ● ● ● ● ● ● ● 59

×