SlideShare a Scribd company logo
1 of 13
Download to read offline
B-Tree Lexicon, Min-Heaps
Kira Radinsky
Min-Heap slides are courtesy of Aya Soffer and David Carmel,
IBM Haifa Research Lab
2 November 2010 236621 Search Engine Technology 2
The Lexicon as a B-Tree
• B-Tree: a balanced tree that is optimized for disk I/O, holding key/value
pairs
• Branching is defined by a min-degree parameter t, t > 1
– t is chosen according to the size of a disk block
• Any internal node other than the root has at least t and at most 2t
children; the root has either no children, or at least two and at most 2t
children
• Any internal node with k children also stores k-1 keys which serve as
separator values: separator j is larger than the keys of subtree j and
smaller than the keys of subtree j+1
• Leaf nodes, like all nodes, store at most 2t-1 key/value pairs
– When not the root, store at least t-1 key/value pairs
• Lookup, insertion and deletion operations on a B-Tree are linear in its
height (and t-logarithmic in the number of keys)
2 November 2010 236621 Search Engine Technology 3
B-Tree Lexicon - Example
• t=2
• Each key is associated with a value that contains a DF and
a pointer to the postings list (dashed line)
gets more
1 2
and as bad
3 1 2
good is it
2 1 2
the ugly
1 2
2 November 2010 236620 Search Engine Technology 4
B-Tree Lookup
Looking up the value associated with key x:
1. current_node  root
2. Let k1<k2<…<km be the keys of current_node
3. if x{k1,k2,…,km} – we’re done, return associated value
4. else, if current_node is a leaf node, return null
5. else, let j be the smallest index s.t. x<kj (j  m+1 if x>km);
– current_node  j’th subtree, and goto 2
2 November 2010 236621 Search Engine Technology 5
Top-r Document Selection
Problem definition: Given a set A of scored documents,
select the r documents with the highest scores in A and
return them in decreasing relevance order
• Naïve method: sort the set A by score
– If |A|=M, time complexity is O(M logM)
• Better approach: since typically r<<M, selecting the r
top scores can be done in O(M+r log M) time using a
heap:
1. Heapify the set of M scores (about 2M comparisons) so that the
top score is at the root
2. Repeatedly extract the heap’s root (r times), each time fixing
the heap in O(logM)
2 November 2010 236621 Search Engine Technology 6
The Heap Data Structure - Reminder
• A binary heap is a (mostly full) binary tree with values
stored at all leaves and internal nodes, and an ordering
rule that requires values to be non-decreasing
(alternatively, non-increasing) along each path from a leaf
to the root
– Largest/smallest value is at the root
• Heap implemented in an Array:
– Root at index 1
– For node at index i, left child is at index 2i and right child at index
2i+1
– Thus the parent of the node at index i is at index i/2
2 November 2010 236621 Search Engine Technology 7
Binary Heap Stored in an Array
23
17
28
5
15
13
144
17
23 17 15 17 8 2 13 4 14 5
1 2 3 4 5 6 7 8 9 10
2 November 2010 236621 Search Engine Technology 8
Extracting the Top Element
• Remove the largest item r times
• Each time:
– Remove the largest item – the root of the heap
– Replace it with the last element of the heap
– Sift the new root down until restoring order
• Example
– Remove item 23 from the root
– Last item in array 5 (at location 10) replaces it
– Reinstate heap order - worst case 5 will be sifted
back down the tree - number of sifts is bounded
by log(size of heap)
2 November 2010 236621 Search Engine Technology 9
Heap Example (cont.)
To restore order at the top level
of tree, item 17, the larger of
the 2 children of root must be
swapped with 5.
This limits the order violation to
the left sub-tree.
5
17
28
15
13
144
17
The process is repeated until heap order is restored
2 November 2010 236621 Search Engine Technology 10
5
17
28
15
13
144
17
17
17
28
15
13
54
14
17
5
28
15
13
144
17
17
17
28
15
13
144
5
Heap Example (cont.)
2 November 2010 236621 Search Engine Technology 11
Top-r Selection Using a Min-Heap
• The selection problem can be solved by a heap that stores
the smallest item at the root: min-heap
• A min-heap of r items is held instead of a max-heap of M –
lots of memory is saved, which is always good
• Process the M scores, storing in the min-heap the r largest
values seen so far
– First r values are heapified in O(r) comparisons
– Replace the smallest value in the min-heap (the rth largest)
whenever a larger value is found
• Sort the r highest values in descending order and return
the corresponding documents – O(r log r)
2 November 2010 236621 Search Engine Technology 12
Min-Heap Processing - Illustration
Processed Unprocessed
Min-heap of r
largest items
Discard smallest
value
2 November 2010 236621 Search Engine Technology 13
Top-r Selection Using a Min-Heap:
Complexity Analysis
• Worst case: the scores are already in increasing order
– Each of the M-r last values is inserted into the heap
– Furthermore, it percolates to the bottom of the heap
– Complexity is O( (M-r)*log(r) )
• Average case – the scores arrive in a permutation of size
M chosen uniformly at random
– The expected number of times one of the M-r last values is
inserted into the heap is ~ r*ln(M/r)
– Each insertion costs O(log(r))
– Complexity is O( r*log(r)*log(M/r) )
• Proof on the board

More Related Content

What's hot

heap Sort Algorithm
heap  Sort Algorithmheap  Sort Algorithm
heap Sort AlgorithmLemia Algmri
 
Hashing Technique In Data Structures
Hashing Technique In Data StructuresHashing Technique In Data Structures
Hashing Technique In Data StructuresSHAKOOR AB
 
SPIRE2013-tabei20131009
SPIRE2013-tabei20131009SPIRE2013-tabei20131009
SPIRE2013-tabei20131009Yasuo Tabei
 
4.4 external hashing
4.4 external hashing4.4 external hashing
4.4 external hashingKrish_ver2
 
Heap Sort || Heapify Method || Build Max Heap Algorithm
Heap Sort || Heapify Method || Build Max Heap AlgorithmHeap Sort || Heapify Method || Build Max Heap Algorithm
Heap Sort || Heapify Method || Build Max Heap AlgorithmLearning Courses Online
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clusteringSubhas Kumar Ghosh
 
Heap Data Structure
 Heap Data Structure Heap Data Structure
Heap Data StructureSaumya Som
 
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasetsRob Emanuele
 
Ist year Msc,2nd sem module1
Ist year Msc,2nd sem module1Ist year Msc,2nd sem module1
Ist year Msc,2nd sem module1blessyboban92
 
Application of hashing in better alg design tanmay
Application of hashing in better alg design tanmayApplication of hashing in better alg design tanmay
Application of hashing in better alg design tanmayTanmay 'Unsinkable'
 

What's hot (20)

How does one go from binary data to HDF files efficiently?
How does one go from binary data to HDF files efficiently?How does one go from binary data to HDF files efficiently?
How does one go from binary data to HDF files efficiently?
 
Extensible hashing
Extensible hashingExtensible hashing
Extensible hashing
 
heap Sort Algorithm
heap  Sort Algorithmheap  Sort Algorithm
heap Sort Algorithm
 
Heap sort
Heap sortHeap sort
Heap sort
 
Hashing Technique In Data Structures
Hashing Technique In Data StructuresHashing Technique In Data Structures
Hashing Technique In Data Structures
 
Heapify algorithm
Heapify algorithmHeapify algorithm
Heapify algorithm
 
Starting work with R
Starting work with RStarting work with R
Starting work with R
 
Heap tree
Heap treeHeap tree
Heap tree
 
Heapsort using Heap
Heapsort using HeapHeapsort using Heap
Heapsort using Heap
 
HeapSort
HeapSortHeapSort
HeapSort
 
SPIRE2013-tabei20131009
SPIRE2013-tabei20131009SPIRE2013-tabei20131009
SPIRE2013-tabei20131009
 
4.4 external hashing
4.4 external hashing4.4 external hashing
4.4 external hashing
 
Heap Sort || Heapify Method || Build Max Heap Algorithm
Heap Sort || Heapify Method || Build Max Heap AlgorithmHeap Sort || Heapify Method || Build Max Heap Algorithm
Heap Sort || Heapify Method || Build Max Heap Algorithm
 
Cis435 week05
Cis435 week05Cis435 week05
Cis435 week05
 
Hash tables
Hash tablesHash tables
Hash tables
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
 
Heap Data Structure
 Heap Data Structure Heap Data Structure
Heap Data Structure
 
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
 
Ist year Msc,2nd sem module1
Ist year Msc,2nd sem module1Ist year Msc,2nd sem module1
Ist year Msc,2nd sem module1
 
Application of hashing in better alg design tanmay
Application of hashing in better alg design tanmayApplication of hashing in better alg design tanmay
Application of hashing in better alg design tanmay
 

Similar to Tutorial 3 (b tree min heap)

Fundamentalsofdatastructures 110501104205-phpapp02
Fundamentalsofdatastructures 110501104205-phpapp02Fundamentalsofdatastructures 110501104205-phpapp02
Fundamentalsofdatastructures 110501104205-phpapp02Getachew Ganfur
 
03-data-structures.pdf
03-data-structures.pdf03-data-structures.pdf
03-data-structures.pdfNash229987
 
Introduction to data structure by anil dutt
Introduction to data structure by anil duttIntroduction to data structure by anil dutt
Introduction to data structure by anil duttAnil Dutt
 
Analysis and design of algorithms part2
Analysis and design of algorithms part2Analysis and design of algorithms part2
Analysis and design of algorithms part2Deepak John
 
Multiway Trees.ppt
Multiway Trees.pptMultiway Trees.ppt
Multiway Trees.pptAseemBhube1
 
Analysis Of Algorithms - Hashing
Analysis Of Algorithms - HashingAnalysis Of Algorithms - Hashing
Analysis Of Algorithms - HashingSam Light
 
Algo-Exercises-2-hash-AVL-Tree.ppt
Algo-Exercises-2-hash-AVL-Tree.pptAlgo-Exercises-2-hash-AVL-Tree.ppt
Algo-Exercises-2-hash-AVL-Tree.pptHebaSamy22
 
Furnish an Index Using the Works of Tree Structures
Furnish an Index Using the Works of Tree StructuresFurnish an Index Using the Works of Tree Structures
Furnish an Index Using the Works of Tree Structuresijceronline
 
[Www.pkbulk.blogspot.com]dbms12
[Www.pkbulk.blogspot.com]dbms12[Www.pkbulk.blogspot.com]dbms12
[Www.pkbulk.blogspot.com]dbms12AnusAhmad
 
Master of Computer Application (MCA) – Semester 4 MC0080
Master of Computer Application (MCA) – Semester 4  MC0080Master of Computer Application (MCA) – Semester 4  MC0080
Master of Computer Application (MCA) – Semester 4 MC0080Aravind NC
 
Binary Search Tree
Binary Search TreeBinary Search Tree
Binary Search TreeAdityaK92
 
Analysis of Algorithms-Heapsort
Analysis of Algorithms-HeapsortAnalysis of Algorithms-Heapsort
Analysis of Algorithms-HeapsortReetesh Gupta
 

Similar to Tutorial 3 (b tree min heap) (20)

Fundamentalsofdatastructures 110501104205-phpapp02
Fundamentalsofdatastructures 110501104205-phpapp02Fundamentalsofdatastructures 110501104205-phpapp02
Fundamentalsofdatastructures 110501104205-phpapp02
 
03-data-structures.pdf
03-data-structures.pdf03-data-structures.pdf
03-data-structures.pdf
 
lecture4.pdf
lecture4.pdflecture4.pdf
lecture4.pdf
 
Introduction to data structure by anil dutt
Introduction to data structure by anil duttIntroduction to data structure by anil dutt
Introduction to data structure by anil dutt
 
Red Black Trees
Red Black TreesRed Black Trees
Red Black Trees
 
Analysis and design of algorithms part2
Analysis and design of algorithms part2Analysis and design of algorithms part2
Analysis and design of algorithms part2
 
Multiway Trees.ppt
Multiway Trees.pptMultiway Trees.ppt
Multiway Trees.ppt
 
Analysis Of Algorithms - Hashing
Analysis Of Algorithms - HashingAnalysis Of Algorithms - Hashing
Analysis Of Algorithms - Hashing
 
Spatial index(2)
Spatial index(2)Spatial index(2)
Spatial index(2)
 
Heap and heapsort
Heap and heapsortHeap and heapsort
Heap and heapsort
 
tree.ppt
tree.ppttree.ppt
tree.ppt
 
Algo-Exercises-2-hash-AVL-Tree.ppt
Algo-Exercises-2-hash-AVL-Tree.pptAlgo-Exercises-2-hash-AVL-Tree.ppt
Algo-Exercises-2-hash-AVL-Tree.ppt
 
Furnish an Index Using the Works of Tree Structures
Furnish an Index Using the Works of Tree StructuresFurnish an Index Using the Works of Tree Structures
Furnish an Index Using the Works of Tree Structures
 
[Www.pkbulk.blogspot.com]dbms12
[Www.pkbulk.blogspot.com]dbms12[Www.pkbulk.blogspot.com]dbms12
[Www.pkbulk.blogspot.com]dbms12
 
Master of Computer Application (MCA) – Semester 4 MC0080
Master of Computer Application (MCA) – Semester 4  MC0080Master of Computer Application (MCA) – Semester 4  MC0080
Master of Computer Application (MCA) – Semester 4 MC0080
 
Binary Search Tree
Binary Search TreeBinary Search Tree
Binary Search Tree
 
Analysis of Algorithms-Heapsort
Analysis of Algorithms-HeapsortAnalysis of Algorithms-Heapsort
Analysis of Algorithms-Heapsort
 
14 query processing-sorting
14 query processing-sorting14 query processing-sorting
14 query processing-sorting
 
002.decision trees
002.decision trees002.decision trees
002.decision trees
 
sorting
sortingsorting
sorting
 

More from Kira

Tutorial 14 (collaborative filtering)
Tutorial 14 (collaborative filtering)Tutorial 14 (collaborative filtering)
Tutorial 14 (collaborative filtering)Kira
 
Tutorial 12 (click models)
Tutorial 12 (click models)Tutorial 12 (click models)
Tutorial 12 (click models)Kira
 
Tutorial 11 (computational advertising)
Tutorial 11 (computational advertising)Tutorial 11 (computational advertising)
Tutorial 11 (computational advertising)Kira
 
Tutorial 10 (computational advertising)
Tutorial 10 (computational advertising)Tutorial 10 (computational advertising)
Tutorial 10 (computational advertising)Kira
 
Tutorial 9 (bloom filters)
Tutorial 9 (bloom filters)Tutorial 9 (bloom filters)
Tutorial 9 (bloom filters)Kira
 
Tutorial 8 (web graph models)
Tutorial 8 (web graph models)Tutorial 8 (web graph models)
Tutorial 8 (web graph models)Kira
 
Tutorial 7 (link analysis)
Tutorial 7 (link analysis)Tutorial 7 (link analysis)
Tutorial 7 (link analysis)Kira
 
Tutorial 6 (web graph attributes)
Tutorial 6 (web graph attributes)Tutorial 6 (web graph attributes)
Tutorial 6 (web graph attributes)Kira
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)Kira
 
Tutorial 4 (duplicate detection)
Tutorial 4 (duplicate detection)Tutorial 4 (duplicate detection)
Tutorial 4 (duplicate detection)Kira
 
Tutorial 2 (mle + language models)
Tutorial 2 (mle + language models)Tutorial 2 (mle + language models)
Tutorial 2 (mle + language models)Kira
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Kira
 
Tutorial 13 (explicit ugc + sentiment analysis)
Tutorial 13 (explicit ugc + sentiment analysis)Tutorial 13 (explicit ugc + sentiment analysis)
Tutorial 13 (explicit ugc + sentiment analysis)Kira
 

More from Kira (13)

Tutorial 14 (collaborative filtering)
Tutorial 14 (collaborative filtering)Tutorial 14 (collaborative filtering)
Tutorial 14 (collaborative filtering)
 
Tutorial 12 (click models)
Tutorial 12 (click models)Tutorial 12 (click models)
Tutorial 12 (click models)
 
Tutorial 11 (computational advertising)
Tutorial 11 (computational advertising)Tutorial 11 (computational advertising)
Tutorial 11 (computational advertising)
 
Tutorial 10 (computational advertising)
Tutorial 10 (computational advertising)Tutorial 10 (computational advertising)
Tutorial 10 (computational advertising)
 
Tutorial 9 (bloom filters)
Tutorial 9 (bloom filters)Tutorial 9 (bloom filters)
Tutorial 9 (bloom filters)
 
Tutorial 8 (web graph models)
Tutorial 8 (web graph models)Tutorial 8 (web graph models)
Tutorial 8 (web graph models)
 
Tutorial 7 (link analysis)
Tutorial 7 (link analysis)Tutorial 7 (link analysis)
Tutorial 7 (link analysis)
 
Tutorial 6 (web graph attributes)
Tutorial 6 (web graph attributes)Tutorial 6 (web graph attributes)
Tutorial 6 (web graph attributes)
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)
 
Tutorial 4 (duplicate detection)
Tutorial 4 (duplicate detection)Tutorial 4 (duplicate detection)
Tutorial 4 (duplicate detection)
 
Tutorial 2 (mle + language models)
Tutorial 2 (mle + language models)Tutorial 2 (mle + language models)
Tutorial 2 (mle + language models)
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
 
Tutorial 13 (explicit ugc + sentiment analysis)
Tutorial 13 (explicit ugc + sentiment analysis)Tutorial 13 (explicit ugc + sentiment analysis)
Tutorial 13 (explicit ugc + sentiment analysis)
 

Recently uploaded

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Recently uploaded (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Tutorial 3 (b tree min heap)

  • 1. B-Tree Lexicon, Min-Heaps Kira Radinsky Min-Heap slides are courtesy of Aya Soffer and David Carmel, IBM Haifa Research Lab
  • 2. 2 November 2010 236621 Search Engine Technology 2 The Lexicon as a B-Tree • B-Tree: a balanced tree that is optimized for disk I/O, holding key/value pairs • Branching is defined by a min-degree parameter t, t > 1 – t is chosen according to the size of a disk block • Any internal node other than the root has at least t and at most 2t children; the root has either no children, or at least two and at most 2t children • Any internal node with k children also stores k-1 keys which serve as separator values: separator j is larger than the keys of subtree j and smaller than the keys of subtree j+1 • Leaf nodes, like all nodes, store at most 2t-1 key/value pairs – When not the root, store at least t-1 key/value pairs • Lookup, insertion and deletion operations on a B-Tree are linear in its height (and t-logarithmic in the number of keys)
  • 3. 2 November 2010 236621 Search Engine Technology 3 B-Tree Lexicon - Example • t=2 • Each key is associated with a value that contains a DF and a pointer to the postings list (dashed line) gets more 1 2 and as bad 3 1 2 good is it 2 1 2 the ugly 1 2
  • 4. 2 November 2010 236620 Search Engine Technology 4 B-Tree Lookup Looking up the value associated with key x: 1. current_node  root 2. Let k1<k2<…<km be the keys of current_node 3. if x{k1,k2,…,km} – we’re done, return associated value 4. else, if current_node is a leaf node, return null 5. else, let j be the smallest index s.t. x<kj (j  m+1 if x>km); – current_node  j’th subtree, and goto 2
  • 5. 2 November 2010 236621 Search Engine Technology 5 Top-r Document Selection Problem definition: Given a set A of scored documents, select the r documents with the highest scores in A and return them in decreasing relevance order • Naïve method: sort the set A by score – If |A|=M, time complexity is O(M logM) • Better approach: since typically r<<M, selecting the r top scores can be done in O(M+r log M) time using a heap: 1. Heapify the set of M scores (about 2M comparisons) so that the top score is at the root 2. Repeatedly extract the heap’s root (r times), each time fixing the heap in O(logM)
  • 6. 2 November 2010 236621 Search Engine Technology 6 The Heap Data Structure - Reminder • A binary heap is a (mostly full) binary tree with values stored at all leaves and internal nodes, and an ordering rule that requires values to be non-decreasing (alternatively, non-increasing) along each path from a leaf to the root – Largest/smallest value is at the root • Heap implemented in an Array: – Root at index 1 – For node at index i, left child is at index 2i and right child at index 2i+1 – Thus the parent of the node at index i is at index i/2
  • 7. 2 November 2010 236621 Search Engine Technology 7 Binary Heap Stored in an Array 23 17 28 5 15 13 144 17 23 17 15 17 8 2 13 4 14 5 1 2 3 4 5 6 7 8 9 10
  • 8. 2 November 2010 236621 Search Engine Technology 8 Extracting the Top Element • Remove the largest item r times • Each time: – Remove the largest item – the root of the heap – Replace it with the last element of the heap – Sift the new root down until restoring order • Example – Remove item 23 from the root – Last item in array 5 (at location 10) replaces it – Reinstate heap order - worst case 5 will be sifted back down the tree - number of sifts is bounded by log(size of heap)
  • 9. 2 November 2010 236621 Search Engine Technology 9 Heap Example (cont.) To restore order at the top level of tree, item 17, the larger of the 2 children of root must be swapped with 5. This limits the order violation to the left sub-tree. 5 17 28 15 13 144 17 The process is repeated until heap order is restored
  • 10. 2 November 2010 236621 Search Engine Technology 10 5 17 28 15 13 144 17 17 17 28 15 13 54 14 17 5 28 15 13 144 17 17 17 28 15 13 144 5 Heap Example (cont.)
  • 11. 2 November 2010 236621 Search Engine Technology 11 Top-r Selection Using a Min-Heap • The selection problem can be solved by a heap that stores the smallest item at the root: min-heap • A min-heap of r items is held instead of a max-heap of M – lots of memory is saved, which is always good • Process the M scores, storing in the min-heap the r largest values seen so far – First r values are heapified in O(r) comparisons – Replace the smallest value in the min-heap (the rth largest) whenever a larger value is found • Sort the r highest values in descending order and return the corresponding documents – O(r log r)
  • 12. 2 November 2010 236621 Search Engine Technology 12 Min-Heap Processing - Illustration Processed Unprocessed Min-heap of r largest items Discard smallest value
  • 13. 2 November 2010 236621 Search Engine Technology 13 Top-r Selection Using a Min-Heap: Complexity Analysis • Worst case: the scores are already in increasing order – Each of the M-r last values is inserted into the heap – Furthermore, it percolates to the bottom of the heap – Complexity is O( (M-r)*log(r) ) • Average case – the scores arrive in a permutation of size M chosen uniformly at random – The expected number of times one of the M-r last values is inserted into the heap is ~ r*ln(M/r) – Each insertion costs O(log(r)) – Complexity is O( r*log(r)*log(M/r) ) • Proof on the board