Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Adaptive Blocking -key points
• Reduce computation time
• Apply across domains
• Maximize recall
• Limit false positives
•...
Blocking Predicates
• Index function: generates keys based field values (e.g.first three letters
of name)
• Equality functio...
find optimal blocking function
that
minimizes false positives
after
finding most true positives
(within some error)
Disjunctive and DNF blocking
• Disjunctive: select pairs covered by at least one
blocking predicate
• Disjunctive Normal F...
Disjunctive Red-blue set cover
DNF
Red-blue set cover
Disjunctive Blocking
• Remove predicates covering too many false pairs
• Remove false pairs covered by too many predicates...
DNF Blocking
• Remove predicates covering too many pairs
• Construct predicate conjunctions,length <= k-1
• Add conjunctio...
Sie haben dieses Dokument abgeschlossen.
Lade die Datei herunter und lese sie offline.
Nächste SlideShare
Manga Nozoki Ana Tomo 8
Weiter
Nächste SlideShare
Manga Nozoki Ana Tomo 8
Weiter
Herunterladen, um offline zu lesen und im Vollbildmodus anzuzeigen.

Teilen

Overview of Adaptive Blocking for DDL Research Lab

Herunterladen, um offline zu lesen

Brief high-level summary of Adaptive Blocking a la Bilenko, Kamath, Mooney (2006).

Ähnliche Bücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen
  • Gehören Sie zu den Ersten, denen das gefällt!

Overview of Adaptive Blocking for DDL Research Lab

  1. 1. Adaptive Blocking -key points • Reduce computation time • Apply across domains • Maximize recall • Limit false positives • Disjunctive / DNF blocking • Approx.Red-Blue Set Cover • Increased reduction ratio • Increased recall Bilenko,Kamath,Mooney.“Adaptive Blocking: Learning to Scale Up Record Linkage.”Proceedings of the 6th IEEE International Conference on Data Mining.Hong Kong,December 2006
  2. 2. Blocking Predicates • Index function: generates keys based field values (e.g.first three letters of name) • Equality function: returns True if any set of index keys matches for a given set of record pairs • Covered pairs: matched (equal) records for a given predicate • Blocking function: blocking predicate set w/aggregate index & equality
  3. 3. find optimal blocking function that minimizes false positives after finding most true positives (within some error)
  4. 4. Disjunctive and DNF blocking • Disjunctive: select pairs covered by at least one blocking predicate • Disjunctive Normal Form: select pairs covered by at least one conjunction of blocking predicates
  5. 5. Disjunctive Red-blue set cover DNF Red-blue set cover
  6. 6. Disjunctive Blocking • Remove predicates covering too many false pairs • Remove false pairs covered by too many predicates • Predicate cost == # of false pairs • Weighted set cover: greedy predicate selection based on improvement; check uncovered threshhold; repeat
  7. 7. DNF Blocking • Remove predicates covering too many pairs • Construct predicate conjunctions,length <= k-1 • Add conjunctions maximizing marginal true/false ratio to set • Apply Disjunctive Blocking with resulting predicates

Brief high-level summary of Adaptive Blocking a la Bilenko, Kamath, Mooney (2006).

Aufrufe

Aufrufe insgesamt

438

Auf Slideshare

0

Aus Einbettungen

0

Anzahl der Einbettungen

13

Befehle

Downloads

2

Geteilt

0

Kommentare

0

Likes

0

×