2. Preamble
⢠Myself:
â Ph.D. @ HebrewU (DB uncertainty + search)
â IBM Almaden (DB theory, IR, Text Analytics)
â LogicBlox (ML in DB, Prob. Programming)
â Technion IL (Associate Prof., next year)
⢠This talk:
ď§ Infrastructure for text analytics
+ DB theory, formal languages, NLP, data mining,
computational complexity, âŚ
2
3. ⢠Text Analytics in the Big Data Era
⢠Information Extraction Systems & Formalism
⢠Foundational Research Challenges
⢠Conclusions and Outlook
Outline
3
4. Text Analytics Matters
Some important applications are based on the
analysis of text-centric data; for example:
Semantic Search
Semantic understanding & indexing of
content to better match user's intent
Life-Science Mining
Extract knowledge bases from
scientific publications
e-Commerce
Comparison Shopping extracts &
compares inventory from online sources
CRM / BI
Monitor customerâs social-media activity
for sentiment & business leads
Log Analysis
Summarize, visualize and analyze logs
produced by machines
4
5. Database Management Systems
⢠Old news: Data management is involved!
â Data semantics, query/analysis semantics, storage,
query evaluation, indices, consistency, transactions,
backup, privacy, recovery, âŚ
â From-scratch engineering is highly challenging
⢠Motivation to the concept of a general-purpose
Database Management System
â Most notably: relational model (pioneered by Edgar F.
Codd in 1969) and SQL
5
6. âBig Dataâ Phenomena
Proprietary data in orgs.
(enterprises, governments, âŚ)
Proliferation of publically open
data sources (Web, social, âŚ)
Past: Present:
Massive-data analyses incurred
high machinery/personnel cost
Business models (cloud, crowd,
opensource) facilitate analyses
Data structured/controlled by
admins, e-forms, software, âŚ
Uncontrolled data from humansâ
free text, heterogeneous kbs, âŚ
Analyses by specialized teams
of heavily trained experts
Analyses by a wide community
featuring a wide range of skills
6
7. âBy 2018, the United States alone could face a shortage of 140,000
to 190,000 people with deep analytical skills as well as 1.5 million
managers and analysts with the know-how to use the analysis of
big data to make effective decisions.â
âBig data: The next frontier for innovation, competition, and productivityâ
McKinsey Report, May 2011
We need dev. & management systems to
facilitate value extraction from Big Data
by a wide range of users / skills
7
8. Core Task: Information Extraction (IE)
âInformation Extraction (IE) is the name given to any process
which selectively structures and combines data which is found,
explicitly stated or implied, in one or more texts. The final
output of the extraction process varies; in every case, however, it
can be transformed so as to populate some type of database.â
J. Cowie and Y. Wilks., Handbook of
Natural Language Processing, 2000
âInformation extraction is the identification, and consequent or concurrent
classification and structuring into semantic classes, of specific
information found in unstructured data sources, such as natural language
text, making the information more suitable for information processing tasks.â
M. F. Moens, Information Extraction: Algorithms
and Prospects in a Retrieval Context, 2006
âdata-in-text
(unstructured)
data-in-db
(structured)
In short:
8
9. Popular Classes of IE Tasks
⢠Named Entity Recognition
From September 1936 to July 1938,
Turing spent most of his time studying
under Church at Princeton University.
In June 1938, he obtained his PhD
from Princeton.
person person organization
organization
9
10. Popular Classes of IE Tasks
AdvisedB
y
WorksIn
From September 1936 to July 1938,
Turing spent most of his time studying
under Church at Princeton University.
In June 1938, he obtained his PhD
from Princeton.
⢠Named Entity Recognition
⢠Relation Extraction
10
11. Popular Classes of IE Tasks
From September 1936 to July 1938,
Turing spent most of his time studying
under Church at Princeton University.
In June 1938, he obtained his PhD
from Princeton.
Graduation
Where?
Who?
⢠Named Entity Recognition
⢠Relation Extraction
⢠Event Extraction
11
12. Popular Classes of IE Tasks
From September 1936 to July 1938,
Turing spent most of his time studying
under Church at Princeton University.
In June 1938, he obtained his PhD
from Princeton.
Education
Start End
Graduation
When?
⢠Named Entity Recognition
⢠Relation Extraction
⢠Event Extraction
⢠Temporal IE
12
13. Popular Classes of IE Tasks
From September 1936 to July 1938,
Turing spent most of his time studying
under Church at Princeton University.
In June 1938, he obtained his PhD
from Princeton.
SameEntity
SameEntity
⢠Named Entity Recognition
⢠Relation Extraction
⢠Event Extraction
⢠Temporal IE
⢠Coreference Resolution
13
14. ariu
lmaden
A
m. com
Yunyao Li
IBM Research - Almaden
San Jose, CA
yunyaol i @us. i bm. com
Frederick R. Reiss
IBM Research - Almaden
San Jose, CA
f r r ei ss@us. i bm. com
stract
aâ analytics over unstruc-
enewed interest in infor-
E). We surveyed the land-
ies and identiďŹed amajor
industry and academia:
ominatesthecommercial
garded as dead-end tech-
mia. We believe the dis-
he way in which the two
ethebeneďŹts and costsof
miaâs perception that rule-
research challenges. We
mportance of rule-based
Commercial*Vendors*(2013)*
NLP*Papers*
(200392012)*
100%$
50%$
0%$
3.5%*
21%$
75%$
Rule,$
Based$
Hybrid$
Machine$
Learning$
Based$
45%*
22%$
33%$
Implementa@ons*of*En@ty*Extrac@on*
Large*Vendors*
67%*
17%$
17%$
All*Vendors*
IE Paradigms: Rules & Statistics
⢠Rules
⢠ML classification
⢠Probabilistic graphical models
⢠Soft logic
[Chiticariu, Li, Reiss, EMNLPâ13]
⢠EMNLP, ACL, NAACL, 2003-
2012
⢠54 industrial vendors (Whoâs
Who in Text Analytics, 2012)
â[âŚ] rules are effective,
interpretable, and are easy
to customize by non-experts
to cope with errors.â
Gupta & Manning, CONLLâ14
14
+
NLP
15. ⢠Text Analytics in the Big Data Era
⢠Information Extraction Systems & Formalism
⢠Foundational Research Challenges
⢠Conclusions and Outlook
Outline
15
16. Xlog: Datalog for IE
⢠Extension of (non-recursive) Datalog
⢠Use case: DBLife (db research kb: dblife.cs.wisc.edu)
⢠Data types: string, document, span
â Focus on single-document programs
⢠âProcedural predicatesâ (p-predicates) are user-defined
functions that produce relations over spans
â Example: sentence(doc, span)
⢠Query-plan optimization
[Shen, Doan, Naughton, Ramakrishnan, VLDB 2007]
Kaspersky Lab CEO Eugene Kaspersky said Intel CEO Paul
Otellini and the Intel board had no idea what they were in for when
the company announced it was acquiring McAfee on August 19,
2010.
Same string, different spans
Span [42,47)
16
17. Xlog Example
âDeclarative Information Extraction using Datalog with Embedded Extraction Predicatesâ
[Shen, Doan, Naughton, Ramakrishnan, VLDB 2007]
Regex.
(string)
Unary
regex
formula
Binary
regex
formula
17
18. ⢠Datalog syntax
â Types: string, span
⢠Built in collection of p-predicates
â Various types of built-in regex formulas
â Linguistic: deep parsing, coreference
resolution, named-entity extractor
Instaread: Datalog + NLP
Binary regex
formulas
Unary regex
formulas
[Hoffmann, 2012]
18
19. IBM SystemT: SQL for IE
⢠Engine for AQL: SQL-like declarative IE lang.
â AQL = Annotation Query Language
⢠SystemT = AQL + Runtime + Dev. Tooling
â [Chiticariu et al., ACL 2010]: position SystemT as a
high-quality and high-efficiency IE solution
â System and IDE demos in ACL 2011, SIGMOD 2011
⢠Commercial product, high academic presence
â Integration on public financial records [HernĂĄndez et al., EDBTâ13,
Balakrishnan et al. SIGMODâ10], NER [Chiticariu et al. EMNLPâ10,
ACLâ10, Nagesh et al. EMNLPâ12, Roy et al. SIGMODâ13], IR [Zhu et
al. WWWâ10, K et al. SIGIRâ12, CIKMâ12], sentiment analysis [Hu et
al., Interactâ13], social media [Sindhwani et al., IBM Journal 2011]
19
21. Formal Framework
⢠Repeated concept: Extend a relational query
language with text transducers (p-predicates,
usually regex formulas)
⢠Research challenge: theoretical underpinnings
of this combined document/relation model
⢠Expressive power
â Query-plan optimization: Can we rewrite an operator via âeasierâ
building blocks?
â System extensions: Can we express a new operation using
existing ones, or prove impossibility?
⢠Next: a formal framework
â With Fagin, Reiss, Vansummeren, PODSâ13, JACM
21
22. 22
Terminology
Kaspersky Lab CEO Eugene Kaspersky said Intel CEO Paul
Otellini and the Intel board had no idea what they were in for when
the company announced it was acquiring McAfee on August 19,
2010.
Company CEO CompanyCEO
[1,14)
(Kaspersky Lab)
[19,36)
(Eugene Kaspersky)
[1,36)
[42,47)
(Intel)
[52,65)
(Paul Otellini)
[42,65)
Relation over spans from the document
Document
Span [52,65)
23. Document Spanners
Document d Relation over the spans of d
Kaspersky Lab CEO Eugene
Kaspersky said Intel CEO Paul Otellini
and the Intel board had no idea what
they were in for when the company
announced it was acquiring McAfee
on August 19, 2010.
x y z
[1,14) [30,36) [1,36)
[42,47) [52,65) [42,65)
[102,110) [115,125) [102,125)
Document Spanner: a function that maps every
doc. (string) into a relation over the doc.âs spans
More formally:
⢠Finite alphabet of symbols
⢠A spanner maps each doc. d â * into a relation over the spans [i,j) of d
⢠The relation has a fixed signature (set of attributes)
â The attributes come from an infinite domain of variables x, y, z, âŚ
23
24. Spanners as Regex Formulas
⢠Regular expression with embedded variables
⢠Examples:
⢠Restriction: each âevaluationâ (parse tree) assigns
one span to each variable (see [Fagin et al., PODSâ13])
Ordinary regex Span variable
ďą .* x{dddd} .*
ďą .* in w{Alabama | Alaska | Arizona | âŚ} .*
ďą (.* z{[A-Z][a-z]*, y{[A-Z][a-z]*}} .*) | âŚ
Representation system for spanners
24
25. Spanners as Datalog w/ Regex
⢠Non-recursive Datalog (NR-Datalog)
⢠Operate over a document (not a relational db)
Token(x) := [ (Îľ | .*_) x{[a-zA-Z]+} ( ((,V_) .*) | Îľ) ]
State(x) := Token(x) , [.* x{Georgia|Virginia|Washington}.*]
Cap1st(x) := Token(x) , [.* x{[A-Z].*}.*]
CommaSp(x,y,z) := [.* z{x{.*} ,_ y{.*}}.*]
Loc(z) := CommaSp(x,y,z) , Cap1st(x) , State(y)
RETURN(x,z) := Cap1st(x) , [.*x{.*}_from_z{.*}.*}] , Loc(z)
Carter_from_Plains,_Georgia,_Washington
_from_Westmoreland,_Virginia
x z
[1,7)
Carter
[13,28)
Plains,_Georgia
[30,40)
Washington
[46,69)
Westmoreland,_Virginia
EDBs = Spanners!
Another representation
system for spanners
Quer
y goal
25
26. Spanners as Automata
0,1 0 1
Ordinary
NFA
1 0 0 1 1 1 0 1
Var-Stack
Automaton
1 0 0 1 1 1 0 1x{
y{
}
}
y{x{ } }
Var-Set
Automaton
1 0 0 1 1 1 0 1x{
}y
y{x{ }x }y
}x
0,1 0 1
0,1 0 1
⢠In an accepting run, each variable opens and later closes exactly once
â Each accepting run defines an assignment to the variables
⢠Nondeterministic â multiple accepting runs â multiple tuples
Close most recent
Close x
y
x
x
y
Another representation system for spanners
y{
26
28. Consequences
⢠Connections between Datalog+regex
spanners and other language formalisms
â Classic string relations [Berstel 79]
â Graph queries (CRPQs) [Cruz et al. 87]
⢠Extension with string equality & difference
â Expressiveness / closure properties
⢠Principles for cleaning inconsistencies
â Follow up work [PODSâ14]
â NextâŚ
28
29. ⢠Text Analytics in the Big Data Era
⢠Information Extraction Systems & Formalism
⢠Foundational Research Challenges
⢠Conclusions and Outlook
Outline
29
30. Next, highlight 3 lines of foundational research that
were motivated by our work on text analytics:
1. Database inconsistency w/ repair priorities
2. Frequent subgraph mining
3. Update propagation
30
31. ⢠Extractors may produce inconsistent results
â Data artifacts
â Developer limitations
⢠Rather than repairing the existing extractors,
common practice is to clean (intermediate) results
â SystemT âconsolidatorsâ [Chiticariu et al.10]
â GATE/JAPE âcontrolsâ [Cunningham 02]
â Implicit in other rule systems, e.g., WHISK [Soderland 99]
â POSIX regex disambiguation [Fowler 03]
Cleaning IE Inconsistencies
33 Martin Luther King Jr. Dr., SE, Atlanta, GA 30303
Person2
Person1
Address1
31
34. Cleaning via Prioritized Repairs
⢠Problem: existing policies are ad-hoc; how to
expose a language for user declaration?
⢠[Fagin, K, Reiss, Vansummeren 2014]: spanner
formalism for declarative cleaning
⢠Key: prioritized repairs [Staworko, et al. 12]
⢠Idea: Extend extraction programs with
â Denial constraints: which facts are in conflict?
â Priority declarations: preference between facts
⢠Captures SystemT, GATE, WHISK, POSIX, âŚ
⢠We are now trying to improve our understanding
of prioritized repairsâŚ
34
35. Prioritized Repairs: Definition
Database
Denial
Constraints
Collection of facts Which sets of facts
cannot co-exist?
Priority
Relation
Binary âis preferred toâ
relation
⢠[Arenas, Bertossi, Chomicki 99]: Inconsistent DB
represents a set of (equally likely) ârepairsâ
ď§ Then we can ask for the âpossibleâ or âconsistentâ query answers
⢠[Staworko, Chomicki, Marcinkowski 12] add priorities:
⢠Let A and B be two consistent subsets of the database
⢠Say that A improves B if we can obtain A from B by a
âprofitableâ exchange of facts (precision laterâŚ)
⢠A repair is a consistent subset that cannot be improved
Inconsistent Database Instance
35
36. Example
professor university city
Monica ubiobio ConcepciĂłn
Monica carleton Ottawa
Jorge uchile Santiago
Jorge ubiobio Santiago
Pablo uchile Santiago
Violated constraints (functional
dependencies):
⢠professor ď university, city
(âkey constraintâ)
⢠university ď city
professor university city
Monica ubiobio ConcepciĂłn
Monica carleton Ottawa
Jorge uchile Santiago
Jorge ubiobio Santiago
Pablo uchile Santiago
professor university city
Monica ubiobio ConcepciĂłn
Monica carleton Ottawa
Jorge uchile Santiago
Jorge ubiobio Santiago
Pablo uchile Santiago
âOrdinaryâ repairs [Arenas et al. 99]
Tuple priority ď some repairs can be discarded [Staworko et al.] 36
A improves B if we get A from B by removing tuples & adding
tuple; each removed preferred to by some added
37. Complexity of Testing Improvability
Theorem:
ď§ In the case of a single functional dependency
or two keys per relation, improvability can be
tested in polynomial time
ď§ In any other combination of FDs, the
problem is NP-complete!
university faculty dean
UChile Economics Agosin
Technion CS Yavneh
Stanford Law Magill
two keys
37
Can a consistent subset be improved?
Recent work (unpublished)
w/ Fagin & Kolaitis
38. IE with Recurring Patterns
I want to buy my advisor a gift.
I really want to buy a gift to my advisor.
I want to buy a gift to the secretary and to my advisor.
1. Apply
dependency
parsing
38
[Zhang, Baldwin, Ho, K, Li, ACL13]: Restoring grammar in social media, sms, etc.
39. IE with Recurring Patterns
I want to buy my advisor a gift.
I really want to buy a gift to my advisor.
I want to buy a gift to the secretary and to my advisor.
I
want
buy
gift advisor
1. Apply
dependency
parsing
2. Find freq.
recurring
patterns
39
[Zhang, Baldwin, Ho, K, Li, ACL13]: Restoring grammar in social media, sms, etc.
41. Complexity Study
⢠Naturally, there has been a lot of work on this problem
â SPIN [Huan et al. 04], MARGIN [Thomas et al. 10], âŚ
⢠But little was known about the computational complexity
⢠Studied: impact of assumptions on comp. complexity
â Graph properties (e.g., trees, treewidth, etc.)
â Label repeatability
â Bounded #results desired
â Bounded threshold
⢠This work led to novel complexity results and a new
methodologies for mining maximal subgraphs
â [K & Kolaitis, ACM PODSâ13, ACM TODS]
⢠Next, some complexity nuggets ď
41
42. Complexity Nuggets
⢠Good news: If labels do not repeat in each input
graph, then there are PTime solutions when
â The threshold is bounded; or
â Graphs are trees & few results are desired
⢠In general graphs w/o label repetition, you can
find 2 results in PTime
â Bad news: But finding 3rd is NP-hard!
â Bad news: And if labels repeat and graphs are
trees, then finding 2nd is already NP-hard!
⢠Even for a bounded threshold
42
43. Improving Dictionaries w/ Feedback
text fragments
(sentences, tables, rows, âŚ)
join
IBM , San Jose
company
occurrences
address
occurrencescompanies, countries, âŚ
Apple , CupertinoIBM , Armonk
IE IE
IE
auto. suggest a âgoodâ
fix to the IE program
Web data
âgoodâ = small effect
on other results
Yahoo! , Cupertino Goo
43
44. View Updates
⢠View-update problem: Translate an update on a view to
an update on the base relations
⢠Deletion propagation as a special case
â Update is delete(a set of view tuples)
⢠Motivation:
â Classic: database/view maintenance
⢠DB access only through views, hidden join keys, etc.
â Debugging
⢠[K&al.12]: deletion propagation for debugging text extractors
â Database causality [Meliou&al.10]
⢠Intuition: good propagation provides a good explanation of why we
have the tuples to begin with
⢠[Bertossi, Salimi 14]: âUnifying Causality, Diagnosis,
Repairs and View-Updates in Databasesâ
44
45. Example: File Access
GroupFile
group file
ai a.txt
ai b.txt
db a. txt
db b.txt
os a.txt
UserGroup
user group
Emma ai
Emma db
Olivia os
Olivia db
Jacob ai
Access(u,f) :â UserGroup(u,g), GroupFile(g,f)
Delete source rows, s.t. Emma wonât access a.txt.
But, maintain maximum access permissions!
[Cui&Widom01; Buneman&al.02]
Access
user file
Emma a.txt
Emma b.txt
Olivia a.txt
Olivia b.txt
Jacob a.txt
Jacob b.txt
= â
45
46. Example: File Access
= â
GroupFile
group file
ai a.txt
ai b.txt
db a. txt
db b.txt
os a.txt
UserGroup
user group
Emma ai
Emma db
Olivia os
Olivia db
Jacob ai
Access
user file
Emma a.txt
Emma b.txt
Olivia a.txt
Olivia b.txt
Jacob a.txt
Jacob b.txt
Access(u,f) :â UserGroup(u,g), GroupFile(g,f)
[Cui&Widom01; Buneman&al.02]
Delete source rows, s.t. Emma wonât access a.txt.
But, maintain maximum access permissions!
46
47. Example: File Access
GroupFile
group file
ai a.txt
ai b.txt
db a. txt
db b.txt
os a.txt
UserGroup
user group
Emma ai
Emma db
Olivia os
Olivia db
Jacob ai
Access
user file
Emma a.txt
Emma b.txt
Olivia a.txt
Olivia b.txt
Jacob a.txt
Jacob b.txt
= â
Access(u,f) :â UserGroup(u,g), GroupFile(g,f)
[Cui&Widom01; Buneman&al.02]
Delete source rows, s.t. Emma wonât access a.txt.
But, maintain maximum access permissions!
Decision variant is NP-complete [Buneman et al. 02]
47
48. Trichotomy in Complexity
We have established a precise (easily testable) criterion
that partition all cases into 3 categories:
1. The problem is solvable in PTime, and even via a
straightforward algorithm [Buneman et al. 2001]
2. The problem is NP-hard, but constant-ratio
approximable in PTime (ILP relaxation)
3. The problem is inapproximable for every ratio
Fix a schema (w/ fds) and a CQ w/o self joins
What is the complexity of finding a solution with a minimal side effect?
[K, Vondrak, Williams, Woodruff, PODS11, PODS12, TODS12, VLDB14]
48
49. ⢠Text Analytics in the Big Data Era
⢠Information Extraction Systems & Formalism
⢠Foundational Research Challenges
⢠Conclusions and Outlook
Outline
49
50. Summary
⢠Text analytics & IE
⢠Rule systems for IE
⢠A formal framework for rules, relating IE to
traditional DB concepts such as Datalog
⢠Research directions motivated by IE
â Prioritized repairs
â Graph mining
â Update propagation
50
51. Outlook: DB w/ Deep Text Support
⢠We need a uniform & elegant data/query model to
combine structured data & text; usefulness for querying
both text and relations
⢠We need a principled, simple & transparent probability
model + effective quality + practical execution cost
⢠We need to balance between automation and control:
from full specification by experts to feature generation for
nonexperienced
â Maximally realize the potential of every developer!
â LogicBlox is working on incorporating ML in Datalog!
51
53. Room for Both
Statistical
Solution
Rule
System
Feature Engineering
Model Space, Runtime
Cleaning + Post Proc.
Cleaning + Post Proc.
Building blocks
(e.g., dictionaries, NER)
âWhat doesnât work: Anything requiring high
precision and full automationâ
Feldman & Ungar, KDDâ08 tutorial on text mining
53
54. String DB, Spanners, Interval Algebra
Kaspersky Kaspersky
Intel Otellini
IBM Rometty
[10,20) [16,26)
[32,37) [50,58)
[105,108) [121,128)
[10,20) [16,26)
[32,37) [50,58)
[105,108) [121,128)
String Databases Interval AlgebraSpanners
Atomic value: string Atomic value: span
(pointing to doc)
Atomic value: interval
(no text)
Join by string conditions
(e.g., x is a substring of y)
Join by interval conditions
(e.g., x is a sub-interval of y)
Join by interval+string
conditions (e.g., x a
token in y)
Apps: text predicates in DBs
[Grahne & al. 99] [Benedikt &
al. 03], string manipulation
[Bonner & Mecca 98]
[Ginsburg and Wang 98]
App: IE Apps: temporal reasoning
[Allen 83] [Vilain & Kautz
86] [Nebel & BĂźrckert 95]
[Krokhin et al. 03]
54
55. 55
Imp. 1: Connection to Known Concepts
⢠Connection to Recognizable Relations [Berstel 79]
â These are unions of cross products of regular languages
â THM: The class of regular spanners is closed under
a string-selection predicate iff the predicate is a
recognizable relation
⢠Connection to CRPQs [Cruz et al. 87]
â Conjunctive Regular Path Queries have been studied as a
query language for labeled graphs
â THM: Regular spanners have the same expressive
power as unions of CRPQs on paths âwith marked
endpointsâ
⢠Up to some simple and necessary adaptation between the models
S I G M O D
Path with marked endpoints
56. Imp. 2: Adding String Equality
NR Datalog w/ regex formulas
Regular Spanners
Regularstr= Spanners
+ String-equality predicate
(+substring-of, prefix-of, âŚ)
âŚapplication from Jane Doe,
social 012-345-6789, on Mar
20th⌠identified as John Doe,
012-345-6789, ask us toâŚ
x1 x2
[117,125)
(Jane Doe)
[875,883)
(John Doe)
⎠âŽ
NameSSN(x,y) := âŚ
SameSSN(x1,x2) := NameSSN(x1,y1) , NameSSN(x2,y2) , str(y1)=str(y2)
Same string,
different spans
56
57. Difference with String Equality
⢠Are regularstr= spanners closed under difference?
â Why should they? Only positive operators are usedâŚ
â However, regex formulas (our EDBs) can introduce
ânegativeâ operations (NFAs closed under complement)
⢠THM: The class of regular spanners is closed under
difference
⢠PROP: The class of regularstr= spanners is closed
under string-inequality selection
⢠THM: The class of regularstr= spanners is closed
under string-containment selection, but then, not
under non-string-containment selection!
⢠COR: The class of regularstr= is not closed under
difference
57
58. Formal Optimization Problem
Fixed: ⢠Schema S w/ fun. dependencies
⢠Conjunctive query Q
Input: ⢠Database instance I over S
⢠Set Aâ Q(I) of answers to delete
Output: J â I s.t. Q(J) ⊠A = â
Goal: Minimize |(Q(I) â A) â Q(J)|
Side Effect
58