An Answer Set Programming based framework for High-Utility Pattern Mining extended with Facets and Advanced Utility Functions
1. The 5th International Joint
conference on Rules and Reasoning
Sept. 8–15, 2021
(virtually in) Leuven, Belgium
An Answer Set Programming based framework for High-Utility
Pattern Mining extended with Facets and Advanced Utility
Functions
Francesco Cauteruccio and Giorgio Terracina
DEMACS, University of Calabria, Italy
{cauteruccio, terracina}@mat.unical.it
2. Outline
• Context and Motivation
• Proposed Framework
• ASP Approach
• Experimental Evaluation
• Conclusion
1/17
3. Context and Motivation
• Pattern Mining is one of the most studied data mining branches
• Find interesting patterns (set of items) in a database of transactions
• Frequent pattern mining, sequential pattern mining, etc…
• High-Utility Pattern Mining (HUPM)
• Find patterns having a high-utility (w.r.t. some utility measure)
• Example: in a sales database, the utility of a pattern may be represented by the profit of some items bought
together.
• Basic assumption: each item is associated with one, static utility .
• However…
• The utility of an item can be defined from very different point of views,
• Transactions are not only flat lists of items but they can provide different level of abstractions.
Pattern Mining and High-Utility Pattern Mining
2/17
4. Context and Motivation
• We present a framework for HUPM extending basic notions and introducing:
• A higher level of abstraction with transaction set representation
• For each transaction, an Object, a Container and a Database level of aggregation can be defined.
• The notion of facet
• A facet is an attribute which can be associated with an item, a transaction, an object or a container;
• Each element may be characterized by more than one facet.
• A taxonomy of extended utility functions
• Based on the database structure and facets,
• They can be combined in several ways to fit different notions of utility.
An extended framework for HUPM
3/17
5. Proposed Framework
• The database representation is multi-layer
• 𝐷𝑎𝑡𝑎𝑏𝑎𝑠𝑒 → 𝐶𝑜𝑛𝑡𝑎𝑖𝑛𝑒𝑟 → 𝑂𝑏𝑗𝑒𝑐𝑡 → 𝑇𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛
• Given a database 𝐷 and a set of transactions {𝑇!, … , 𝑇"}
• 𝐷 is organized as a set of containers 𝐶 = {𝐶!, … , 𝐶"}
• 𝐶# can be associated with a set of objects 𝑂 = {𝑂!, … , 𝑂$}
• 𝑂% contains a set of transactions {𝑇!, … , 𝑇&}
Extending the database structure
Running example depicting a sales database
5/17
6. Proposed Framework
• The utility of 𝑖 may be defined from
different perspectives: facets.
• Each item 𝑖 can be associated with
one or more facets.
• Facets can also be defined for
transactions, objects and containers.
• To describe facets, we use the notion
of utility vector.
Facets
6/17
7. Proposed Framework
• Item utility vector for an item 𝑖
• 𝐼𝑈! = [𝑖𝑢", 𝑖𝑢#, … , 𝑖𝑢$], and 𝑖𝑢% describes a certain facet of 𝑖
• Transaction utility vector for a transaction 𝑇#
• 𝑇𝑈&!
= [𝑡𝑢", 𝑡𝑢#, … , 𝑡𝑢'], and 𝑡𝑢% describes a certain facet of 𝑇(
• Internal utility of 𝑖 in 𝑇( is available and denoted as 𝑞(𝑖, 𝑇()
• Object utility vector
• 𝑂𝑈) = 𝑜𝑢", 𝑜𝑢#, … , 𝑜𝑢* , and 𝑜𝑢% describes a certain facet of 𝑂
• Container utility vector
• 𝐶𝑈+ = [𝑐𝑢", 𝑐𝑢#, … , 𝑐𝑢,], and 𝑐𝑢% describes a certain facet of 𝐶
Facets and utility vectors
facets: price, weight
7/17
8. Proposed Framework
• Intra-pattern utility function
• Let 𝑃 be a pattern with 𝑟 items, and 𝑇( be a transaction containing it,
• Let 𝐼𝑈𝑆 = {𝐼𝑈", … , 𝐼𝑈-} be the set of item utility vectors of 𝑃
• The intra-pattern utility function 𝐼𝑈&!
= 𝑖𝑝(𝑃, 𝑇(, 𝐼𝑈𝑆) generates an
unique item utility vector for the pattern occurrence.
• 𝑖𝑝 ⋅ can be any function combining the utilities across the facets,
such as SUM, MAX, MIN, AVG, etc…
• Pattern utility function
• The occurrence utility vector for a pattern 𝑃 is 𝑂𝑐𝑐𝑈&!
=
[𝐼𝑈&!
, 𝑇𝑈&!
, 𝑂𝑈&!
, 𝐶𝑈&!
]
• The pattern utility vector 𝑈. is the collection of all the occurrence
utility vectors of 𝑃: 𝑈. = ⋃&!∈&"
𝑂𝑐𝑐𝑈&!
• 𝑈. is a matrix (occurrence utility vectors × facets).
Advanced utility functions
In this example, 𝑖𝑝 = ∑ 𝑓𝑎𝑐𝑒𝑡 × 𝑖𝑛𝑡𝑒𝑟𝑛𝑎𝑙𝑈𝑡𝑖𝑙𝑖𝑡𝑦
8/17
9. Proposed Framework
• The utility 𝑢 of a pattern 𝑃 can be obtained as an arbitrary
combination of the values of 𝑈! using a function 𝑢(𝑃)
• 𝑢(𝑃) can be classified as
• Horizontal first 𝑢 𝑃 = 𝑓?(𝑓@(𝑢.)) combines by row (facets), then by
column (occurrences)
• Vertical first 𝑢 𝑃 = 𝑓@(𝑓?(𝑢.)) combines by column (occurrences),
then by row (facets)
• Mixed 𝑢 𝑃 = 𝑓 𝑢. combines the values at once
• All of these classifications can be further classified in
• Inter-transaction utility
• Pattern-vs-object utility
• Pattern-vs-container utility
• These can be exploited to define utility measures relating
item/transaction facets and one of the object/container facets,
such as Pearson and Multiple correlation.
Advanced utility functions
As an example, suppose we select 𝑓# =
𝑓𝑖𝑙𝑡𝑒𝑟 ⋅ and 𝑓$ = max ⋅ , that is we first
filter for a single facet, and then we take
the maximum across all the occurrences.
9/17
10. Proposed Framework
• Given a pattern 𝑃, we say that 𝑃 is an extended high-utility pattern if its utility 𝑢(𝑃) is greater than
a minimum threshold 𝑡ℎ$ and it occurs in at least 𝑡ℎ% transactions.
• The problem of extended high-utility pattern mining (e-HUPM) is to discover all the extended
high-utility patterns in a given database 𝐷.
The e–HUPM problem
10/17
11. ASP Approach
• We want to provide as much flexibility as possible in the definition of what is a useful pattern.
• Encoding the problem in Answer Set Programming (ASP) helps achieving the desidered flexibility
and modularity.
• Classic guess-and-check scenario
• We generate one answer set for each valid pattern,
• Complex utility functions are executed by means of external functions (e.g., in DLVHEX, WASP, clingo)
• Pattern validity criteria and filter can be easily applied by encoding them as rules.
The why and the how
11/17
13. Experimental Evaluation
• Aspect-based sentiment analysis of scientific reviews [1]
• Each review is annotated with 8 different aspects: appropriataness, clarity, originality,
empirical/theoretical soudness, meaningful comparison, substance, impact and recommendation
• Each aspect has assigned one sentiment value from {𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒, 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒, 𝑛𝑒𝑢𝑡𝑟𝑎𝑙, 𝑎𝑏𝑠𝑒𝑛𝑡}.
• 814 papers and a total of 1148 reviews.
• 2230 annotated lines with a total of 15124 distinct words.
Dataset
[1] Chakraborty, S., Goyal, P., Mukherjee, A.: Aspect-based sentiment analysis of scientific reviews. In: JCDL ’20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, Virtual Event, China, August 1-5,
2020. pp. 207–216 (2020), ACM
13/17
14. Experimental Evaluation
• Average running time
• 𝑢(𝑃): disagreement between originality and decision (% of
pattern occurrences showing a positive originality and a
reject decision, thresholds: [15, 50, 75, 100])
Quantitative analysis
• Comparison with HUPM systems
• No analysis involving more than one
elements can be devised with such
systems
• Here, the utility of an item is the absolute
value of the appropriateness aspect in
the sentence it first appears in.
14/17
15. Experimental Evaluation
• 𝑢 𝑃 : Pearson correlation between an aspect 𝑋
and the final decision on the corresponding
paper.
• «The set of review sentences containing 𝑃 are
characterized by a direct correlation between the
sentiment on 𝑋 and the final decision on the
corresponding paper»
• 𝑢(𝑃): Multiple correlation between two aspects
𝑋 and 𝑌 and the final decision on the
corresponding paper.
Qualitative analysis
15/17
16. Conclusion
• We introduced a general framework for HUPM with several extensions.
• The framework allows to work with multi-dimensional data and with different utility measures.
• A versatile and modular ASP encoding has been developed.
• We employed a real use case on paper review to carry out both quantitative and qualitative
analyses.
• Facets and advanced utility functions help reducing the amount of relevant patterns
• Useful in providing deep insights on the data.
• Not an ending point!
• Apply the framework to new contexts,
• Derive ad-hoc algorithms for the extended utility functions.
16/17
17. Thanks for your attention!
These slides available at https://bit.ly/ehupm-ruleml21
Francesco Cauteruccio
Research Fellow @ DEMACS, University of Calabria
cauteruccio@mat.unical.it
francescocauteruccio.info
@finalfire