2. Agenda
• Motivation
• Summary of existing approaches ?
• Support computations
• Comparison and Evaluation
3. Background
• Frequent subgraph mining
– Graph-transection setting (for graph datasets)
• Many small graphs
– Single-graph setting
• One big graph
• New problem for single-graph setting
– Definition of support
4. Challenge
• Difficulty of defining the support in a large
graph
– Property of anti-monotone is required in pruning
the search space
• Anti-monotone
– A⊂B ⇒sup(A) > sup(B)
5. Subgraph Support
• The most intuitive definition
– Count of embeddings in input graph
• Not anti-monotone
Count of embeddings 1 2 2
5
6. Motivation
• Suggest a new definition of support for
subgraph that
– Resulting support is anti-monotone
– Support can be computed efficiently
• Three Support computation algorithms
– Overlap based (2)
– Minimum image based (1)
7. Agenda
• Motivation
• Summary of existing approaches
• Support computations
– Simple overlap
Overlap based methods
– Harmful overlap
– Minimum image
• Comparison and Evaluation
8. Overlap based support
• The size of maximum independent set (MIS)
– Find overlaps
– Find maximum independent node size
9. Overlap
• Sharing at least one node in each embeddings
• 𝑉1 ∩ 𝑉2 ≠ ∅
(𝑉1 , 𝑉2 : 𝑛𝑜𝑑𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔𝑠)
Embedding is an occurrence of pattern
9
10. Overlap Graph
• 𝑂 = (𝑉 𝑂 , 𝐸 𝑂 )
– 𝑉 𝑂 : set of embeddings as its node set
– 𝐸 𝑂 = { 𝑓1 , 𝑓2 |
𝑓1 , 𝑓2 ∈ 𝑉 𝑂 ∧ 𝑓1 ≡ 𝑓2 ∧ 𝑉1 ∩ 𝑉2 ≠ ∅ 1 ∈ 1 , 𝑓2 ∈
𝑓 𝑉 𝑉
2 }
– If two embeddings share at least one node,
nodes of overlap graph is connected
10
11. Maximum Independent Set Support
• Independent node set of Graph 𝐺 = (𝑉, 𝐸)
– 𝐼 ⊆ 𝑉 𝑤𝑖𝑡ℎ ∀𝑢, 𝑣 ∈ 𝐼: 𝑢, 𝑣 ∈ 𝐸
– Maximum independent node set need not to be
unique
The size of The size of
maximum independent node set : 1 maximum independent node set : 2
• MIS-support = size of maximum independent
node set
11
12. Harmful Overlap Support(1/3)
• MIS-support
– Considering any overlap as harmful
• Overlap is Not necessarily harmful
– Anti-monotone property is important
12
15. Note
• Harmful overlap is a weaker concept than
simple overlap
– HO-support is never lower than MIS-support
15
16. Experiment
• Support computation as Part of the
MoSS(Molecular Substructure Miner) program
– IC93 dataset[7]
• 1283 molecules forms a connected component
– Tic-Tac-Toc win dataset
• This consists of 626 connected components
16
17. Result
• Vertical axis: Number of frequent subgraphs of
which support exceeds threshold
• Horizontal axis: Number of nodes (of pattern)?
• In the case IC93
– Up to 30% more
• Due to heavily overlapping
with of carbon atoms
• In the case Tic-Tac-Toe
– Around 5 % more
17
18. Agenda
• Motivation
• Summary of existing approaches
• Support computations
– Simple overlap
– Harmful overlap
– Minimum image
• Comparison and Evaluation
19. Minimum image based definition
• Minimum image based support of p in g
– Number of unique nodes mapped
1
2 Embeddings Unique
1 3 3 5 3
3 2 2 4 4 2
3 1 5 3 3
4
5
20. Benefits
I. Instead of 𝑂(𝑁 2 ) 𝑜𝑣𝑒𝑟𝑙𝑎𝑝𝑠, 𝑂 𝑁 𝑑𝑎𝑡𝑎𝑠𝑒𝑡
II. No NP-compete MIS problem
III. Not necessary to compute all occurrence,
only for all nodes
21. Agenda
• Motivation
• Summary of existing approaches
• Support computations
– Simple overlap
– Harmful overlap
– Minimum image
• Comparison and Evaluation
25. Experimental Setting
• Comparisons of Image-based and overlap-
based algorithms
• Dataset
– WebKB dataset (4 large graphs of structure of web
pages)
27. Conclusion
• Conclusion
– Overlap based support measure that is anti-
monotone
– Maximum image based algorithm that is more
efficient than previous ones