SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Downloaden Sie, um offline zu lesen
A Framework for Online Clustering
Based on Evolving Semi-Supervision
Guilherme Alves, Maria Camila N. Barioni and Elaine Faria
Universidade Federal de Uberlândia
Outline
■ Introduction
■ The CABESS Framework
■ Pointwise CABESS
■ Experiments and Results
■ Conclusion
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 2
Introduction
Motivation
■ The desired organization for the data may change over time.
Semi-supervised approaches may be useful for guiding clustering
algorithms in the adaptation process.
■ Additional information (semi-supervision) may change over time.
■ It may causes clustering transitions: birth, split or merge.
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 3
Introduction
Objective and Research Questions
Goal: to provide a framework that is able to use and maintain
semi-supervision correctly to enable efficient and effective online
clustering processes.
Q1. How accurately does semi-supervision aid on clustering
effectiveness when there are external clustering transitions over time?
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 4
Introduction
Objective and Research Questions
Goal: to provide a framework that is able to use and maintain
semi-supervision correctly to enable efficient and effective online
clustering processes.
Q1. How accurately does semi-supervision aid on clustering
effectiveness when there are external clustering transitions over time?
Q2. Are there major differences between clustering effectiveness when
using semi-supervised clustering approaches based on feedback and
labels?
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 5
Introduction
Objective and Research Questions
Goal: to provide a framework that is able to use and maintain
semi-supervision correctly to enable efficient and effective online
clustering processes.
Q1. How accurately does semi-supervision aid on clustering
effectiveness when there are external clustering transitions over time?
Q2. Are there major differences between clustering effectiveness when
using semi-supervised clustering approaches based on feedback and
labels?
Q3. How the feedback window size variation affects semi-supervision
information and clustering effectiveness?
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 6
Introduction
Objective and Research Questions
Goal: to provide a framework that is able to use and maintain
semi-supervision correctly to enable efficient and effective online
clustering processes.
Q1. How accurately does semi-supervision aid on clustering
effectiveness when there are external clustering transitions over time?
Q2. Are there major differences between clustering effectiveness when
using semi-supervised clustering approaches based on feedback and
labels?
Q3. How the feedback window size variation affects semi-supervision
information and clustering effectiveness?
Q4. How efficient is our approach compared to existing semi-supervised
approaches?
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 7
Introduction
Example
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 8
𝑡0
1 1
1 1 2 2
2
1 2
1
1
1
1
1 1
1
2 2
2
2
2
1
1
1
1
1
III.Groundtruth
Time
Introduction
Example
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 9
𝑡0 𝑡
1 1
1 1 2 2
2
1 2
1
1
1
1
1 1
1
2 2
2
2
2
1
1
1
1
1
3
3 3 2 2
2
3 2
4
4
4
4
4
2 2
2
2
2
3
33
4 4
4
4
2 2
4 4
III.Groundtruth
Time
Introduction
Example
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 10
𝑡0 𝑡 𝑡
1 1
1 1 2 2
2
1 2
1
1
1
1
1 1
1
2 2
2
2
2
1
1
1
1
1
3
3 3 2 2
2
3 2
4
4
4
4
4
2 2
2
2
2
3
33
4 4
4
4
2 2
4 4
3
3 2 2
2
3 2
4
4
4
4 4
4
2
2
4
3
4
4
2 2
2
3
4 4
3 2
2233
3
2
2
3
III.Groundtruth
Time
Introduction
Example
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 11
𝑡0 𝑡 𝑡 𝑡
1 1
1 1 2 2
2
1 2
1
1
1
1
1 1
1
2 2
2
2
2
1
1
1
1
1
3
3 3 2 2
2
3 2
4
4
4
4
4
2 2
2
2
2
3
33
4 4
4
4
2 2
4 4
3
3 2 2
2
3 2
4
4
4
4 4
4
2
2
4
3
4
4
2 2
2
3
4 4
3 2
2233
3
2
2
3 3
3 2 2
2
3 2
4
4
4 4
4
2
2
2
4
3
4
4
4
2 23
4 4
3
2 2
4
22
2233
3
3
III.Groundtruth
Time
Introduction
Example
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 12
𝑡0
1 2
1
1
2 2
2
1
1 1
1 1 2 2
2
1 2
1
1
1
1
1 1
1
2 2
2
2
2
1
1
1
1
1
II.PartitionSetIII.Groundtruth
Time
Introduction
Example
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 13
𝑡0 𝑡
1 2
1
1
2 2
2
1
1 2
1
2 2
23
3
41
1 1
1 1 2 2
2
1 2
1
1
1
1
1 1
1
2 2
2
2
2
1
1
1
1
1
3
3 3 2 2
2
3 2
4
4
4
4
4
2 2
2
2
2
3
33
4 4
4
4
2 2
4 4
II.PartitionSetIII.Groundtruth
Time
Introduction
Example
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 14
𝑡0 𝑡 𝑡
DivisãoSplit: → { , }
1 2
1
1
2 2
2
1
1 2
1
2 2
23
3
41
2
4 4
2
4
3
1
2 2
23
3
2
1 1
1 1 2 2
2
1 2
1
1
1
1
1 1
1
2 2
2
2
2
1
1
1
1
1
3
3 3 2 2
2
3 2
4
4
4
4
4
2 2
2
2
2
3
33
4 4
4
4
2 2
4 4
3
3 2 2
2
3 2
4
4
4
4 4
4
2
2
4
3
4
4
2 2
2
3
4 4
3 2
2233
3
2
2
3
II.PartitionSetIII.Groundtruth
Time
I.Clustering
transition
Introduction
Example
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 15
𝑡0 𝑡 𝑡 𝑡
DivisãoSplit: → { , }
1 2
1
1
2 2
2
1
1 2
1
2 2
23
3
41
2
4 4
2
4
3
1
2 2
23
3
2
3 2
4
4
2
4
3
4
2 2
2
23
3
1 1
1 1 2 2
2
1 2
1
1
1
1
1 1
1
2 2
2
2
2
1
1
1
1
1
3
3 3 2 2
2
3 2
4
4
4
4
4
2 2
2
2
2
3
33
4 4
4
4
2 2
4 4
3
3 2 2
2
3 2
4
4
4
4 4
4
2
2
4
3
4
4
2 2
2
3
4 4
3 2
2233
3
2
2
3 3
3 2 2
2
3 2
4
4
4 4
4
2
2
2
4
3
4
4
4
2 23
4 4
3
2 2
4
22
2233
3
3
II.PartitionSetIII.Groundtruth
Time
I.Clustering
transition
Introduction
Main Contributions
■ The introduction of CABESS (Cluster Adaptation Based on
Evolving Semi-Supervision)
■ A framework for online clustering using semi-supervision in the form
of feedback.
■ A strategy that extracts semi-supervision information from
feedback given in the form of labels.
■ An approach to keep the labels consistent over time.
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 16
The CABESS Framework
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 17
AF Summarization
1
Semi-supervised
Clustering 4
Clustering
2
ℱ
Semi-Supervision
Deduction 3
B
Transition
detection 5
𝒟
CABESS
𝑡 ℛ 𝑡
, 𝒮 𝑡
ℱ 𝑡
𝒟 𝑡
T
Partition
Set Storage
𝑡 𝑡−
𝑡
F
T
𝑡−
𝒮 𝑡
A
B
Verify if new instances have been generated or the user has been satisfied with the cluster quality.
Verify if it is the first clustering performed.
The Framework CABESS
Pointwise CABESS
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 18
AF
BIRCH
[Zhang et al.1996]
1
SSDBScan
[Lelis and Sander
2009] 4
DBScan
[Ester et al.1996]
2
ℱ
Semi-Supervision
Deduction 3
B
MONIC
[Spiliopoulou et al.
2006] 5
𝒟
CABESS
𝑡 ℛ 𝑡
, 𝒮 𝑡
ℱ 𝑡
𝒟 𝑡
T
Partition
Set Storage
𝑡 𝑡−
𝑡
F
T
𝑡−
𝒮 𝑡
The CABESS Framework » Pointwise CABESS
Extracting labels from feedbacks
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 19
1. Feedbacks
𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛
Instance-level labels
✓
✓
✓
✓
✓
✓
✓
(a)
The CABESS Framework » Pointwise CABESS
Extracting labels from feedbacks
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 20
1. Feedbacks
𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛
Instance-level labels
■ A neighborhood 𝑁 is defined as a set of instances that must be in the
same cluster.
✓
✓
✓
✓
✓
✓
✓
(a) (b)
The CABESS Framework » Pointwise CABESS
Extracting labels from feedbacks
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 21
1. Feedbacks
𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛
Instance-level labels
■ A neighborhood 𝑁 is defined as a set of instances that must be in the
same cluster.
■ Same previous cluster AND received positive feedback → Same label
■ Received displacement feedback → assign to label of the neighborhood
of the destination rule.
✓
✓
✓
✓
✓
✓
✓
1
1
2
1
2
2
2
2
2
(a) (b) (c)
The CABESS Framework » Pointwise CABESS
Extracting labels from feedbacks
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 22
2. Instance-level labels
𝑑𝑒𝑑𝑢𝑐𝑡𝑖𝑜𝑛
Summarized-level labels
■ Performed as a propagation task
■ If one of the summarized instances has labeled instances with
different labels
■ Split it to obtain purified summarized instances
■ summarized instances that contain only the same label in labeled
instances.
1
1
2
1
2
2
2
2
2
1
1
2
1
2
2
2
2
2
1
1
2
2
2
2
(c) (d)
The CABESS Framework » Pointwise CABESS
Dealing with obsolete labels
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 23
■ Obsolete labels: labels assigned to instances, for which the
clusters do not exist anymore.
■ Minimizing the problem: adoption of a detector of transitions.
■ Better neighborhood management.
■ when a cluster survives both neighborhood and associated labels are
preserved.
𝑠𝑝𝑙𝑖𝑡
does not exist at 𝒕 → label 2 becomes obsolete
𝑡 𝑡
Experiments
Datasets
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 24
Name # instances d # classes Reference Type
DB7 9,050 2 8 [Silva et al. 2015]
SyntheticSYN3 5,000 2 3 streamMOA
SYN4 10,000 3 5 streamMOA
FROGS 1,484 8 4 [Colonna et al. 2016]
RealIPEA 5,564 5 27 IPEA
KDD’995 24,692 19 11 UCI
Experiments
Comparison Methods
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 25
■ Unsupervised (DBScan)
■ Consists in periodically executing a clustering algorithm without any
semi-supervision.
■ Semi-supervised (SSDBScan)
■ Static Approach.
■ Consists in periodically applying a semi-supervised clustering algorithm.
■ This approach does not discard any label over time.
■ Window-based Approach.
■ It is a variation of the previous approach.
■ Instead of executing the clustering algorithm over all the label set, we
remove old labels.
Experiments
Protocol and Evaluation
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 26
■ Adjusted Rand Index (ARI)
■ Prequential Protocol
■ For each timestamp 𝑡 only one label is considered valid for a
data instance according to the grouping tree.
■ Online arrival of instances and feedback.
■ The arrival of the data instances and the user feedback are given
according to the uniform distribution.
Experimental Results
Semi-supervised vs. Unsupervised
Q1. How accurately does semi-supervision aid on clustering
effectiveness when there are external clustering transitions over time?
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 27
Experimental Results
Semi-supervised vs. Unsupervised
Q1. How accurately does semi-supervision aid on clustering
effectiveness when there are external clustering transitions over time?
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 28
Experimental Results
Feedbacks vs. Labels
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 29
Q2. Are there major differences between clustering effectiveness when
using semi-supervised clustering approaches based on feedback and labels?
Experimental Results
Feedbacks vs. Labels
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 30
Q2. Are there major differences between clustering effectiveness when
using semi-supervised clustering approaches based on feedback and labels?
Experimental Results
Feedback Window Size Variation
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 31
Q3. How the feedback window size variation affects semi-supervision
information and clustering effectiveness?
Experimental Results
Efficiency
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 32
Q4. How efficient is our approach compared to existing semi-supervised
approaches?
Conclusion
■ Higher efficiency when compared to other semi-supervised
approaches
■ While keeping an equivalent effectiveness.
■ Future works:
■ Exploring other types of semi-supervision information such as
instance-level constraints.
■ Tackling other strategies for detecting transitions.
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 33
References
Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996). A density-based algorithm for
discovering clusters in large spatial databases with noise. In KDD, pages 226–231.
AAAI Press
Colonna, J. G., Gama, J., and Nakamura, E. F. (2016). Recognizing Family, Genus, and
Species of Anuran Using a Hierarchical Classification Approach. pages 198–212.
Springer, Cham.
Lai, H. P., Visani, M., Boucher, A., and Ogier, J.-M. (2014). A new interactive semi-
supervised clustering model for large image database indexing. Pattern
Recognition Letters, 37(1):94–106.
Lelis, L. and Sander, J. (2009). Semi-supervised Density-Based Clustering. In IEEE
ICDM, pages 842–847.
Spiliopoulou, M., Ntoutsi, I., Theodoridis, Y., and Schult, R. (2006). MONIC: Modeling
and Monitoring Cluster Transitions. In ACM SIGKDD, page 706, New York, NY, USA.
ACM Press.
Zhang, T., Ramakrishnan, R., and Livny, M. (1996). BIRCH: An Efficient Data
Clustering Method for very Large Databases. ACM SIGMOD Record, 25(2):103–114.
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 34
A Framework for Online
Clustering Based on
Evolving Semi-Supervision
Guilherme Alves guilhermealves@ufu.br
Maria Camila N. Barioni camila.barioni@ufu.br
Elaine Faria elaine@ufu.br
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 35
Acknowledgments

Weitere ähnliche Inhalte

Ähnlich wie A Framework for Online Clustering Based on Evolving Semi-supervision

A Hybrid Memory Data Cube Approach for High Dimension Relations
A Hybrid Memory Data Cube Approach for High Dimension RelationsA Hybrid Memory Data Cube Approach for High Dimension Relations
A Hybrid Memory Data Cube Approach for High Dimension RelationsRodrigo Rocha Silva
 
UKSG Conference 2015 - E-resources: ezPAARSE helps you discover who is readin...
UKSG Conference 2015 - E-resources: ezPAARSE helps you discover who is readin...UKSG Conference 2015 - E-resources: ezPAARSE helps you discover who is readin...
UKSG Conference 2015 - E-resources: ezPAARSE helps you discover who is readin...UKSG: connecting the knowledge community
 
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...Lixi Conrads
 
Identity & Access Management Briefing
Identity & Access Management BriefingIdentity & Access Management Briefing
Identity & Access Management BriefingCharise Arrowood
 
Multiple Sequence Alignment Tool Using NCBI COBALT
Multiple Sequence Alignment Tool Using NCBI COBALTMultiple Sequence Alignment Tool Using NCBI COBALT
Multiple Sequence Alignment Tool Using NCBI COBALTMohsin Raza
 
From Cardinal(ity) Sins to Cost-Efficient Metrics Aggregation
From Cardinal(ity) Sins to Cost-Efficient Metrics AggregationFrom Cardinal(ity) Sins to Cost-Efficient Metrics Aggregation
From Cardinal(ity) Sins to Cost-Efficient Metrics AggregationPaige Cruz
 
A solution to align corporate reporting frameworks: The case of GRI and CDP
A solution to align corporate reporting frameworks: The case of GRI and CDPA solution to align corporate reporting frameworks: The case of GRI and CDP
A solution to align corporate reporting frameworks: The case of GRI and CDPMaria Mora
 
Investigating Geographic Information System Technologies A Global Positioning...
Investigating Geographic Information System Technologies A Global Positioning...Investigating Geographic Information System Technologies A Global Positioning...
Investigating Geographic Information System Technologies A Global Positioning...Simon Sweeney
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsMaribel Acosta Deibe
 
Why Distributed Tracing is Essential for Performance and Reliability
Why Distributed Tracing is Essential for Performance and ReliabilityWhy Distributed Tracing is Essential for Performance and Reliability
Why Distributed Tracing is Essential for Performance and ReliabilityDevOps.com
 
Survey on Software Data Reduction Techniques Accomplishing Bug Triage
Survey on Software Data Reduction Techniques Accomplishing Bug TriageSurvey on Software Data Reduction Techniques Accomplishing Bug Triage
Survey on Software Data Reduction Techniques Accomplishing Bug TriageIRJET Journal
 
Sql portfolio admin_practicals
Sql portfolio admin_practicalsSql portfolio admin_practicals
Sql portfolio admin_practicalsShelli Ciaschini
 
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Universidade de São Paulo
 
IRJET- Sampling Selection Strategy for Large Scale Deduplication of Synthetic...
IRJET- Sampling Selection Strategy for Large Scale Deduplication of Synthetic...IRJET- Sampling Selection Strategy for Large Scale Deduplication of Synthetic...
IRJET- Sampling Selection Strategy for Large Scale Deduplication of Synthetic...IRJET Journal
 
Impact of Asymmetry of Internet Traffic for Heuristic Based Classification
Impact of Asymmetry of Internet Traffic for Heuristic Based ClassificationImpact of Asymmetry of Internet Traffic for Heuristic Based Classification
Impact of Asymmetry of Internet Traffic for Heuristic Based ClassificationCSCJournals
 
Learning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsLearning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsPooyan Jamshidi
 
SDPM in Brazil - 2007
SDPM in Brazil - 2007SDPM in Brazil - 2007
SDPM in Brazil - 2007Peter Mello
 
Susanne Bartel | Breaking the Plateau with KMM
Susanne Bartel | Breaking the Plateau with KMM Susanne Bartel | Breaking the Plateau with KMM
Susanne Bartel | Breaking the Plateau with KMM Kanban Conferences
 

Ähnlich wie A Framework for Online Clustering Based on Evolving Semi-supervision (20)

A Hybrid Memory Data Cube Approach for High Dimension Relations
A Hybrid Memory Data Cube Approach for High Dimension RelationsA Hybrid Memory Data Cube Approach for High Dimension Relations
A Hybrid Memory Data Cube Approach for High Dimension Relations
 
UKSG Conference 2015 - E-resources: ezPAARSE helps you discover who is readin...
UKSG Conference 2015 - E-resources: ezPAARSE helps you discover who is readin...UKSG Conference 2015 - E-resources: ezPAARSE helps you discover who is readin...
UKSG Conference 2015 - E-resources: ezPAARSE helps you discover who is readin...
 
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
 
Identity & Access Management Briefing
Identity & Access Management BriefingIdentity & Access Management Briefing
Identity & Access Management Briefing
 
Multiple Sequence Alignment Tool Using NCBI COBALT
Multiple Sequence Alignment Tool Using NCBI COBALTMultiple Sequence Alignment Tool Using NCBI COBALT
Multiple Sequence Alignment Tool Using NCBI COBALT
 
From Cardinal(ity) Sins to Cost-Efficient Metrics Aggregation
From Cardinal(ity) Sins to Cost-Efficient Metrics AggregationFrom Cardinal(ity) Sins to Cost-Efficient Metrics Aggregation
From Cardinal(ity) Sins to Cost-Efficient Metrics Aggregation
 
A solution to align corporate reporting frameworks: The case of GRI and CDP
A solution to align corporate reporting frameworks: The case of GRI and CDPA solution to align corporate reporting frameworks: The case of GRI and CDP
A solution to align corporate reporting frameworks: The case of GRI and CDP
 
Investigating Geographic Information System Technologies A Global Positioning...
Investigating Geographic Information System Technologies A Global Positioning...Investigating Geographic Information System Technologies A Global Positioning...
Investigating Geographic Information System Technologies A Global Positioning...
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
 
Why Distributed Tracing is Essential for Performance and Reliability
Why Distributed Tracing is Essential for Performance and ReliabilityWhy Distributed Tracing is Essential for Performance and Reliability
Why Distributed Tracing is Essential for Performance and Reliability
 
Survey on Software Data Reduction Techniques Accomplishing Bug Triage
Survey on Software Data Reduction Techniques Accomplishing Bug TriageSurvey on Software Data Reduction Techniques Accomplishing Bug Triage
Survey on Software Data Reduction Techniques Accomplishing Bug Triage
 
Sql portfolio admin_practicals
Sql portfolio admin_practicalsSql portfolio admin_practicals
Sql portfolio admin_practicals
 
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
 
IRJET- Sampling Selection Strategy for Large Scale Deduplication of Synthetic...
IRJET- Sampling Selection Strategy for Large Scale Deduplication of Synthetic...IRJET- Sampling Selection Strategy for Large Scale Deduplication of Synthetic...
IRJET- Sampling Selection Strategy for Large Scale Deduplication of Synthetic...
 
Bicod2017
Bicod2017Bicod2017
Bicod2017
 
BICOD-2017
BICOD-2017BICOD-2017
BICOD-2017
 
Impact of Asymmetry of Internet Traffic for Heuristic Based Classification
Impact of Asymmetry of Internet Traffic for Heuristic Based ClassificationImpact of Asymmetry of Internet Traffic for Heuristic Based Classification
Impact of Asymmetry of Internet Traffic for Heuristic Based Classification
 
Learning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsLearning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain Environments
 
SDPM in Brazil - 2007
SDPM in Brazil - 2007SDPM in Brazil - 2007
SDPM in Brazil - 2007
 
Susanne Bartel | Breaking the Plateau with KMM
Susanne Bartel | Breaking the Plateau with KMM Susanne Bartel | Breaking the Plateau with KMM
Susanne Bartel | Breaking the Plateau with KMM
 

Kürzlich hochgeladen

Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 

Kürzlich hochgeladen (20)

Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 

A Framework for Online Clustering Based on Evolving Semi-supervision

  • 1. A Framework for Online Clustering Based on Evolving Semi-Supervision Guilherme Alves, Maria Camila N. Barioni and Elaine Faria Universidade Federal de Uberlândia
  • 2. Outline ■ Introduction ■ The CABESS Framework ■ Pointwise CABESS ■ Experiments and Results ■ Conclusion 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 2
  • 3. Introduction Motivation ■ The desired organization for the data may change over time. Semi-supervised approaches may be useful for guiding clustering algorithms in the adaptation process. ■ Additional information (semi-supervision) may change over time. ■ It may causes clustering transitions: birth, split or merge. 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 3
  • 4. Introduction Objective and Research Questions Goal: to provide a framework that is able to use and maintain semi-supervision correctly to enable efficient and effective online clustering processes. Q1. How accurately does semi-supervision aid on clustering effectiveness when there are external clustering transitions over time? 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 4
  • 5. Introduction Objective and Research Questions Goal: to provide a framework that is able to use and maintain semi-supervision correctly to enable efficient and effective online clustering processes. Q1. How accurately does semi-supervision aid on clustering effectiveness when there are external clustering transitions over time? Q2. Are there major differences between clustering effectiveness when using semi-supervised clustering approaches based on feedback and labels? 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 5
  • 6. Introduction Objective and Research Questions Goal: to provide a framework that is able to use and maintain semi-supervision correctly to enable efficient and effective online clustering processes. Q1. How accurately does semi-supervision aid on clustering effectiveness when there are external clustering transitions over time? Q2. Are there major differences between clustering effectiveness when using semi-supervised clustering approaches based on feedback and labels? Q3. How the feedback window size variation affects semi-supervision information and clustering effectiveness? 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 6
  • 7. Introduction Objective and Research Questions Goal: to provide a framework that is able to use and maintain semi-supervision correctly to enable efficient and effective online clustering processes. Q1. How accurately does semi-supervision aid on clustering effectiveness when there are external clustering transitions over time? Q2. Are there major differences between clustering effectiveness when using semi-supervised clustering approaches based on feedback and labels? Q3. How the feedback window size variation affects semi-supervision information and clustering effectiveness? Q4. How efficient is our approach compared to existing semi-supervised approaches? 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 7
  • 8. Introduction Example 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 8 𝑡0 1 1 1 1 2 2 2 1 2 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 III.Groundtruth Time
  • 9. Introduction Example 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 9 𝑡0 𝑡 1 1 1 1 2 2 2 1 2 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 3 3 3 2 2 2 3 2 4 4 4 4 4 2 2 2 2 2 3 33 4 4 4 4 2 2 4 4 III.Groundtruth Time
  • 10. Introduction Example 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 10 𝑡0 𝑡 𝑡 1 1 1 1 2 2 2 1 2 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 3 3 3 2 2 2 3 2 4 4 4 4 4 2 2 2 2 2 3 33 4 4 4 4 2 2 4 4 3 3 2 2 2 3 2 4 4 4 4 4 4 2 2 4 3 4 4 2 2 2 3 4 4 3 2 2233 3 2 2 3 III.Groundtruth Time
  • 11. Introduction Example 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 11 𝑡0 𝑡 𝑡 𝑡 1 1 1 1 2 2 2 1 2 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 3 3 3 2 2 2 3 2 4 4 4 4 4 2 2 2 2 2 3 33 4 4 4 4 2 2 4 4 3 3 2 2 2 3 2 4 4 4 4 4 4 2 2 4 3 4 4 2 2 2 3 4 4 3 2 2233 3 2 2 3 3 3 2 2 2 3 2 4 4 4 4 4 2 2 2 4 3 4 4 4 2 23 4 4 3 2 2 4 22 2233 3 3 III.Groundtruth Time
  • 12. Introduction Example 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 12 𝑡0 1 2 1 1 2 2 2 1 1 1 1 1 2 2 2 1 2 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 II.PartitionSetIII.Groundtruth Time
  • 13. Introduction Example 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 13 𝑡0 𝑡 1 2 1 1 2 2 2 1 1 2 1 2 2 23 3 41 1 1 1 1 2 2 2 1 2 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 3 3 3 2 2 2 3 2 4 4 4 4 4 2 2 2 2 2 3 33 4 4 4 4 2 2 4 4 II.PartitionSetIII.Groundtruth Time
  • 14. Introduction Example 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 14 𝑡0 𝑡 𝑡 DivisãoSplit: → { , } 1 2 1 1 2 2 2 1 1 2 1 2 2 23 3 41 2 4 4 2 4 3 1 2 2 23 3 2 1 1 1 1 2 2 2 1 2 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 3 3 3 2 2 2 3 2 4 4 4 4 4 2 2 2 2 2 3 33 4 4 4 4 2 2 4 4 3 3 2 2 2 3 2 4 4 4 4 4 4 2 2 4 3 4 4 2 2 2 3 4 4 3 2 2233 3 2 2 3 II.PartitionSetIII.Groundtruth Time I.Clustering transition
  • 15. Introduction Example 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 15 𝑡0 𝑡 𝑡 𝑡 DivisãoSplit: → { , } 1 2 1 1 2 2 2 1 1 2 1 2 2 23 3 41 2 4 4 2 4 3 1 2 2 23 3 2 3 2 4 4 2 4 3 4 2 2 2 23 3 1 1 1 1 2 2 2 1 2 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 3 3 3 2 2 2 3 2 4 4 4 4 4 2 2 2 2 2 3 33 4 4 4 4 2 2 4 4 3 3 2 2 2 3 2 4 4 4 4 4 4 2 2 4 3 4 4 2 2 2 3 4 4 3 2 2233 3 2 2 3 3 3 2 2 2 3 2 4 4 4 4 4 2 2 2 4 3 4 4 4 2 23 4 4 3 2 2 4 22 2233 3 3 II.PartitionSetIII.Groundtruth Time I.Clustering transition
  • 16. Introduction Main Contributions ■ The introduction of CABESS (Cluster Adaptation Based on Evolving Semi-Supervision) ■ A framework for online clustering using semi-supervision in the form of feedback. ■ A strategy that extracts semi-supervision information from feedback given in the form of labels. ■ An approach to keep the labels consistent over time. 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 16
  • 17. The CABESS Framework 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 17 AF Summarization 1 Semi-supervised Clustering 4 Clustering 2 ℱ Semi-Supervision Deduction 3 B Transition detection 5 𝒟 CABESS 𝑡 ℛ 𝑡 , 𝒮 𝑡 ℱ 𝑡 𝒟 𝑡 T Partition Set Storage 𝑡 𝑡− 𝑡 F T 𝑡− 𝒮 𝑡 A B Verify if new instances have been generated or the user has been satisfied with the cluster quality. Verify if it is the first clustering performed.
  • 18. The Framework CABESS Pointwise CABESS 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 18 AF BIRCH [Zhang et al.1996] 1 SSDBScan [Lelis and Sander 2009] 4 DBScan [Ester et al.1996] 2 ℱ Semi-Supervision Deduction 3 B MONIC [Spiliopoulou et al. 2006] 5 𝒟 CABESS 𝑡 ℛ 𝑡 , 𝒮 𝑡 ℱ 𝑡 𝒟 𝑡 T Partition Set Storage 𝑡 𝑡− 𝑡 F T 𝑡− 𝒮 𝑡
  • 19. The CABESS Framework » Pointwise CABESS Extracting labels from feedbacks 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 19 1. Feedbacks 𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛 Instance-level labels ✓ ✓ ✓ ✓ ✓ ✓ ✓ (a)
  • 20. The CABESS Framework » Pointwise CABESS Extracting labels from feedbacks 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 20 1. Feedbacks 𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛 Instance-level labels ■ A neighborhood 𝑁 is defined as a set of instances that must be in the same cluster. ✓ ✓ ✓ ✓ ✓ ✓ ✓ (a) (b)
  • 21. The CABESS Framework » Pointwise CABESS Extracting labels from feedbacks 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 21 1. Feedbacks 𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛 Instance-level labels ■ A neighborhood 𝑁 is defined as a set of instances that must be in the same cluster. ■ Same previous cluster AND received positive feedback → Same label ■ Received displacement feedback → assign to label of the neighborhood of the destination rule. ✓ ✓ ✓ ✓ ✓ ✓ ✓ 1 1 2 1 2 2 2 2 2 (a) (b) (c)
  • 22. The CABESS Framework » Pointwise CABESS Extracting labels from feedbacks 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 22 2. Instance-level labels 𝑑𝑒𝑑𝑢𝑐𝑡𝑖𝑜𝑛 Summarized-level labels ■ Performed as a propagation task ■ If one of the summarized instances has labeled instances with different labels ■ Split it to obtain purified summarized instances ■ summarized instances that contain only the same label in labeled instances. 1 1 2 1 2 2 2 2 2 1 1 2 1 2 2 2 2 2 1 1 2 2 2 2 (c) (d)
  • 23. The CABESS Framework » Pointwise CABESS Dealing with obsolete labels 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 23 ■ Obsolete labels: labels assigned to instances, for which the clusters do not exist anymore. ■ Minimizing the problem: adoption of a detector of transitions. ■ Better neighborhood management. ■ when a cluster survives both neighborhood and associated labels are preserved. 𝑠𝑝𝑙𝑖𝑡 does not exist at 𝒕 → label 2 becomes obsolete 𝑡 𝑡
  • 24. Experiments Datasets 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 24 Name # instances d # classes Reference Type DB7 9,050 2 8 [Silva et al. 2015] SyntheticSYN3 5,000 2 3 streamMOA SYN4 10,000 3 5 streamMOA FROGS 1,484 8 4 [Colonna et al. 2016] RealIPEA 5,564 5 27 IPEA KDD’995 24,692 19 11 UCI
  • 25. Experiments Comparison Methods 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 25 ■ Unsupervised (DBScan) ■ Consists in periodically executing a clustering algorithm without any semi-supervision. ■ Semi-supervised (SSDBScan) ■ Static Approach. ■ Consists in periodically applying a semi-supervised clustering algorithm. ■ This approach does not discard any label over time. ■ Window-based Approach. ■ It is a variation of the previous approach. ■ Instead of executing the clustering algorithm over all the label set, we remove old labels.
  • 26. Experiments Protocol and Evaluation 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 26 ■ Adjusted Rand Index (ARI) ■ Prequential Protocol ■ For each timestamp 𝑡 only one label is considered valid for a data instance according to the grouping tree. ■ Online arrival of instances and feedback. ■ The arrival of the data instances and the user feedback are given according to the uniform distribution.
  • 27. Experimental Results Semi-supervised vs. Unsupervised Q1. How accurately does semi-supervision aid on clustering effectiveness when there are external clustering transitions over time? 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 27
  • 28. Experimental Results Semi-supervised vs. Unsupervised Q1. How accurately does semi-supervision aid on clustering effectiveness when there are external clustering transitions over time? 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 28
  • 29. Experimental Results Feedbacks vs. Labels 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 29 Q2. Are there major differences between clustering effectiveness when using semi-supervised clustering approaches based on feedback and labels?
  • 30. Experimental Results Feedbacks vs. Labels 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 30 Q2. Are there major differences between clustering effectiveness when using semi-supervised clustering approaches based on feedback and labels?
  • 31. Experimental Results Feedback Window Size Variation 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 31 Q3. How the feedback window size variation affects semi-supervision information and clustering effectiveness?
  • 32. Experimental Results Efficiency 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 32 Q4. How efficient is our approach compared to existing semi-supervised approaches?
  • 33. Conclusion ■ Higher efficiency when compared to other semi-supervised approaches ■ While keeping an equivalent effectiveness. ■ Future works: ■ Exploring other types of semi-supervision information such as instance-level constraints. ■ Tackling other strategies for detecting transitions. 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 33
  • 34. References Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pages 226–231. AAAI Press Colonna, J. G., Gama, J., and Nakamura, E. F. (2016). Recognizing Family, Genus, and Species of Anuran Using a Hierarchical Classification Approach. pages 198–212. Springer, Cham. Lai, H. P., Visani, M., Boucher, A., and Ogier, J.-M. (2014). A new interactive semi- supervised clustering model for large image database indexing. Pattern Recognition Letters, 37(1):94–106. Lelis, L. and Sander, J. (2009). Semi-supervised Density-Based Clustering. In IEEE ICDM, pages 842–847. Spiliopoulou, M., Ntoutsi, I., Theodoridis, Y., and Schult, R. (2006). MONIC: Modeling and Monitoring Cluster Transitions. In ACM SIGKDD, page 706, New York, NY, USA. ACM Press. Zhang, T., Ramakrishnan, R., and Livny, M. (1996). BIRCH: An Efficient Data Clustering Method for very Large Databases. ACM SIGMOD Record, 25(2):103–114. 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 34
  • 35. A Framework for Online Clustering Based on Evolving Semi-Supervision Guilherme Alves guilhermealves@ufu.br Maria Camila N. Barioni camila.barioni@ufu.br Elaine Faria elaine@ufu.br 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 35 Acknowledgments