SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Information-theoretic co-clustering Authors / Inderjit S. Dhillon, SubramanyamMallela and Dharmendra S. Modha Conference / ACM SIGKDD ’03, August 24-27, 2003, Washington Presenter / Meng-Lun, Wu 1
Outline Introduction Problem Formulation Co-Clustering Algorithm Experimental Result Conclusions And Future Work 2
Introduction (cont.) Clustering is a fundamental tool in unsupervised learning. Most clustering algorithms focus on one-way clustering. Clustering 3
Introduction (cont.) It is often desirable to co-cluster or simultaneously cluster both dimensions. The normalized non-negative contingency table into a joint probability distribution between two discrete random variables. The optimal co-clustering is one that leads to the largest mutual information between the clustered random variables. 4
Introduction (cont.) The optimal co-clustering is one that minimizes the loss in mutual information. The mutual information of two random variables is a quantity that measures the mutual dependence of the two variables. Formally, the mutual information can be defined as: 5
Introduction (cont.) The Kullback-Leibler (K-L) divergence, measures the difference between two probability distributions. Given the true probability distribution p(x,y) and another distribution q(x,y) can be defined as: 6
Problem formulation Let X and Y be discrete random variables. X: {x1,…,xm}, Y: {y1,…,yn} p(X, Y) denote the joint probability distribution. Let the k clusters of X as:  Let the l clusters of Y as: {ŷ1, ŷ2, . . . , ŷl} 7
Problem formulation (cont.) Definition  An optimal co-clustering minimizes Subject to constraints on the number of row and column clusters. For a fixed co-clustering (CX,CY), we can write the loss in mutual information. 8
Problem formulation (cont.) 9
Problem formulation (cont.) q(X,Y) is a distribution of the form 0.18   0.18  0.14   0.14   0.18  0.18 0.5   0.5 0.15 0.15 0.15 0.15 0.2 0.2 10 0.3 0.3 0.4 Suppose
Co-CLUSTERING Algorithm Input :  The joint probability distribution p(X,Y), k the desired number of row clusters and l the desired number of column clusters. Output: The partition functions C†X and C†Y 11
Co-CLUSTERING Algorithm (cont.) 12 ^x3^x1 ^x3^x2
Co-CLUSTERING Algorithm (cont.) 13 ŷ2 ŷ1 ŷ1 ŷ2
Co-CLUSTERING Algorithm (cont.) 14 D(p||q)=0.02881
Experimental results For our experimental results we use various subsets of the 20-Newsgroup data(NG20). We use 1D-clustering to denote document clustering without any word clustering. Evaluation Measures Micro-averaged-precision Micro-averaged-recall 15
Experimental results (cont.) 16
Experimental results (cont.) 17
Experimental results (cont.) 18
CONCLUSIONS AND FUTURE WORK The information-theoretic formulation for co-clustering can be guaranteed to reach a local minimum in a finite number of steps. Co-clustering for joint distribution of two random variables. In this paper, the row and column clusters are pre-specified. We hope that an information-theoretic regularization procedure may allow us to select the number of clusters. 19

Weitere ähnliche Inhalte

Ähnlich wie Information Theoretic Co Clustering

11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.pptSueMiu
 
Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptSubrata Kumer Paul
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
 
Fuzzy c-Means Clustering Algorithms
Fuzzy c-Means Clustering AlgorithmsFuzzy c-Means Clustering Algorithms
Fuzzy c-Means Clustering AlgorithmsJustin Cletus
 
Scalable Constrained Spectral Clustering
Scalable Constrained Spectral ClusteringScalable Constrained Spectral Clustering
Scalable Constrained Spectral Clustering1crore projects
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep LearningRayKim51
 
COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXING
COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXINGCOMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXING
COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXINGcsandit
 
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Salah Amean
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMWireilla
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMijfls
 
Automated Clustering Project - 12th CONTECSI 34th WCARS
Automated Clustering Project - 12th CONTECSI 34th WCARS Automated Clustering Project - 12th CONTECSI 34th WCARS
Automated Clustering Project - 12th CONTECSI 34th WCARS TECSI FEA USP
 
CS583-unsupervised-learning.ppt learning
CS583-unsupervised-learning.ppt learningCS583-unsupervised-learning.ppt learning
CS583-unsupervised-learning.ppt learningssuserb02eff
 
CS583-unsupervised-learning.ppt
CS583-unsupervised-learning.pptCS583-unsupervised-learning.ppt
CS583-unsupervised-learning.pptHathiramN1
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learningAnil Yadav
 
Free vibration analysis of composite plates with uncertain properties
Free vibration analysis of composite plates  with uncertain propertiesFree vibration analysis of composite plates  with uncertain properties
Free vibration analysis of composite plates with uncertain propertiesUniversity of Glasgow
 
Edd clustering algorithm for
Edd clustering algorithm forEdd clustering algorithm for
Edd clustering algorithm forcsandit
 
EDD CLUSTERING ALGORITHM FOR WIRELESS SENSOR NETWORKS
EDD CLUSTERING ALGORITHM FOR WIRELESS SENSOR NETWORKSEDD CLUSTERING ALGORITHM FOR WIRELESS SENSOR NETWORKS
EDD CLUSTERING ALGORITHM FOR WIRELESS SENSOR NETWORKScscpconf
 

Ähnlich wie Information Theoretic Co Clustering (20)

11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.ppt
 
11 clusadvanced
11 clusadvanced11 clusadvanced
11 clusadvanced
 
Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.ppt
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering Algorithm
 
Fuzzy c-Means Clustering Algorithms
Fuzzy c-Means Clustering AlgorithmsFuzzy c-Means Clustering Algorithms
Fuzzy c-Means Clustering Algorithms
 
Scalable Constrained Spectral Clustering
Scalable Constrained Spectral ClusteringScalable Constrained Spectral Clustering
Scalable Constrained Spectral Clustering
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXING
COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXINGCOMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXING
COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXING
 
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
 
CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistic...
CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistic...CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistic...
CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistic...
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
 
Automated Clustering Project - 12th CONTECSI 34th WCARS
Automated Clustering Project - 12th CONTECSI 34th WCARS Automated Clustering Project - 12th CONTECSI 34th WCARS
Automated Clustering Project - 12th CONTECSI 34th WCARS
 
Ica group 3[1]
Ica group 3[1]Ica group 3[1]
Ica group 3[1]
 
CS583-unsupervised-learning.ppt learning
CS583-unsupervised-learning.ppt learningCS583-unsupervised-learning.ppt learning
CS583-unsupervised-learning.ppt learning
 
CS583-unsupervised-learning.ppt
CS583-unsupervised-learning.pptCS583-unsupervised-learning.ppt
CS583-unsupervised-learning.ppt
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning
 
Free vibration analysis of composite plates with uncertain properties
Free vibration analysis of composite plates  with uncertain propertiesFree vibration analysis of composite plates  with uncertain properties
Free vibration analysis of composite plates with uncertain properties
 
Edd clustering algorithm for
Edd clustering algorithm forEdd clustering algorithm for
Edd clustering algorithm for
 
EDD CLUSTERING ALGORITHM FOR WIRELESS SENSOR NETWORKS
EDD CLUSTERING ALGORITHM FOR WIRELESS SENSOR NETWORKSEDD CLUSTERING ALGORITHM FOR WIRELESS SENSOR NETWORKS
EDD CLUSTERING ALGORITHM FOR WIRELESS SENSOR NETWORKS
 

Mehr von AllenWu

A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringAllenWu
 
Collaborative filtering with CCAM
Collaborative filtering with CCAMCollaborative filtering with CCAM
Collaborative filtering with CCAMAllenWu
 
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsDSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsAllenWu
 
Co-clustering with augmented data
Co-clustering with augmented dataCo-clustering with augmented data
Co-clustering with augmented dataAllenWu
 
Ch4.mapreduce algorithm design
Ch4.mapreduce algorithm designCh4.mapreduce algorithm design
Ch4.mapreduce algorithm designAllenWu
 
地震知識
地震知識地震知識
地震知識AllenWu
 
Collaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrixCollaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrixAllenWu
 
Co clustering by-block_value_decomposition
Co clustering by-block_value_decompositionCo clustering by-block_value_decomposition
Co clustering by-block_value_decompositionAllenWu
 
Semantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual AnalysisSemantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual AnalysisAllenWu
 

Mehr von AllenWu (9)

A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
Collaborative filtering with CCAM
Collaborative filtering with CCAMCollaborative filtering with CCAM
Collaborative filtering with CCAM
 
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsDSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
 
Co-clustering with augmented data
Co-clustering with augmented dataCo-clustering with augmented data
Co-clustering with augmented data
 
Ch4.mapreduce algorithm design
Ch4.mapreduce algorithm designCh4.mapreduce algorithm design
Ch4.mapreduce algorithm design
 
地震知識
地震知識地震知識
地震知識
 
Collaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrixCollaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrix
 
Co clustering by-block_value_decomposition
Co clustering by-block_value_decompositionCo clustering by-block_value_decomposition
Co clustering by-block_value_decomposition
 
Semantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual AnalysisSemantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual Analysis
 

Kürzlich hochgeladen

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Kürzlich hochgeladen (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

Information Theoretic Co Clustering

  • 1. Information-theoretic co-clustering Authors / Inderjit S. Dhillon, SubramanyamMallela and Dharmendra S. Modha Conference / ACM SIGKDD ’03, August 24-27, 2003, Washington Presenter / Meng-Lun, Wu 1
  • 2. Outline Introduction Problem Formulation Co-Clustering Algorithm Experimental Result Conclusions And Future Work 2
  • 3. Introduction (cont.) Clustering is a fundamental tool in unsupervised learning. Most clustering algorithms focus on one-way clustering. Clustering 3
  • 4. Introduction (cont.) It is often desirable to co-cluster or simultaneously cluster both dimensions. The normalized non-negative contingency table into a joint probability distribution between two discrete random variables. The optimal co-clustering is one that leads to the largest mutual information between the clustered random variables. 4
  • 5. Introduction (cont.) The optimal co-clustering is one that minimizes the loss in mutual information. The mutual information of two random variables is a quantity that measures the mutual dependence of the two variables. Formally, the mutual information can be defined as: 5
  • 6. Introduction (cont.) The Kullback-Leibler (K-L) divergence, measures the difference between two probability distributions. Given the true probability distribution p(x,y) and another distribution q(x,y) can be defined as: 6
  • 7. Problem formulation Let X and Y be discrete random variables. X: {x1,…,xm}, Y: {y1,…,yn} p(X, Y) denote the joint probability distribution. Let the k clusters of X as: Let the l clusters of Y as: {ŷ1, ŷ2, . . . , ŷl} 7
  • 8. Problem formulation (cont.) Definition An optimal co-clustering minimizes Subject to constraints on the number of row and column clusters. For a fixed co-clustering (CX,CY), we can write the loss in mutual information. 8
  • 10. Problem formulation (cont.) q(X,Y) is a distribution of the form 0.18 0.18 0.14 0.14 0.18 0.18 0.5 0.5 0.15 0.15 0.15 0.15 0.2 0.2 10 0.3 0.3 0.4 Suppose
  • 11. Co-CLUSTERING Algorithm Input : The joint probability distribution p(X,Y), k the desired number of row clusters and l the desired number of column clusters. Output: The partition functions C†X and C†Y 11
  • 12. Co-CLUSTERING Algorithm (cont.) 12 ^x3^x1 ^x3^x2
  • 13. Co-CLUSTERING Algorithm (cont.) 13 ŷ2 ŷ1 ŷ1 ŷ2
  • 14. Co-CLUSTERING Algorithm (cont.) 14 D(p||q)=0.02881
  • 15. Experimental results For our experimental results we use various subsets of the 20-Newsgroup data(NG20). We use 1D-clustering to denote document clustering without any word clustering. Evaluation Measures Micro-averaged-precision Micro-averaged-recall 15
  • 19. CONCLUSIONS AND FUTURE WORK The information-theoretic formulation for co-clustering can be guaranteed to reach a local minimum in a finite number of steps. Co-clustering for joint distribution of two random variables. In this paper, the row and column clusters are pre-specified. We hope that an information-theoretic regularization procedure may allow us to select the number of clusters. 19