SlideShare ist ein Scribd-Unternehmen logo
1 von 4
Downloaden Sie, um offline zu lesen
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 27-30
www.iosrjournals.org
www.iosrjournals.org 27 | Page
Correlation Preserving Indexing Based Text Clustering
Venkata Gopala Rao .S 1
, A. Bhanu Prasad2
1
(M.Tech, Software Engineering, Vardhaman College of Engineering/ JNTU-Hyderabad, India)
2
(Associate Professor, Department of IT, Vardhaman College of Engineering/ JNTU-Hyderabad, India)
Abstract: In Document clustering previously they presented new document clustering method based on
correlation preserving indexing. It simultaneously maximizes the correlation between the documents in the
local patches and minimizes the correlation between the documents outside these patches. Consequently, a low
dimensional semantic subspace is derived where the documents corresponding to the same semantics are close
to each other with learning level parsing procedure based CPI method. The proposed CPI method with learning
level parsing procedure is to find correlation between relational documents to avoid maximum unknown
clusters those are not effectual to find exact correlation between documents depend on accuracy of sentences.
The proposed CPI method with learning level parsing procedure in document clustering doubles the accuracy of
previous correlation coefficient. The proposed hierarchical clustering algorithm behavior is different with CPI
in terms of NMI, Accuracy.
Index Terms—Document clustering, correlation measure, correlation latent semantic indexing, dimensionality
reduction.
I. Introduction
The aim of document clustering is to automatically group related documents into clusters. Document
clustering plays vital role in machine learning and artificial intelligence and has received much attention in
recent years. Based on different existing measures number of methods have been proposed to handle document
clustering [4],[5],[6],[7],[8],[9]. In existing measures more frequently used measure is Euclidean distance. One
method that uses Euclidean distance concept is k-means method, which minimizes the sum of squared Euclidean
distance between the data points and corresponding cluster centers.
Through spectral clustering method low computation cost is achieved, in which documents are first
projected into low dimensional semantic subspace and then traditional clustering algorithm is applied for
document clustering. Latent semantic indexing (LSI) [7] is another spectral clustering method aimed at finding
the best subspaces approximation to original document space by reducing the global reconstruction
error(Euclidean distance).
Euclidean distance is dissimilarity measure space which describes the dissimilarities rather than
similarities between documents. Hence, it is not able to capture non linear manifold structure embedded in
similarities between them. Locality preserving indexing (LPI) is different clustering method based on graph
partitioning theory. This LPI method applies a weighted function to each pair wise distance to capturing
similarity structure rather than dissimilarity structure of the document. It does not overcome the limitation of
Euclidean distance. Furthermore, the selection of the weighted functions is often a difficult task.
In this Document clustering previously presented new document clustering method based on
correlation preserving indexing. It simultaneously maximizes the correlation between the documents in the
local patches and minimizes the correlation between the documents outside these patches. Consequently, a low
dimensional semantic subspace is derived where the documents corresponding to the same semantics are close
to each other proposed CPI method with learning level parsing procedure. Because previously they proposed
algorithm is support to find correlation between documents depend on words. Now we are proposing to find
correlation between relational documents to avoid maximum unknown clusters those are not effect able to find
exact correlation between documents depend on accuracy of sentences. This is a little double that the previous
correlation coefficient is only one way to find the correlation between documents depends on words. In this
paper we are proposing to find the correlation between two documents depends on parsers to get accuracy at
learning level then we are providing correlation of CPI methods.
II. Related Work
Correlation preserving indexing: Semantic structure usually implicit in high dimensional
document space. It is necessary to find a low dimensional semantic subspace in which the semantic structure can
become clear. Hence, discovering the intrinsic structure of the document space is often important task of
document clustering. Correlation as a similarity measure is suitable for capturing the manifold structure
Correlation Preserving Indexing Based Text Clustering
www.iosrjournals.org 28 | Page
embedded in the high dimensional document space because the manifold structure is often embedded in the
similarities between the documents. The correlation between two vectors (column vectors) u and v is defined as
The correlation corresponds to an angle Ө such that
Cos Ө = Corr (u , v).
The association between vectors u and v is stronger when the value of Corr(u,v) is larger.
Online document clustering aims to group documents into clusters, which belongs unsupervised
learning and it can be transformed into semi-supervised learning by using the following information:
1. If two documents are close to each other in the original document space, then they tend to be grouped into
the same cluster [8].
2. If two documents are far away from each other in the original document space, they tend to be grouped
into different clusters.
Document preprocessing: In document preprocessing set of documents are given as inputs to the
database. Then randomly choose the one particular document form database. From randomly selected
documents identify all unique words and remove stop words for finding similarity between documents.
Stemming is the process for reducing derived words to their stem, base are root form generally a written word
form. A stemming algorithm is a process in which the various form of a word are reduced to common form, for
example
 suffix Removal to generate word stem
 Grouping words
 Increase relevance
Finally term weighting is to provide the information retrieval and text categorization. In document clustering
groups together conceptually related documents thus enabling identification of duplicate words.
Fig 1. Document preprocessing
III. Actual Work
Preprocessing: Document clustering method based on correlation preserving indexing (CPI) and
which explicitly considers manifold structure embedded in the similarities between the documents. Its goal is to
find an optimal semantic subspace by simultaneously maximizing the correlations between the documents in the
local patches and minimizing the correlations between the documents outside these patches. This is different
between LSI and LPI, which are based on a dissimilarity measure (Euclidean distance), and which are focused
Correlation Preserving Indexing Based Text Clustering
www.iosrjournals.org 29 | Page
on detecting the intrinsic structure between widely separated documents rather than on detecting the intrinsic
structure between nearby documents. The similarity-measure-based CPI method aims on detecting the intrinsic
structure between nearby documents rather than on detecting the intrinsic structure between widely separated
documents. As the intrinsic semantic structure of the document space is often embedded in the similarities
between the documents and the CPI can effectively detect the intrinsic semantic structure of the high-
dimensional document space.
Correlation Preserving Indexing based Documentation clustering: The semantic structure is
usually implicit in high-dimensional document space. It is desirable to find a low dimensional semantic
subspace in which the semantic structure can become clear. Hence, discovering the intrinsic structure of the
document space is often a primary task of document clustering. Since the manifold structure is often embedded
in the similarities between the documents, correlation as a similarity measure is suitable for capturing the
manifold structure embedded in the high-dimensional document space.
K-means on Document sets: The k-means method is the methods that use the Euclidean distance,
which minimizes the sum of the squared Euclidean distance between the data points and their corresponding
cluster centers. Since the document space is always of high dimensionality and it is preferable to find a low
dimensional representation of the documents to reduce computation complexity.
Documents Classification into clusters: The aim of online document clustering is to group
documents into clusters and which belongs unsupervised learning. Further it can be transformed into semi-
supervised learning by using the following side information:
1. If two documents are close to each other in the original document space, then they tend to be grouped into
the same cluster.
2. If two documents are far away from each other in the original document space, they tend to be grouped into
different clusters.
Hierarchical clustering method: Groups the data instances into a tree of clustering in hierarchical
clustering methods. There are two major methods in hierarchical clustering methods
1. Agglomerative method
2. Divisive method
Agglomerative method is one which performs the clusters in bottom up fashion. The divisive method is
another which splits the data into smaller clusters in a top-down passion. These hierarchical methods can be
represented by using dendrograms. These methods are known for their quick termination.
Agglomerative (bottom up) - in agglomerative method data comparison start with first point( singleton)
and recursively add two or more appropriate clutters. Finally stops the comparison method when k number of
clusters achieved.
Divisive (Top down) - In Divisive method data comparison start with big cluster and recursively divide
into smaller clutters. Finally stops the comparison method when k number of clusters achieved.
Semantic-based document mining: The above figure illustrate semantic understand based
document mining that satisfies some user needs and these user needs are acquired through mining process such
as document clustering, document classification and information retrieval. The semantic understanding based
document mining undergoes parsing procedure and parsing step comprises semantic analysis to extract
systematic structure descriptions. In this paper proposing CPI based learning level parsing procedure which
improves the accuracy compared to CPI clustering algorithm.
Fig 2. Semantic based document mining
Correlation Preserving Indexing Based Text Clustering
www.iosrjournals.org 30 | Page
IV. Performance Analysis
This proposed approach of correlation preserving indexing illustrates and evaluates the performance of
all the approaches. We analyze our proposed scheme shows or works better than other existing systems
(LSI,LPI) in terms of memory, storage, generalization error, performance. Hierarchical clustering algorithm and
learning level parsing enhances the accuracy and performance.
V. Conclusion
The proposed system is document clustering method based on correlation preserving indexing and it
simultaneously maximizes the correlation between the documents in the local patches and minimizes the
correlation between the documents outside these patches. The CPI method with learning level parsing procedure
is to find correlation between relational documents to avoid maximum unknown clusters those are not effectual
to find exact correlation between documents depend on accuracy of sentences. CPI method has good
generalization capability and it can effectively deals with very large size data. The proposed CPI method with
learning level parsing procedure in document clustering doubles the accuracy of previous correlation coefficient.
References
[1] Taiping Zhang, yuan yan tan, Bin Fang Young Xiang “ document clutering in correlation limilarity measure space” IEEE
TRANSACTIONS ON KNOWLEDGE AND DATA ENGNEERING, vol. 24, no.6, june 2012.
[2] R.T. Ng and J. Han, “Efficient and Effective Clustering Methods for Spatial Data Mining,” Proc. 20th Int’l Conf. Very Large Data
Bases (VLDB), page. 144-155, 1994.
[3] A.K. Jain, M.N. Murty, and P.J. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, page. 264-323,
1999.
[4] P. Pintelas and S. Kotsiantis “Recent Advances in Clustering: A Brief Survey,” WSEAS Trans. Information Science and
Applications, vol. 1, no. 1, page. 73-81, 2004.
[5] J.B. MacQueen, “Some Methods for Classification and Analysis of Multivariate Observations,” Proc. Fifth Berkeley Symp. Math.
Statistics and Probability, vol. 1, page. 281-297, 1967.
[6] A.K. McCallum and L.D. Baker “Distributional Clustering of Words for Text Classification,” Proc. 21st Ann. Int’l ACM SIGIR
Conf. Research and Development in Information Retrieval, page. 96-103, 1998.
[7] X. Liu, Y. Gong, W. Xu, and S. Zhu, “Document Clustering with Cluster Refinement and Model Selection Capabilities,” Proc. 25th
Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR ’02), page. 191-198, 2002.
[8] S.C. Deerwester, S.T. Dumais, T.K. Landauer, G.W. Furnas, and R.A. Harshman, “Indexing by Latent Semantic Analysis,” J. Am.
Soc. Information Science, vol. 41, no. 6, pp. 391-407, 1990.
[9] j. han and D. Cai, X. He, “Document Clustering Using Locality Preserving Indexing,” IEEE Trans. Knowledge and Data Eng., vol.
17, no. 12, page. 1624-1637, Dec. 2005.
[10] y. gong and W. Xu, X. Liu, “Document Clustering Based on Non- Negative Matrix Factorization,” Proc. 26th Ann. Int’l ACM
SIGIR Conf. Research and Development in Informaion Retrieval (SIGIR ’03), page. 267-273, 2003.
[11] P. Achananuparp, X.-J. Shen and X.-H. Hu “The Evaluation of Sentence Similarity Measures,” Proc. 10th International Conference
on Data Warehousing and Knowledge Discovery (DaWak), 2008, page. 305-316.
[12] J.-P. Bao, Q.-B. Songand J.-Y. Shen, X.-D. Liu“A New Text Feature Extraction Model and Its Application in Document Copy
Detection,” Proc. 2nd International Conference on Machine Learning and Cybernetics, 2003, page. 82-87.
[13] S. Manandhar and M. D. Boni “An Analysis of Clarification Dialogue for Question Answering,” Proc. HLT-NAACL, 2003, page.
48-55.
[14] R. Mihalcea and C. Corley “Measuring the Semantic Similarity of Texts,” Proc. ACL Workshop on Empirical Modeling of
Semantic Equivalence and Entailment, 2005, page. 13-18.
[15] J. Feng, Y.-M. Zhou, and T. Martin, “Sentence Similarity based on Relevance,” Proc. IPMU, 2008, page. 832-839.
[16] C. Ho, M. A. A. Murad, S. C. Doraisamy and R. A. Kadir “Word Sense Disambiguation-based Sentence Similarity,” 23rd
International Conference of Computational Linguistics (COLING), 2010, in press.

Weitere ähnliche Inhalte

Was ist angesagt?

G04124041046
G04124041046G04124041046
G04124041046IOSR-JEN
 
A Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text DocumentsA Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text DocumentsIJMER
 
SEARCH OF INFORMATION BASED CONTENT IN SEMI-STRUCTURED DOCUMENTS USING INTERF...
SEARCH OF INFORMATION BASED CONTENT IN SEMI-STRUCTURED DOCUMENTS USING INTERF...SEARCH OF INFORMATION BASED CONTENT IN SEMI-STRUCTURED DOCUMENTS USING INTERF...
SEARCH OF INFORMATION BASED CONTENT IN SEMI-STRUCTURED DOCUMENTS USING INTERF...ijcsitcejournal
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALijaia
 
Information extraction using discourse
Information extraction using discourseInformation extraction using discourse
Information extraction using discourseijitcs
 
Hierarchal clustering and similarity measures along with multi representation
Hierarchal clustering and similarity measures along with multi representationHierarchal clustering and similarity measures along with multi representation
Hierarchal clustering and similarity measures along with multi representationeSAT Journals
 
Hierarchal clustering and similarity measures along
Hierarchal clustering and similarity measures alongHierarchal clustering and similarity measures along
Hierarchal clustering and similarity measures alongeSAT Publishing House
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureIOSR Journals
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural networkINFOGAIN PUBLICATION
 
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MININGA CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MININGijcsit
 
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE ijdms
 
Blei ngjordan2003
Blei ngjordan2003Blei ngjordan2003
Blei ngjordan2003Ajay Ohri
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 

Was ist angesagt? (15)

G04124041046
G04124041046G04124041046
G04124041046
 
A Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text DocumentsA Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text Documents
 
SEARCH OF INFORMATION BASED CONTENT IN SEMI-STRUCTURED DOCUMENTS USING INTERF...
SEARCH OF INFORMATION BASED CONTENT IN SEMI-STRUCTURED DOCUMENTS USING INTERF...SEARCH OF INFORMATION BASED CONTENT IN SEMI-STRUCTURED DOCUMENTS USING INTERF...
SEARCH OF INFORMATION BASED CONTENT IN SEMI-STRUCTURED DOCUMENTS USING INTERF...
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
 
Information extraction using discourse
Information extraction using discourseInformation extraction using discourse
Information extraction using discourse
 
Hierarchal clustering and similarity measures along with multi representation
Hierarchal clustering and similarity measures along with multi representationHierarchal clustering and similarity measures along with multi representation
Hierarchal clustering and similarity measures along with multi representation
 
Hierarchal clustering and similarity measures along
Hierarchal clustering and similarity measures alongHierarchal clustering and similarity measures along
Hierarchal clustering and similarity measures along
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity Measure
 
L0261075078
L0261075078L0261075078
L0261075078
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network
 
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MININGA CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
 
Bl24409420
Bl24409420Bl24409420
Bl24409420
 
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
 
Blei ngjordan2003
Blei ngjordan2003Blei ngjordan2003
Blei ngjordan2003
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 

Andere mochten auch

Dependence of Power Factor on Inductive Loads for Microcontroller based Power...
Dependence of Power Factor on Inductive Loads for Microcontroller based Power...Dependence of Power Factor on Inductive Loads for Microcontroller based Power...
Dependence of Power Factor on Inductive Loads for Microcontroller based Power...IOSR Journals
 
A Solution to Optimal Power Flow Problem using Artificial Bee Colony Algorith...
A Solution to Optimal Power Flow Problem using Artificial Bee Colony Algorith...A Solution to Optimal Power Flow Problem using Artificial Bee Colony Algorith...
A Solution to Optimal Power Flow Problem using Artificial Bee Colony Algorith...IOSR Journals
 
Generation and Implementation of Barker and Nested Binary codes
Generation and Implementation of Barker and Nested Binary codesGeneration and Implementation of Barker and Nested Binary codes
Generation and Implementation of Barker and Nested Binary codesIOSR Journals
 
High Security Cryptographic Technique Using Steganography and Chaotic Image E...
High Security Cryptographic Technique Using Steganography and Chaotic Image E...High Security Cryptographic Technique Using Steganography and Chaotic Image E...
High Security Cryptographic Technique Using Steganography and Chaotic Image E...IOSR Journals
 
Analyzing the Effect of Varying CBR on AODV, DSR, IERP Routing Protocols in M...
Analyzing the Effect of Varying CBR on AODV, DSR, IERP Routing Protocols in M...Analyzing the Effect of Varying CBR on AODV, DSR, IERP Routing Protocols in M...
Analyzing the Effect of Varying CBR on AODV, DSR, IERP Routing Protocols in M...IOSR Journals
 
Analysis and Implementation of Efficient Association Rules using K-mean and N...
Analysis and Implementation of Efficient Association Rules using K-mean and N...Analysis and Implementation of Efficient Association Rules using K-mean and N...
Analysis and Implementation of Efficient Association Rules using K-mean and N...IOSR Journals
 
Gene Selection for Sample Classification in Microarray: Clustering Based Method
Gene Selection for Sample Classification in Microarray: Clustering Based MethodGene Selection for Sample Classification in Microarray: Clustering Based Method
Gene Selection for Sample Classification in Microarray: Clustering Based MethodIOSR Journals
 
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...IOSR Journals
 
A Modified KS-test for Feature Selection
A Modified KS-test for Feature SelectionA Modified KS-test for Feature Selection
A Modified KS-test for Feature SelectionIOSR Journals
 
Comparison of different Ant based techniques for identification of shortest p...
Comparison of different Ant based techniques for identification of shortest p...Comparison of different Ant based techniques for identification of shortest p...
Comparison of different Ant based techniques for identification of shortest p...IOSR Journals
 
A Load Aware Proposal for Maximum Available Bandwidth Routing in Wireless Mes...
A Load Aware Proposal for Maximum Available Bandwidth Routing in Wireless Mes...A Load Aware Proposal for Maximum Available Bandwidth Routing in Wireless Mes...
A Load Aware Proposal for Maximum Available Bandwidth Routing in Wireless Mes...IOSR Journals
 
Open Source Software Survivability Analysis Using Communication Pattern Valid...
Open Source Software Survivability Analysis Using Communication Pattern Valid...Open Source Software Survivability Analysis Using Communication Pattern Valid...
Open Source Software Survivability Analysis Using Communication Pattern Valid...IOSR Journals
 
Improved AODV based on Load and Delay for Route Discovery in MANET
Improved AODV based on Load and Delay for Route Discovery in MANETImproved AODV based on Load and Delay for Route Discovery in MANET
Improved AODV based on Load and Delay for Route Discovery in MANETIOSR Journals
 
Hand Vien Recognotion Using Matlab
Hand Vien Recognotion Using MatlabHand Vien Recognotion Using Matlab
Hand Vien Recognotion Using MatlabIOSR Journals
 
New Skills for Testers
New Skills for TestersNew Skills for Testers
New Skills for TestersIOSR Journals
 
A Survey on Network Layer Multicast Routing Protocols for Mobile Ad Hoc Netw...
A Survey on Network Layer Multicast Routing Protocols for  Mobile Ad Hoc Netw...A Survey on Network Layer Multicast Routing Protocols for  Mobile Ad Hoc Netw...
A Survey on Network Layer Multicast Routing Protocols for Mobile Ad Hoc Netw...IOSR Journals
 
Improved Fuzzy Control Strategy for Power Quality in Distributed Generation’s...
Improved Fuzzy Control Strategy for Power Quality in Distributed Generation’s...Improved Fuzzy Control Strategy for Power Quality in Distributed Generation’s...
Improved Fuzzy Control Strategy for Power Quality in Distributed Generation’s...IOSR Journals
 
The Delivery of Web Mining in Healthcare System on Cloud Computing
The Delivery of Web Mining in Healthcare System on Cloud ComputingThe Delivery of Web Mining in Healthcare System on Cloud Computing
The Delivery of Web Mining in Healthcare System on Cloud ComputingIOSR Journals
 

Andere mochten auch (20)

Dependence of Power Factor on Inductive Loads for Microcontroller based Power...
Dependence of Power Factor on Inductive Loads for Microcontroller based Power...Dependence of Power Factor on Inductive Loads for Microcontroller based Power...
Dependence of Power Factor on Inductive Loads for Microcontroller based Power...
 
A Solution to Optimal Power Flow Problem using Artificial Bee Colony Algorith...
A Solution to Optimal Power Flow Problem using Artificial Bee Colony Algorith...A Solution to Optimal Power Flow Problem using Artificial Bee Colony Algorith...
A Solution to Optimal Power Flow Problem using Artificial Bee Colony Algorith...
 
Generation and Implementation of Barker and Nested Binary codes
Generation and Implementation of Barker and Nested Binary codesGeneration and Implementation of Barker and Nested Binary codes
Generation and Implementation of Barker and Nested Binary codes
 
High Security Cryptographic Technique Using Steganography and Chaotic Image E...
High Security Cryptographic Technique Using Steganography and Chaotic Image E...High Security Cryptographic Technique Using Steganography and Chaotic Image E...
High Security Cryptographic Technique Using Steganography and Chaotic Image E...
 
Analyzing the Effect of Varying CBR on AODV, DSR, IERP Routing Protocols in M...
Analyzing the Effect of Varying CBR on AODV, DSR, IERP Routing Protocols in M...Analyzing the Effect of Varying CBR on AODV, DSR, IERP Routing Protocols in M...
Analyzing the Effect of Varying CBR on AODV, DSR, IERP Routing Protocols in M...
 
Analysis and Implementation of Efficient Association Rules using K-mean and N...
Analysis and Implementation of Efficient Association Rules using K-mean and N...Analysis and Implementation of Efficient Association Rules using K-mean and N...
Analysis and Implementation of Efficient Association Rules using K-mean and N...
 
Gene Selection for Sample Classification in Microarray: Clustering Based Method
Gene Selection for Sample Classification in Microarray: Clustering Based MethodGene Selection for Sample Classification in Microarray: Clustering Based Method
Gene Selection for Sample Classification in Microarray: Clustering Based Method
 
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
 
A Modified KS-test for Feature Selection
A Modified KS-test for Feature SelectionA Modified KS-test for Feature Selection
A Modified KS-test for Feature Selection
 
Comparison of different Ant based techniques for identification of shortest p...
Comparison of different Ant based techniques for identification of shortest p...Comparison of different Ant based techniques for identification of shortest p...
Comparison of different Ant based techniques for identification of shortest p...
 
A Load Aware Proposal for Maximum Available Bandwidth Routing in Wireless Mes...
A Load Aware Proposal for Maximum Available Bandwidth Routing in Wireless Mes...A Load Aware Proposal for Maximum Available Bandwidth Routing in Wireless Mes...
A Load Aware Proposal for Maximum Available Bandwidth Routing in Wireless Mes...
 
Open Source Software Survivability Analysis Using Communication Pattern Valid...
Open Source Software Survivability Analysis Using Communication Pattern Valid...Open Source Software Survivability Analysis Using Communication Pattern Valid...
Open Source Software Survivability Analysis Using Communication Pattern Valid...
 
Improved AODV based on Load and Delay for Route Discovery in MANET
Improved AODV based on Load and Delay for Route Discovery in MANETImproved AODV based on Load and Delay for Route Discovery in MANET
Improved AODV based on Load and Delay for Route Discovery in MANET
 
C0410912
C0410912C0410912
C0410912
 
Hand Vien Recognotion Using Matlab
Hand Vien Recognotion Using MatlabHand Vien Recognotion Using Matlab
Hand Vien Recognotion Using Matlab
 
New Skills for Testers
New Skills for TestersNew Skills for Testers
New Skills for Testers
 
A Survey on Network Layer Multicast Routing Protocols for Mobile Ad Hoc Netw...
A Survey on Network Layer Multicast Routing Protocols for  Mobile Ad Hoc Netw...A Survey on Network Layer Multicast Routing Protocols for  Mobile Ad Hoc Netw...
A Survey on Network Layer Multicast Routing Protocols for Mobile Ad Hoc Netw...
 
B0440509
B0440509B0440509
B0440509
 
Improved Fuzzy Control Strategy for Power Quality in Distributed Generation’s...
Improved Fuzzy Control Strategy for Power Quality in Distributed Generation’s...Improved Fuzzy Control Strategy for Power Quality in Distributed Generation’s...
Improved Fuzzy Control Strategy for Power Quality in Distributed Generation’s...
 
The Delivery of Web Mining in Healthcare System on Cloud Computing
The Delivery of Web Mining in Healthcare System on Cloud ComputingThe Delivery of Web Mining in Healthcare System on Cloud Computing
The Delivery of Web Mining in Healthcare System on Cloud Computing
 

Ähnlich wie Correlation Preserving Indexing for Text Clustering

Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Editor IJARCET
 
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...ijdmtaiir
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringA Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Challenging Issues and Similarity Measures for Web Document Clustering
Challenging Issues and Similarity Measures for Web Document ClusteringChallenging Issues and Similarity Measures for Web Document Clustering
Challenging Issues and Similarity Measures for Web Document ClusteringIOSR Journals
 
An efficient approach for semantically enhanced document clustering by using ...
An efficient approach for semantically enhanced document clustering by using ...An efficient approach for semantically enhanced document clustering by using ...
An efficient approach for semantically enhanced document clustering by using ...ijaia
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data toIJwest
 
Co-Clustering For Cross-Domain Text Classification
Co-Clustering For Cross-Domain Text ClassificationCo-Clustering For Cross-Domain Text Classification
Co-Clustering For Cross-Domain Text Classificationpaperpublications3
 
AN EFFICIENT APPROACH FOR SEMANTICALLYENHANCED DOCUMENT CLUSTERING BY USING W...
AN EFFICIENT APPROACH FOR SEMANTICALLYENHANCED DOCUMENT CLUSTERING BY USING W...AN EFFICIENT APPROACH FOR SEMANTICALLYENHANCED DOCUMENT CLUSTERING BY USING W...
AN EFFICIENT APPROACH FOR SEMANTICALLYENHANCED DOCUMENT CLUSTERING BY USING W...ijaia
 
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...IOSR Journals
 
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Editor IJMTER
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) ijceronline
 

Ähnlich wie Correlation Preserving Indexing for Text Clustering (20)

Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973
 
F017243241
F017243241F017243241
F017243241
 
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
E1062530
E1062530E1062530
E1062530
 
L0261075078
L0261075078L0261075078
L0261075078
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringA Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
J017145559
J017145559J017145559
J017145559
 
Challenging Issues and Similarity Measures for Web Document Clustering
Challenging Issues and Similarity Measures for Web Document ClusteringChallenging Issues and Similarity Measures for Web Document Clustering
Challenging Issues and Similarity Measures for Web Document Clustering
 
An efficient approach for semantically enhanced document clustering by using ...
An efficient approach for semantically enhanced document clustering by using ...An efficient approach for semantically enhanced document clustering by using ...
An efficient approach for semantically enhanced document clustering by using ...
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data to
 
Co-Clustering For Cross-Domain Text Classification
Co-Clustering For Cross-Domain Text ClassificationCo-Clustering For Cross-Domain Text Classification
Co-Clustering For Cross-Domain Text Classification
 
AN EFFICIENT APPROACH FOR SEMANTICALLYENHANCED DOCUMENT CLUSTERING BY USING W...
AN EFFICIENT APPROACH FOR SEMANTICALLYENHANCED DOCUMENT CLUSTERING BY USING W...AN EFFICIENT APPROACH FOR SEMANTICALLYENHANCED DOCUMENT CLUSTERING BY USING W...
AN EFFICIENT APPROACH FOR SEMANTICALLYENHANCED DOCUMENT CLUSTERING BY USING W...
 
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
 
L017158389
L017158389L017158389
L017158389
 
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
 
AN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERING
AN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERINGAN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERING
AN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERING
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 

Mehr von IOSR Journals (20)

A011140104
A011140104A011140104
A011140104
 
M0111397100
M0111397100M0111397100
M0111397100
 
L011138596
L011138596L011138596
L011138596
 
K011138084
K011138084K011138084
K011138084
 
J011137479
J011137479J011137479
J011137479
 
I011136673
I011136673I011136673
I011136673
 
G011134454
G011134454G011134454
G011134454
 
H011135565
H011135565H011135565
H011135565
 
F011134043
F011134043F011134043
F011134043
 
E011133639
E011133639E011133639
E011133639
 
D011132635
D011132635D011132635
D011132635
 
C011131925
C011131925C011131925
C011131925
 
B011130918
B011130918B011130918
B011130918
 
A011130108
A011130108A011130108
A011130108
 
I011125160
I011125160I011125160
I011125160
 
H011124050
H011124050H011124050
H011124050
 
G011123539
G011123539G011123539
G011123539
 
F011123134
F011123134F011123134
F011123134
 
E011122530
E011122530E011122530
E011122530
 
D011121524
D011121524D011121524
D011121524
 

Kürzlich hochgeladen

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 

Kürzlich hochgeladen (20)

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 

Correlation Preserving Indexing for Text Clustering

  • 1. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 27-30 www.iosrjournals.org www.iosrjournals.org 27 | Page Correlation Preserving Indexing Based Text Clustering Venkata Gopala Rao .S 1 , A. Bhanu Prasad2 1 (M.Tech, Software Engineering, Vardhaman College of Engineering/ JNTU-Hyderabad, India) 2 (Associate Professor, Department of IT, Vardhaman College of Engineering/ JNTU-Hyderabad, India) Abstract: In Document clustering previously they presented new document clustering method based on correlation preserving indexing. It simultaneously maximizes the correlation between the documents in the local patches and minimizes the correlation between the documents outside these patches. Consequently, a low dimensional semantic subspace is derived where the documents corresponding to the same semantics are close to each other with learning level parsing procedure based CPI method. The proposed CPI method with learning level parsing procedure is to find correlation between relational documents to avoid maximum unknown clusters those are not effectual to find exact correlation between documents depend on accuracy of sentences. The proposed CPI method with learning level parsing procedure in document clustering doubles the accuracy of previous correlation coefficient. The proposed hierarchical clustering algorithm behavior is different with CPI in terms of NMI, Accuracy. Index Terms—Document clustering, correlation measure, correlation latent semantic indexing, dimensionality reduction. I. Introduction The aim of document clustering is to automatically group related documents into clusters. Document clustering plays vital role in machine learning and artificial intelligence and has received much attention in recent years. Based on different existing measures number of methods have been proposed to handle document clustering [4],[5],[6],[7],[8],[9]. In existing measures more frequently used measure is Euclidean distance. One method that uses Euclidean distance concept is k-means method, which minimizes the sum of squared Euclidean distance between the data points and corresponding cluster centers. Through spectral clustering method low computation cost is achieved, in which documents are first projected into low dimensional semantic subspace and then traditional clustering algorithm is applied for document clustering. Latent semantic indexing (LSI) [7] is another spectral clustering method aimed at finding the best subspaces approximation to original document space by reducing the global reconstruction error(Euclidean distance). Euclidean distance is dissimilarity measure space which describes the dissimilarities rather than similarities between documents. Hence, it is not able to capture non linear manifold structure embedded in similarities between them. Locality preserving indexing (LPI) is different clustering method based on graph partitioning theory. This LPI method applies a weighted function to each pair wise distance to capturing similarity structure rather than dissimilarity structure of the document. It does not overcome the limitation of Euclidean distance. Furthermore, the selection of the weighted functions is often a difficult task. In this Document clustering previously presented new document clustering method based on correlation preserving indexing. It simultaneously maximizes the correlation between the documents in the local patches and minimizes the correlation between the documents outside these patches. Consequently, a low dimensional semantic subspace is derived where the documents corresponding to the same semantics are close to each other proposed CPI method with learning level parsing procedure. Because previously they proposed algorithm is support to find correlation between documents depend on words. Now we are proposing to find correlation between relational documents to avoid maximum unknown clusters those are not effect able to find exact correlation between documents depend on accuracy of sentences. This is a little double that the previous correlation coefficient is only one way to find the correlation between documents depends on words. In this paper we are proposing to find the correlation between two documents depends on parsers to get accuracy at learning level then we are providing correlation of CPI methods. II. Related Work Correlation preserving indexing: Semantic structure usually implicit in high dimensional document space. It is necessary to find a low dimensional semantic subspace in which the semantic structure can become clear. Hence, discovering the intrinsic structure of the document space is often important task of document clustering. Correlation as a similarity measure is suitable for capturing the manifold structure
  • 2. Correlation Preserving Indexing Based Text Clustering www.iosrjournals.org 28 | Page embedded in the high dimensional document space because the manifold structure is often embedded in the similarities between the documents. The correlation between two vectors (column vectors) u and v is defined as The correlation corresponds to an angle Ө such that Cos Ө = Corr (u , v). The association between vectors u and v is stronger when the value of Corr(u,v) is larger. Online document clustering aims to group documents into clusters, which belongs unsupervised learning and it can be transformed into semi-supervised learning by using the following information: 1. If two documents are close to each other in the original document space, then they tend to be grouped into the same cluster [8]. 2. If two documents are far away from each other in the original document space, they tend to be grouped into different clusters. Document preprocessing: In document preprocessing set of documents are given as inputs to the database. Then randomly choose the one particular document form database. From randomly selected documents identify all unique words and remove stop words for finding similarity between documents. Stemming is the process for reducing derived words to their stem, base are root form generally a written word form. A stemming algorithm is a process in which the various form of a word are reduced to common form, for example  suffix Removal to generate word stem  Grouping words  Increase relevance Finally term weighting is to provide the information retrieval and text categorization. In document clustering groups together conceptually related documents thus enabling identification of duplicate words. Fig 1. Document preprocessing III. Actual Work Preprocessing: Document clustering method based on correlation preserving indexing (CPI) and which explicitly considers manifold structure embedded in the similarities between the documents. Its goal is to find an optimal semantic subspace by simultaneously maximizing the correlations between the documents in the local patches and minimizing the correlations between the documents outside these patches. This is different between LSI and LPI, which are based on a dissimilarity measure (Euclidean distance), and which are focused
  • 3. Correlation Preserving Indexing Based Text Clustering www.iosrjournals.org 29 | Page on detecting the intrinsic structure between widely separated documents rather than on detecting the intrinsic structure between nearby documents. The similarity-measure-based CPI method aims on detecting the intrinsic structure between nearby documents rather than on detecting the intrinsic structure between widely separated documents. As the intrinsic semantic structure of the document space is often embedded in the similarities between the documents and the CPI can effectively detect the intrinsic semantic structure of the high- dimensional document space. Correlation Preserving Indexing based Documentation clustering: The semantic structure is usually implicit in high-dimensional document space. It is desirable to find a low dimensional semantic subspace in which the semantic structure can become clear. Hence, discovering the intrinsic structure of the document space is often a primary task of document clustering. Since the manifold structure is often embedded in the similarities between the documents, correlation as a similarity measure is suitable for capturing the manifold structure embedded in the high-dimensional document space. K-means on Document sets: The k-means method is the methods that use the Euclidean distance, which minimizes the sum of the squared Euclidean distance between the data points and their corresponding cluster centers. Since the document space is always of high dimensionality and it is preferable to find a low dimensional representation of the documents to reduce computation complexity. Documents Classification into clusters: The aim of online document clustering is to group documents into clusters and which belongs unsupervised learning. Further it can be transformed into semi- supervised learning by using the following side information: 1. If two documents are close to each other in the original document space, then they tend to be grouped into the same cluster. 2. If two documents are far away from each other in the original document space, they tend to be grouped into different clusters. Hierarchical clustering method: Groups the data instances into a tree of clustering in hierarchical clustering methods. There are two major methods in hierarchical clustering methods 1. Agglomerative method 2. Divisive method Agglomerative method is one which performs the clusters in bottom up fashion. The divisive method is another which splits the data into smaller clusters in a top-down passion. These hierarchical methods can be represented by using dendrograms. These methods are known for their quick termination. Agglomerative (bottom up) - in agglomerative method data comparison start with first point( singleton) and recursively add two or more appropriate clutters. Finally stops the comparison method when k number of clusters achieved. Divisive (Top down) - In Divisive method data comparison start with big cluster and recursively divide into smaller clutters. Finally stops the comparison method when k number of clusters achieved. Semantic-based document mining: The above figure illustrate semantic understand based document mining that satisfies some user needs and these user needs are acquired through mining process such as document clustering, document classification and information retrieval. The semantic understanding based document mining undergoes parsing procedure and parsing step comprises semantic analysis to extract systematic structure descriptions. In this paper proposing CPI based learning level parsing procedure which improves the accuracy compared to CPI clustering algorithm. Fig 2. Semantic based document mining
  • 4. Correlation Preserving Indexing Based Text Clustering www.iosrjournals.org 30 | Page IV. Performance Analysis This proposed approach of correlation preserving indexing illustrates and evaluates the performance of all the approaches. We analyze our proposed scheme shows or works better than other existing systems (LSI,LPI) in terms of memory, storage, generalization error, performance. Hierarchical clustering algorithm and learning level parsing enhances the accuracy and performance. V. Conclusion The proposed system is document clustering method based on correlation preserving indexing and it simultaneously maximizes the correlation between the documents in the local patches and minimizes the correlation between the documents outside these patches. The CPI method with learning level parsing procedure is to find correlation between relational documents to avoid maximum unknown clusters those are not effectual to find exact correlation between documents depend on accuracy of sentences. CPI method has good generalization capability and it can effectively deals with very large size data. The proposed CPI method with learning level parsing procedure in document clustering doubles the accuracy of previous correlation coefficient. References [1] Taiping Zhang, yuan yan tan, Bin Fang Young Xiang “ document clutering in correlation limilarity measure space” IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGNEERING, vol. 24, no.6, june 2012. [2] R.T. Ng and J. Han, “Efficient and Effective Clustering Methods for Spatial Data Mining,” Proc. 20th Int’l Conf. Very Large Data Bases (VLDB), page. 144-155, 1994. [3] A.K. Jain, M.N. Murty, and P.J. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, page. 264-323, 1999. [4] P. Pintelas and S. Kotsiantis “Recent Advances in Clustering: A Brief Survey,” WSEAS Trans. Information Science and Applications, vol. 1, no. 1, page. 73-81, 2004. [5] J.B. MacQueen, “Some Methods for Classification and Analysis of Multivariate Observations,” Proc. Fifth Berkeley Symp. Math. Statistics and Probability, vol. 1, page. 281-297, 1967. [6] A.K. McCallum and L.D. Baker “Distributional Clustering of Words for Text Classification,” Proc. 21st Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval, page. 96-103, 1998. [7] X. Liu, Y. Gong, W. Xu, and S. Zhu, “Document Clustering with Cluster Refinement and Model Selection Capabilities,” Proc. 25th Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR ’02), page. 191-198, 2002. [8] S.C. Deerwester, S.T. Dumais, T.K. Landauer, G.W. Furnas, and R.A. Harshman, “Indexing by Latent Semantic Analysis,” J. Am. Soc. Information Science, vol. 41, no. 6, pp. 391-407, 1990. [9] j. han and D. Cai, X. He, “Document Clustering Using Locality Preserving Indexing,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 12, page. 1624-1637, Dec. 2005. [10] y. gong and W. Xu, X. Liu, “Document Clustering Based on Non- Negative Matrix Factorization,” Proc. 26th Ann. Int’l ACM SIGIR Conf. Research and Development in Informaion Retrieval (SIGIR ’03), page. 267-273, 2003. [11] P. Achananuparp, X.-J. Shen and X.-H. Hu “The Evaluation of Sentence Similarity Measures,” Proc. 10th International Conference on Data Warehousing and Knowledge Discovery (DaWak), 2008, page. 305-316. [12] J.-P. Bao, Q.-B. Songand J.-Y. Shen, X.-D. Liu“A New Text Feature Extraction Model and Its Application in Document Copy Detection,” Proc. 2nd International Conference on Machine Learning and Cybernetics, 2003, page. 82-87. [13] S. Manandhar and M. D. Boni “An Analysis of Clarification Dialogue for Question Answering,” Proc. HLT-NAACL, 2003, page. 48-55. [14] R. Mihalcea and C. Corley “Measuring the Semantic Similarity of Texts,” Proc. ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, 2005, page. 13-18. [15] J. Feng, Y.-M. Zhou, and T. Martin, “Sentence Similarity based on Relevance,” Proc. IPMU, 2008, page. 832-839. [16] C. Ho, M. A. A. Murad, S. C. Doraisamy and R. A. Kadir “Word Sense Disambiguation-based Sentence Similarity,” 23rd International Conference of Computational Linguistics (COLING), 2010, in press.