SlideShare ist ein Scribd-Unternehmen logo
1 von 6
Downloaden Sie, um offline zu lesen
International Journal of Engineering Research and Development
e-ISSN: 2278-067X, p-ISSN: 2278-800X, www.ijerd.com
Volume 9, Issue 3 (December 2013), PP. 28-33

Concept and Term Based Similarity Measure for Text
Classification and Clustering
B.Sindhiya1, Dr.N.Tajunisha2
1

M.Phil Scholar Department Of Computer Science Sri Ramakrishna College Of Arts And Science For Women
Coimbatore, India.
2
M.Phil, Ph.D Associate Professor Department Of Computer Science Sri Ramakrishna College Of Arts And
Science For Women Coimbatore, India.

Abstract:- The exploitation of syntactic structures and semantic background knowledge has always been an
appealing subject in the context of data mining, text retrieval and information management. The usefulness of
this kind of information has been shown most prominently in highly specialized tasks, such as text
categorization scenarios. So far, however, additional syntactic or semantic information has been used only
individually. In this paper, a new principle approach , the concept and term based similarity measure, which
incorporates linguistic and semantic structures, using syntactic dependencies, and semantic background
knowledge is proposed. This novel method represents the meaning of texts in a high-dimensional space of
concepts derived from WordNet. A number of case studies have been included in the research to demonstrate
the various aspects of this framework.
Index Terms:- Document classification, document clustering, similarity measure, accuracy, classifiers,
clustering algorithms.

I.

INTRODUCTION

Clustering maps the data items into clusters, where clusters are natural grouping of data items based on
similarity or probability density methods. Unlike classification and prediction which analyzes class-label data
objects, clustering analyzes data objects without class-labels and tries to generate such labels. A similarity
measure or similarity function is a real-valued function that quantifies the similarity between two objects.
The similarity measure reflects the degree of closeness or separation of the target objects and should
correspond to the characteristics that are believed to distinguish the clusters embedded in the data. Before
Clustering, a similarity/distance measure must be determined. [1]. Choosing an appropriate similarity measure is
also crucial for cluster analysis, especially for a particular type of clustering algorithms.
Text Categorization (TC) is the classification of documents with respect to a set of one or more preexisting
categories [2]. The classification phase consists of generating a weighted vector for all categories, then using a
similarity measure to find the closest category. The similarity measure is used to determine the degree of
resemblance between two vectors. To achieve reasonable classification results, a similarity measure should
generally respond with larger values to documents that belong to the same class and with smaller values
otherwise. During the last decades, a large number of methods proposed for text categorization were typically
based on the classical Bag-of-Words model where each term or term stem is an independent feature.
The existing similarity measure was more frequently used to assess the similarity between
words.Although the information theoretic similarity measure results are statistically significant it does not
reduce the dimension of the vector model [3].Metric distances such as Euclidean distance are not appropriate for
high dimension and sparse domains. Due to the ignorance of any relation between words, the learning
algorithms are restricted to detect patterns in the used terminology only, while conceptual patterns remain
ignored.
Existing approaches requires performing an optimization over an entire collection of documents. Most
of these techniques are computationally expensive.

II.

RELATED WORKS

Similarity measures have been extensively used in text classification and clustering algorithms. The
spherical k-means algorithm [4] adopted the cosine similarity measure for document clustering. In this method
the unlabeled document collections are becoming increasingly common and available. Using words as features,
text documents are often represented as high-dimensional and sparse vectors. The algorithm outputs k disjoint
clusters each with a concept vector that is the centroid of the cluster normalized to have unit Euclidean norm.
Derrick Higgins [5] adopted a cosine-based pairwise adaptive similarity measure for document
clustering. Pairwise-adaptive similarity measure for large high dimensional document datasets improves the

28
Concept and Term Based Similarity Measure for Text Classification and Clustering
unsupervised clustering quality and speed compared to the original cosine similarity measure. Zoulficar younes
[6] reported results of clustering experiments with clustering algorithms and 12 different text data sets, and
concluded that the objective function based on cosine similarity “leads to the best solutions irrespective of the
number of clusters for most of the data sets.”
Daphe koller [7] proposed a divisive information-theoretic feature clustering algorithm for text
classification using the Kullback-Leibler divergence. High dimensionality of text can be a deterrent in applying
complex learners such as Support Vector Machines to the task of text classification
Kullback [8] combined squared Euclidean distance with relative entropy in a k-means like clustering
algorithm. K means algorithm introduced recently is specifically designed to handle unit length document
vectors. [6] conclude that the objective function based on cosine similarity leads to the best solutions
irrespective of the number of clusters for most of the data sets.
Craven [9] performed document clustering based on the proposed phrase based similarity measure. The
phrase-based document similarity to compute the pair-wise similarities of documents based on the Suffix Tree
Document (STD) model. By mapping each node in the suffix tree of STD model into a unique feature term in
the Vector Space Document (VSD) model, the phrase-based document similarity naturally inherits the term tfidf weighting scheme in computing the document similarity with phrases.

III.

PROPOSED METHODOLOGY

In this chapter the proposed concept and term based similarity between the documents is illustrated.
The proposed system, also measure the semantic similarity between the terms and concepts with use of wordnet
tool and tree tagger tool.
3.1 CSMTP (CONCEPT BASED SIMILARITY MEASURE FOR TEXT PROCESSING) ALGORITHM
The CSMTP algorithm selects the terms from the testing documents, Generates the terms from the
document, selects the appropriate feature and calculates the similarty measure based on the term and its
respective concepts.
CSMTP ALGORITHM
1. Let D1 and D2 be the testing documents.
2. Let T1 and T2 be the terms from the document D1 and D2.
3. Remove the stopwords ST1 and ST2 from the documents D1 and D2.
4. Let C1 and C2 be two concepts from T1 and T2 respectively where (T1 denotes the first thesaurus and T2
the second).
5. Compute the similarity measure between two concepts, with,
log max 𝐴 , 𝐵 − log⁡ 𝐴 ∩ 𝐵 )
(
𝑆𝐼𝑀 𝑐 𝑖 , 𝑐 𝑘 = 1 −
𝐿𝑜𝑔 𝑊 − 𝐿𝑜𝑔(min 𝐴 , 𝐵 )
where A and B are the sets of all articles that link to concepts c i and ck respectively and W is the set of
all articles. Max(A,B) represents the maximum similarity measure between A and B. min(A,B)
represents the minimum similarity measure between A and B. the log(A  B) represents the common
concepts in A and B.
6. Compute the semantic relatedness between term and its candidate concepts in a given document according
to the context information
1
1
𝑅𝑒𝑙 𝑡, 𝑐 𝑖 𝑑 𝑗 =
𝑆𝐼𝑀(𝑐 𝑖 , 𝑐 𝑘 )
𝑇 −1
|𝐶𝑆 𝑙 |
𝑡 1 ∈𝑇&𝑡 1 ≠𝑡

𝐶 𝑘 ∈𝐶𝑆 𝑙

where T is the term set of the jth document dj , tl is a term in dj except for t, csl is the candidate concept
set related to term tl.
3.2 SYNTACTIC REPRESENTATION
Tf-idf weighting scheme are used in syntactic level to record the syntactic information. Tf–idf, term
frequency–inverse document frequency, is a numerical statistic which reflects how important a word is to a
document in a collection or corpus. It is often used as a weighting factor in information retrieval and text
mining. The tf-idf value increases proportionally to the number of times a word appears in the document, but is
offset by the frequency of the word in the corpus, which helps to control for the fact that some words are
generally more common than others. Tf–idf is the product of two statistics, term frequency and inverse
document frequency. Various ways for determining the exact values of both statistics exist. In the case of the
term frequency tf(t,d), the simplest choice is to use the raw frequency of a term in a document, i.e. the number
of times that term t occurs in document d. If we denote the raw frequency of t by f(t,d), then the simple tf
scheme is tf(t,d) = f(t,d).

29
Concept and Term Based Similarity Measure for Text Classification and Clustering
3.3 SEMANTIC SIMILARITY
Semantic level consists of concepts related to the terms in the syntactic level. These two levels are
connected via the semantic correlation between terms and their relevant concepts.
WordNet is used to calculate the ascertain connections among four types of Parts of Speech (POS) noun, verb, adjective, and adverb. The minimum unit in a WordNet is synset, which represent an exact meaning
of a word. It includes the word, its clarification, and its synonyms.

IV.

EXPERIMENTAL RESULTS

In this section, the effectiveness of the proposed similarity measure CSMTP is investigated . The
investigation is done by applying the CSMTP measure in several text applications, including k-NN based singlelabel classification (SL-kNN) , k-NN based multi-label classification (ML-kNN) , k-means clustering (k-means)
, and hierarchical agglomerative clustering (HAC) . The data sets, namely WebKB, Reuters-8 respectively, are
used in the experiments presented below.
1) WebKB.
The documents in the WebKB data set are webpages collected by the World Wide Knowledge Base
(Web→Kb) project of the CMU text learning group. The documents were manually classified into several
different classes. The documents of this data set were not predesignated as training or testing patterns. the
datasets can be randomly divided into training and testing subsets.
2) Reuters-8
Reuters-21578 ModeApt`e Split Text Categorization Test Collection contains thousands of documents
collected from Reuters newswire in 1987. The most widely used version is Reuters-21578 ModeApt`e, which
contains 90 categories and 12902 documents.
4.1 CLASSIFICATION PERFORMANCE
For WebKB dataset, the randomly selected training documents are used for training/validation and the
testing documents are used for testing. For Reuters-8 dataset the predesignated training data are used for
training/validation and the predesignated testing data are used for testing. Note that the data for
training/validation are separate from the data for testing in each case
4.1.1 Single-Label Document Classification
In this experiment, we compare the performance of our measureand the others in single-label document
classification. The performance is evaluated by the classification accuracy , AC, which compares the predicted
label of each document with that provided by the document corpus:
𝑛
𝐼=1 𝐸(𝐶𝑖, 𝐶 𝑖 1)
𝐴𝐶 =
𝑛
where n is the number of testing documents, and ci and ci are the target label and the predicted label,
respectively, of the ith document. E(ci, ci1) = 1 if ci = ci1, and E(ci, ci1) = 0 otherwise.
Figure 4.1 shows the classification accuracy obtained by SL-kNN with SMTP and CSMTP similarity measures
using different class(k) different k(class) settings, i.e., k=4,8,12,16,20, on the training/validation data of
WebKB. The figure clearly shows that the single label document classification accuracy obtained using the
proposed CSMTP measure performs high comparing to the SMTP measure.

FIGURE 4.1 Classification Accuracy By Sl-Knn On Training/Validation With SMTP & CSMTP Using
Differen K Settings.

30
Concept and Term Based Similarity Measure for Text Classification and Clustering
TABLE 4.1 shows the classification AC(accuracy) obtained by single label classification on the testing data of
webkb
TABLE 4.1
Classification Accuracies By Sl–Knn With Different Measures On Testing Data Of WEBKB
K=1
K=3
K=5
K=7
K=9
SMTP
0.9013 0.9191 0.9242 0.9223 0.9233
CSMTP 0.9338 0.9411 0.9420 0.9447 0.9461
4.1.2MULTI LABEL CLASSIFICATION
Figure 4.2 shows the classification accuracy obtained by ML-kNN with SMTP and CSMTP similarity
measures using different class(k) different k(class) settings, i.e., k=4,8,12,16,20, on the training/validation data
of WebKB. The figure 5.2 clearly shows that the multi label document classification accuracy obtained using
the proposed CSMTP measure performs high comparing to the

Figure 4.2 Classification Accuracy By Ml-Knn On Training/Validation With Smtp & Csmtp Using Differen K
Settings.
TABLE 4.2 shows the classification AC(accuracy) obtained by multi label classification on the testing
data of webkb
TABLE 4.2
CLASSIFICATION ACCURACIES BY ML–KNN WITH DIFFERENT MEASURES ON TESTING DATA
OF WEBKB
K=1
K=3
K=5
K=7
K=9
0.6910 0.6932 0.6965 0.6990 0.7009
SMTP
CSMTP 0.7130 0.7111 0.7114 0.7092 0.7083
4.2 CLUSTERING PERFORMANCE
For a document corpus with p classes and n documents, remove the class labels and randomly select
one-third of the documents for training/validation and the remaining for testing. Note that the data for
training/validation are separate from the data for testing.
4.2.1 KMEANS CLUSTERING
In this experiment, the performance of the CSMTP MEASURE in clustering is compared with the
SMTP MEASURE . The performance is evaluated by the clustering accuracy , AC, which compares the
predicted label of each document with that provided by the document corpus:
𝐾
𝐼=1 𝑚𝑜𝑠𝑡 𝑖
𝐴𝐶 =
𝑛
Figure 4.3 shows the clustering accuracy obtained by kmeans with SMTP and CSMTP similarity
measures using different k(cluster) settings, i.e., k=4,8,12,16,20, on the training/validation data of WebKB. The
figure clearly shows that the kmeans clustering accuracy obtained using the proposed CSMTP measure performs
high comparing to the SMTP measure.
Figure 4.3 Clustering Accuracy By Kmeans On Training/Validation With Smtp & Csmtp Using
Differen K Settings.

31
Concept and Term Based Similarity Measure for Text Classification and Clustering

TABLE 4.3 shows the clustering AC(accuracy) obtained by kmeans clustering on the testing data of webkb
Table 4.3
Ac Values By K-Means With Different Measures On Testing Data Of WEBKB
K=8
K=16 K=24 K=32
0.7906 0.8584 0.8692 0.8728
SMTP
CSMTP 0.8450 0.8702 0.8796 0.8964
4.2.2 HIERARCHICAL AGGLOMERATIVE DOCUMENT CLSTERING PERFORMANCE
Figure 4.4 shows the Clustering accuracy obtained by hierarchical clustering with SMTP and CSMTP
similarity measures using different k(cluster) settings, i.e., k=4,8,12,16,20, on the training/validation data of
WebKB. The figure clearly shows that the hierarchical agglomerative clustering accuracy obtained using the
proposed CSMTP measure performs high comparing to the SMTP measure.

Figure 4.4 Clustering Accuracy By Hac On Training/Validation With Smtp & Csmtp Using Differen K
Settings.
TABLE 4.4 shows the classification AC(accuracy) obtained by multi label classification on the testing data of
webkb
Table 4.4
Clustering Accuracies By Hac With Different Measures On Testing Data Of Webkb
K=1
K=3
K=5
K=7
K=9
0.5960 0.5483 0.5367 0.5091 0.5000
SMTP
CSMTP 0.6770 0.6343 0.5552 0.5244 0.5200

V.

CONCLUSION

The concept and term based model represents document as a two-way model with the aid of WordNet.
In the two-way representation model, the term information is represented first, and the concept information is
represented second and these levels are connected by the semantic relatedness between terms and concepts.
Experimental results on real data sets have shown that the proposed model and classification framework
significantly improved the classification and clustering performance by comparing with the existing
SMTP(similarity measure for text processing) model .the experiments shows CSMTP(concept and term based
similarity measure for text processing) takes less time when running in parallel, less space when running in
series and categorization accuracy is high.

32
Concept and Term Based Similarity Measure for Text Classification and Clustering
VI.
FUTURE WORK
In future, the work can be focused on the concept mapping and weighting technology to find the better
concept vector space for documents, because the better concept-based representation can help to further improve
the performance of text classification and clustering framework.a new semantic-based vector space model
utilizing the category information can also be exploited. Afterwards, two-way representation model can be
extended to three-way model containing term, concept and category information respectively. CTSMTP will
also be improved to fit the three-way model and achieve more predominant text classification and clustering
performance.
REFERENCES
[1].
[2].
[3].

[4].
[5].
[6].
[7].

[8].
[9].
[10].
[11].
[12].
[13].

Similarity Measures For Text Document Clustering Anna Huang 2008.
F. Sebastiani. Machine Learning In Automated Text Categorization. Acm Computing Surveys,
34(1):1–47, 2002.
S. Clinchant And E. Gaussier. Information-Based Models For Ad Hoc Ir. Proceedings Of 33rd Annual
International Acm Sigir Conference On Research And Development In Information Retrieval, Pages
234–241, 2010.
Sentence Similarity Measures For Essay Coherence Derrick Higgins ,Jill Burstein,2007
Similarity Measures for Short Segments of Text Donald Metzler1, Susan Dumais2, Christopher
Meek,2007
Multi-Label Classification Algorithm Derived From K-Nearest Neighbor Rule With Label
Dependencies Zoulficar Younes, Fahed Abdallah, And Thierry Denoeux,2003
Daphe Koller And Mehran Sahami, Hierarchically Classifying Documents Using Very Few Words,
Proceedings Of The 14th International Conference On Machine Learning (Ml), Nashville, Tennessee,
July 1997, Pages 170-178.
S. Kullback And R. A. Leibler. On Information And Sufficiency. Annuals Of Mathematical Statistics,
22(1):79–86, 1951.
H. Chim And X. Deng. Efficient Phrase-Based Document Similarity For Clustering. Ieee Transactions
On Knowledge And Data Engineering, 20(9):1217 – 1229, 2008.
http://web.ist.utl.pt/ acardoso/datasets/.
http://www.cs.technion.ac.il/ ronb/thesis.html.
http://www.daviddlewis.com/resources/testcollections/reuters21578/
http://www.dmoz.org/

33

Weitere ähnliche Inhalte

Was ist angesagt?

TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...ijnlc
 
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE ijdms
 
Search Engines
Search EnginesSearch Engines
Search Enginesbutest
 
A rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationA rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationNinad Samel
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONIJDKP
 
An Improved Similarity Matching based Clustering Framework for Short and Sent...
An Improved Similarity Matching based Clustering Framework for Short and Sent...An Improved Similarity Matching based Clustering Framework for Short and Sent...
An Improved Similarity Matching based Clustering Framework for Short and Sent...IJECEIAES
 
Topic models
Topic modelsTopic models
Topic modelsAjay Ohri
 
Adversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalAdversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalBhaskar Mitra
 
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...iosrjce
 
Probabilistic retrieval model
Probabilistic retrieval modelProbabilistic retrieval model
Probabilistic retrieval modelbaradhimarch81
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextUniversity of Bari (Italy)
 

Was ist angesagt? (17)

TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
 
Ir 09
Ir   09Ir   09
Ir 09
 
Topicmodels
TopicmodelsTopicmodels
Topicmodels
 
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
 
Search Engines
Search EnginesSearch Engines
Search Engines
 
A rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationA rough set based hybrid method to text categorization
A rough set based hybrid method to text categorization
 
Ju3517011704
Ju3517011704Ju3517011704
Ju3517011704
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
 
An Improved Similarity Matching based Clustering Framework for Short and Sent...
An Improved Similarity Matching based Clustering Framework for Short and Sent...An Improved Similarity Matching based Clustering Framework for Short and Sent...
An Improved Similarity Matching based Clustering Framework for Short and Sent...
 
Topic models
Topic modelsTopic models
Topic models
 
Canini09a
Canini09aCanini09a
Canini09a
 
Adversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalAdversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrieval
 
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...
 
Probabilistic retrieval model
Probabilistic retrieval modelProbabilistic retrieval model
Probabilistic retrieval model
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 

Andere mochten auch

International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Kịch bản sư phạm
Kịch bản sư phạmKịch bản sư phạm
Kịch bản sư phạmThanhHoai216
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Stepping Up To The Challenge - CMO Insignt From The Global C-Suite Study
Stepping Up To The Challenge - CMO Insignt From The Global C-Suite StudyStepping Up To The Challenge - CMO Insignt From The Global C-Suite Study
Stepping Up To The Challenge - CMO Insignt From The Global C-Suite StudyEvgeny Tsarkov
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 

Andere mochten auch (7)

International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Kịch bản sư phạm
Kịch bản sư phạmKịch bản sư phạm
Kịch bản sư phạm
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Stepping Up To The Challenge - CMO Insignt From The Global C-Suite Study
Stepping Up To The Challenge - CMO Insignt From The Global C-Suite StudyStepping Up To The Challenge - CMO Insignt From The Global C-Suite Study
Stepping Up To The Challenge - CMO Insignt From The Global C-Suite Study
 
2
22
2
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 

Ähnlich wie International Journal of Engineering Research and Development (IJERD)

International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsi...
IRS-Lecture-Notes irsirs    IRS-Lecture-Notes irsirs   IRS-Lecture-Notes irsi...IRS-Lecture-Notes irsirs    IRS-Lecture-Notes irsirs   IRS-Lecture-Notes irsi...
IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsi...onlmcq
 
Application of rhetorical
Application of rhetoricalApplication of rhetorical
Application of rhetoricalcsandit
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextFulvio Rotella
 
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringA Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
 
A Novel Approach for Keyword extraction in learning objects using text mining
A Novel Approach for Keyword extraction in learning objects using text miningA Novel Approach for Keyword extraction in learning objects using text mining
A Novel Approach for Keyword extraction in learning objects using text miningIJSRD
 
A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...ijnlc
 
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...ijdmtaiir
 
Farthest Neighbor Approach for Finding Initial Centroids in K- Means
Farthest Neighbor Approach for Finding Initial Centroids in K- MeansFarthest Neighbor Approach for Finding Initial Centroids in K- Means
Farthest Neighbor Approach for Finding Initial Centroids in K- MeansWaqas Tariq
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureIOSR Journals
 
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURESNAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURESacijjournal
 
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING A SURVEY ON SIMILARITY MEASURES IN TEXT MINING
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING mlaij
 
Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Editor IJARCET
 
Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Editor IJARCET
 

Ähnlich wie International Journal of Engineering Research and Development (IJERD) (20)

L0261075078
L0261075078L0261075078
L0261075078
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
F017243241
F017243241F017243241
F017243241
 
Cc35451454
Cc35451454Cc35451454
Cc35451454
 
IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsi...
IRS-Lecture-Notes irsirs    IRS-Lecture-Notes irsirs   IRS-Lecture-Notes irsi...IRS-Lecture-Notes irsirs    IRS-Lecture-Notes irsirs   IRS-Lecture-Notes irsi...
IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsi...
 
Application of rhetorical
Application of rhetoricalApplication of rhetorical
Application of rhetorical
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringA Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
 
P13 corley
P13 corleyP13 corley
P13 corley
 
TEXT CLUSTERING.doc
TEXT CLUSTERING.docTEXT CLUSTERING.doc
TEXT CLUSTERING.doc
 
A Novel Approach for Keyword extraction in learning objects using text mining
A Novel Approach for Keyword extraction in learning objects using text miningA Novel Approach for Keyword extraction in learning objects using text mining
A Novel Approach for Keyword extraction in learning objects using text mining
 
A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...
 
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
 
Farthest Neighbor Approach for Finding Initial Centroids in K- Means
Farthest Neighbor Approach for Finding Initial Centroids in K- MeansFarthest Neighbor Approach for Finding Initial Centroids in K- Means
Farthest Neighbor Approach for Finding Initial Centroids in K- Means
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity Measure
 
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURESNAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
 
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING A SURVEY ON SIMILARITY MEASURES IN TEXT MINING
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING
 
Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973
 
Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 

Mehr von IJERD Editor

A Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
A Novel Method for Prevention of Bandwidth Distributed Denial of Service AttacksA Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
A Novel Method for Prevention of Bandwidth Distributed Denial of Service AttacksIJERD Editor
 
MEMS MICROPHONE INTERFACE
MEMS MICROPHONE INTERFACEMEMS MICROPHONE INTERFACE
MEMS MICROPHONE INTERFACEIJERD Editor
 
Influence of tensile behaviour of slab on the structural Behaviour of shear c...
Influence of tensile behaviour of slab on the structural Behaviour of shear c...Influence of tensile behaviour of slab on the structural Behaviour of shear c...
Influence of tensile behaviour of slab on the structural Behaviour of shear c...IJERD Editor
 
Gold prospecting using Remote Sensing ‘A case study of Sudan’
Gold prospecting using Remote Sensing ‘A case study of Sudan’Gold prospecting using Remote Sensing ‘A case study of Sudan’
Gold prospecting using Remote Sensing ‘A case study of Sudan’IJERD Editor
 
Reducing Corrosion Rate by Welding Design
Reducing Corrosion Rate by Welding DesignReducing Corrosion Rate by Welding Design
Reducing Corrosion Rate by Welding DesignIJERD Editor
 
Router 1X3 – RTL Design and Verification
Router 1X3 – RTL Design and VerificationRouter 1X3 – RTL Design and Verification
Router 1X3 – RTL Design and VerificationIJERD Editor
 
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...IJERD Editor
 
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVRMitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVRIJERD Editor
 
Study on the Fused Deposition Modelling In Additive Manufacturing
Study on the Fused Deposition Modelling In Additive ManufacturingStudy on the Fused Deposition Modelling In Additive Manufacturing
Study on the Fused Deposition Modelling In Additive ManufacturingIJERD Editor
 
Spyware triggering system by particular string value
Spyware triggering system by particular string valueSpyware triggering system by particular string value
Spyware triggering system by particular string valueIJERD Editor
 
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...IJERD Editor
 
Secure Image Transmission for Cloud Storage System Using Hybrid Scheme
Secure Image Transmission for Cloud Storage System Using Hybrid SchemeSecure Image Transmission for Cloud Storage System Using Hybrid Scheme
Secure Image Transmission for Cloud Storage System Using Hybrid SchemeIJERD Editor
 
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...IJERD Editor
 
Gesture Gaming on the World Wide Web Using an Ordinary Web Camera
Gesture Gaming on the World Wide Web Using an Ordinary Web CameraGesture Gaming on the World Wide Web Using an Ordinary Web Camera
Gesture Gaming on the World Wide Web Using an Ordinary Web CameraIJERD Editor
 
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...IJERD Editor
 
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...IJERD Editor
 
Moon-bounce: A Boon for VHF Dxing
Moon-bounce: A Boon for VHF DxingMoon-bounce: A Boon for VHF Dxing
Moon-bounce: A Boon for VHF DxingIJERD Editor
 
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...IJERD Editor
 
Importance of Measurements in Smart Grid
Importance of Measurements in Smart GridImportance of Measurements in Smart Grid
Importance of Measurements in Smart GridIJERD Editor
 
Study of Macro level Properties of SCC using GGBS and Lime stone powder
Study of Macro level Properties of SCC using GGBS and Lime stone powderStudy of Macro level Properties of SCC using GGBS and Lime stone powder
Study of Macro level Properties of SCC using GGBS and Lime stone powderIJERD Editor
 

Mehr von IJERD Editor (20)

A Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
A Novel Method for Prevention of Bandwidth Distributed Denial of Service AttacksA Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
A Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
 
MEMS MICROPHONE INTERFACE
MEMS MICROPHONE INTERFACEMEMS MICROPHONE INTERFACE
MEMS MICROPHONE INTERFACE
 
Influence of tensile behaviour of slab on the structural Behaviour of shear c...
Influence of tensile behaviour of slab on the structural Behaviour of shear c...Influence of tensile behaviour of slab on the structural Behaviour of shear c...
Influence of tensile behaviour of slab on the structural Behaviour of shear c...
 
Gold prospecting using Remote Sensing ‘A case study of Sudan’
Gold prospecting using Remote Sensing ‘A case study of Sudan’Gold prospecting using Remote Sensing ‘A case study of Sudan’
Gold prospecting using Remote Sensing ‘A case study of Sudan’
 
Reducing Corrosion Rate by Welding Design
Reducing Corrosion Rate by Welding DesignReducing Corrosion Rate by Welding Design
Reducing Corrosion Rate by Welding Design
 
Router 1X3 – RTL Design and Verification
Router 1X3 – RTL Design and VerificationRouter 1X3 – RTL Design and Verification
Router 1X3 – RTL Design and Verification
 
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
 
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVRMitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
 
Study on the Fused Deposition Modelling In Additive Manufacturing
Study on the Fused Deposition Modelling In Additive ManufacturingStudy on the Fused Deposition Modelling In Additive Manufacturing
Study on the Fused Deposition Modelling In Additive Manufacturing
 
Spyware triggering system by particular string value
Spyware triggering system by particular string valueSpyware triggering system by particular string value
Spyware triggering system by particular string value
 
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
 
Secure Image Transmission for Cloud Storage System Using Hybrid Scheme
Secure Image Transmission for Cloud Storage System Using Hybrid SchemeSecure Image Transmission for Cloud Storage System Using Hybrid Scheme
Secure Image Transmission for Cloud Storage System Using Hybrid Scheme
 
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
 
Gesture Gaming on the World Wide Web Using an Ordinary Web Camera
Gesture Gaming on the World Wide Web Using an Ordinary Web CameraGesture Gaming on the World Wide Web Using an Ordinary Web Camera
Gesture Gaming on the World Wide Web Using an Ordinary Web Camera
 
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
 
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
 
Moon-bounce: A Boon for VHF Dxing
Moon-bounce: A Boon for VHF DxingMoon-bounce: A Boon for VHF Dxing
Moon-bounce: A Boon for VHF Dxing
 
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
 
Importance of Measurements in Smart Grid
Importance of Measurements in Smart GridImportance of Measurements in Smart Grid
Importance of Measurements in Smart Grid
 
Study of Macro level Properties of SCC using GGBS and Lime stone powder
Study of Macro level Properties of SCC using GGBS and Lime stone powderStudy of Macro level Properties of SCC using GGBS and Lime stone powder
Study of Macro level Properties of SCC using GGBS and Lime stone powder
 

Kürzlich hochgeladen

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

International Journal of Engineering Research and Development (IJERD)

  • 1. International Journal of Engineering Research and Development e-ISSN: 2278-067X, p-ISSN: 2278-800X, www.ijerd.com Volume 9, Issue 3 (December 2013), PP. 28-33 Concept and Term Based Similarity Measure for Text Classification and Clustering B.Sindhiya1, Dr.N.Tajunisha2 1 M.Phil Scholar Department Of Computer Science Sri Ramakrishna College Of Arts And Science For Women Coimbatore, India. 2 M.Phil, Ph.D Associate Professor Department Of Computer Science Sri Ramakrishna College Of Arts And Science For Women Coimbatore, India. Abstract:- The exploitation of syntactic structures and semantic background knowledge has always been an appealing subject in the context of data mining, text retrieval and information management. The usefulness of this kind of information has been shown most prominently in highly specialized tasks, such as text categorization scenarios. So far, however, additional syntactic or semantic information has been used only individually. In this paper, a new principle approach , the concept and term based similarity measure, which incorporates linguistic and semantic structures, using syntactic dependencies, and semantic background knowledge is proposed. This novel method represents the meaning of texts in a high-dimensional space of concepts derived from WordNet. A number of case studies have been included in the research to demonstrate the various aspects of this framework. Index Terms:- Document classification, document clustering, similarity measure, accuracy, classifiers, clustering algorithms. I. INTRODUCTION Clustering maps the data items into clusters, where clusters are natural grouping of data items based on similarity or probability density methods. Unlike classification and prediction which analyzes class-label data objects, clustering analyzes data objects without class-labels and tries to generate such labels. A similarity measure or similarity function is a real-valued function that quantifies the similarity between two objects. The similarity measure reflects the degree of closeness or separation of the target objects and should correspond to the characteristics that are believed to distinguish the clusters embedded in the data. Before Clustering, a similarity/distance measure must be determined. [1]. Choosing an appropriate similarity measure is also crucial for cluster analysis, especially for a particular type of clustering algorithms. Text Categorization (TC) is the classification of documents with respect to a set of one or more preexisting categories [2]. The classification phase consists of generating a weighted vector for all categories, then using a similarity measure to find the closest category. The similarity measure is used to determine the degree of resemblance between two vectors. To achieve reasonable classification results, a similarity measure should generally respond with larger values to documents that belong to the same class and with smaller values otherwise. During the last decades, a large number of methods proposed for text categorization were typically based on the classical Bag-of-Words model where each term or term stem is an independent feature. The existing similarity measure was more frequently used to assess the similarity between words.Although the information theoretic similarity measure results are statistically significant it does not reduce the dimension of the vector model [3].Metric distances such as Euclidean distance are not appropriate for high dimension and sparse domains. Due to the ignorance of any relation between words, the learning algorithms are restricted to detect patterns in the used terminology only, while conceptual patterns remain ignored. Existing approaches requires performing an optimization over an entire collection of documents. Most of these techniques are computationally expensive. II. RELATED WORKS Similarity measures have been extensively used in text classification and clustering algorithms. The spherical k-means algorithm [4] adopted the cosine similarity measure for document clustering. In this method the unlabeled document collections are becoming increasingly common and available. Using words as features, text documents are often represented as high-dimensional and sparse vectors. The algorithm outputs k disjoint clusters each with a concept vector that is the centroid of the cluster normalized to have unit Euclidean norm. Derrick Higgins [5] adopted a cosine-based pairwise adaptive similarity measure for document clustering. Pairwise-adaptive similarity measure for large high dimensional document datasets improves the 28
  • 2. Concept and Term Based Similarity Measure for Text Classification and Clustering unsupervised clustering quality and speed compared to the original cosine similarity measure. Zoulficar younes [6] reported results of clustering experiments with clustering algorithms and 12 different text data sets, and concluded that the objective function based on cosine similarity “leads to the best solutions irrespective of the number of clusters for most of the data sets.” Daphe koller [7] proposed a divisive information-theoretic feature clustering algorithm for text classification using the Kullback-Leibler divergence. High dimensionality of text can be a deterrent in applying complex learners such as Support Vector Machines to the task of text classification Kullback [8] combined squared Euclidean distance with relative entropy in a k-means like clustering algorithm. K means algorithm introduced recently is specifically designed to handle unit length document vectors. [6] conclude that the objective function based on cosine similarity leads to the best solutions irrespective of the number of clusters for most of the data sets. Craven [9] performed document clustering based on the proposed phrase based similarity measure. The phrase-based document similarity to compute the pair-wise similarities of documents based on the Suffix Tree Document (STD) model. By mapping each node in the suffix tree of STD model into a unique feature term in the Vector Space Document (VSD) model, the phrase-based document similarity naturally inherits the term tfidf weighting scheme in computing the document similarity with phrases. III. PROPOSED METHODOLOGY In this chapter the proposed concept and term based similarity between the documents is illustrated. The proposed system, also measure the semantic similarity between the terms and concepts with use of wordnet tool and tree tagger tool. 3.1 CSMTP (CONCEPT BASED SIMILARITY MEASURE FOR TEXT PROCESSING) ALGORITHM The CSMTP algorithm selects the terms from the testing documents, Generates the terms from the document, selects the appropriate feature and calculates the similarty measure based on the term and its respective concepts. CSMTP ALGORITHM 1. Let D1 and D2 be the testing documents. 2. Let T1 and T2 be the terms from the document D1 and D2. 3. Remove the stopwords ST1 and ST2 from the documents D1 and D2. 4. Let C1 and C2 be two concepts from T1 and T2 respectively where (T1 denotes the first thesaurus and T2 the second). 5. Compute the similarity measure between two concepts, with, log max 𝐴 , 𝐵 − log⁡ 𝐴 ∩ 𝐵 ) ( 𝑆𝐼𝑀 𝑐 𝑖 , 𝑐 𝑘 = 1 − 𝐿𝑜𝑔 𝑊 − 𝐿𝑜𝑔(min 𝐴 , 𝐵 ) where A and B are the sets of all articles that link to concepts c i and ck respectively and W is the set of all articles. Max(A,B) represents the maximum similarity measure between A and B. min(A,B) represents the minimum similarity measure between A and B. the log(A  B) represents the common concepts in A and B. 6. Compute the semantic relatedness between term and its candidate concepts in a given document according to the context information 1 1 𝑅𝑒𝑙 𝑡, 𝑐 𝑖 𝑑 𝑗 = 𝑆𝐼𝑀(𝑐 𝑖 , 𝑐 𝑘 ) 𝑇 −1 |𝐶𝑆 𝑙 | 𝑡 1 ∈𝑇&𝑡 1 ≠𝑡 𝐶 𝑘 ∈𝐶𝑆 𝑙 where T is the term set of the jth document dj , tl is a term in dj except for t, csl is the candidate concept set related to term tl. 3.2 SYNTACTIC REPRESENTATION Tf-idf weighting scheme are used in syntactic level to record the syntactic information. Tf–idf, term frequency–inverse document frequency, is a numerical statistic which reflects how important a word is to a document in a collection or corpus. It is often used as a weighting factor in information retrieval and text mining. The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to control for the fact that some words are generally more common than others. Tf–idf is the product of two statistics, term frequency and inverse document frequency. Various ways for determining the exact values of both statistics exist. In the case of the term frequency tf(t,d), the simplest choice is to use the raw frequency of a term in a document, i.e. the number of times that term t occurs in document d. If we denote the raw frequency of t by f(t,d), then the simple tf scheme is tf(t,d) = f(t,d). 29
  • 3. Concept and Term Based Similarity Measure for Text Classification and Clustering 3.3 SEMANTIC SIMILARITY Semantic level consists of concepts related to the terms in the syntactic level. These two levels are connected via the semantic correlation between terms and their relevant concepts. WordNet is used to calculate the ascertain connections among four types of Parts of Speech (POS) noun, verb, adjective, and adverb. The minimum unit in a WordNet is synset, which represent an exact meaning of a word. It includes the word, its clarification, and its synonyms. IV. EXPERIMENTAL RESULTS In this section, the effectiveness of the proposed similarity measure CSMTP is investigated . The investigation is done by applying the CSMTP measure in several text applications, including k-NN based singlelabel classification (SL-kNN) , k-NN based multi-label classification (ML-kNN) , k-means clustering (k-means) , and hierarchical agglomerative clustering (HAC) . The data sets, namely WebKB, Reuters-8 respectively, are used in the experiments presented below. 1) WebKB. The documents in the WebKB data set are webpages collected by the World Wide Knowledge Base (Web→Kb) project of the CMU text learning group. The documents were manually classified into several different classes. The documents of this data set were not predesignated as training or testing patterns. the datasets can be randomly divided into training and testing subsets. 2) Reuters-8 Reuters-21578 ModeApt`e Split Text Categorization Test Collection contains thousands of documents collected from Reuters newswire in 1987. The most widely used version is Reuters-21578 ModeApt`e, which contains 90 categories and 12902 documents. 4.1 CLASSIFICATION PERFORMANCE For WebKB dataset, the randomly selected training documents are used for training/validation and the testing documents are used for testing. For Reuters-8 dataset the predesignated training data are used for training/validation and the predesignated testing data are used for testing. Note that the data for training/validation are separate from the data for testing in each case 4.1.1 Single-Label Document Classification In this experiment, we compare the performance of our measureand the others in single-label document classification. The performance is evaluated by the classification accuracy , AC, which compares the predicted label of each document with that provided by the document corpus: 𝑛 𝐼=1 𝐸(𝐶𝑖, 𝐶 𝑖 1) 𝐴𝐶 = 𝑛 where n is the number of testing documents, and ci and ci are the target label and the predicted label, respectively, of the ith document. E(ci, ci1) = 1 if ci = ci1, and E(ci, ci1) = 0 otherwise. Figure 4.1 shows the classification accuracy obtained by SL-kNN with SMTP and CSMTP similarity measures using different class(k) different k(class) settings, i.e., k=4,8,12,16,20, on the training/validation data of WebKB. The figure clearly shows that the single label document classification accuracy obtained using the proposed CSMTP measure performs high comparing to the SMTP measure. FIGURE 4.1 Classification Accuracy By Sl-Knn On Training/Validation With SMTP & CSMTP Using Differen K Settings. 30
  • 4. Concept and Term Based Similarity Measure for Text Classification and Clustering TABLE 4.1 shows the classification AC(accuracy) obtained by single label classification on the testing data of webkb TABLE 4.1 Classification Accuracies By Sl–Knn With Different Measures On Testing Data Of WEBKB K=1 K=3 K=5 K=7 K=9 SMTP 0.9013 0.9191 0.9242 0.9223 0.9233 CSMTP 0.9338 0.9411 0.9420 0.9447 0.9461 4.1.2MULTI LABEL CLASSIFICATION Figure 4.2 shows the classification accuracy obtained by ML-kNN with SMTP and CSMTP similarity measures using different class(k) different k(class) settings, i.e., k=4,8,12,16,20, on the training/validation data of WebKB. The figure 5.2 clearly shows that the multi label document classification accuracy obtained using the proposed CSMTP measure performs high comparing to the Figure 4.2 Classification Accuracy By Ml-Knn On Training/Validation With Smtp & Csmtp Using Differen K Settings. TABLE 4.2 shows the classification AC(accuracy) obtained by multi label classification on the testing data of webkb TABLE 4.2 CLASSIFICATION ACCURACIES BY ML–KNN WITH DIFFERENT MEASURES ON TESTING DATA OF WEBKB K=1 K=3 K=5 K=7 K=9 0.6910 0.6932 0.6965 0.6990 0.7009 SMTP CSMTP 0.7130 0.7111 0.7114 0.7092 0.7083 4.2 CLUSTERING PERFORMANCE For a document corpus with p classes and n documents, remove the class labels and randomly select one-third of the documents for training/validation and the remaining for testing. Note that the data for training/validation are separate from the data for testing. 4.2.1 KMEANS CLUSTERING In this experiment, the performance of the CSMTP MEASURE in clustering is compared with the SMTP MEASURE . The performance is evaluated by the clustering accuracy , AC, which compares the predicted label of each document with that provided by the document corpus: 𝐾 𝐼=1 𝑚𝑜𝑠𝑡 𝑖 𝐴𝐶 = 𝑛 Figure 4.3 shows the clustering accuracy obtained by kmeans with SMTP and CSMTP similarity measures using different k(cluster) settings, i.e., k=4,8,12,16,20, on the training/validation data of WebKB. The figure clearly shows that the kmeans clustering accuracy obtained using the proposed CSMTP measure performs high comparing to the SMTP measure. Figure 4.3 Clustering Accuracy By Kmeans On Training/Validation With Smtp & Csmtp Using Differen K Settings. 31
  • 5. Concept and Term Based Similarity Measure for Text Classification and Clustering TABLE 4.3 shows the clustering AC(accuracy) obtained by kmeans clustering on the testing data of webkb Table 4.3 Ac Values By K-Means With Different Measures On Testing Data Of WEBKB K=8 K=16 K=24 K=32 0.7906 0.8584 0.8692 0.8728 SMTP CSMTP 0.8450 0.8702 0.8796 0.8964 4.2.2 HIERARCHICAL AGGLOMERATIVE DOCUMENT CLSTERING PERFORMANCE Figure 4.4 shows the Clustering accuracy obtained by hierarchical clustering with SMTP and CSMTP similarity measures using different k(cluster) settings, i.e., k=4,8,12,16,20, on the training/validation data of WebKB. The figure clearly shows that the hierarchical agglomerative clustering accuracy obtained using the proposed CSMTP measure performs high comparing to the SMTP measure. Figure 4.4 Clustering Accuracy By Hac On Training/Validation With Smtp & Csmtp Using Differen K Settings. TABLE 4.4 shows the classification AC(accuracy) obtained by multi label classification on the testing data of webkb Table 4.4 Clustering Accuracies By Hac With Different Measures On Testing Data Of Webkb K=1 K=3 K=5 K=7 K=9 0.5960 0.5483 0.5367 0.5091 0.5000 SMTP CSMTP 0.6770 0.6343 0.5552 0.5244 0.5200 V. CONCLUSION The concept and term based model represents document as a two-way model with the aid of WordNet. In the two-way representation model, the term information is represented first, and the concept information is represented second and these levels are connected by the semantic relatedness between terms and concepts. Experimental results on real data sets have shown that the proposed model and classification framework significantly improved the classification and clustering performance by comparing with the existing SMTP(similarity measure for text processing) model .the experiments shows CSMTP(concept and term based similarity measure for text processing) takes less time when running in parallel, less space when running in series and categorization accuracy is high. 32
  • 6. Concept and Term Based Similarity Measure for Text Classification and Clustering VI. FUTURE WORK In future, the work can be focused on the concept mapping and weighting technology to find the better concept vector space for documents, because the better concept-based representation can help to further improve the performance of text classification and clustering framework.a new semantic-based vector space model utilizing the category information can also be exploited. Afterwards, two-way representation model can be extended to three-way model containing term, concept and category information respectively. CTSMTP will also be improved to fit the three-way model and achieve more predominant text classification and clustering performance. REFERENCES [1]. [2]. [3]. [4]. [5]. [6]. [7]. [8]. [9]. [10]. [11]. [12]. [13]. Similarity Measures For Text Document Clustering Anna Huang 2008. F. Sebastiani. Machine Learning In Automated Text Categorization. Acm Computing Surveys, 34(1):1–47, 2002. S. Clinchant And E. Gaussier. Information-Based Models For Ad Hoc Ir. Proceedings Of 33rd Annual International Acm Sigir Conference On Research And Development In Information Retrieval, Pages 234–241, 2010. Sentence Similarity Measures For Essay Coherence Derrick Higgins ,Jill Burstein,2007 Similarity Measures for Short Segments of Text Donald Metzler1, Susan Dumais2, Christopher Meek,2007 Multi-Label Classification Algorithm Derived From K-Nearest Neighbor Rule With Label Dependencies Zoulficar Younes, Fahed Abdallah, And Thierry Denoeux,2003 Daphe Koller And Mehran Sahami, Hierarchically Classifying Documents Using Very Few Words, Proceedings Of The 14th International Conference On Machine Learning (Ml), Nashville, Tennessee, July 1997, Pages 170-178. S. Kullback And R. A. Leibler. On Information And Sufficiency. Annuals Of Mathematical Statistics, 22(1):79–86, 1951. H. Chim And X. Deng. Efficient Phrase-Based Document Similarity For Clustering. Ieee Transactions On Knowledge And Data Engineering, 20(9):1217 – 1229, 2008. http://web.ist.utl.pt/ acardoso/datasets/. http://www.cs.technion.ac.il/ ronb/thesis.html. http://www.daviddlewis.com/resources/testcollections/reuters21578/ http://www.dmoz.org/ 33