SlideShare ist ein Scribd-Unternehmen logo
1 von 6
Downloaden Sie, um offline zu lesen
Control Theory and Informatics
ISSN 2224-5774 (Paper) ISSN 2225-0492 (Online)
Vol.3, No.4, 2013

www.iiste.org

Comparative Analysis of Various Data Stream Mining Procedures
and Various Dimension Reduction Techniques
Diksha Upadhyay, Susheel Jain, Anurag Jain
Department of Computer Science,RITS,Bhopal, India
Email: {diksha.du31@gmail.com, jain_susheel65@yahoo.co.in, anurag.akjain@gmail.com}
Abstract
In recent years data mining is contributing to be the great research area, as we know data mining is the process of
extracting needful information from the given set of data which will be further used for various purposes, it
could be for commercial use or for scientific use .while fetching the information (mined data) proper
methodologies with good approximations have to be used .In our survey we have provided the study about
various data stream clustering techniques and various dimension reduction techniques with their characteristics
to improve the quality of clustering, we have also provided our approach(our proposal) for clustering the
streamed data using suitable procedures ,In our approach for stream data mining a dimension reduction
technique have been used then after the Fuzzy C-means algorithm have been applied on it to improve the quality
of clustering.
Keywords: Data Stream, Dimension Reduction, Clustering
1. Introduction
In the growing economy ,as there is the tremendous growth in Information technology so as in data repositories
which will be in any form for e.g. From the prospective of commercial sector each firm will store the
information(data) regarding their turnover, about their customer statistics ,about challenging factors for an
enterprise, profit areas etc of several previous years in the data repositories .data mining is the research area in
which the important information will be dig out from the given repository of data store .The generated (produced)
information ,then be further used by the business leaders (business analysts) of an enterprise to take the strategic
decisions regarding the future policies ,but this is only the commercial application of data mining it can be used
mostly in scientific research also. In this study our intension is on presenting the comparison between several
data stream mining techniques .Data stream mining is the hot research area now a days in which the main task is
to form the cluster (Fuzzy nature).
There is continuous generating data stream [1] from many sources such as from sensor networks, from web user
click events, from the flow of internet traffic etc. The streamed data generated from various sources will exhibits
the similar characteristics such as infinite arriving nature, uncertain arriving speed, and it have to be scanned in
the single pass, the mining phenomena have to be performed on data stream with some conventions such as
limited memory space, and with in limited time constraint so the more efficient mining algorithms have to be
drawn.
In mining the data clusters have been found using clustering, which is the method of generating several chunks
of provided input data , with each chunk (cluster) composed up of elements(data objects) having similar
characteristics. Here we are providing the snap shot of various data stream clustering algorithms. In [2]
STREAM procedure have been presented to cluster data stream ,To cluster binary stream data a procedure [8]
have been presented to cluster data stream where the modification on K-means have been proposed for better
result .A time- series clustering mechanism has been presented in [4] to generate the hierarchy of clusters and
many more.
1.1 Dimension reduction
As we know the generated streamed data (upcoming) from various sources can be of many dimensions from the
prospective of geometrical overview, or it may be possible that it will contain dozen of dimensions, so to cluster
such data stream is very critical using conventional clustering procedures such as K-means algorithm because we
know clustering have to be done with in limited memory capacity and time constraint.
From the survey it have been identified that the clustering algorithms are less efficient with the high dimensional
datasets, so to improve their efficiency data sets will passed through some respective dimension reduction
procedure [16] such as SVD, PCA, SVM etc, and then after on the reduced dataset the clustering algorithm will
be applied for better clustering, we have also provided the comparative analysis for various dimension reduction
techniques.
1.2 Objective of our work
In this study various data stream mining techniques and dimension reduction techniques have been evaluated on
the basis of their usage statistics, application parameters, working mechanism etc.

60
Control Theory and Informatics
ISSN 2224-5774 (Paper) ISSN 2225-0492 (Online)
Vol.3, No.4, 2013

www.iiste.org

1.3 Structure of This paper
After providing the overview of various concepts in section1,the comparison between various data stream
mining algorithms and dimension reduction techniques have been presented in section 2,and then after our
proposal have been presented with the working model in the section3,finally section 4 will concludes the paper
with possible future enhancements.
2. Comparative analysis
2.1 Overview of various Fuzzy and other clustering methods (algorithms)
S.
Name of
Type of data on which
Working mechanism
No
technique
it operates

1

Istrap [6]

Stream data

Static analysis is performed, and the methods
have been proposed to remove outliers.

2

WFCM
[5]

Stream data

Data stream is divided in to chunks as per the
arrival .the main idea of WFCM is renewing the
weighted clustering centers by iterations till the
cost function get a satisfying result
A trade-off fuzzy factor and kernel method is
used

C++

Improved
FCM [7]

Synthetic
images

4

DSCLU [9]

Stream data
(real &synthetic data)

It separates the process in phases in online
phase and offline phase and then operates

5

IFCM [10]

Interval
valued
symbolic data (with real
& synthetic symbolic
interval)

This method is objective function based
calculation technique

Fuzzy
corrected
rank index

6

DCWFPminer
[11]
VLFCM
[12]
HSW
stream [13]
Modified
fuzzy Cmeans
[14]

It uses divide and conquer strategy and from
bottom to top ,depth first recursive method to
mine the closed weighted frequent pattern of
CWFP tree in sliding window
Methods have been used to extend the fuzzy Cmeans algorithm
Projected clustering technique is used

C
language

7

Data stream
(continuous, unbounded
and high speed
data
streams)
Very large data

9

10

Density
WFCM
[15]

real

Matlab

3

8

and

Implemented
on
(tool used)

High
dimensional
stream data
Stream data

Various featured and
relational datasets are
used

Mahalanobis distance algorithm is used

The weighted and relational algorithm is
proposed in it (WNERFCM) .

61

Matlab

JAVA

1.6

Matlab
Microsoft
Visual c++
Matlab

Matlab
Control Theory and Informatics
ISSN 2224-5774 (Paper) ISSN 2225-0492 (Online)
Vol.3, No.4, 2013

www.iiste.org

2.2 Comparative overview of various Dimension reduction techniques
S.No Name of technique
Nature
Usage statistics
1
Singular
value unsupervised Most of these algorithms has the
decomposition(SVD)
primary objectives of applying it are
identification and extraction of
structural constitution within the data
2
Principle component unsupervised Most of these algorithms has Feature
analysis (PCA)
extraction objectives

3

Linear discriminant
analysis(LDA)

unsupervised

Most of these algorithms are used in
small sample size problems ( for
feature
selection
and
feature
transformation)

4

Support vector
machines (SVM)

supervised

5

Independent
Component
Analysis(ICA)

unsupervised

Most of these algorithms will be
specifically appropriate for certain
specific datasets where the sample
data set is much smaller in
comparison to large number of
features(genes)
Most of the ICA algorithms will
require whitened data by means of an
identity covariance matrix

6

Canonical
Correlation Analysis
(CCA)

unsupervised

Most of CCA algorithms are used
DNA micro array assess expressions
of numerous thousand of genes

Working mechanism
A
gene
selection
procedure have been
used (it is factorization
method)
The main focus is to
convert the values into
set
of
linear
combinations
Classification based
approach

Its main purpose is the
analysis
of
gene
expression data

It works on the
principle of additive
subcomponents
and
separation of singular
units from a large
multivariate source
Its main focus is on
discovering
linear
combinations from two
sets of variables and
then by approximates
correlations amongst
these variables

3. Our proposal
3.1 Problem statement
To increase the efficiency of data stream clustering algorithms (reduced weighted fuzzy C-means algorithm in
our case) [5] and for visualization purpose, the dimension reduction technique [3] have to be used which will
resolve the given input data in to two dimensions for better clustering quality. We will apply our proposed
technique on the data having streaming behavior and higher dimensions.
3.2 Proposed approach
In this study we propose a dimension reduced weighted fuzzy c-means algorithm. The algorithm will be
applicable for those high dimensional data sets that have streaming behavior. An example of such data sets is
live high-definition videos in internet. These data have two special properties which separate them from other
data sets: a) They have streaming behavior and b) They have higher dimensions. Firstly, we will apply the
dimension reduction technique [3] to convert the higher dimensional data stream in to less dimensions (Two
dimensions) and then after the weighted fuzzy C-means algorithm [5] will be applied for better data stream
clustering.
3.3 Algorithm
STEP 1: [Input] Input the data set components (o1, o2… ok) having both high dimensions and Streaming behavior.
STEP 2: [Reduction] Apply the proposed dimension reduction technique [3] on the given input Data.
STEP 3: [Clustering] Apply the Fuzzy C-means algorithm on the data received in the step2.
STEP 4: [Output] we get data set components (o1, o2… ok) with lesser dimensions.
3.4 Proposed Technology Used
We will implement our proposed algorithm in Java and will test its efficiency with some real high definitional

62
Control Theory and Informatics
ISSN 2224-5774 (Paper) ISSN 2225-0492 (Online)
Vol.3, No.4, 2013

data sets which have the streaming behavior.
Dimension 1

www.iiste.org

simple working model of our work proposal

I1
I2

(With two dimensions)
Elements with reduced
Dimensions

I3
:
Dimension 2
:
IK

:
: I1
Dimension n

I1

I2
I3

data

:

Proposed
Dimension
Reduction
Technique

:

I2
I3
:

Applying
Fuzzy CReduced and clustered
means
clustering
Algorithm

:

IK

IK

I1
I2
I3
:
:
IK

In the above figure there are K data elements (I1,I2,I3…IK) with n
Dimensions each
Various dimensions of element I1
Conclusion
In this paper we have presented various data stream mining techniques, as it is the popular research area now a
days ,we have also discussed various characteristics of stream data and its possible sources .As the streamed
data can be high dimensional various dimension reduction techniques were applied on it prior to it clustering .In
our survey we have provided the comparative analysis of various stream mining procedures and dimension
reduction techniques ,also the working model for mining data stream have been discussed in our proposal section
of this paper ,after reading this work one can get the quick overview about stream data mining and various
dimension reduction techniques available.
References
[1] Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, Jennifer Widom (2002). ”Models and issues
in data stream systems”. Proceedings of the 21th ACM SIGACT-SIGMOD SIGART Symposium on Principles
of Database Systems. pp. 1–16.

63
Control Theory and Informatics
ISSN 2224-5774 (Paper) ISSN 2225-0492 (Online)
Vol.3, No.4, 2013

www.iiste.org

[2] Liadan O’Chalaghan, Nina Mishra, Adam Meyerson, Sudipto Guha, Rajeev Motwani (2002).“Streaming data
algorithms for high quality clustering”. Proceedings of the 18th international conference on data engineering.
pp.685– 694.
[3] P.S. Bishnu , V. Bhattacherjee. A Dimension Reduction Technique for K-Means Clustering
[4] Pedro Rodrigues, Joao Gama, Joao Pedro Pedroso (2004). “Hierarchical time-series clustering for data
streams”. Proceedings of the first international workshop on knowledge discovery in data streams. pp. 22–31.
[5] Renxia Wan, Xiaoya Yan, Xiaoke Su; A Weighted Fuzzy Clustering Algorithm for Data Stream. 2008ISECS
International Colloquium on Computing, Communication, Control, and Management.
[6] Lingjuan Li, Xiong Li College of Computer Nanjing University of Posts and Telecommunications
Nanjing,Improved Online Stream Data Clustering Algorithm 2012 Second International Conference on Business
Computing and Global Informatization.
[7] Maoguo Gong, Member, IEEE, Yan Liang, Jiao Shi, Wenping Ma, and Jingjing MaFuzzy C Means
Clustering With Local Information and Kernel Metric for Image Segmentation IEEE TRANSACTIONS ON
IMAGE PROCESSING, VOL. 22, NO. 2, FEBRUARY 2013
[8] Carlos Ordonez (2003). “Clustering binary data streams with k-means”. Proceedings of the 8th ACM
SIGMOD workshop on research issues in data mining and knowledge discovery.pp.12–19.
[9] Amin Namadchian Gholamreza Esfandani DSCLU: a new Data Stream CLUstring algorithm for multi
density environments 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence,
Networking and Parallel/Distributed Computing.
[10] Arthur F. M. Alvim, Renata C.M.R de Souza A Fuzzy Weighted Clustering Method for Symbolic Interval
Data IEEE.
[11] Wang Jie, Zeng Yu DCWFP-Miner: Mining Closed Weighted Frequent Patterns over Data Streams 2012
9th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2012).
[12] Timothy C. Havens, Senior Member, IEEE, James C. Bezdek, Life Fellow, IEEE, Christopher Leckie,
Lawrence O. Hall, Fellow, IEEE, and Marimuthu Palaniswami, Fellow, IEEE Fuzzy c-Means Algorithms for
Very Large DataIEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 20, NO. 6, DECEMBER 2012.
[13] Weiguo Liu, Jia OuYang Clustering Algorithm for High Dimensional Data Stream over Sliding Windows
2011 International Joint Conference of IEEE TrustCom-11/IEEE ICESS-11/FCST-11.
[14] Shao-Hong Yin, Min Li Study on a modified Fuzzy C-Means Clustering Algorithm 2010
International Conference On Computer Design And Appliations (ICCDA 2010).
[15] Richard J. Hathaway and Yingkang Hu Density-Weighted Fuzzy c-Means Clustering IEEE
TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 1, FEBRUARY 2009.
[16] Nebu Varghese1, Vinay Verghese2, Prof. Gayathri. P3 and Dr. N. Jaisankar4 A SURVEY OF
DIMENSIONALITY REDUCTION AND CLASSIFICATION METHODS International Journal of Computer
Science & Engineering Survey (IJCSES) Vol.3, No.3, June 2012.

64
This academic article was published by The International Institute for Science,
Technology and Education (IISTE). The IISTE is a pioneer in the Open Access
Publishing service based in the U.S. and Europe. The aim of the institute is
Accelerating Global Knowledge Sharing.
More information about the publisher can be found in the IISTE’s homepage:
http://www.iiste.org
CALL FOR JOURNAL PAPERS
The IISTE is currently hosting more than 30 peer-reviewed academic journals and
collaborating with academic institutions around the world. There’s no deadline for
submission. Prospective authors of IISTE journals can find the submission
instruction on the following page: http://www.iiste.org/journals/
The IISTE
editorial team promises to the review and publish all the qualified submissions in a
fast manner. All the journals articles are available online to the readers all over the
world without financial, legal, or technical barriers other than those inseparable from
gaining access to the internet itself. Printed version of the journals is also available
upon request of readers and authors.
MORE RESOURCES
Book publication information: http://www.iiste.org/book/
Recent conferences: http://www.iiste.org/conference/
IISTE Knowledge Sharing Partners
EBSCO, Index Copernicus, Ulrich's Periodicals Directory, JournalTOCS, PKP Open
Archives Harvester, Bielefeld Academic Search Engine, Elektronische
Zeitschriftenbibliothek EZB, Open J-Gate, OCLC WorldCat, Universe Digtial
Library , NewJour, Google Scholar

Weitere ähnliche Inhalte

Was ist angesagt?

MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
 
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...cscpconf
 
Iss 6
Iss 6Iss 6
Iss 6ijgca
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
 
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval IJECEIAES
 
Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...
Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...
Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...IJCSIS Research Publications
 
Data characterization towards modeling frequent pattern mining algorithms
Data characterization towards modeling frequent pattern mining algorithmsData characterization towards modeling frequent pattern mining algorithms
Data characterization towards modeling frequent pattern mining algorithmscsandit
 
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringA Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
 
Data Imputation by Soft Computing
Data Imputation by Soft ComputingData Imputation by Soft Computing
Data Imputation by Soft Computingijtsrd
 
Data performance characterization of frequent pattern mining algorithms
Data performance characterization of frequent pattern mining algorithmsData performance characterization of frequent pattern mining algorithms
Data performance characterization of frequent pattern mining algorithmsIJDKP
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
 
Data stream mining techniques: a review
Data stream mining techniques: a reviewData stream mining techniques: a review
Data stream mining techniques: a reviewTELKOMNIKA JOURNAL
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
 
QUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENT
QUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENTQUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENT
QUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENTcsandit
 
IRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET Journal
 

Was ist angesagt? (19)

MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
 
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
 
Aa31163168
Aa31163168Aa31163168
Aa31163168
 
Iss 6
Iss 6Iss 6
Iss 6
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
 
Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...
Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...
Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...
 
Data characterization towards modeling frequent pattern mining algorithms
Data characterization towards modeling frequent pattern mining algorithmsData characterization towards modeling frequent pattern mining algorithms
Data characterization towards modeling frequent pattern mining algorithms
 
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringA Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
 
Data Imputation by Soft Computing
Data Imputation by Soft ComputingData Imputation by Soft Computing
Data Imputation by Soft Computing
 
Data performance characterization of frequent pattern mining algorithms
Data performance characterization of frequent pattern mining algorithmsData performance characterization of frequent pattern mining algorithms
Data performance characterization of frequent pattern mining algorithms
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithm
 
Data stream mining techniques: a review
Data stream mining techniques: a reviewData stream mining techniques: a review
Data stream mining techniques: a review
 
Fn3110961103
Fn3110961103Fn3110961103
Fn3110961103
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm
 
IJCSIT
IJCSITIJCSIT
IJCSIT
 
50120140505013
5012014050501350120140505013
50120140505013
 
QUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENT
QUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENTQUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENT
QUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENT
 
IRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction Data
 

Andere mochten auch

QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE
QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE
QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE IJORCS
 
WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)
WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)
WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)IJERD Editor
 
Improved Performance of Unsupervised Method by Renovated K-Means
Improved Performance of Unsupervised Method by Renovated K-MeansImproved Performance of Unsupervised Method by Renovated K-Means
Improved Performance of Unsupervised Method by Renovated K-MeansIJASCSE
 
Trabajo de musica 3 diver adrian naranjo hernandez ajajjajajajja
Trabajo de musica 3 diver adrian naranjo hernandez ajajjajajajjaTrabajo de musica 3 diver adrian naranjo hernandez ajajjajajajja
Trabajo de musica 3 diver adrian naranjo hernandez ajajjajajajjaadriannaranjo3
 

Andere mochten auch (6)

50120130406008
5012013040600850120130406008
50120130406008
 
QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE
QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE
QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE
 
WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)
WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)
WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)
 
Improved Performance of Unsupervised Method by Renovated K-Means
Improved Performance of Unsupervised Method by Renovated K-MeansImproved Performance of Unsupervised Method by Renovated K-Means
Improved Performance of Unsupervised Method by Renovated K-Means
 
By
ByBy
By
 
Trabajo de musica 3 diver adrian naranjo hernandez ajajjajajajja
Trabajo de musica 3 diver adrian naranjo hernandez ajajjajajajjaTrabajo de musica 3 diver adrian naranjo hernandez ajajjajajajja
Trabajo de musica 3 diver adrian naranjo hernandez ajajjajajajja
 

Ähnlich wie Comparative analysis of various data stream mining procedures and various dimension reduction techniques

Parametric comparison based on split criterion on classification algorithm
Parametric comparison based on split criterion on classification algorithmParametric comparison based on split criterion on classification algorithm
Parametric comparison based on split criterion on classification algorithmIAEME Publication
 
A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
 
Drsp dimension reduction for similarity matching and pruning of time series ...
Drsp  dimension reduction for similarity matching and pruning of time series ...Drsp  dimension reduction for similarity matching and pruning of time series ...
Drsp dimension reduction for similarity matching and pruning of time series ...IJDKP
 
Cross Domain Data Fusion
Cross Domain Data FusionCross Domain Data Fusion
Cross Domain Data FusionIRJET Journal
 
T AXONOMY OF O PTIMIZATION A PPROACHES OF R ESOURCE B ROKERS IN D ATA G RIDS
T AXONOMY OF  O PTIMIZATION  A PPROACHES OF R ESOURCE B ROKERS IN  D ATA  G RIDST AXONOMY OF  O PTIMIZATION  A PPROACHES OF R ESOURCE B ROKERS IN  D ATA  G RIDS
T AXONOMY OF O PTIMIZATION A PPROACHES OF R ESOURCE B ROKERS IN D ATA G RIDSijcsit
 
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE cscpconf
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYEditor IJMTER
 
A Firefly based improved clustering algorithm
A Firefly based improved clustering algorithmA Firefly based improved clustering algorithm
A Firefly based improved clustering algorithmIRJET Journal
 
A study and survey on various progressive duplicate detection mechanisms
A study and survey on various progressive duplicate detection mechanismsA study and survey on various progressive duplicate detection mechanisms
A study and survey on various progressive duplicate detection mechanismseSAT Journals
 
Ijricit 01-002 enhanced replica detection in short time for large data sets
Ijricit 01-002 enhanced replica detection in  short time for large data setsIjricit 01-002 enhanced replica detection in  short time for large data sets
Ijricit 01-002 enhanced replica detection in short time for large data setsIjripublishers Ijri
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisIRJET Journal
 
Development of pattern knowledge discovery framework using
Development of pattern knowledge discovery framework usingDevelopment of pattern knowledge discovery framework using
Development of pattern knowledge discovery framework usingIAEME Publication
 
Study of Density Based Clustering Techniques on Data Streams
Study of Density Based Clustering Techniques on Data StreamsStudy of Density Based Clustering Techniques on Data Streams
Study of Density Based Clustering Techniques on Data StreamsIJERA Editor
 
A STUDY OF TRADITIONAL DATA ANALYSIS AND SENSOR DATA ANALYTICS
A STUDY OF TRADITIONAL DATA ANALYSIS AND SENSOR DATA ANALYTICSA STUDY OF TRADITIONAL DATA ANALYSIS AND SENSOR DATA ANALYTICS
A STUDY OF TRADITIONAL DATA ANALYSIS AND SENSOR DATA ANALYTICSijistjournal
 
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...cscpconf
 
A frame work for clustering time evolving data
A frame work for clustering time evolving dataA frame work for clustering time evolving data
A frame work for clustering time evolving dataiaemedu
 
REVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining TechniquesREVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining TechniquesEditor IJMTER
 
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...IRJET Journal
 

Ähnlich wie Comparative analysis of various data stream mining procedures and various dimension reduction techniques (20)

Parametric comparison based on split criterion on classification algorithm
Parametric comparison based on split criterion on classification algorithmParametric comparison based on split criterion on classification algorithm
Parametric comparison based on split criterion on classification algorithm
 
A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...
 
Drsp dimension reduction for similarity matching and pruning of time series ...
Drsp  dimension reduction for similarity matching and pruning of time series ...Drsp  dimension reduction for similarity matching and pruning of time series ...
Drsp dimension reduction for similarity matching and pruning of time series ...
 
Cross Domain Data Fusion
Cross Domain Data FusionCross Domain Data Fusion
Cross Domain Data Fusion
 
T AXONOMY OF O PTIMIZATION A PPROACHES OF R ESOURCE B ROKERS IN D ATA G RIDS
T AXONOMY OF  O PTIMIZATION  A PPROACHES OF R ESOURCE B ROKERS IN  D ATA  G RIDST AXONOMY OF  O PTIMIZATION  A PPROACHES OF R ESOURCE B ROKERS IN  D ATA  G RIDS
T AXONOMY OF O PTIMIZATION A PPROACHES OF R ESOURCE B ROKERS IN D ATA G RIDS
 
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
 
A Firefly based improved clustering algorithm
A Firefly based improved clustering algorithmA Firefly based improved clustering algorithm
A Firefly based improved clustering algorithm
 
A study and survey on various progressive duplicate detection mechanisms
A study and survey on various progressive duplicate detection mechanismsA study and survey on various progressive duplicate detection mechanisms
A study and survey on various progressive duplicate detection mechanisms
 
Ijricit 01-002 enhanced replica detection in short time for large data sets
Ijricit 01-002 enhanced replica detection in  short time for large data setsIjricit 01-002 enhanced replica detection in  short time for large data sets
Ijricit 01-002 enhanced replica detection in short time for large data sets
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend Analysis
 
Development of pattern knowledge discovery framework using
Development of pattern knowledge discovery framework usingDevelopment of pattern knowledge discovery framework using
Development of pattern knowledge discovery framework using
 
E502024047
E502024047E502024047
E502024047
 
E502024047
E502024047E502024047
E502024047
 
Study of Density Based Clustering Techniques on Data Streams
Study of Density Based Clustering Techniques on Data StreamsStudy of Density Based Clustering Techniques on Data Streams
Study of Density Based Clustering Techniques on Data Streams
 
A STUDY OF TRADITIONAL DATA ANALYSIS AND SENSOR DATA ANALYTICS
A STUDY OF TRADITIONAL DATA ANALYSIS AND SENSOR DATA ANALYTICSA STUDY OF TRADITIONAL DATA ANALYSIS AND SENSOR DATA ANALYTICS
A STUDY OF TRADITIONAL DATA ANALYSIS AND SENSOR DATA ANALYTICS
 
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
 
A frame work for clustering time evolving data
A frame work for clustering time evolving dataA frame work for clustering time evolving data
A frame work for clustering time evolving data
 
REVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining TechniquesREVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining Techniques
 
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
 

Mehr von Alexander Decker

Abnormalities of hormones and inflammatory cytokines in women affected with p...
Abnormalities of hormones and inflammatory cytokines in women affected with p...Abnormalities of hormones and inflammatory cytokines in women affected with p...
Abnormalities of hormones and inflammatory cytokines in women affected with p...Alexander Decker
 
A validation of the adverse childhood experiences scale in
A validation of the adverse childhood experiences scale inA validation of the adverse childhood experiences scale in
A validation of the adverse childhood experiences scale inAlexander Decker
 
A usability evaluation framework for b2 c e commerce websites
A usability evaluation framework for b2 c e commerce websitesA usability evaluation framework for b2 c e commerce websites
A usability evaluation framework for b2 c e commerce websitesAlexander Decker
 
A universal model for managing the marketing executives in nigerian banks
A universal model for managing the marketing executives in nigerian banksA universal model for managing the marketing executives in nigerian banks
A universal model for managing the marketing executives in nigerian banksAlexander Decker
 
A unique common fixed point theorems in generalized d
A unique common fixed point theorems in generalized dA unique common fixed point theorems in generalized d
A unique common fixed point theorems in generalized dAlexander Decker
 
A trends of salmonella and antibiotic resistance
A trends of salmonella and antibiotic resistanceA trends of salmonella and antibiotic resistance
A trends of salmonella and antibiotic resistanceAlexander Decker
 
A transformational generative approach towards understanding al-istifham
A transformational  generative approach towards understanding al-istifhamA transformational  generative approach towards understanding al-istifham
A transformational generative approach towards understanding al-istifhamAlexander Decker
 
A time series analysis of the determinants of savings in namibia
A time series analysis of the determinants of savings in namibiaA time series analysis of the determinants of savings in namibia
A time series analysis of the determinants of savings in namibiaAlexander Decker
 
A therapy for physical and mental fitness of school children
A therapy for physical and mental fitness of school childrenA therapy for physical and mental fitness of school children
A therapy for physical and mental fitness of school childrenAlexander Decker
 
A theory of efficiency for managing the marketing executives in nigerian banks
A theory of efficiency for managing the marketing executives in nigerian banksA theory of efficiency for managing the marketing executives in nigerian banks
A theory of efficiency for managing the marketing executives in nigerian banksAlexander Decker
 
A systematic evaluation of link budget for
A systematic evaluation of link budget forA systematic evaluation of link budget for
A systematic evaluation of link budget forAlexander Decker
 
A synthetic review of contraceptive supplies in punjab
A synthetic review of contraceptive supplies in punjabA synthetic review of contraceptive supplies in punjab
A synthetic review of contraceptive supplies in punjabAlexander Decker
 
A synthesis of taylor’s and fayol’s management approaches for managing market...
A synthesis of taylor’s and fayol’s management approaches for managing market...A synthesis of taylor’s and fayol’s management approaches for managing market...
A synthesis of taylor’s and fayol’s management approaches for managing market...Alexander Decker
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalA survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalAlexander Decker
 
A survey on live virtual machine migrations and its techniques
A survey on live virtual machine migrations and its techniquesA survey on live virtual machine migrations and its techniques
A survey on live virtual machine migrations and its techniquesAlexander Decker
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbAlexander Decker
 
A survey on challenges to the media cloud
A survey on challenges to the media cloudA survey on challenges to the media cloud
A survey on challenges to the media cloudAlexander Decker
 
A survey of provenance leveraged
A survey of provenance leveragedA survey of provenance leveraged
A survey of provenance leveragedAlexander Decker
 
A survey of private equity investments in kenya
A survey of private equity investments in kenyaA survey of private equity investments in kenya
A survey of private equity investments in kenyaAlexander Decker
 
A study to measures the financial health of
A study to measures the financial health ofA study to measures the financial health of
A study to measures the financial health ofAlexander Decker
 

Mehr von Alexander Decker (20)

Abnormalities of hormones and inflammatory cytokines in women affected with p...
Abnormalities of hormones and inflammatory cytokines in women affected with p...Abnormalities of hormones and inflammatory cytokines in women affected with p...
Abnormalities of hormones and inflammatory cytokines in women affected with p...
 
A validation of the adverse childhood experiences scale in
A validation of the adverse childhood experiences scale inA validation of the adverse childhood experiences scale in
A validation of the adverse childhood experiences scale in
 
A usability evaluation framework for b2 c e commerce websites
A usability evaluation framework for b2 c e commerce websitesA usability evaluation framework for b2 c e commerce websites
A usability evaluation framework for b2 c e commerce websites
 
A universal model for managing the marketing executives in nigerian banks
A universal model for managing the marketing executives in nigerian banksA universal model for managing the marketing executives in nigerian banks
A universal model for managing the marketing executives in nigerian banks
 
A unique common fixed point theorems in generalized d
A unique common fixed point theorems in generalized dA unique common fixed point theorems in generalized d
A unique common fixed point theorems in generalized d
 
A trends of salmonella and antibiotic resistance
A trends of salmonella and antibiotic resistanceA trends of salmonella and antibiotic resistance
A trends of salmonella and antibiotic resistance
 
A transformational generative approach towards understanding al-istifham
A transformational  generative approach towards understanding al-istifhamA transformational  generative approach towards understanding al-istifham
A transformational generative approach towards understanding al-istifham
 
A time series analysis of the determinants of savings in namibia
A time series analysis of the determinants of savings in namibiaA time series analysis of the determinants of savings in namibia
A time series analysis of the determinants of savings in namibia
 
A therapy for physical and mental fitness of school children
A therapy for physical and mental fitness of school childrenA therapy for physical and mental fitness of school children
A therapy for physical and mental fitness of school children
 
A theory of efficiency for managing the marketing executives in nigerian banks
A theory of efficiency for managing the marketing executives in nigerian banksA theory of efficiency for managing the marketing executives in nigerian banks
A theory of efficiency for managing the marketing executives in nigerian banks
 
A systematic evaluation of link budget for
A systematic evaluation of link budget forA systematic evaluation of link budget for
A systematic evaluation of link budget for
 
A synthetic review of contraceptive supplies in punjab
A synthetic review of contraceptive supplies in punjabA synthetic review of contraceptive supplies in punjab
A synthetic review of contraceptive supplies in punjab
 
A synthesis of taylor’s and fayol’s management approaches for managing market...
A synthesis of taylor’s and fayol’s management approaches for managing market...A synthesis of taylor’s and fayol’s management approaches for managing market...
A synthesis of taylor’s and fayol’s management approaches for managing market...
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalA survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incremental
 
A survey on live virtual machine migrations and its techniques
A survey on live virtual machine migrations and its techniquesA survey on live virtual machine migrations and its techniques
A survey on live virtual machine migrations and its techniques
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
 
A survey on challenges to the media cloud
A survey on challenges to the media cloudA survey on challenges to the media cloud
A survey on challenges to the media cloud
 
A survey of provenance leveraged
A survey of provenance leveragedA survey of provenance leveraged
A survey of provenance leveraged
 
A survey of private equity investments in kenya
A survey of private equity investments in kenyaA survey of private equity investments in kenya
A survey of private equity investments in kenya
 
A study to measures the financial health of
A study to measures the financial health ofA study to measures the financial health of
A study to measures the financial health of
 

Kürzlich hochgeladen

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Kürzlich hochgeladen (20)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Comparative analysis of various data stream mining procedures and various dimension reduction techniques

  • 1. Control Theory and Informatics ISSN 2224-5774 (Paper) ISSN 2225-0492 (Online) Vol.3, No.4, 2013 www.iiste.org Comparative Analysis of Various Data Stream Mining Procedures and Various Dimension Reduction Techniques Diksha Upadhyay, Susheel Jain, Anurag Jain Department of Computer Science,RITS,Bhopal, India Email: {diksha.du31@gmail.com, jain_susheel65@yahoo.co.in, anurag.akjain@gmail.com} Abstract In recent years data mining is contributing to be the great research area, as we know data mining is the process of extracting needful information from the given set of data which will be further used for various purposes, it could be for commercial use or for scientific use .while fetching the information (mined data) proper methodologies with good approximations have to be used .In our survey we have provided the study about various data stream clustering techniques and various dimension reduction techniques with their characteristics to improve the quality of clustering, we have also provided our approach(our proposal) for clustering the streamed data using suitable procedures ,In our approach for stream data mining a dimension reduction technique have been used then after the Fuzzy C-means algorithm have been applied on it to improve the quality of clustering. Keywords: Data Stream, Dimension Reduction, Clustering 1. Introduction In the growing economy ,as there is the tremendous growth in Information technology so as in data repositories which will be in any form for e.g. From the prospective of commercial sector each firm will store the information(data) regarding their turnover, about their customer statistics ,about challenging factors for an enterprise, profit areas etc of several previous years in the data repositories .data mining is the research area in which the important information will be dig out from the given repository of data store .The generated (produced) information ,then be further used by the business leaders (business analysts) of an enterprise to take the strategic decisions regarding the future policies ,but this is only the commercial application of data mining it can be used mostly in scientific research also. In this study our intension is on presenting the comparison between several data stream mining techniques .Data stream mining is the hot research area now a days in which the main task is to form the cluster (Fuzzy nature). There is continuous generating data stream [1] from many sources such as from sensor networks, from web user click events, from the flow of internet traffic etc. The streamed data generated from various sources will exhibits the similar characteristics such as infinite arriving nature, uncertain arriving speed, and it have to be scanned in the single pass, the mining phenomena have to be performed on data stream with some conventions such as limited memory space, and with in limited time constraint so the more efficient mining algorithms have to be drawn. In mining the data clusters have been found using clustering, which is the method of generating several chunks of provided input data , with each chunk (cluster) composed up of elements(data objects) having similar characteristics. Here we are providing the snap shot of various data stream clustering algorithms. In [2] STREAM procedure have been presented to cluster data stream ,To cluster binary stream data a procedure [8] have been presented to cluster data stream where the modification on K-means have been proposed for better result .A time- series clustering mechanism has been presented in [4] to generate the hierarchy of clusters and many more. 1.1 Dimension reduction As we know the generated streamed data (upcoming) from various sources can be of many dimensions from the prospective of geometrical overview, or it may be possible that it will contain dozen of dimensions, so to cluster such data stream is very critical using conventional clustering procedures such as K-means algorithm because we know clustering have to be done with in limited memory capacity and time constraint. From the survey it have been identified that the clustering algorithms are less efficient with the high dimensional datasets, so to improve their efficiency data sets will passed through some respective dimension reduction procedure [16] such as SVD, PCA, SVM etc, and then after on the reduced dataset the clustering algorithm will be applied for better clustering, we have also provided the comparative analysis for various dimension reduction techniques. 1.2 Objective of our work In this study various data stream mining techniques and dimension reduction techniques have been evaluated on the basis of their usage statistics, application parameters, working mechanism etc. 60
  • 2. Control Theory and Informatics ISSN 2224-5774 (Paper) ISSN 2225-0492 (Online) Vol.3, No.4, 2013 www.iiste.org 1.3 Structure of This paper After providing the overview of various concepts in section1,the comparison between various data stream mining algorithms and dimension reduction techniques have been presented in section 2,and then after our proposal have been presented with the working model in the section3,finally section 4 will concludes the paper with possible future enhancements. 2. Comparative analysis 2.1 Overview of various Fuzzy and other clustering methods (algorithms) S. Name of Type of data on which Working mechanism No technique it operates 1 Istrap [6] Stream data Static analysis is performed, and the methods have been proposed to remove outliers. 2 WFCM [5] Stream data Data stream is divided in to chunks as per the arrival .the main idea of WFCM is renewing the weighted clustering centers by iterations till the cost function get a satisfying result A trade-off fuzzy factor and kernel method is used C++ Improved FCM [7] Synthetic images 4 DSCLU [9] Stream data (real &synthetic data) It separates the process in phases in online phase and offline phase and then operates 5 IFCM [10] Interval valued symbolic data (with real & synthetic symbolic interval) This method is objective function based calculation technique Fuzzy corrected rank index 6 DCWFPminer [11] VLFCM [12] HSW stream [13] Modified fuzzy Cmeans [14] It uses divide and conquer strategy and from bottom to top ,depth first recursive method to mine the closed weighted frequent pattern of CWFP tree in sliding window Methods have been used to extend the fuzzy Cmeans algorithm Projected clustering technique is used C language 7 Data stream (continuous, unbounded and high speed data streams) Very large data 9 10 Density WFCM [15] real Matlab 3 8 and Implemented on (tool used) High dimensional stream data Stream data Various featured and relational datasets are used Mahalanobis distance algorithm is used The weighted and relational algorithm is proposed in it (WNERFCM) . 61 Matlab JAVA 1.6 Matlab Microsoft Visual c++ Matlab Matlab
  • 3. Control Theory and Informatics ISSN 2224-5774 (Paper) ISSN 2225-0492 (Online) Vol.3, No.4, 2013 www.iiste.org 2.2 Comparative overview of various Dimension reduction techniques S.No Name of technique Nature Usage statistics 1 Singular value unsupervised Most of these algorithms has the decomposition(SVD) primary objectives of applying it are identification and extraction of structural constitution within the data 2 Principle component unsupervised Most of these algorithms has Feature analysis (PCA) extraction objectives 3 Linear discriminant analysis(LDA) unsupervised Most of these algorithms are used in small sample size problems ( for feature selection and feature transformation) 4 Support vector machines (SVM) supervised 5 Independent Component Analysis(ICA) unsupervised Most of these algorithms will be specifically appropriate for certain specific datasets where the sample data set is much smaller in comparison to large number of features(genes) Most of the ICA algorithms will require whitened data by means of an identity covariance matrix 6 Canonical Correlation Analysis (CCA) unsupervised Most of CCA algorithms are used DNA micro array assess expressions of numerous thousand of genes Working mechanism A gene selection procedure have been used (it is factorization method) The main focus is to convert the values into set of linear combinations Classification based approach Its main purpose is the analysis of gene expression data It works on the principle of additive subcomponents and separation of singular units from a large multivariate source Its main focus is on discovering linear combinations from two sets of variables and then by approximates correlations amongst these variables 3. Our proposal 3.1 Problem statement To increase the efficiency of data stream clustering algorithms (reduced weighted fuzzy C-means algorithm in our case) [5] and for visualization purpose, the dimension reduction technique [3] have to be used which will resolve the given input data in to two dimensions for better clustering quality. We will apply our proposed technique on the data having streaming behavior and higher dimensions. 3.2 Proposed approach In this study we propose a dimension reduced weighted fuzzy c-means algorithm. The algorithm will be applicable for those high dimensional data sets that have streaming behavior. An example of such data sets is live high-definition videos in internet. These data have two special properties which separate them from other data sets: a) They have streaming behavior and b) They have higher dimensions. Firstly, we will apply the dimension reduction technique [3] to convert the higher dimensional data stream in to less dimensions (Two dimensions) and then after the weighted fuzzy C-means algorithm [5] will be applied for better data stream clustering. 3.3 Algorithm STEP 1: [Input] Input the data set components (o1, o2… ok) having both high dimensions and Streaming behavior. STEP 2: [Reduction] Apply the proposed dimension reduction technique [3] on the given input Data. STEP 3: [Clustering] Apply the Fuzzy C-means algorithm on the data received in the step2. STEP 4: [Output] we get data set components (o1, o2… ok) with lesser dimensions. 3.4 Proposed Technology Used We will implement our proposed algorithm in Java and will test its efficiency with some real high definitional 62
  • 4. Control Theory and Informatics ISSN 2224-5774 (Paper) ISSN 2225-0492 (Online) Vol.3, No.4, 2013 data sets which have the streaming behavior. Dimension 1 www.iiste.org simple working model of our work proposal I1 I2 (With two dimensions) Elements with reduced Dimensions I3 : Dimension 2 : IK : : I1 Dimension n I1 I2 I3 data : Proposed Dimension Reduction Technique : I2 I3 : Applying Fuzzy CReduced and clustered means clustering Algorithm : IK IK I1 I2 I3 : : IK In the above figure there are K data elements (I1,I2,I3…IK) with n Dimensions each Various dimensions of element I1 Conclusion In this paper we have presented various data stream mining techniques, as it is the popular research area now a days ,we have also discussed various characteristics of stream data and its possible sources .As the streamed data can be high dimensional various dimension reduction techniques were applied on it prior to it clustering .In our survey we have provided the comparative analysis of various stream mining procedures and dimension reduction techniques ,also the working model for mining data stream have been discussed in our proposal section of this paper ,after reading this work one can get the quick overview about stream data mining and various dimension reduction techniques available. References [1] Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, Jennifer Widom (2002). ”Models and issues in data stream systems”. Proceedings of the 21th ACM SIGACT-SIGMOD SIGART Symposium on Principles of Database Systems. pp. 1–16. 63
  • 5. Control Theory and Informatics ISSN 2224-5774 (Paper) ISSN 2225-0492 (Online) Vol.3, No.4, 2013 www.iiste.org [2] Liadan O’Chalaghan, Nina Mishra, Adam Meyerson, Sudipto Guha, Rajeev Motwani (2002).“Streaming data algorithms for high quality clustering”. Proceedings of the 18th international conference on data engineering. pp.685– 694. [3] P.S. Bishnu , V. Bhattacherjee. A Dimension Reduction Technique for K-Means Clustering [4] Pedro Rodrigues, Joao Gama, Joao Pedro Pedroso (2004). “Hierarchical time-series clustering for data streams”. Proceedings of the first international workshop on knowledge discovery in data streams. pp. 22–31. [5] Renxia Wan, Xiaoya Yan, Xiaoke Su; A Weighted Fuzzy Clustering Algorithm for Data Stream. 2008ISECS International Colloquium on Computing, Communication, Control, and Management. [6] Lingjuan Li, Xiong Li College of Computer Nanjing University of Posts and Telecommunications Nanjing,Improved Online Stream Data Clustering Algorithm 2012 Second International Conference on Business Computing and Global Informatization. [7] Maoguo Gong, Member, IEEE, Yan Liang, Jiao Shi, Wenping Ma, and Jingjing MaFuzzy C Means Clustering With Local Information and Kernel Metric for Image Segmentation IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 2, FEBRUARY 2013 [8] Carlos Ordonez (2003). “Clustering binary data streams with k-means”. Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery.pp.12–19. [9] Amin Namadchian Gholamreza Esfandani DSCLU: a new Data Stream CLUstring algorithm for multi density environments 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. [10] Arthur F. M. Alvim, Renata C.M.R de Souza A Fuzzy Weighted Clustering Method for Symbolic Interval Data IEEE. [11] Wang Jie, Zeng Yu DCWFP-Miner: Mining Closed Weighted Frequent Patterns over Data Streams 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2012). [12] Timothy C. Havens, Senior Member, IEEE, James C. Bezdek, Life Fellow, IEEE, Christopher Leckie, Lawrence O. Hall, Fellow, IEEE, and Marimuthu Palaniswami, Fellow, IEEE Fuzzy c-Means Algorithms for Very Large DataIEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 20, NO. 6, DECEMBER 2012. [13] Weiguo Liu, Jia OuYang Clustering Algorithm for High Dimensional Data Stream over Sliding Windows 2011 International Joint Conference of IEEE TrustCom-11/IEEE ICESS-11/FCST-11. [14] Shao-Hong Yin, Min Li Study on a modified Fuzzy C-Means Clustering Algorithm 2010 International Conference On Computer Design And Appliations (ICCDA 2010). [15] Richard J. Hathaway and Yingkang Hu Density-Weighted Fuzzy c-Means Clustering IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 1, FEBRUARY 2009. [16] Nebu Varghese1, Vinay Verghese2, Prof. Gayathri. P3 and Dr. N. Jaisankar4 A SURVEY OF DIMENSIONALITY REDUCTION AND CLASSIFICATION METHODS International Journal of Computer Science & Engineering Survey (IJCSES) Vol.3, No.3, June 2012. 64
  • 6. This academic article was published by The International Institute for Science, Technology and Education (IISTE). The IISTE is a pioneer in the Open Access Publishing service based in the U.S. and Europe. The aim of the institute is Accelerating Global Knowledge Sharing. More information about the publisher can be found in the IISTE’s homepage: http://www.iiste.org CALL FOR JOURNAL PAPERS The IISTE is currently hosting more than 30 peer-reviewed academic journals and collaborating with academic institutions around the world. There’s no deadline for submission. Prospective authors of IISTE journals can find the submission instruction on the following page: http://www.iiste.org/journals/ The IISTE editorial team promises to the review and publish all the qualified submissions in a fast manner. All the journals articles are available online to the readers all over the world without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. Printed version of the journals is also available upon request of readers and authors. MORE RESOURCES Book publication information: http://www.iiste.org/book/ Recent conferences: http://www.iiste.org/conference/ IISTE Knowledge Sharing Partners EBSCO, Index Copernicus, Ulrich's Periodicals Directory, JournalTOCS, PKP Open Archives Harvester, Bielefeld Academic Search Engine, Elektronische Zeitschriftenbibliothek EZB, Open J-Gate, OCLC WorldCat, Universe Digtial Library , NewJour, Google Scholar