SlideShare ist ein Scribd-Unternehmen logo
1 von 5
Downloaden Sie, um offline zu lesen
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 07 | July -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 743
Mining Big Data using Genetic Algorithm
Surbhi Jain
Assistant Professor, Department of Computer Science, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract – In today’s era, the amount of data available in
the world is growing at a very rapid pace day by day because
of the use of internet, smart phones, social networks, etc. This
collection of large and complex data sets is referred to as Big
Data. Primitive database systems are unable to capture, store
and analyse this large amount of data. It is necessary to
improve the text processing so that the information or the
relevant knowledge which was previously unknown can be
mined from the text. This paper proposes need for an
algorithm for the clustering problem of big data using a
combination of the genetic algorithm with some of the known
clustering algorithms. The main idea behind this istocombine
the advantages of Geneticalgorithmsandclusteringtoprocess
large amount of data. Genetic Algorithm is an algorithm
which is used to optimize the results. This paper gives an
overview of concepts like data mining, genetic algorithmsand
big data.
Key Words: Genetic Algorithms, Big Data, Clustering,
Chromosomes, Mining
1. INTRODUCTION
In current Big Data age the data is becoming more and more
available owing to advances in information and
communication knowhow, enterprises are gaining
meaningful information,relevantknowledgeandvisionfrom
this huge data based on decision making. Big data mining is
the ability of taking out valuable information from huge and
complex set of data or data streams i.e. Big Data. One of the
important data mining techniques for big data analysis is
clustering. There are difficulties for applying clustering
techniques to big data due to enormous amount of data
rising on daily basis. There are a lot of clustering techniques
available the most common of which is the K-means
algorithm. It is used to analyze information from a dataset.
But as we are saying that because of big data we have
plethora of data available, thus available clustering
algorithms are not very efficient. As Big Data refers to
terabytes and petabytes of data, we need to have clustering
algorithms with high computational costs. We can think of
designing an algorithm which can combine the features of
some of the clustering algorithms and genetic algorithm to
process big data.
To extract some meaningful information from the source
data is the process called Mining. It is a set of computerized
techniques that are used to extract formerly unknown or
buried information from largesetsofdatabases.ASuccessful
Data Mining makes possible to uncover patterns and
relationships, and then to use this “new” information for
making proactive knowledge-driven business decisions.
There are a lot of algorithms whicharebeingusedformining
the information from plain text. Thealgorithmsusedtosolve
the optimization problems aretheGeneticAlgorithms.These
algorithms work on search based inputs. The algorithms
eventually leads to generate useful solutions forsuchkindof
problems.
2. GENETIC ALGORITHMS
Genetic Algorithms are a clan of computational prototypes
inspired by evolution theory of Darwin.AccordingtoDarwin
the species which is fittest and can adapt to changing
surroundings can survive; the remaining tends to die away.
Darwin also stated that “the survival of an organism can be
maintained through the process of reproduction, crossover
and mutation”. GA’s basic working mechanism is as follows:
the algorithm is started with a set of solutions (represented
by chromosomes) called population. Solutions from one
population are taken and used to form a new population
(reproduction). This is driven by optimism, that the new
population will be superior to the old one. This is the reason
they are often termed as optimistic search algorithms. The
reproductive prospects are distributed in such a way that
those chromosomes which represent a better solutionto the
target problem are given more chances to reproduce than
those which represent inferior solutions.
They search through a huge combination of parameters to
find the best match. For example, they can search through
different combinations of materials and designs to find the
perfect combination of both which could resultina stronger,
lighter and overall, better final product.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 07 | July -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 744
As an example we can consider “Face Recognition Systems”
which are used for drawingsketchesbasedonvisualizations.
This system is majorly used for investigation purposes
where in sketch of some criminal is to be made on the basis
of description given by some eye witness to the crime. The
initial population is nothing but a lot of facial features which
are already there in the system. The features may include a
lots of varieties of noses, ears, lips, eyes etc. They may differ
in color, size or anything else. As the witness starts giving
descriptions the features which are most likely to match can
be selected (Selection). The selectedfeaturescanthenfollow
the steps of cross-over and mutation to produce more likely
features. As in eyes of one face and lips of another can be
chosen to go for cross over to produce a new individual
which has both the features matching with the criminal. The
process continues till the witness recognizes thefinal face as
the one desired.
3. BIG DATA
Big data is a term for data sets that are so large or complex
that primitive data processing application software is
inadequate to deal with them. Big data represents a new
period in data study and utilization. It is a leveraging open
source technology- a robust, secure, highly available,
enterprise-class Big Data platform. Challenges include
capture, storage, analysis, querying,andupdatingdata safely
and securely. While the term “big data” is relatively new, the
doing of collecting and storing plethora of information for
eventual analysis is ages old.
The significance of big data is not based on how much data
we have, but how we use that data. We can take data from
any source and analyze it to find responses that enable us to
produce results in reducedcostandtimewithsmartdecision
making. Here in this paper we are trying to combine bigdata
with genetic algorithms for generating efficient analysis of
data. The reason for the interest in genetic algorithmsisthat
these are very powerful and broadly applicable search
techniques. As said earlier also, Big Data refers to large-
volume, complex, growing data sets with numerous, self-
directed sources. Big Data are now rapidly expanding in all
fields like science and engineering, including physical,
biological andbiomedical scienceswiththefastdevelopment
of networking, data storage, and the data collectioncapacity.
With the new technology of Big Data, the computations can
be speeded up. In very usual cases, if our system starts
getting heavy because of loads of data whichisbecoming too
big for our system to be managed, we add RAM or vacate
some space by deleting certain processes. Big data on the
contrary, adds more systems to the pool and there by
promote parallelism.Thishoweverleadstofaulttoleranceas
a consequence. More the number of systems, more is the
probability of system failures. Fortunately, big data handles
this automatically by duplicating data on the systemssothat
if one system fails, its data can be redirected to some other
system.
4. DATA MINING
The knowledge from the data sets is extracted using Data
Mining technology. It is used to search and analyze data.The
data to be mined varies from a small data settoanenormous
sized data set i.e. big data. In Data Mining, the source data is
kept in the format of databases i.e. in the form of tables if we
are considering relational databases. We only have to apply
the algorithms to extract data from databases. The Data
Mining environment produces voluminous data. The
information retrieved in the data Miningstepistransformed
into the structure that is easily understood by users. Once
data has been extracted and then transformed, it is loaded
into systems from where we can read it. The various
methods like genetic algorithms, support vector machines,
decision tree, neural network andclusteranalysistodisclose
the hidden patterns inside the huge amounts of data set are
all included in data mining.
For handling such large amount of data sets, various
algorithms which define various structures and approaches
implemented to handle Big Data are needed. They also
defines the various tools that were developed for analyzing
them. Data mining and Text Mining are often used
synonymously which howeverisnotright.Although both are
mining techniques, but there is a very thin line of difference
between the two. Data mining refers to the process of
extraction of useful text from the databases which is not
known prior, while text mining refers to extraction of useful
and knowledgeable data from the plain text i.e. the naturally
occurring text. Unlike data mining, this text need not be
transformed into any other format.
5. CLUSTERING
Clustering refers to categorizing similar kind of objects. It is
a method of exploring the data, a technique of finding out
patterns in the dataset. It falls in the category of
unsupervised learning i.e. we don’t know in advance how
data should group the data objects (of similar types)
together. It is one of the most vital research field in the data
mining. In clustering we aim at making collections of objects
in such a manner that the objects having same attributes
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 07 | July -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 745
belong to same group and objects withdifferent behaviorsin
dissimilar groups. With the formation of groups, we can
easily identify areas where the object space is dense and
where it is sparsely filled and hence can determine the
distribution patterns. We can find the stimulating patterns
directly from the data sets without needing to have much of
background knowledge. One of the popular approaches of
clustering is Partitioning. Partitioning worksbytransferring
objects by moving them from one cluster to another cluster
starting from a certain point. The number of clusters for this
technique should be pre-defined for thistechnique(likeink-
means algorithm).
6. GENETIC ALGORITHM FOR CLUSTERING
The voluminous data that is available to us can be divided
into small groups where each group can be considered as
population. By applying genetic operators iteratively on the
population we can find out the optimum solution for the
current scenario. Search process, as we all know, is a
problem-solving method wherein we cannot determine the
sequence of steps leading to the solution in advance. It is
based on how nicely and wisely we have applied the search
operators. An ideal search should be capable of carrying out
search process locally as well as in a random manner.
Random search explores the entire solution andisproficient
in avoiding reaching to a local optimum while local search
helps in exploring all the local possibilities and reaching the
best solution.
As discussed earlier a genetic algorithm is capable of
effectively searching the problem domain and solving
complex problems by simulating natural evolution. It
perform search and provide near optimal solutions for
objective function of an optimization problem. A set of
chromosomes is referred to as a population wherein a
chromosome (represented as strings) refers to the
parameters in the search space, encodedbya combinationof
cluster centroids.
First step is to create a randompopulation,whichrepresents
different solutions in the search space. Next, a few of
chromosomes are selected as per the principle of survival of
the fittest, and each is assigned into the next generation.
Chromosomes are nothingbutbinary encodedstrings,which
represents probable solutions to the optimization problem.
Each string is then evaluated on the fitness function
(objective function), giving a measure of the solutionquality
called the fitness value. A new candidate solutionpopulation
can be createdafter recombination(crossoverandmutation)
is being performed upon candidate solution selection.
Individual representation and population initialization,
fitness computation, selection, crossover and mutation are
thus the basic steps of genetic algorithm for data clustering.
Given is the algorithm for the same:
Input:
k: the no of clusters
d: the data set containing n objects
p: population size Tmax: Maximum no. of iterations
Output:
A set of K clusters
1) Initialize every chromosome to have k random centroids
selected from the set of data.
2) For T=1 to Tmax
(i) For every chromosome i
a. Allocate the object data to the cluster
with the closest centroid.
b. Recomputed k cluster centroids of
chromosome i as the mean of their data objects.
c. Compute the chromosome i fitness.
(ii) Generate the new group of chromosomes using
GA selection, crossover and mutation.
The spine for a Genetic Algorithm to work is the Fitness
function F (x). The prime focus of this function is to give the
successive results after applying GA.
Firstly, it is derived from the objective function and then
used in successive genetic operations like crossover,
mutation. Fitness means quality value which is the degree of
the reproductive efficiency of individual string
(chromosomes). A score is given to each individual
chromosome with the help of fitness functions.Theproposal
is to generate a Genetic Algorithm based clustering
algorithm which is expected to provide an optimal
clustering, better than that of K-Means approach. This may
however induces a little more time complexity.
The major benefit of using genetic algorithmsisthatthey are
easily parallelized. Parallel implementation of GA is
apprehended using two commonly used models namely:
 Coarse-grained parallel GA
 Fine-grained parallel GA
In the first model every node is given a population split to
process while in the second model each individual is
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 07 | July -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 746
provided with a separate node for fitness evaluation.
Adjoining nodes communicate with each other for selection
and remaining operations.
6.1 PARALLEL ImplementationforClusteringusing
GAs
At first, the input data set is fragmented according to the
block size by the input format. Each fragmentisthengivento
a mapper to perform the First phase clustering,the resultsof
which are passed on to a single reducer to perform the
Second phase mapper.
Step 1: Population initialization
Each mapper forms the initial populationofindividualsafter
receiving the input fragments. Each individual is a
chromosome of size 𝑁. Every segment of the chromosome is
a centroid. Centroids are randomlyselecteddata pointsfrom
the received data split. For every data point in each
chromosome clustering is performed and the data set is
assigned to the cluster of the closest centroid. Then the
fitness is evaluated.
Step 2: Mating & Selection
Cross-over and mutation techniques are used for mating.
For cross-over, wegenerallyusearithmetic cross-overwhich
generates one offspring from two parents. The centroid of
the offspring is the arithmetic average of the corresponding
centroid of parents. Swap mutation technique is used for
mutation. In this, 9’s compliment of the data points is taken.
The offspring from older population are selected to produce
a new population. For selection, an approach known as
Tournament selection is used wherein the individual is
selected by performing a tournament based on fitness
evaluationamongseveral individualschosenatrandomfrom
the population.
Step 3: Termination
A new population thus generated replaces the older
population which would again form a newer population
using mating and selection procedure.Thiswhole procedure
would be reiterated again and again until the termination
condition is met. The termination condition can be anything
like achieving a specified number of iterations or reaching a
particular solution. The fittest individual of the final
population of each mapper is handed on as the result to the
reducer. The Second phaseclusteringonthemapping results
of all mapper is then performed by the reducer.
6.2 GENETIC K-MEANS ALGORITHM
Apart from parallel implementation using Genetic
Algorithms, we can also have an algorithmthatcombinesthe
advantage of Genetic algorithm and K-means algorithm for
clustering. It is expected to provide an optimal clustering,
better to that of K-Means approach, but probablywitha little
more time complexity.
The major steps of the algorithm of GK-means are:
1) Set the population.
2) Compute fitness of everyindividual byfollowing equation.
Fitness (i) =2. (pi - 1)/Q-1
i=individual, p=position, Q=total individuals
3) If satisfied with the fitness condition,thenassignsolution,
Else
4) Calculate sub population and migrate
5) Counting the ith individual depends on the rate si,whichis
relative to its level of fitness that is
Si = fitness (i) / summation (fitness (i));
6) Translate population and assets individual wellness.
7) Perform crossover and mutation on each sub population
8) If termination condition satisfies, stop; else go to step 5.
The major drawback of k-means algorithm is that it can’t
process large amounts of data. If we have minimum amount
of data then k mean is easy to process but for large amount
of data it will not give desired results. Since we are talking
about Big Data, so surely k-means is not the solution to our
problem. GK-means on the contrary will take less memory
and time to process big data and will give desired results as
well. The Genetic k-means gradually converges to the global
optimum as desired.
7. DISADVANTAGES OF GA
A major difficulty in applying Genetic Algorithms is how to
handle constraints. Genetic operators often produce
infeasible offspring while manipulating chromosomes. A
Penalty technique is used to keep a check on the number of
infeasible solutions produced in each generation. This helps
in enforcing the genetic search towards an optimal solution.
Apart from this, a few other disadvantages are:
1) These are challenging to understand and to describe to
end users.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 07 | July -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 747
2) The problem abstraction and the means to represent
individuals is quite difficult.
3) How to determine the best fitness function is a difficult
work.
4) How to do crossover and mutation is another difficulty.
5) The large over-production of individuals and the random
character of the search process is another drawback.
8. FUTURE SCOPE
The paper compares and reviews the methods available for
clustering data based on genetic algorithms. A more robust
and time saving algorithm can be designedsuchthatbigdata
can be effectively mined overcoming all the challengesbeing
faced by Genetic Algorithms.
9. CONCLUSION
This paper provides the reader a review of all the jargons
related to analysing big data. The concepts like Text Mining,
Big Data and Genetic Algorithm concept, samples, scope,
methods, advantages, challenges etc. are all discussed here.
The paper reviews various methods that are available for
text mining. The paper concludes that since the prime focus
is on to mining big data, so algorithm followed has to be
space effective and time effective. The paper presents need
for an algorithm that characterizes the features of the Big
Data revolution, and proposes a Big Data processing model,
from the data mining perspective
10. REFERENCES
[1] Senthilnath, J., S. N. Omkar, and V. Mani. "Clusteringusing
firefly algorithm: performance study." Swarm and
Evolutionary Computation 1, no. 3 (2011).
[2] Ahmed and Saeed. A Survey of Big Data CloudComputing
Security. International Journal of Computer Science and
Software Engineering (IJCSSE), Volume 3, Issue 1, December
2014.
[3] Arora, Deepali, Varshney, Analysis of K-Means and K-
Medoids Algorithm For Big Data, InternationalConference on
Information Security & Privacy (ICISP2015), 2015.
[4] Dash and Dash, Comparative Analysis of K-means and
Genetic Algorithm Based Data Clustering. International
Journal of Advanced Computer and Mathematical
SciencesISSN 2230-9624. Vol 3, Issue 2, 2012.
[5] Gaddam, Securing your Big Data Environment, Black Hat
USA 2015.
[6] http://www.sas.com/en_us/insights/big-data/internet-
of-things.html
[7] Inukollu, Arsi and Ravuri, Security Issues Associated
With Big Data in Cloud Computing. International Journal of
Network Security & Its Applications (IJNSA), Vol.6, No.3, May
2014.
[8] Jiawei Han and MichelineKamber,“Data MiningConcepts
& Techniques”, Second Edition, Morgan Kaufmann
Publishers
[9] “Text Mining Technique using Genetic Algorithm”,
International Journal of Computer Applications (0975 –
8887) Volume #. 63, February 2013
[10] McAfee, Andrew, and Erik Brynjolfsson. "Big data: the
management revolution." Harvard business review 2012
[11] Deepankar Bharadwaj, Dr. Arvind Shukla, Text Mining
Technique on Big data using Genetic Algorithm,
International Journal of Computer Engineering and
Applications, Volume X, Issue IX, Sep. 16
[12] Mitsuo Gen, Runwei Cheng, Genetic Algorithms and
Engineering Optimization, John Wiley and Sons, 2000

Weitere Àhnliche Inhalte

Was ist angesagt?

Adversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkAdversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkRobert Grossman
 
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET Journal
 
Data Mining – A Perspective Approach
Data Mining – A Perspective ApproachData Mining – A Perspective Approach
Data Mining – A Perspective ApproachIRJET Journal
 
Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?NUS-ISS
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Robert Grossman
 
Data mining in agriculture
Data mining in agricultureData mining in agriculture
Data mining in agricultureSibananda Khatai
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningAbcdDcba12
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciencesChris Dwan
 
Data Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyData Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyEditor IJCATR
 
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...IJECEIAES
 
A REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATA
A REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATAA REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATA
A REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATAIJMIT JOURNAL
 
6.a survey on big data challenges in the context of predictive
6.a survey on big data challenges in the context of predictive6.a survey on big data challenges in the context of predictive
6.a survey on big data challenges in the context of predictiveEditorJST
 
Performance analysis of data mining algorithms with neural network
Performance analysis of data mining algorithms with neural networkPerformance analysis of data mining algorithms with neural network
Performance analysis of data mining algorithms with neural networkIAEME Publication
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data MiningIOSR Journals
 
Real World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining ToolsReal World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining Toolsijsrd.com
 
Big Data for Ag (2019)
Big Data for Ag (2019)Big Data for Ag (2019)
Big Data for Ag (2019)Benjamin Wielgosz
 

Was ist angesagt? (20)

[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...
[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...
[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...
 
Big Data
Big Data Big Data
Big Data
 
Cri big data
Cri big dataCri big data
Cri big data
 
Adversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkAdversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World Talk
 
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
 
Data Mining – A Perspective Approach
Data Mining – A Perspective ApproachData Mining – A Perspective Approach
Data Mining – A Perspective Approach
 
Big data road map
Big data road mapBig data road map
Big data road map
 
Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
Data mining in agriculture
Data mining in agricultureData mining in agriculture
Data mining in agriculture
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciences
 
Data Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyData Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A Survey
 
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
 
A REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATA
A REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATAA REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATA
A REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATA
 
6.a survey on big data challenges in the context of predictive
6.a survey on big data challenges in the context of predictive6.a survey on big data challenges in the context of predictive
6.a survey on big data challenges in the context of predictive
 
Performance analysis of data mining algorithms with neural network
Performance analysis of data mining algorithms with neural networkPerformance analysis of data mining algorithms with neural network
Performance analysis of data mining algorithms with neural network
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
 
Real World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining ToolsReal World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining Tools
 
Big Data for Ag (2019)
Big Data for Ag (2019)Big Data for Ag (2019)
Big Data for Ag (2019)
 

Ähnlich wie Mining Big Data using Genetic Algorithm

Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysisPoonam Kshirsagar
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applicationsSubrat Swain
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data miningPolash Halder
 
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MININGISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MININGcscpconf
 
Unit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptxUnit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptxvipulkondekar
 
Understand the Idea of Big Data and in Present Scenario
Understand the Idea of Big Data and in Present ScenarioUnderstand the Idea of Big Data and in Present Scenario
Understand the Idea of Big Data and in Present ScenarioAI Publications
 
The Survey of Data Mining Applications And Feature Scope
The Survey of Data Mining Applications  And Feature Scope The Survey of Data Mining Applications  And Feature Scope
The Survey of Data Mining Applications And Feature Scope IJCSEIT Journal
 
Big Data Analytics: Recent Achievements and New Challenges
Big Data Analytics: Recent Achievements and New ChallengesBig Data Analytics: Recent Achievements and New Challenges
Big Data Analytics: Recent Achievements and New ChallengesEditor IJCATR
 
An Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataAn Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataIJSTA
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...ijdpsjournal
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...ijdpsjournal
 
Big data upload
Big data uploadBig data upload
Big data uploadBhavin Tandel
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data AnalyticsJOSEPH FRANCIS
 
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...Editor IJMTER
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...Editor IJCATR
 
Nikita rajbhoj(a 50)
Nikita rajbhoj(a 50)Nikita rajbhoj(a 50)
Nikita rajbhoj(a 50)NikitaRajbhoj
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)Shahbaz Anjam
 
elgendy2014.pdf
elgendy2014.pdfelgendy2014.pdf
elgendy2014.pdfAkuhuruf
 

Ähnlich wie Mining Big Data using Genetic Algorithm (20)

Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applications
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MININGISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
 
Unit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptxUnit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptx
 
Understand the Idea of Big Data and in Present Scenario
Understand the Idea of Big Data and in Present ScenarioUnderstand the Idea of Big Data and in Present Scenario
Understand the Idea of Big Data and in Present Scenario
 
The Survey of Data Mining Applications And Feature Scope
The Survey of Data Mining Applications  And Feature Scope The Survey of Data Mining Applications  And Feature Scope
The Survey of Data Mining Applications And Feature Scope
 
Big Data Analytics: Recent Achievements and New Challenges
Big Data Analytics: Recent Achievements and New ChallengesBig Data Analytics: Recent Achievements and New Challenges
Big Data Analytics: Recent Achievements and New Challenges
 
An Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataAn Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional Data
 
data.2.pptx
data.2.pptxdata.2.pptx
data.2.pptx
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
 
Big data upload
Big data uploadBig data upload
Big data upload
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
 
Nikita rajbhoj(a 50)
Nikita rajbhoj(a 50)Nikita rajbhoj(a 50)
Nikita rajbhoj(a 50)
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
 
elgendy2014.pdf
elgendy2014.pdfelgendy2014.pdf
elgendy2014.pdf
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 

Mehr von IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTUREIRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsIRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASIRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProIRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemIRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesIRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web applicationIRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignIRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...IRJET Journal
 

Mehr von IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

KĂŒrzlich hochgeladen

Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
TechTACÂź CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTACÂź CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTACÂź CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTACÂź CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
Gurgaon âœĄïž9711147426✹Call In girls Gurgaon Sector 51 escort service
Gurgaon âœĄïž9711147426✹Call In girls Gurgaon Sector 51 escort serviceGurgaon âœĄïž9711147426✹Call In girls Gurgaon Sector 51 escort service
Gurgaon âœĄïž9711147426✹Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESNarmatha D
 

KĂŒrzlich hochgeladen (20)

Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
TechTACÂź CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTACÂź CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTACÂź CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTACÂź CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
Gurgaon âœĄïž9711147426✹Call In girls Gurgaon Sector 51 escort service
Gurgaon âœĄïž9711147426✹Call In girls Gurgaon Sector 51 escort serviceGurgaon âœĄïž9711147426✹Call In girls Gurgaon Sector 51 escort service
Gurgaon âœĄïž9711147426✹Call In girls Gurgaon Sector 51 escort service
 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIES
 

Mining Big Data using Genetic Algorithm

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 07 | July -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 743 Mining Big Data using Genetic Algorithm Surbhi Jain Assistant Professor, Department of Computer Science, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract – In today’s era, the amount of data available in the world is growing at a very rapid pace day by day because of the use of internet, smart phones, social networks, etc. This collection of large and complex data sets is referred to as Big Data. Primitive database systems are unable to capture, store and analyse this large amount of data. It is necessary to improve the text processing so that the information or the relevant knowledge which was previously unknown can be mined from the text. This paper proposes need for an algorithm for the clustering problem of big data using a combination of the genetic algorithm with some of the known clustering algorithms. The main idea behind this istocombine the advantages of Geneticalgorithmsandclusteringtoprocess large amount of data. Genetic Algorithm is an algorithm which is used to optimize the results. This paper gives an overview of concepts like data mining, genetic algorithmsand big data. Key Words: Genetic Algorithms, Big Data, Clustering, Chromosomes, Mining 1. INTRODUCTION In current Big Data age the data is becoming more and more available owing to advances in information and communication knowhow, enterprises are gaining meaningful information,relevantknowledgeandvisionfrom this huge data based on decision making. Big data mining is the ability of taking out valuable information from huge and complex set of data or data streams i.e. Big Data. One of the important data mining techniques for big data analysis is clustering. There are difficulties for applying clustering techniques to big data due to enormous amount of data rising on daily basis. There are a lot of clustering techniques available the most common of which is the K-means algorithm. It is used to analyze information from a dataset. But as we are saying that because of big data we have plethora of data available, thus available clustering algorithms are not very efficient. As Big Data refers to terabytes and petabytes of data, we need to have clustering algorithms with high computational costs. We can think of designing an algorithm which can combine the features of some of the clustering algorithms and genetic algorithm to process big data. To extract some meaningful information from the source data is the process called Mining. It is a set of computerized techniques that are used to extract formerly unknown or buried information from largesetsofdatabases.ASuccessful Data Mining makes possible to uncover patterns and relationships, and then to use this “new” information for making proactive knowledge-driven business decisions. There are a lot of algorithms whicharebeingusedformining the information from plain text. Thealgorithmsusedtosolve the optimization problems aretheGeneticAlgorithms.These algorithms work on search based inputs. The algorithms eventually leads to generate useful solutions forsuchkindof problems. 2. GENETIC ALGORITHMS Genetic Algorithms are a clan of computational prototypes inspired by evolution theory of Darwin.AccordingtoDarwin the species which is fittest and can adapt to changing surroundings can survive; the remaining tends to die away. Darwin also stated that “the survival of an organism can be maintained through the process of reproduction, crossover and mutation”. GA’s basic working mechanism is as follows: the algorithm is started with a set of solutions (represented by chromosomes) called population. Solutions from one population are taken and used to form a new population (reproduction). This is driven by optimism, that the new population will be superior to the old one. This is the reason they are often termed as optimistic search algorithms. The reproductive prospects are distributed in such a way that those chromosomes which represent a better solutionto the target problem are given more chances to reproduce than those which represent inferior solutions. They search through a huge combination of parameters to find the best match. For example, they can search through different combinations of materials and designs to find the perfect combination of both which could resultina stronger, lighter and overall, better final product.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 07 | July -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 744 As an example we can consider “Face Recognition Systems” which are used for drawingsketchesbasedonvisualizations. This system is majorly used for investigation purposes where in sketch of some criminal is to be made on the basis of description given by some eye witness to the crime. The initial population is nothing but a lot of facial features which are already there in the system. The features may include a lots of varieties of noses, ears, lips, eyes etc. They may differ in color, size or anything else. As the witness starts giving descriptions the features which are most likely to match can be selected (Selection). The selectedfeaturescanthenfollow the steps of cross-over and mutation to produce more likely features. As in eyes of one face and lips of another can be chosen to go for cross over to produce a new individual which has both the features matching with the criminal. The process continues till the witness recognizes thefinal face as the one desired. 3. BIG DATA Big data is a term for data sets that are so large or complex that primitive data processing application software is inadequate to deal with them. Big data represents a new period in data study and utilization. It is a leveraging open source technology- a robust, secure, highly available, enterprise-class Big Data platform. Challenges include capture, storage, analysis, querying,andupdatingdata safely and securely. While the term “big data” is relatively new, the doing of collecting and storing plethora of information for eventual analysis is ages old. The significance of big data is not based on how much data we have, but how we use that data. We can take data from any source and analyze it to find responses that enable us to produce results in reducedcostandtimewithsmartdecision making. Here in this paper we are trying to combine bigdata with genetic algorithms for generating efficient analysis of data. The reason for the interest in genetic algorithmsisthat these are very powerful and broadly applicable search techniques. As said earlier also, Big Data refers to large- volume, complex, growing data sets with numerous, self- directed sources. Big Data are now rapidly expanding in all fields like science and engineering, including physical, biological andbiomedical scienceswiththefastdevelopment of networking, data storage, and the data collectioncapacity. With the new technology of Big Data, the computations can be speeded up. In very usual cases, if our system starts getting heavy because of loads of data whichisbecoming too big for our system to be managed, we add RAM or vacate some space by deleting certain processes. Big data on the contrary, adds more systems to the pool and there by promote parallelism.Thishoweverleadstofaulttoleranceas a consequence. More the number of systems, more is the probability of system failures. Fortunately, big data handles this automatically by duplicating data on the systemssothat if one system fails, its data can be redirected to some other system. 4. DATA MINING The knowledge from the data sets is extracted using Data Mining technology. It is used to search and analyze data.The data to be mined varies from a small data settoanenormous sized data set i.e. big data. In Data Mining, the source data is kept in the format of databases i.e. in the form of tables if we are considering relational databases. We only have to apply the algorithms to extract data from databases. The Data Mining environment produces voluminous data. The information retrieved in the data Miningstepistransformed into the structure that is easily understood by users. Once data has been extracted and then transformed, it is loaded into systems from where we can read it. The various methods like genetic algorithms, support vector machines, decision tree, neural network andclusteranalysistodisclose the hidden patterns inside the huge amounts of data set are all included in data mining. For handling such large amount of data sets, various algorithms which define various structures and approaches implemented to handle Big Data are needed. They also defines the various tools that were developed for analyzing them. Data mining and Text Mining are often used synonymously which howeverisnotright.Although both are mining techniques, but there is a very thin line of difference between the two. Data mining refers to the process of extraction of useful text from the databases which is not known prior, while text mining refers to extraction of useful and knowledgeable data from the plain text i.e. the naturally occurring text. Unlike data mining, this text need not be transformed into any other format. 5. CLUSTERING Clustering refers to categorizing similar kind of objects. It is a method of exploring the data, a technique of finding out patterns in the dataset. It falls in the category of unsupervised learning i.e. we don’t know in advance how data should group the data objects (of similar types) together. It is one of the most vital research field in the data mining. In clustering we aim at making collections of objects in such a manner that the objects having same attributes
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 07 | July -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 745 belong to same group and objects withdifferent behaviorsin dissimilar groups. With the formation of groups, we can easily identify areas where the object space is dense and where it is sparsely filled and hence can determine the distribution patterns. We can find the stimulating patterns directly from the data sets without needing to have much of background knowledge. One of the popular approaches of clustering is Partitioning. Partitioning worksbytransferring objects by moving them from one cluster to another cluster starting from a certain point. The number of clusters for this technique should be pre-defined for thistechnique(likeink- means algorithm). 6. GENETIC ALGORITHM FOR CLUSTERING The voluminous data that is available to us can be divided into small groups where each group can be considered as population. By applying genetic operators iteratively on the population we can find out the optimum solution for the current scenario. Search process, as we all know, is a problem-solving method wherein we cannot determine the sequence of steps leading to the solution in advance. It is based on how nicely and wisely we have applied the search operators. An ideal search should be capable of carrying out search process locally as well as in a random manner. Random search explores the entire solution andisproficient in avoiding reaching to a local optimum while local search helps in exploring all the local possibilities and reaching the best solution. As discussed earlier a genetic algorithm is capable of effectively searching the problem domain and solving complex problems by simulating natural evolution. It perform search and provide near optimal solutions for objective function of an optimization problem. A set of chromosomes is referred to as a population wherein a chromosome (represented as strings) refers to the parameters in the search space, encodedbya combinationof cluster centroids. First step is to create a randompopulation,whichrepresents different solutions in the search space. Next, a few of chromosomes are selected as per the principle of survival of the fittest, and each is assigned into the next generation. Chromosomes are nothingbutbinary encodedstrings,which represents probable solutions to the optimization problem. Each string is then evaluated on the fitness function (objective function), giving a measure of the solutionquality called the fitness value. A new candidate solutionpopulation can be createdafter recombination(crossoverandmutation) is being performed upon candidate solution selection. Individual representation and population initialization, fitness computation, selection, crossover and mutation are thus the basic steps of genetic algorithm for data clustering. Given is the algorithm for the same: Input: k: the no of clusters d: the data set containing n objects p: population size Tmax: Maximum no. of iterations Output: A set of K clusters 1) Initialize every chromosome to have k random centroids selected from the set of data. 2) For T=1 to Tmax (i) For every chromosome i a. Allocate the object data to the cluster with the closest centroid. b. Recomputed k cluster centroids of chromosome i as the mean of their data objects. c. Compute the chromosome i fitness. (ii) Generate the new group of chromosomes using GA selection, crossover and mutation. The spine for a Genetic Algorithm to work is the Fitness function F (x). The prime focus of this function is to give the successive results after applying GA. Firstly, it is derived from the objective function and then used in successive genetic operations like crossover, mutation. Fitness means quality value which is the degree of the reproductive efficiency of individual string (chromosomes). A score is given to each individual chromosome with the help of fitness functions.Theproposal is to generate a Genetic Algorithm based clustering algorithm which is expected to provide an optimal clustering, better than that of K-Means approach. This may however induces a little more time complexity. The major benefit of using genetic algorithmsisthatthey are easily parallelized. Parallel implementation of GA is apprehended using two commonly used models namely:  Coarse-grained parallel GA  Fine-grained parallel GA In the first model every node is given a population split to process while in the second model each individual is
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 07 | July -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 746 provided with a separate node for fitness evaluation. Adjoining nodes communicate with each other for selection and remaining operations. 6.1 PARALLEL ImplementationforClusteringusing GAs At first, the input data set is fragmented according to the block size by the input format. Each fragmentisthengivento a mapper to perform the First phase clustering,the resultsof which are passed on to a single reducer to perform the Second phase mapper. Step 1: Population initialization Each mapper forms the initial populationofindividualsafter receiving the input fragments. Each individual is a chromosome of size 𝑁. Every segment of the chromosome is a centroid. Centroids are randomlyselecteddata pointsfrom the received data split. For every data point in each chromosome clustering is performed and the data set is assigned to the cluster of the closest centroid. Then the fitness is evaluated. Step 2: Mating & Selection Cross-over and mutation techniques are used for mating. For cross-over, wegenerallyusearithmetic cross-overwhich generates one offspring from two parents. The centroid of the offspring is the arithmetic average of the corresponding centroid of parents. Swap mutation technique is used for mutation. In this, 9’s compliment of the data points is taken. The offspring from older population are selected to produce a new population. For selection, an approach known as Tournament selection is used wherein the individual is selected by performing a tournament based on fitness evaluationamongseveral individualschosenatrandomfrom the population. Step 3: Termination A new population thus generated replaces the older population which would again form a newer population using mating and selection procedure.Thiswhole procedure would be reiterated again and again until the termination condition is met. The termination condition can be anything like achieving a specified number of iterations or reaching a particular solution. The fittest individual of the final population of each mapper is handed on as the result to the reducer. The Second phaseclusteringonthemapping results of all mapper is then performed by the reducer. 6.2 GENETIC K-MEANS ALGORITHM Apart from parallel implementation using Genetic Algorithms, we can also have an algorithmthatcombinesthe advantage of Genetic algorithm and K-means algorithm for clustering. It is expected to provide an optimal clustering, better to that of K-Means approach, but probablywitha little more time complexity. The major steps of the algorithm of GK-means are: 1) Set the population. 2) Compute fitness of everyindividual byfollowing equation. Fitness (i) =2. (pi - 1)/Q-1 i=individual, p=position, Q=total individuals 3) If satisfied with the fitness condition,thenassignsolution, Else 4) Calculate sub population and migrate 5) Counting the ith individual depends on the rate si,whichis relative to its level of fitness that is Si = fitness (i) / summation (fitness (i)); 6) Translate population and assets individual wellness. 7) Perform crossover and mutation on each sub population 8) If termination condition satisfies, stop; else go to step 5. The major drawback of k-means algorithm is that it can’t process large amounts of data. If we have minimum amount of data then k mean is easy to process but for large amount of data it will not give desired results. Since we are talking about Big Data, so surely k-means is not the solution to our problem. GK-means on the contrary will take less memory and time to process big data and will give desired results as well. The Genetic k-means gradually converges to the global optimum as desired. 7. DISADVANTAGES OF GA A major difficulty in applying Genetic Algorithms is how to handle constraints. Genetic operators often produce infeasible offspring while manipulating chromosomes. A Penalty technique is used to keep a check on the number of infeasible solutions produced in each generation. This helps in enforcing the genetic search towards an optimal solution. Apart from this, a few other disadvantages are: 1) These are challenging to understand and to describe to end users.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 07 | July -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 747 2) The problem abstraction and the means to represent individuals is quite difficult. 3) How to determine the best fitness function is a difficult work. 4) How to do crossover and mutation is another difficulty. 5) The large over-production of individuals and the random character of the search process is another drawback. 8. FUTURE SCOPE The paper compares and reviews the methods available for clustering data based on genetic algorithms. A more robust and time saving algorithm can be designedsuchthatbigdata can be effectively mined overcoming all the challengesbeing faced by Genetic Algorithms. 9. CONCLUSION This paper provides the reader a review of all the jargons related to analysing big data. The concepts like Text Mining, Big Data and Genetic Algorithm concept, samples, scope, methods, advantages, challenges etc. are all discussed here. The paper reviews various methods that are available for text mining. The paper concludes that since the prime focus is on to mining big data, so algorithm followed has to be space effective and time effective. The paper presents need for an algorithm that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective 10. REFERENCES [1] Senthilnath, J., S. N. Omkar, and V. Mani. "Clusteringusing firefly algorithm: performance study." Swarm and Evolutionary Computation 1, no. 3 (2011). [2] Ahmed and Saeed. A Survey of Big Data CloudComputing Security. International Journal of Computer Science and Software Engineering (IJCSSE), Volume 3, Issue 1, December 2014. [3] Arora, Deepali, Varshney, Analysis of K-Means and K- Medoids Algorithm For Big Data, InternationalConference on Information Security & Privacy (ICISP2015), 2015. [4] Dash and Dash, Comparative Analysis of K-means and Genetic Algorithm Based Data Clustering. International Journal of Advanced Computer and Mathematical SciencesISSN 2230-9624. Vol 3, Issue 2, 2012. [5] Gaddam, Securing your Big Data Environment, Black Hat USA 2015. [6] http://www.sas.com/en_us/insights/big-data/internet- of-things.html [7] Inukollu, Arsi and Ravuri, Security Issues Associated With Big Data in Cloud Computing. International Journal of Network Security & Its Applications (IJNSA), Vol.6, No.3, May 2014. [8] Jiawei Han and MichelineKamber,“Data MiningConcepts & Techniques”, Second Edition, Morgan Kaufmann Publishers [9] “Text Mining Technique using Genetic Algorithm”, International Journal of Computer Applications (0975 – 8887) Volume #. 63, February 2013 [10] McAfee, Andrew, and Erik Brynjolfsson. "Big data: the management revolution." Harvard business review 2012 [11] Deepankar Bharadwaj, Dr. Arvind Shukla, Text Mining Technique on Big data using Genetic Algorithm, International Journal of Computer Engineering and Applications, Volume X, Issue IX, Sep. 16 [12] Mitsuo Gen, Runwei Cheng, Genetic Algorithms and Engineering Optimization, John Wiley and Sons, 2000