SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
Abhijit Kumar Behera 
M.Tech (CSE) 
Roll No. 1350001 
School of Computer Engineering 
Guided By : Dr. Laxman Sahoo
Contents 
 Introduction 
 Apache Hadoop related projects 
 Application of Mahout 
 Literature Survey 
 Plan of Action 
 Conclusion 
 References
Introduction 
•The K-means algorithm is one of the most well-known clustering 
algorithms that has been frequently used to variety of problems. 
•MapReduce as the most popular cloud computing parallel 
framework is effective to handle massive data, the researches of K-means 
clustering algorithm which is based on MapReduce 
become a focus for scholars.
Components of Hadoop 
HDFS 
•Name Node 
•Data Node 
•Secondary 
Name Node 
 Map Reduce 
•Map() 
•Combine() 
•Reduce() 
YARN 
•Job Tracker 
•TaskTracker 
HBase
MapReduce Word count process
HBase 
Hadoop 
( HDFS and 
MapReduce) 
Mahout 
Spark 
HIVE 
Zookeeper Sqoop 
PIG 
Apache Hadoop Projects
Application of Mahout 
 Collaborative Filtering 
 Matrix factorization based recommenders 
 A user based Recommender 
 Clustering 
 Canopy Clustering 
 K-Means Clustering 
 Fuzzy K-Means 
 Affinity Propagation Clustering 
 Classification 
 Naive Bayes 
 Random forest classifier
Literature Survey 
An Improved parallel K-means Clustering Algorithm with 
MapReduce 
Authors Name: Qing Liao, Fan Yang, Jingming Zhao 
Journal : Communication Technology (ICCT), IEEE 
Year of Publication:2014 
Parallel K-means Algorithm 
1) Initial 
2) Mapper 
3) Reducer
Literature Survey...
Literature Survey 
Clouds for Scalable Big Data Analytics 
Authors Name: Domenico Talia 
Journal: IEEE Computer Society 
Year of Publication:2013 
In this paper, author describe how cloud comp uting enhance the development and 
functionality of Big Data Analytics when it deployed into it. 
Cloud Service Model Features Users 
Data analytics software as a service A single and complete data mining 
application or task (including data sources) 
offered as a service 
End users, analytics managers, data 
analysts 
Data analytics platform as a service A data analysis suite or framework for 
programming or developing high-level 
applications, hiding the cloud 
infrastructure and data storage 
Data mining application developers, 
data scientists 
Data analytics infrastructure as a 
service 
A set of virtualized resources provided to a 
programmer or data mining researcher for 
developing, configuring, and running data 
analysis frameworks or applications 
Data mining programmers, data 
management developers, data 
mining researchers
Plan of Action 
August - October 2014 Literature survey is done. 
November 2014 
Problem definition formulation is 
done and problem solving outline are 
yet to be done 
December 2014- January 2015 
Find out the appropriate solution of 
the problem yet to be formulated 
February-May 2015 
Final implementation of the solution 
with result yet to be done
Conclusion 
Large-scale data mining has been a new challenge in recent years. 
Using the Map-Reduce frame work the big data analytics can be 
accomplished. The K-means algorithm is one of the most well-known 
clustering algorithms. However, its processing performance 
has usually encountered a bottleneck if being utilized to deal with 
massive data. A parallel K-means algorithm with MapReduce which 
shows obvious advantage is implemented to handle massive data.
References 
[1] Walisa Romsaiyud, Wichian Premchaiswadi, " An Adaptive Machine Learning on Map- 
Reduce Framework for Improving performance of Large-Scale Data Analysis on EC ", 
Eleventh IEEE Int'l Conf. on ICT and knowledge Engineering, 2014 
[2] Domenico Talia," Clouds for Scalable Big Data Analytics ", IEEE Computer Society, 2013 
[3] Feng Ye, Zhijan Wang , "Cloud-based Big Data Mining & Analyzing Services 
Platform integrating R", IEEE International Conference on Advance Cloud and Big Data 
, 2013 
[4].DzApache-Hadoopdz-http://hadoop.apache.org/#What+Is+Apache+Hadoop%3F
MACHINE LEARNING ON MAPREDUCE FRAMEWORK

Weitere ähnliche Inhalte

Was ist angesagt?

CS7_HANEY_DataCentricExtremeScaleComputLegion PAO edit
CS7_HANEY_DataCentricExtremeScaleComputLegion PAO editCS7_HANEY_DataCentricExtremeScaleComputLegion PAO edit
CS7_HANEY_DataCentricExtremeScaleComputLegion PAO edit
Richard Haney
 

Was ist angesagt? (19)

11
1111
11
 
Starfish-A self tuning system for bigdata analytics
Starfish-A self tuning system for bigdata analyticsStarfish-A self tuning system for bigdata analytics
Starfish-A self tuning system for bigdata analytics
 
Federated Galaxy: Biomedical Computing at the Frontier
Federated Galaxy: Biomedical Computing at the FrontierFederated Galaxy: Biomedical Computing at the Frontier
Federated Galaxy: Biomedical Computing at the Frontier
 
Big Data HPC Convergence
Big Data HPC ConvergenceBig Data HPC Convergence
Big Data HPC Convergence
 
CS7_HANEY_DataCentricExtremeScaleComputLegion PAO edit
CS7_HANEY_DataCentricExtremeScaleComputLegion PAO editCS7_HANEY_DataCentricExtremeScaleComputLegion PAO edit
CS7_HANEY_DataCentricExtremeScaleComputLegion PAO edit
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel 
 
Twister4Azure - Iterative MapReduce for Azure Cloud
Twister4Azure - Iterative MapReduce for Azure CloudTwister4Azure - Iterative MapReduce for Azure Cloud
Twister4Azure - Iterative MapReduce for Azure Cloud
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoop
 
GreenLight Data Collection Architecture
GreenLight Data Collection ArchitectureGreenLight Data Collection Architecture
GreenLight Data Collection Architecture
 
JPJ1402 A Scalable Two-Phase Top-Down Specialization Approach For Data Anon...
JPJ1402   A Scalable Two-Phase Top-Down Specialization Approach For Data Anon...JPJ1402   A Scalable Two-Phase Top-Down Specialization Approach For Data Anon...
JPJ1402 A Scalable Two-Phase Top-Down Specialization Approach For Data Anon...
 
MAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine LearningMAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine Learning
 
Intel Faster Risk Oct08 - Vassil Alexandrov
Intel Faster Risk Oct08 - Vassil AlexandrovIntel Faster Risk Oct08 - Vassil Alexandrov
Intel Faster Risk Oct08 - Vassil Alexandrov
 
OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3
 
Resume
ResumeResume
Resume
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
 
Cost savings from auto-scaling of network resources using machine learning
Cost savings from auto-scaling of network resources using machine learningCost savings from auto-scaling of network resources using machine learning
Cost savings from auto-scaling of network resources using machine learning
 
Gray-Box Models for Performance Assessment of Spark Applications
Gray-Box Models for Performance Assessment of Spark ApplicationsGray-Box Models for Performance Assessment of Spark Applications
Gray-Box Models for Performance Assessment of Spark Applications
 
CCCORE: Cloud Container for Collaborative Research
CCCORE: Cloud Container for Collaborative Research CCCORE: Cloud Container for Collaborative Research
CCCORE: Cloud Container for Collaborative Research
 
Combining efficiency, fidelity, and flexibility in resource information services
Combining efficiency, fidelity, and flexibility in resource information servicesCombining efficiency, fidelity, and flexibility in resource information services
Combining efficiency, fidelity, and flexibility in resource information services
 

Andere mochten auch

BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
Mahantesh Angadi
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
Justin Basilico
 

Andere mochten auch (20)

BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
 
A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...
 
Survey on load balancing and data skew mitigation in mapreduce applications
Survey on load balancing and data skew mitigation in mapreduce applicationsSurvey on load balancing and data skew mitigation in mapreduce applications
Survey on load balancing and data skew mitigation in mapreduce applications
 
Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureApache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and Future
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering Algorithm
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And Strategies
 
Automation in the Bug Flow - Machine Learning for Triaging and Tracing
Automation in the Bug Flow - Machine Learning for Triaging and TracingAutomation in the Bug Flow - Machine Learning for Triaging and Tracing
Automation in the Bug Flow - Machine Learning for Triaging and Tracing
 
Mark Lynch - Importance of Big Data and Analytics for the Insurance Market
Mark Lynch - Importance of Big Data and Analytics for the Insurance MarketMark Lynch - Importance of Big Data and Analytics for the Insurance Market
Mark Lynch - Importance of Big Data and Analytics for the Insurance Market
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Applying Machine Learning and Artificial Intelligence to Business
Applying Machine Learning and Artificial Intelligence to BusinessApplying Machine Learning and Artificial Intelligence to Business
Applying Machine Learning and Artificial Intelligence to Business
 
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
 
Clustering: A Survey
Clustering: A SurveyClustering: A Survey
Clustering: A Survey
 
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...
 
Machine Learning: Artificial Intelligence isn't just a Science Fiction topic
Machine Learning: Artificial Intelligence isn't just a Science Fiction topicMachine Learning: Artificial Intelligence isn't just a Science Fiction topic
Machine Learning: Artificial Intelligence isn't just a Science Fiction topic
 
Distributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop ClustersDistributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop Clusters
 
Machine Learning -- The Artificial Intelligence Revolution
Machine Learning -- The Artificial Intelligence RevolutionMachine Learning -- The Artificial Intelligence Revolution
Machine Learning -- The Artificial Intelligence Revolution
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Why You Should Care about Machine Learning And Artificial Intelligence Richar...
Why You Should Care about Machine Learning And Artificial Intelligence Richar...Why You Should Care about Machine Learning And Artificial Intelligence Richar...
Why You Should Care about Machine Learning And Artificial Intelligence Richar...
 
Machine learning it is time...
Machine learning it is time...Machine learning it is time...
Machine learning it is time...
 

Ähnlich wie MACHINE LEARNING ON MAPREDUCE FRAMEWORK

Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
kalai75
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
NavNeet KuMar
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
ijwscjournal
 

Ähnlich wie MACHINE LEARNING ON MAPREDUCE FRAMEWORK (20)

Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
 
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
 
B017320612
B017320612B017320612
B017320612
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
 
Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big Data
 
Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases report
 
Sycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptxSycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptx
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce FrameworkBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Vol 10 No 1 - February 2014
Vol 10 No 1 - February 2014Vol 10 No 1 - February 2014
Vol 10 No 1 - February 2014
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Resume
ResumeResume
Resume
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 

Kürzlich hochgeladen

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Kürzlich hochgeladen (20)

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 

MACHINE LEARNING ON MAPREDUCE FRAMEWORK

  • 1. Abhijit Kumar Behera M.Tech (CSE) Roll No. 1350001 School of Computer Engineering Guided By : Dr. Laxman Sahoo
  • 2. Contents  Introduction  Apache Hadoop related projects  Application of Mahout  Literature Survey  Plan of Action  Conclusion  References
  • 3. Introduction •The K-means algorithm is one of the most well-known clustering algorithms that has been frequently used to variety of problems. •MapReduce as the most popular cloud computing parallel framework is effective to handle massive data, the researches of K-means clustering algorithm which is based on MapReduce become a focus for scholars.
  • 4. Components of Hadoop HDFS •Name Node •Data Node •Secondary Name Node  Map Reduce •Map() •Combine() •Reduce() YARN •Job Tracker •TaskTracker HBase
  • 6. HBase Hadoop ( HDFS and MapReduce) Mahout Spark HIVE Zookeeper Sqoop PIG Apache Hadoop Projects
  • 7. Application of Mahout  Collaborative Filtering  Matrix factorization based recommenders  A user based Recommender  Clustering  Canopy Clustering  K-Means Clustering  Fuzzy K-Means  Affinity Propagation Clustering  Classification  Naive Bayes  Random forest classifier
  • 8. Literature Survey An Improved parallel K-means Clustering Algorithm with MapReduce Authors Name: Qing Liao, Fan Yang, Jingming Zhao Journal : Communication Technology (ICCT), IEEE Year of Publication:2014 Parallel K-means Algorithm 1) Initial 2) Mapper 3) Reducer
  • 10. Literature Survey Clouds for Scalable Big Data Analytics Authors Name: Domenico Talia Journal: IEEE Computer Society Year of Publication:2013 In this paper, author describe how cloud comp uting enhance the development and functionality of Big Data Analytics when it deployed into it. Cloud Service Model Features Users Data analytics software as a service A single and complete data mining application or task (including data sources) offered as a service End users, analytics managers, data analysts Data analytics platform as a service A data analysis suite or framework for programming or developing high-level applications, hiding the cloud infrastructure and data storage Data mining application developers, data scientists Data analytics infrastructure as a service A set of virtualized resources provided to a programmer or data mining researcher for developing, configuring, and running data analysis frameworks or applications Data mining programmers, data management developers, data mining researchers
  • 11. Plan of Action August - October 2014 Literature survey is done. November 2014 Problem definition formulation is done and problem solving outline are yet to be done December 2014- January 2015 Find out the appropriate solution of the problem yet to be formulated February-May 2015 Final implementation of the solution with result yet to be done
  • 12. Conclusion Large-scale data mining has been a new challenge in recent years. Using the Map-Reduce frame work the big data analytics can be accomplished. The K-means algorithm is one of the most well-known clustering algorithms. However, its processing performance has usually encountered a bottleneck if being utilized to deal with massive data. A parallel K-means algorithm with MapReduce which shows obvious advantage is implemented to handle massive data.
  • 13. References [1] Walisa Romsaiyud, Wichian Premchaiswadi, " An Adaptive Machine Learning on Map- Reduce Framework for Improving performance of Large-Scale Data Analysis on EC ", Eleventh IEEE Int'l Conf. on ICT and knowledge Engineering, 2014 [2] Domenico Talia," Clouds for Scalable Big Data Analytics ", IEEE Computer Society, 2013 [3] Feng Ye, Zhijan Wang , "Cloud-based Big Data Mining & Analyzing Services Platform integrating R", IEEE International Conference on Advance Cloud and Big Data , 2013 [4].DzApache-Hadoopdz-http://hadoop.apache.org/#What+Is+Apache+Hadoop%3F