SlideShare ist ein Scribd-Unternehmen logo
1 von 34
www.edureka.in/data-science
Data Science Webinar Series:
Applications of Clustering in Real Life
View Data Science Courses at : www.edureka.in/data_science
*
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
View Data Science Courses at : www.edureka.in/data_science
*
www.edureka.in/data-scienceSlide 2
Meet Your Instructor
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Mr. Kumaran Ponnambalam
• Director, Data Engineering & PS, Transera Inc,
San Francisco Bay Area
www.edureka.in/data-scienceSlide 3
Meet Your Instructor
 Understand Data Science Applications and Prospects
 Get an overview of Machine Learning
 Understand the difference between Supervised and Unsupervised Learning
 Learn Clustering and K-means Clustering
 Implement K-means clustering in R
At the end of this session, you will be able to
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 4
Objectives
 Understand Data Science Applications and Prospects
 Get an overview of Machine Learning
 Understand the difference between Supervised and Unsupervised Learning
 Learn Clustering and K-means Clustering
 Implement K-means clustering in R
At the end of this session, you will be able to
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 5
Data Science Applications: Wine Recommendation
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 6
Data Science Applications: Pizza Hut
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 7
Data Science Applications: NetFlix
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 8
Data Science Applications: Summarize News
www.edureka.in/data-scienceSlide 9
How about this?
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 10
What’s Common in these Applications?
According to Wikipedia: Data science is the study of the generalizable extraction of knowledge
from data, yet the key word is science.
These scenarios involve:
 Storing, organizing and integrating huge amount of unstructured data
 Processing and analyzing the data
 Extracting knowledge, insights and predict future from the data
Storage of big data is done in Hadoop. For more details on Hadoop please refer Big data and
Hadoop blog http://www.edureka.in/blog/category/big-data-and-hadoop/
Processing, Analyzing, extracting knowledge and insights are done through Machine Learning
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Slide 11 www.edureka.in/data-science
Data Science: Demand Supply Gap
Big Data Analyst
Big Data Architect
Big Data Engineer
Big Data Research Analyst
Big Data Visualizer
Data Scientist
50
43
44
31
23
18
50
57
56
69
77
82
Filled job vs unfilled jobs in big data
Filled Unfilled
Vacancy/Filled(%)
Gartner Says Big Data Creates Big Jobs: 4.4 Million IT
Jobs Globally to Support Big Data By
2015http://www.gartner.com/newsroom/id/2207915
Slide 12 www.edureka.in/data-science
Data Science: Job Trends
www.edureka.in/data-scienceSlide 13
Machine Learning Categories
Types of Learning
Supervised
Learning
Unsupervised
Learning
Inferring a function
from labelled
training data.
Trying to find hidden
structure in
unlabelled data.
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 14
Machine Learning Categories
What category do the applications below fall into?
Supervised Learning Supervised Learning
Unsupervised Learning Unsupervised Learning
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 15
Common Machine Learning Algorithms
Types of Learning
Supervised Learning
Unsupervised Learning
Algorithms
 Naïve Bayes
 Support Vector Machines
 Random Forests
 Decision Trees
Algorithms
 K-means
 Fuzzy Clustering
 Hierarchical Clustering
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 16
Clustering
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 17
Clustering: Scenarios
The following scenarios implement Clustering:
 A telephone company needs to establish its network by putting its towers in a particular region it
has acquired. The location of putting these towers can be found by clustering algorithm so that
all its users receive optimum signal strength.
 The Miami DEA wants to make its law enforcement more stringent and hence have decided to
make their patrol vans stationed across the area so that the areas of high crime rates are in
vicinity to the patrol vans.
 A Hospital Care chain wants to open a series of Emergency-Care wards, keeping in mind the
factor of maximum accident prone areas in a region.
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 18
Some More Use-Cases of Clustering
Slide 18
 Organizing data into clusters shows internal structure of the data
Ex. Clusty and clustering genes
 Sometimes the partitioning is the goal
Ex. Market segmentation
 Prepare for other AI techniques
Ex. Summarize news (cluster and then find centroid)
 Discovery in data
Ex. Underlying rules, reoccurring patterns, topics, etc.
www.edureka.in/data-scienceSlide 19
What is Clustering?
Slide 19
Organizing data into clusters such that there is:
 High intra-cluster similarity
 Low inter-cluster similarity
 Informally, finding natural groupings among
objects
http://en.wikipedia.org/wiki/Cluster_analysis
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 20Slide 20
K-Means Clustering
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 21Slide 21
K-Means Clustering
The process by which objects are classified into
a number of groups so that they are as much
dissimilar as possible from one group to another
group, but as much similar as possible within
each group.
The objects in group 1 should be as similar as
possible.
But there should be much difference between an
object in group 1 and group 2.
The attributes of the objects are allowed to
determine which objects should be grouped
together.
Total population
Group 1
Group 2 Group 3
Group 4
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 22
K-Means: Pizza Hut Clustering Example
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 23
Let us suppose the following points are the delivery locations for Pizza.
K-Means: Pizza Hut Clustering Example
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 24
Lets locate three cluster centres randomly
C1
C3
C2
K-Means: Pizza Hut Clustering Example
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 25
Find the distance of the points as shown.
C1
C3
C2
K-Means: Pizza Hut Clustering Example
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 26
Assign the points to the nearest cluster centres based on the distance between each centre and the points.
C1
C2
C3
K-Means: Pizza Hut Clustering Example
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 27
Re-assign the cluster centres and locate nearest points.
C1
C2
C3
K-Means: Pizza Hut Clustering Example
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 28
Re-assign the cluster centres and locate nearest points, calculate the distance.
C1
C2
C3
K-Means: Pizza Hut Clustering Example
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 29
Form the three clusters.
C1
C2
C3
K-Means: Pizza Hut Clustering Example
www.edureka.in/data-scienceSlide 30
ObjectiveFunctionValue
i.e.,Distortion
Elbow method
The value of k should be such that even if we increase the value of k from here on, the distortion
remains constant. This is the ideal value of k, for the clusters created.
The Elbow Curve
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 31
Now let us consider the another scenario of clustering :
The data from “Google page rank”.
Notice, that the data given here are sentences and not vectors.
Can we apply K-means clustering to it?
We will take a deep dive into TF-IDF in module 3 of this course.
Let’s look at the Another Scenario
For analyzing this type of data we use “TF-IDF algorithm” which converts these attributes to vectors.
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Slide 32 www.edureka.in/data-science
Demo
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
More Information on R setup and applications at:
http://www.edureka.in/blog/category/business-analytics-with-r/
Slide 33 www.edureka.in/data-science
 Module 1
» Introduction to Data Science
 Module 2
» Basic Data Manipulation using R
 Module 3
» Machine Learning Techniques using R Part -1
- Clustering
- TF-IDF and Cosine Similarity
- Association Rule Mining
 Module 4
» Machine Learning Techniques using R Part -2
- Supervised and Unsupervised Learning
- Decision Tree Classifier
Course Topics
 Module 5
» Machine Learning Techniques using R Part -3
- Random Forest Classifier
- Naïve Bayer’s Classifier
 Module 6
» Introduction to Hadoop Architecture
 Module 7
» Integrating R with Hadoop
 Module 8
» Mahout Introduction and Algorithm
Implementation
 Module 9
» Additional Mahout Algorithms and Parallel
Processing in R
 Module 10
» Project
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Slide 34
Questions?
Enroll for the Complete Course at : www.edureka.in/data_science
Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
www.edureka.in/data_science
Please Don’t forget to fill in the survey report
Class Recording and Presentation will be available in 24 hours at:
http://www.edureka.in/blog/application-of-clustering-in-data-science-using-real-life-examples/

Weitere ähnliche Inhalte

Was ist angesagt?

Chapter 8 distributed file systems
Chapter 8 distributed file systemsChapter 8 distributed file systems
Chapter 8 distributed file systems
AbDul ThaYyal
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
Edureka!
 

Was ist angesagt? (20)

2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering Algorithm
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Hierarchical clustering
Hierarchical clustering Hierarchical clustering
Hierarchical clustering
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
 
Distributed System ppt
Distributed System pptDistributed System ppt
Distributed System ppt
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Outlier analysis,Chapter-12, Data Mining: Concepts and Techniques
Outlier analysis,Chapter-12, Data Mining: Concepts and TechniquesOutlier analysis,Chapter-12, Data Mining: Concepts and Techniques
Outlier analysis,Chapter-12, Data Mining: Concepts and Techniques
 
Chapter 8 distributed file systems
Chapter 8 distributed file systemsChapter 8 distributed file systems
Chapter 8 distributed file systems
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Process synchronization in Operating Systems
Process synchronization in Operating SystemsProcess synchronization in Operating Systems
Process synchronization in Operating Systems
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Classification and prediction
Classification and predictionClassification and prediction
Classification and prediction
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
I. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHMI. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHM
 

Ähnlich wie Application of Clustering in Data Science using Real-life Examples

"Introduction to R Programming and Machine Learning"
"Introduction to R Programming and Machine Learning""Introduction to R Programming and Machine Learning"
"Introduction to R Programming and Machine Learning"
Edureka!
 
Introduction to Apache Mahout
Introduction to Apache MahoutIntroduction to Apache Mahout
Introduction to Apache Mahout
Edureka!
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 

Ähnlich wie Application of Clustering in Data Science using Real-life Examples (20)

Business Analytics Decision Tree in R
Business Analytics Decision Tree in RBusiness Analytics Decision Tree in R
Business Analytics Decision Tree in R
 
Data Science : Make Smarter Business Decisions
Data Science : Make Smarter Business DecisionsData Science : Make Smarter Business Decisions
Data Science : Make Smarter Business Decisions
 
Sentiment Analysis In Retail Domain
Sentiment Analysis In Retail DomainSentiment Analysis In Retail Domain
Sentiment Analysis In Retail Domain
 
Business Analytics with R
Business Analytics with RBusiness Analytics with R
Business Analytics with R
 
"Introduction to R Programming and Machine Learning"
"Introduction to R Programming and Machine Learning""Introduction to R Programming and Machine Learning"
"Introduction to R Programming and Machine Learning"
 
Webinar : Introduction to R Programming and Machine Learning
Webinar : Introduction to R Programming and Machine LearningWebinar : Introduction to R Programming and Machine Learning
Webinar : Introduction to R Programming and Machine Learning
 
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
 
Introduction to Apache Mahout
Introduction to Apache MahoutIntroduction to Apache Mahout
Introduction to Apache Mahout
 
Is Data Scientist the Sexiest Job of the 21st century?
Is Data Scientist the Sexiest Job of the 21st century?Is Data Scientist the Sexiest Job of the 21st century?
Is Data Scientist the Sexiest Job of the 21st century?
 
Logistic Regression In Data Science
Logistic Regression In Data ScienceLogistic Regression In Data Science
Logistic Regression In Data Science
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With R
 
How it works- Data Science
How it works- Data ScienceHow it works- Data Science
How it works- Data Science
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
 
Ds webinar-30july
Ds webinar-30julyDs webinar-30july
Ds webinar-30july
 
BigMLSchool: Customer Segmentation
BigMLSchool: Customer SegmentationBigMLSchool: Customer Segmentation
BigMLSchool: Customer Segmentation
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science Demystified
 
Applied AI Workshop - Presentation - Connect Day GDL
Applied AI Workshop - Presentation - Connect Day GDLApplied AI Workshop - Presentation - Connect Day GDL
Applied AI Workshop - Presentation - Connect Day GDL
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
 

Mehr von Edureka!

Mehr von Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Kürzlich hochgeladen

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 

Kürzlich hochgeladen (20)

General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 

Application of Clustering in Data Science using Real-life Examples

  • 1. www.edureka.in/data-science Data Science Webinar Series: Applications of Clustering in Real Life View Data Science Courses at : www.edureka.in/data_science * Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions View Data Science Courses at : www.edureka.in/data_science *
  • 2. www.edureka.in/data-scienceSlide 2 Meet Your Instructor Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Mr. Kumaran Ponnambalam • Director, Data Engineering & PS, Transera Inc, San Francisco Bay Area
  • 3. www.edureka.in/data-scienceSlide 3 Meet Your Instructor  Understand Data Science Applications and Prospects  Get an overview of Machine Learning  Understand the difference between Supervised and Unsupervised Learning  Learn Clustering and K-means Clustering  Implement K-means clustering in R At the end of this session, you will be able to Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 4. www.edureka.in/data-scienceSlide 4 Objectives  Understand Data Science Applications and Prospects  Get an overview of Machine Learning  Understand the difference between Supervised and Unsupervised Learning  Learn Clustering and K-means Clustering  Implement K-means clustering in R At the end of this session, you will be able to Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 5. www.edureka.in/data-scienceSlide 5 Data Science Applications: Wine Recommendation Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 6. www.edureka.in/data-scienceSlide 6 Data Science Applications: Pizza Hut Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 7. www.edureka.in/data-scienceSlide 7 Data Science Applications: NetFlix Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 8. www.edureka.in/data-scienceSlide 8 Data Science Applications: Summarize News
  • 9. www.edureka.in/data-scienceSlide 9 How about this? Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 10. www.edureka.in/data-scienceSlide 10 What’s Common in these Applications? According to Wikipedia: Data science is the study of the generalizable extraction of knowledge from data, yet the key word is science. These scenarios involve:  Storing, organizing and integrating huge amount of unstructured data  Processing and analyzing the data  Extracting knowledge, insights and predict future from the data Storage of big data is done in Hadoop. For more details on Hadoop please refer Big data and Hadoop blog http://www.edureka.in/blog/category/big-data-and-hadoop/ Processing, Analyzing, extracting knowledge and insights are done through Machine Learning Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 11. Slide 11 www.edureka.in/data-science Data Science: Demand Supply Gap Big Data Analyst Big Data Architect Big Data Engineer Big Data Research Analyst Big Data Visualizer Data Scientist 50 43 44 31 23 18 50 57 56 69 77 82 Filled job vs unfilled jobs in big data Filled Unfilled Vacancy/Filled(%) Gartner Says Big Data Creates Big Jobs: 4.4 Million IT Jobs Globally to Support Big Data By 2015http://www.gartner.com/newsroom/id/2207915
  • 13. www.edureka.in/data-scienceSlide 13 Machine Learning Categories Types of Learning Supervised Learning Unsupervised Learning Inferring a function from labelled training data. Trying to find hidden structure in unlabelled data. Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 14. www.edureka.in/data-scienceSlide 14 Machine Learning Categories What category do the applications below fall into? Supervised Learning Supervised Learning Unsupervised Learning Unsupervised Learning Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 15. www.edureka.in/data-scienceSlide 15 Common Machine Learning Algorithms Types of Learning Supervised Learning Unsupervised Learning Algorithms  Naïve Bayes  Support Vector Machines  Random Forests  Decision Trees Algorithms  K-means  Fuzzy Clustering  Hierarchical Clustering Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 16. www.edureka.in/data-scienceSlide 16 Clustering Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 17. www.edureka.in/data-scienceSlide 17 Clustering: Scenarios The following scenarios implement Clustering:  A telephone company needs to establish its network by putting its towers in a particular region it has acquired. The location of putting these towers can be found by clustering algorithm so that all its users receive optimum signal strength.  The Miami DEA wants to make its law enforcement more stringent and hence have decided to make their patrol vans stationed across the area so that the areas of high crime rates are in vicinity to the patrol vans.  A Hospital Care chain wants to open a series of Emergency-Care wards, keeping in mind the factor of maximum accident prone areas in a region. Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 18. www.edureka.in/data-scienceSlide 18 Some More Use-Cases of Clustering Slide 18  Organizing data into clusters shows internal structure of the data Ex. Clusty and clustering genes  Sometimes the partitioning is the goal Ex. Market segmentation  Prepare for other AI techniques Ex. Summarize news (cluster and then find centroid)  Discovery in data Ex. Underlying rules, reoccurring patterns, topics, etc.
  • 19. www.edureka.in/data-scienceSlide 19 What is Clustering? Slide 19 Organizing data into clusters such that there is:  High intra-cluster similarity  Low inter-cluster similarity  Informally, finding natural groupings among objects http://en.wikipedia.org/wiki/Cluster_analysis Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 20. www.edureka.in/data-scienceSlide 20Slide 20 K-Means Clustering Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 21. www.edureka.in/data-scienceSlide 21Slide 21 K-Means Clustering The process by which objects are classified into a number of groups so that they are as much dissimilar as possible from one group to another group, but as much similar as possible within each group. The objects in group 1 should be as similar as possible. But there should be much difference between an object in group 1 and group 2. The attributes of the objects are allowed to determine which objects should be grouped together. Total population Group 1 Group 2 Group 3 Group 4 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 22. www.edureka.in/data-scienceSlide 22 K-Means: Pizza Hut Clustering Example Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 23. www.edureka.in/data-scienceSlide 23 Let us suppose the following points are the delivery locations for Pizza. K-Means: Pizza Hut Clustering Example Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 24. www.edureka.in/data-scienceSlide 24 Lets locate three cluster centres randomly C1 C3 C2 K-Means: Pizza Hut Clustering Example Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 25. www.edureka.in/data-scienceSlide 25 Find the distance of the points as shown. C1 C3 C2 K-Means: Pizza Hut Clustering Example Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 26. www.edureka.in/data-scienceSlide 26 Assign the points to the nearest cluster centres based on the distance between each centre and the points. C1 C2 C3 K-Means: Pizza Hut Clustering Example Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 27. www.edureka.in/data-scienceSlide 27 Re-assign the cluster centres and locate nearest points. C1 C2 C3 K-Means: Pizza Hut Clustering Example Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 28. www.edureka.in/data-scienceSlide 28 Re-assign the cluster centres and locate nearest points, calculate the distance. C1 C2 C3 K-Means: Pizza Hut Clustering Example Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 29. www.edureka.in/data-scienceSlide 29 Form the three clusters. C1 C2 C3 K-Means: Pizza Hut Clustering Example
  • 30. www.edureka.in/data-scienceSlide 30 ObjectiveFunctionValue i.e.,Distortion Elbow method The value of k should be such that even if we increase the value of k from here on, the distortion remains constant. This is the ideal value of k, for the clusters created. The Elbow Curve Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 31. www.edureka.in/data-scienceSlide 31 Now let us consider the another scenario of clustering : The data from “Google page rank”. Notice, that the data given here are sentences and not vectors. Can we apply K-means clustering to it? We will take a deep dive into TF-IDF in module 3 of this course. Let’s look at the Another Scenario For analyzing this type of data we use “TF-IDF algorithm” which converts these attributes to vectors. Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 32. Slide 32 www.edureka.in/data-science Demo Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions More Information on R setup and applications at: http://www.edureka.in/blog/category/business-analytics-with-r/
  • 33. Slide 33 www.edureka.in/data-science  Module 1 » Introduction to Data Science  Module 2 » Basic Data Manipulation using R  Module 3 » Machine Learning Techniques using R Part -1 - Clustering - TF-IDF and Cosine Similarity - Association Rule Mining  Module 4 » Machine Learning Techniques using R Part -2 - Supervised and Unsupervised Learning - Decision Tree Classifier Course Topics  Module 5 » Machine Learning Techniques using R Part -3 - Random Forest Classifier - Naïve Bayer’s Classifier  Module 6 » Introduction to Hadoop Architecture  Module 7 » Integrating R with Hadoop  Module 8 » Mahout Introduction and Algorithm Implementation  Module 9 » Additional Mahout Algorithms and Parallel Processing in R  Module 10 » Project Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 34. Slide 34 Questions? Enroll for the Complete Course at : www.edureka.in/data_science Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in/data_science Please Don’t forget to fill in the survey report Class Recording and Presentation will be available in 24 hours at: http://www.edureka.in/blog/application-of-clustering-in-data-science-using-real-life-examples/

Hinweis der Redaktion

  1. Netflix uses 1 petabyte to store the videos for streaming. BitTorrent Sync has transferred over 30 petabytes of data since its pre-alpha release in January 2013. The 2009 movie Avatar is reported to have taken over 1 petabyte of local storage at Weta Digital for the rendering of the 3D CGI effects. One petabyte of average MP3-encoded songs (for mobile, roughly one megabyte per minute), would require 2000 years to play.
  2. News groups as clusters