SlideShare a Scribd company logo
1 of 18
Introduction on Data Mining
What is Data Mining Non-trivial extraction of implicit, previously unknown and potentially useful information from data Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns Data mining is the process of automatically discovering useful information in large data repositories 	--
Simple Examples for Data Mining ,[object Object]
Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,),[object Object]
Origins of Data Mining Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems Traditional Techniquesmay be unsuitable due to  Enormity of data High dimensionality of data Heterogeneous, distributed nature of data
Data Mining Tasks Prediction Methods Use some variables to predict unknown or future values of other variables Description Methods Find human-interpretable patterns that describe the data.
Data Mining Tasks Classification [Predictive] Clustering [Descriptive] Association Rule Discovery [Descriptive] Sequential Pattern Discovery [Descriptive] Regression [Predictive] Deviation Detection [Predictive]
Classification: Definition It is used for discrete target variables Ex: predicting whether a Web user will make a purchase at  an online store is an classification tasks because the target variabe is binary-valued.
Clustering: Definition -	Clustering  analysis  seeks to find groups of closely related observations that belong to the same cluster are more similar to each other than observations  that observations that belong s to other clusters.  Ex:           -to find areas of ocean that have aq significant impact on the earth’s climate.
Association Rule Discovery: Definition 	Given a set of records each of which contain some number of items from a given collection; Produce dependency rules which will predict occurrence of an item based on occurrences of other items.
Contd… Rules Discovered: {Milk} --> {Coke}     {Diaper, Milk} --> {Beer}
Sequential Pattern Discovery: Definition 	Given is a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events. (A   B)     (C)  --->   (D   E)
Contd… 	Rules are formed by first disovering patterns. Event occurrences in the patterns are governed by timing constraints. (A   B)     (C)    (D   E) <= xg  >ng <= ws <= ms
Sequential Pattern Discovery: Example 	 In telecommunications alarm logs,  (Inverter_ProblemExcessive_Line_Current)          (Rectifier_Alarm) --> (Fire_Alarm)
Regression 	Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency. Greatly studied in statistics, neural network fields.
Regression-examples 	Predicting sales amounts of new product based on advertising expenditure. Predicting wind velocities as a function of temperature, humidity, air pressure, etc. Time series prediction of stock market indices.
Deviation/Anomaly Detection Detect significant deviations from normal behavior Applications: Credit Card Fraud Detection Network Intrusion Detection
Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net

More Related Content

Viewers also liked

certificate_QA_Course
certificate_QA_Coursecertificate_QA_Course
certificate_QA_Course
Ivan Atanasov
 
Improving Artificial Intelligence by Studying the Brain
Improving Artificial Intelligence by Studying the BrainImproving Artificial Intelligence by Studying the Brain
Improving Artificial Intelligence by Studying the Brain
Tahoe Silicon Mountain
 

Viewers also liked (19)

The Girlfriends' Guide of Financial Savvy
The Girlfriends' Guide of Financial SavvyThe Girlfriends' Guide of Financial Savvy
The Girlfriends' Guide of Financial Savvy
 
List 44
List 44List 44
List 44
 
Bài giảng tmđt75
Bài giảng tmđt75Bài giảng tmđt75
Bài giảng tmđt75
 
Voys Samen - Assen Onderneemt
Voys Samen - Assen OnderneemtVoys Samen - Assen Onderneemt
Voys Samen - Assen Onderneemt
 
Data
DataData
Data
 
Nuove frontiere della bilateralità lombarda
Nuove frontiere della bilateralità lombardaNuove frontiere della bilateralità lombarda
Nuove frontiere della bilateralità lombarda
 
Apresentação i9life
Apresentação i9lifeApresentação i9life
Apresentação i9life
 
Best Designer Perfumes for Women
Best Designer Perfumes for WomenBest Designer Perfumes for Women
Best Designer Perfumes for Women
 
certificate_QA_Course
certificate_QA_Coursecertificate_QA_Course
certificate_QA_Course
 
PR3 Types and Styles of Music Based Programming
PR3 Types and Styles of Music Based ProgrammingPR3 Types and Styles of Music Based Programming
PR3 Types and Styles of Music Based Programming
 
San jose island panama 1945
San jose island panama 1945San jose island panama 1945
San jose island panama 1945
 
Data Mining Techniques In Computer Aided Cancer Diagnosis
Data Mining Techniques In Computer Aided Cancer DiagnosisData Mining Techniques In Computer Aided Cancer Diagnosis
Data Mining Techniques In Computer Aided Cancer Diagnosis
 
Jenifer Carter Resume
Jenifer Carter ResumeJenifer Carter Resume
Jenifer Carter Resume
 
Improving Artificial Intelligence by Studying the Brain
Improving Artificial Intelligence by Studying the BrainImproving Artificial Intelligence by Studying the Brain
Improving Artificial Intelligence by Studying the Brain
 
Role of media in crisis and Disaster
Role of media in crisis and Disaster Role of media in crisis and Disaster
Role of media in crisis and Disaster
 
E tips educational resources online
E tips educational resources onlineE tips educational resources online
E tips educational resources online
 
Managing your supervisor
Managing your supervisorManaging your supervisor
Managing your supervisor
 
Báo cáo bài tập lớn
Báo cáo bài tập lớnBáo cáo bài tập lớn
Báo cáo bài tập lớn
 
Staying Ahead of the Game - The Steps to Effective Crisis Communications Plan...
Staying Ahead of the Game - The Steps to Effective Crisis Communications Plan...Staying Ahead of the Game - The Steps to Effective Crisis Communications Plan...
Staying Ahead of the Game - The Steps to Effective Crisis Communications Plan...
 

Similar to Introduction to data mining

chapter1_Introduction.pdf data mining ppt
chapter1_Introduction.pdf data mining pptchapter1_Introduction.pdf data mining ppt
chapter1_Introduction.pdf data mining ppt
GyanaKarn
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
Tony Nguyen
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
Luis Goldster
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
James Wong
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
Harry Potter
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
Young Alista
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
Fraboni Ec
 
Data mining techniques and dss
Data mining techniques and dssData mining techniques and dss
Data mining techniques and dss
Niyitegekabilly
 

Similar to Introduction to data mining (20)

Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
chapter1_Introduction.pdf data mining ppt
chapter1_Introduction.pdf data mining pptchapter1_Introduction.pdf data mining ppt
chapter1_Introduction.pdf data mining ppt
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Dwdm ppt for the btech student contain basis
Dwdm ppt for the btech student contain basisDwdm ppt for the btech student contain basis
Dwdm ppt for the btech student contain basis
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Dwd mdatamining intro-iep
Dwd mdatamining intro-iepDwd mdatamining intro-iep
Dwd mdatamining intro-iep
 
Data mining
Data miningData mining
Data mining
 
Datamining intro-iep
Datamining intro-iepDatamining intro-iep
Datamining intro-iep
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Data mining techniques and dss
Data mining techniques and dssData mining techniques and dss
Data mining techniques and dss
 
Data mining
Data miningData mining
Data mining
 

More from Datamining Tools

More from Datamining Tools (20)

Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Data Mining: Data warehouse and olap technology
Data Mining: Data warehouse and olap technologyData Mining: Data warehouse and olap technology
Data Mining: Data warehouse and olap technology
 
Data MIning: Data processing
Data MIning: Data processingData MIning: Data processing
Data MIning: Data processing
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
Data mining: Classification and Prediction
Data mining: Classification and PredictionData mining: Classification and Prediction
Data mining: Classification and Prediction
 
Data Mining: Data mining classification and analysis
Data Mining: Data mining classification and analysisData Mining: Data mining classification and analysis
Data Mining: Data mining classification and analysis
 
Data Mining: Data mining and key definitions
Data Mining: Data mining and key definitionsData Mining: Data mining and key definitions
Data Mining: Data mining and key definitions
 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalization
 
Data Mining: Applying data mining
Data Mining: Applying data miningData Mining: Applying data mining
Data Mining: Applying data mining
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
AI: Planning and AI
AI: Planning and AIAI: Planning and AI
AI: Planning and AI
 
AI: Logic in AI 2
AI: Logic in AI 2AI: Logic in AI 2
AI: Logic in AI 2
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
AI: Learning in AI 2
AI: Learning in AI  2AI: Learning in AI  2
AI: Learning in AI 2
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligence
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 

Introduction to data mining

  • 2. What is Data Mining Non-trivial extraction of implicit, previously unknown and potentially useful information from data Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns Data mining is the process of automatically discovering useful information in large data repositories --
  • 3.
  • 4.
  • 5. Origins of Data Mining Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems Traditional Techniquesmay be unsuitable due to Enormity of data High dimensionality of data Heterogeneous, distributed nature of data
  • 6. Data Mining Tasks Prediction Methods Use some variables to predict unknown or future values of other variables Description Methods Find human-interpretable patterns that describe the data.
  • 7. Data Mining Tasks Classification [Predictive] Clustering [Descriptive] Association Rule Discovery [Descriptive] Sequential Pattern Discovery [Descriptive] Regression [Predictive] Deviation Detection [Predictive]
  • 8. Classification: Definition It is used for discrete target variables Ex: predicting whether a Web user will make a purchase at an online store is an classification tasks because the target variabe is binary-valued.
  • 9. Clustering: Definition - Clustering analysis seeks to find groups of closely related observations that belong to the same cluster are more similar to each other than observations that observations that belong s to other clusters. Ex: -to find areas of ocean that have aq significant impact on the earth’s climate.
  • 10. Association Rule Discovery: Definition Given a set of records each of which contain some number of items from a given collection; Produce dependency rules which will predict occurrence of an item based on occurrences of other items.
  • 11. Contd… Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer}
  • 12. Sequential Pattern Discovery: Definition Given is a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events. (A B) (C) ---> (D E)
  • 13. Contd… Rules are formed by first disovering patterns. Event occurrences in the patterns are governed by timing constraints. (A B) (C) (D E) <= xg >ng <= ws <= ms
  • 14. Sequential Pattern Discovery: Example In telecommunications alarm logs, (Inverter_ProblemExcessive_Line_Current) (Rectifier_Alarm) --> (Fire_Alarm)
  • 15. Regression Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency. Greatly studied in statistics, neural network fields.
  • 16. Regression-examples Predicting sales amounts of new product based on advertising expenditure. Predicting wind velocities as a function of temperature, humidity, air pressure, etc. Time series prediction of stock market indices.
  • 17. Deviation/Anomaly Detection Detect significant deviations from normal behavior Applications: Credit Card Fraud Detection Network Intrusion Detection
  • 18. Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net