This document provides an overview of data mining. It defines data mining as extracting knowledge from large amounts of data similar to gold mining. It discusses why data mining is needed due to the data explosion problem and how it can extract knowledge from data. The document outlines several data mining tools and techniques such as classification, clustering, regression, and association rules. It also discusses the KDD process and data mining architecture. Applications of data mining discussed include communications, insurance, education, banking, supermarkets, crime investigation, and bioinformatics. Advantages include predicting trends, understanding customer habits, and increasing revenue while disadvantages relate to privacy and security issues.
2. Data Mining
DATA MINING
• Data mining refers to extracting knowledge from large
amounts of data.
• Similar to “Gold Mining”
• But it’s a misnomer – “knowledge mining from data”
3. Data Mining
WHY DATA MINING
• Data explosion problem
• Advance data collection tools and
database technology lead to tremendous
amounts of data Stored in database.
4. CONT.
• We are drowning in data, but starving
for knowledge!
• Solution: Data warehousing and Data mining.
• Data warehousing and on-line analytical processing
• Extraction of interesting knowledge using data mining
5. Data Mining Tools
TOOLS FOR DATA MINING
o It is multidisciplinary skill that uses machine learning,
artificial intelligence, database technology.
o It is all about explaining the past and prediction the future
for analysis.
• Rapid miner
• Orange
• Weka
• Knime
• Python
• SAS data mining etc.
6. KDD
• Data mining is also called as knowledge discovery, knowledge
extraction, data/pattern analysis, information harvesting, etc.
• Gregory Piatetsky-Shapiro coined the term "knowledge
discovery in databases" for the first workshop.
• Currently, the terms data mining and knowledge discovery are
used interchangeably.
KDD PROCESS (knowledge discovery in database)
10. Techniques
1.Classification:
• This technique is used to retrieve important and relevant information about
data, and metadata.
2. Clustering:
• Clustering is a technique to identify data that are like each other.
• This process helps to understand the differences and similarities between
the data.
3. Regression:
• Regression is the technique to identifying the relationship between
variables.
11. 1
Techniques
4. Association rules
• This techniques is used to identify interesting relations between different
variables in the database.
• Also, used to unpack hidden patterns in the data.
5. Outer detection
• This type of data mining technique refers to observation of data items in
the dataset which do not match an expected pattern or expected
behavior.
• This technique can be used in a variety of domains, such as detection,
fraud or fault detection, etc.
• Outer detection is also called outlier analysis or outlier mining.
12. Techniques
6. Sequential patterns:
• This data mining technique helps to discover or identify
similar patterns or trends in transaction data for certain
period.
7. Prediction:
• Prediction has used a combination of the other data
mining techniques like trends, sequential patterns,
clustering, classification, etc.
• It analyzes past events or instances in a right
sequence for predicting a future event.
CONT.
13. 1
Applications
APPLICATIONS
1) Communications
Data mining techniques are used in communication sector to predict
customer behavior to offer highly targeted and relevant campaigns.
2) Insurance
Risk management: Data mining helps insurance companies to price that
products profitable and promote new offers to their new or existing
customers.
3) Education
Data mining benefits educators to access student data, predict achievement
levels and find student or groups of students which needs extra attention.
For example students who are weak in math's subject.
14. Application
CONT.
4) Banking
It helps banks to identify probable defaulters to decide whether to issue credit cards, loans,
etc.
5) Super Markets
Market basket analysis
Data mining allows supermarket’s develop rules to predict if their shoppers were
likely to be expecting. By evaluating their buying pattern, they could find woman
customers who are mostly interested in products like clothes, shoes, jewelry and
so on.
6) Crime Investigation
Data mining helps crime investigation agencies to deploy police workforce (where
is a crime most likely to happens and when?), who to search at a border crossing
etc.
15. Application
Cont.
7) Bioinformatics
Data mining helps to mine biological data from massive
datasets gathered in biology and medicine.
8) Fraud Detection:
It is almost a kind of crime that is increasing day after day.
The fraud detection process can be mainly used through
credit card services and telecommunication.
With the help of the services most of the important
information like duration of the call, location, the time and
day etc. can be acquired which helps in big time.
16. Advantages
ADVANTAGES OF DATA MINING:
1. It is helpful to predict future trends.
2. It signifies customer habits.
3. Helps in decision making.
4. Increase company revenue.
5. It depends upon market-based analysis.
6. Quick fraud detection.
17. Application
DISADVANTAGES
• Information is collected through Data Mining intended for the
ethical purposes can be misused.
• This information may be exploited by unethical people or
business to take benefits of vulnerable people or discriminate
against a group of people.
• Security is a big issue. Business own information about their
employees and customers including social security number,
birthday, payroll and etc.
• Because of privacy issues . People are afraid of their personal
information is collected and used in an unethical way that
potentially causing them a lot of troubles.