3. What is Data Mining?
•Data mining refers to extracting or “mining” knowledge from
large amounts of data.
•Data mining field brings together techniques from learning ,
pattern recognition , statistics , databases and visualization to
deal with the issues of information extraction from large data
bases.
•Data mining field finds its application in market analysis and
management like for e.g. customer relationship management
, cross selling, market segmentation.
4. ARCHITECTURE OF DATA MINING
Architecture of a typical data mining system may have the following major
components:
1) Database , Data warehouse , World Wide Web:
- This is one or set of databases, data warehouses, spreadsheets or other kind
of information repositories. Data cleaning and data integration techniques
may be performed.
2) Databases or Data warehouse Server:
- It is responsible for fetching the relevant data, based on the user’s
requirement needed for data mining.
5.
6. 3) Knowledge base:
- This is domain knowledge that is used to guide the search , and gives
interesting and hidden patterns from data. Such knowledge can include concept
hierarchies, used to organize attribute or attribute values into different levels of
abstraction.
-Knowledge such as user beliefs, which can be used to asses a pattern’s
interestingness based on it’s unexpectedness may also be included
-Other example are constraints, threshold & metadata.
4) Data Mining Engine:
- This is essential to the data mining system & ideally consists of a set of
functional modules for tasks such as characterization, association & correlation
analysis, classification, prediction, cluster analysis, outlier analysis & evolution
analysis.
7. 5) Pattern Evaluation Module:
- It is integrated with the mining module and it gives the search
of only the interesting patterns.
6) Graphical User Interface:
- Used to communicate between users and the data mining
system, allowing the users to interact with the system by
specifying a data mining query or task, & performing exploratory
data mining based on the intermediate data mining results.
-This component allows the user to browse database or data
warehouse schemas or data structures, evaluate mined patterns,
& visualize the patterns in different forms.
8. Knowledge Discovery Data(KDD)
•The unifying goal of the KDD process is to extract knowledge from
Data in the context of large databases .
•It consists of an iterative sequence of the following steps:
1) Data Cleaning:
-To remove noise and inconsistent data.
2) Data Integration:
-Combining multiple data sources.
3) Data Selection:
-Data relevant to the analysis task are retrieved from the
database.
9.
10. 4) Data Transformation:
- Data are transformed into forms appropriate for mining by
performing summary or aggregation operations, for instance.
5) Data Mining:
- An essential process where intelligent methods are applied in
order to extract data patterns.
6) Pattern Evalution:
-To identify the truly interesting patterns representing knowledge
base on some interestingness measures.
7) Knowledge Presentation:
- Visualization and knowledge representation techniques are used
to present the mined knowledge to the user.
11. Steps 1 to 4 are different forms of data preprocessing, where the
data are prepared for mining.
-The data mining step may interact with the user or knowledge
base.
-The interesting patterns are represented to the user & may be
stored as a new knowledge in the knowledge base.
-Data mining is only step which is more essential because it
uncovers hidden patterns for evaluation.
12. › KDD and Data Mining are not same thing.
› KDD is the overall process of discovering useful
knowledge from data whereas Data Mining is only one
step in the KDD process.
› KDD is the nontrivial process of identifying valid ,
potentially useful and ultimately understandable
patterns in data and Data Mining is an application of
specific algorithms for extracting patterns for data.
How does KDD defer from Data Mining: