SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Data Mining:
Concepts and
Techniques
(3 rd ed.)

— Chapter 13 —
Jiawei Han, Micheline Kamber, and Jian Pei
University of Illinois at Urbana-Champaign &
Simon Fraser University
©2011 Han, Kamber & Pei. All rights reserved.
Chapter 13: Data Mining Trends
and Research Frontiers


Mining Complex Types of Data



Other Methodologies of Data Mining



Data Mining Applications



Data Mining and Society



Data Mining Trends



Summary
3
Mining Complex Types of Data


Mining Sequence Data


Mining Time Series



Mining Symbolic Sequences



Mining Biological Sequences



Mining Graphs and Networks



Mining Other Kinds of Data
4
Mining Sequence Data


Similarity Search in Time Series Data




Regression and Trend Analysis in Time-Series Data




Feature-based vs. sequence-distance-based vs. model-based

Alignment of Biological Sequences




GSP, PrefixSpan, constraint-based sequential pattern mining

Sequence Classification




long term + cyclic + seasonal variation + random movements

Sequential Pattern Mining in Symbolic Sequences




Subsequence match, dimensionality reduction, query-based similarity
search, motif-based similarity search

Pair-wise vs. multi-sequence alignment, substitution matirces, BLAST

Hidden Markov Model for Biological Sequence Analysis


Markov chain vs. hidden Markov models, forward vs. Viterbi vs. BaumWelch algorithms
5
Mining Graphs and Networks


Graph Pattern Mining




Statistical Modeling of Networks




Frequent subgraph patterns, closed graph patterns, gSpan vs. CloseGraph
Small world phenomenon, power law (log-tail) distribution, densification

Clustering and Classification of Graphs and Homogeneous Networks





Clustering: Fast Modularity vs. SCAN
Classification: model vs. pattern-based mining

Clustering, Ranking and Classification of Heterogeneous Networks




RankClus, RankClass, and meta path-based, user-guided methodology

Role Discovery and Link Prediction in Information Networks


PathPredict



Similarity Search and OLAP in Information Networks: PathSim, GraphCube



Evolution of Social and Information Networks: EvoNetClus
6
Mining Other Kinds of Data


Mining Spatial Data




Mining Spatiotemporal and Moving Object Data




Topic modeling, i-topic model, integration with geo- and networked data

Mining Web Data




Social media data, geo-tagged spatial clustering, periodicity analysis, …

Mining Text Data




Applications: healthcare, air-traffic control, flood simulation

Mining Multimedia Data




Spatiotemporal data mining, trajectory mining, periodica, swarm, …

Mining Cyber-Physical System Data




Spatial frequent/co-located patterns, spatial clustering and classification

Web content, web structure, and web usage mining

Mining Data Streams


Dynamics, one-pass, patterns, clustering, classification, outlier detection

7
Chapter 13: Data Mining Trends
and Research Frontiers


Mining Complex Types of Data



Other Methodologies of Data Mining



Data Mining Applications



Data Mining and Society



Data Mining Trends



Summary
8
Other Methodologies of Data
Mining


Statistical Data Mining



Views on Data Mining Foundations



Visual and Audio Data Mining

9
Major Statistical Data Mining
Methods


Regression



Generalized Linear Model



Analysis of Variance



Mixed-Effect Models



Factor Analysis



Discriminant Analysis



Survival Analysis
10
Statistical Data Mining (1)


There are many well-established statistical techniques for data
analysis, particularly for numeric data
 applied extensively to data from scientific experiments and data
from economics and the social sciences
Regression



predict the value of a response
(dependent) variable from one or more
predictor (independent) variables
where the variables are numeric


forms of regression: linear, multiple,
weighted, polynomial, nonparametric,
and robust


11
Scientific and Statistical Data Mining (2)





Generalized linear models
 allow a categorical response variable (or
some transformation of it) to be related
to a set of predictor variables
 similar to the modeling of a numeric
response variable using linear
regression
 include logistic regression and Poisson
regression
Mixed-effect models
For analyzing grouped data, i.e. data that can be classified
according to one or more grouping variables


Typically describe relationships between a response variable and
some covariates in data grouped according to one or more factors


12
Scientific and Statistical Data Mining (3)


Regression trees




Binary trees used for classification
and prediction
Similar to decision trees:Tests are
performed at the internal nodes

In a regression tree the mean of the
objective attribute is computed and
used as the predicted value
Analysis of variance






Analyze experimental data for two or
more populations described by a
numeric response variable and one
or more categorical variables
(factors)
13
Statistical Data Mining (4)




Factor analysis
 determine which variables are
combined to generate a given factor
 e.g., for many psychiatric data, one
can indirectly measure other
quantities (such as test scores) that
reflect the factor of interest
Discriminant analysis
 predict a categorical response
variable, commonly used in social
science
 Attempts to determine several
discriminant functions (linear
combinations of the independent
variables) that discriminate among
the groups defined by the response
variable

www.spss.com/datamine/factor.htm

14
Statistical Data Mining (5)


Time series: many methods such as autoregression,
ARIMA (Autoregressive integrated moving-average
modeling), long memory time-series modeling



Quality control: displays group summary charts



Survival analysis

Predicts the probability
that a patient undergoing a
medical treatment would
survive at least to time t
(life span prediction)


15
Other Methodologies of Data
Mining


Statistical Data Mining



Views on Data Mining Foundations



Visual and Audio Data Mining

16
Views on Data Mining Foundations (I)


Data reduction





Basis of data mining: Reduce data representation
Trades accuracy for speed in response

Data compression




Basis of data mining: Compress the given data by
encoding in terms of bits, association rules, decision
trees, clusters, etc.

Probability and statistical theory


Basis of data mining: Discover joint probability
distributions of random variables
17
Views on Data Mining Foundations (II)


Microeconomic view




A view of utility: Finding patterns that are interesting only to the
extent in that they can be used in the decision-making process of
some enterprise

Pattern Discovery and Inductive databases








Basis of data mining: Discover patterns occurring in the database,
such as associations, classification models, sequential patterns,
etc.
Data mining is the problem of performing inductive logic on
databases
The task is to query the data and the theory (i.e., patterns) of the
database
Popular among many researchers in database systems
18
Other Methodologies of Data
Mining


Statistical Data Mining



Views on Data Mining Foundations



Visual and Audio Data Mining

19
Visual Data Mining




Visualization: Use of computer graphics to create visual
images which aid in the understanding of complex, often
massive representations of data
Visual Data Mining: discovering implicit but useful
knowledge from large data sets using visualization
techniques
Multimedia
Human
Computer
Systems
Computer
Graphics
Interfaces
Visual Data
Mining
High
Pattern
Performance
Recognitio
Computing
n

20
Visualization


Purpose of Visualization









Gain insight into an information space by mapping data
onto graphical primitives
Provide qualitative overview of large data sets
Search for patterns, trends, structure, irregularities,
relationships among data.
Help find interesting regions and suitable parameters
for further quantitative analysis.
Provide a visual proof of computer representations
derived

21
Visual Data Mining & Data Visualization




Integration of visualization and data mining
 data visualization
 data mining result visualization
 data mining process visualization
 interactive visual data mining
Data visualization
 Data in a database or data warehouse can be viewed
 at different levels of abstraction
 as different combinations of attributes or
dimensions
 Data can be presented in various visual forms
22
Data Mining Result Visualization




Presentation of the results or knowledge obtained from
data mining in visual forms
Examples


Scatter plots and boxplots (obtained from descriptive
data mining)



Decision trees



Association rules



Clusters



Outliers



Generalized rules
23
Boxplots from Statsoft: Multiple
Variable Combinations

24
Visualization of Data Mining Results in
SAS Enterprise Miner: Scatter Plots

25
Visualization of Association Rules
in SGI/MineSet 3.0

26
Visualization of a Decision Tree in
SGI/MineSet 3.0

27
Visualization of Cluster Grouping in IBM
Intelligent Miner

28
Data Mining Process Visualization


Presentation of the various processes of data mining in
visual forms so that users can see


Data extraction process



Where the data is extracted



How the data is cleaned, integrated, preprocessed,
and mined



Method selected for data mining



Where the results are stored



How they may be viewed
29
Visualization of Data Mining
Processes by Clementine
See your solution
discovery
process clearly

Understand
variations with
visualized data

30
Interactive Visual Data Mining




Using visualization tools in the data mining process to
help users make smart data mining decisions
Example




Display the data distribution in a set of attributes using
colored sectors or columns (depending on whether the
whole space is represented by either a circle or a set
of columns)
Use the display to which sector should first be
selected for classification and where a good split point
for this sector may be
31
Perception-Based Classification
(PBC)

32
Audio Data Mining









Uses audio signals to indicate the patterns of data or
the features of data mining results
An interesting alternative to visual mining
An inverse task of mining audio (such as music)
databases which is to find patterns from audio data
Visual data mining may disclose interesting patterns
using graphical displays, but requires users to
concentrate on watching patterns
Instead, transform patterns into sound and music and
listen to pitches, rhythms, tune, and melody in order to
identify anything interesting or unusual
33
Chapter 13: Data Mining Trends
and Research Frontiers


Mining Complex Types of Data



Other Methodologies of Data Mining



Data Mining Applications



Data Mining and Society



Data Mining Trends



Summary
34
Data Mining Applications




Data mining: A young discipline with broad and diverse
applications
 There still exists a nontrivial gap between generic data
mining methods and effective and scalable data mining
tools for domain-specific applications
Some application domains (briefly discussed here)
 Data Mining for Financial data analysis
 Data Mining for Retail and Telecommunication
Industries
 Data Mining in Science and Engineering
 Data Mining for Intrusion Detection and Prevention
 Data Mining and Recommender Systems
35
Data Mining for Financial Data Analysis
(I)




Financial data collected in banks and financial institutions
are often relatively complete, reliable, and of high quality
Design and construction of data warehouses for
multidimensional data analysis and data mining


View the debt and revenue changes by month, by
region, by sector, and by other factors

Access statistical information such as max, min, total,
average, trend, etc.
Loan payment prediction/consumer credit policy analysis
 feature selection and attribute relevance ranking
 Loan payment performance
 Consumer credit rating




36
Data Mining for Financial Data Analysis
(II)




Classification and clustering of customers for targeted
marketing
 multidimensional segmentation by nearest-neighbor,
classification, decision trees, etc. to identify customer
groups or associate a new customer to an appropriate
customer group
Detection of money laundering and other financial crimes
 integration of from multiple DBs (e.g., bank
transactions, federal/state crime history DBs)
 Tools: data visualization, linkage analysis,
classification, clustering tools, outlier analysis, and
sequential pattern analysis tools (find unusual access
sequences)
37
Data Mining for Retail & Telcomm. Industries (I)




Retail industry: huge amounts of data on sales, customer
shopping history, e-commerce, etc.
Applications of retail data mining


Identify customer buying behaviors



Discover customer shopping patterns and trends



Improve the quality of customer service



Achieve better customer retention and satisfaction



Enhance goods consumption ratios





Design more effective goods transportation and
distribution policies

Telcomm. and many other industries: Share many similar
goals and expectations of retail data mining

38
Data Mining Practice for Retail Industry



Design and construction of data warehouses
Multidimensional analysis of sales, customers, products, time, and
region



Analysis of the effectiveness of sales campaigns



Customer retention: Analysis of customer loyalty






Use customer loyalty card information to register sequences of
purchases of particular customers
Use sequential pattern mining to investigate changes in customer
consumption or loyalty
Suggest adjustments on the pricing and variety of goods



Product recommendation and cross-reference of items



Fraudulent analysis and the identification of usual patterns



Use of visualization tools in data analysis
39
Data Mining in Science and Engineering


Data warehouses and data preprocessing




Mining complex data types




Resolving inconsistencies or incompatible data collected in
diverse environments and different periods (e.g. eco-system
studies)
Spatiotemporal, biological, diverse semantics and relationships

Graph-based and network-based mining


Links, relationships, data flow, etc.



Visualization tools and domain-specific knowledge



Other issues




Data mining in social sciences and social studies: text and social
media
Data mining in computer science: monitoring systems, software
bugs, network intrusion

40
Data Mining for Intrusion Detection and
Prevention


Majority of intrusion detection and prevention systems use






Signature-based detection: use signatures, attack patterns that are
preconfigured and predetermined by domain experts
Anomaly-based detection: build profiles (models of normal
behavior) and detect those that are substantially deviate from the
profiles

What data mining can help



New data mining algorithms for intrusion detection
Association, correlation, and discriminative pattern analysis help
select and build discriminative classifiers



Analysis of stream data: outlier detection, clustering, model shifting



Distributed data mining



Visualization and querying tools
41
Data Mining and Recommender Systems






Recommender systems: Personalization, making product
recommendations that are likely to be of interest to a user
Approaches: Content-based, collaborative, or their hybrid
 Content-based: Recommends items that are similar to items the
user preferred or queried in the past
 Collaborative filtering: Consider a user's social environment,
opinions of other customers who have similar tastes or preferences
Data mining and recommender systems
 Users C × items S: extract from known to unknown ratings to
predict user-item combinations
 Memory-based method often uses k-nearest neighbor approach
 Model-based method uses a collection of ratings to learn a model
(e.g., probabilistic models, clustering, Bayesian networks, etc.)
 Hybrid approaches integrate both to improve performance (e.g.,
using ensemble)
42
Chapter 13: Data Mining Trends
and Research Frontiers


Mining Complex Types of Data



Other Methodologies of Data Mining



Data Mining Applications



Data Mining and Society



Data Mining Trends



Summary
43
Ubiquitous and Invisible Data Mining


Ubiquitous Data Mining





Data mining is used everywhere, e.g., online shopping
Ex. Customer relationship management (CRM)

Invisible Data Mining








Invisible: Data mining functions are built in daily life operations
Ex. Google search: Users may be unaware that they are
examining results returned by data
Invisible data mining is highly desirable
Invisible mining needs to consider efficiency and scalability, user
interaction, incorporation of background knowledge and
visualization techniques, finding interesting patterns, real-time, …
Further work: Integration of data mining into existing business
and scientific technologies to provide domain-specific data mining
tools
44
Privacy, Security and Social Impacts of
Data Mining


Many data mining applications do not touch personal data






E.g., meteorology, astronomy, geography, geology, biology, and
other scientific and engineering data

Many DM studies are on developing scalable algorithms to find general
or statistically significant patterns, not touching individuals
The real privacy concern: unconstrained access of individual records,
especially privacy-sensitive information



Method 1: Removing sensitive IDs associated with the data



Method 2: Data security-enhancing methods





Multi-level security model: permit to access to only authorized level
Encryption: e.g., blind signatures, biometric encryption, and
anonymous databases (personal information is encrypted and stored
at different locations)

Method 3: Privacy-preserving data mining methods
45
Privacy-Preserving Data Mining




Privacy-preserving (privacy-enhanced or privacy-sensitive) mining:
 Obtaining valid mining results without disclosing the underlying
sensitive data values
 Often needs trade-off between information loss and privacy
Privacy-preserving data mining methods:
 Randomization (e.g., perturbation): Add noise to the data in order
to mask some attribute values of records
 K-anonymity and l-diversity: Alter individual records so that they
cannot be uniquely identified







k-anonymity: Any given record maps onto at least k other records
l-diversity: enforcing intra-group diversity of sensitive values

Distributed privacy preservation: Data partitioned and distributed
either horizontally, vertically, or a combination of both
Downgrading the effectiveness of data mining: The output of data
mining may violate privacy


Modify data or mining results, e.g., hiding some association rules or slightly
distorting some classification models

46
Chapter 13: Data Mining Trends
and Research Frontiers


Mining Complex Types of Data



Other Methodologies of Data Mining



Data Mining Applications



Data Mining and Society



Data Mining Trends



Summary
47
Trends of Data Mining


Application exploration: Dealing with application-specific problems



Scalable and interactive data mining methods



Integration of data mining with Web search engines, database
systems, data warehouse systems and cloud computing systems



Mining social and information networks



Mining spatiotemporal, moving objects and cyber-physical systems



Mining multimedia, text and web data



Mining biological and biomedical data



Data mining with software engineering and system engineering



Visual and audio data mining



Distributed data mining and real-time data stream mining



Privacy protection and information security in data mining
48
Chapter 13: Data Mining Trends
and Research Frontiers


Mining Complex Types of Data



Other Methodologies of Data Mining



Data Mining Applications



Data Mining and Society



Data Mining Trends



Summary
49
Summary



We present a high-level overview of mining complex data types
Statistical data mining methods, such as regression, generalized linear
models, analysis of variance, etc., are popularly adopted



Researchers also try to build theoretical foundations for data mining



Visual/audio data mining has been popular and effective









Application-based mining integrates domain-specific knowledge with
data analysis techniques and provide mission-specific solutions
Ubiquitous data mining and invisible data mining are penetrating our
data lives
Privacy and data security are importance issues in data mining, and
privacy-preserving data mining has been developed recently
Our discussion on trends in data mining shows that data mining is a
promising, young field, with great, strategic importance
50
References and Further Reading


The books lists a lot of references for further reading. Here we only list a few books



E. Alpaydin. Introduction to Machine Learning, 2nd ed., MIT Press, 2011













S. Chakrabarti. Mining the Web: Statistical Analysis of Hypertex and Semi-Structured Data . Morgan
Kaufmann, 2002
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification, 2ed., Wiley-Interscience, 2000
D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning about a Highly Connected
World. Cambridge University Press, 2010.
U. Fayyad, G. Grinstein, and A. Wierse (eds.), Information Visualization in Data Mining and
Knowledge Discovery, Morgan Kaufmann, 2001
J. Han, M. Kamber, J. Pei. Data Mining: Concepts and Techniques . Morgan Kaufmann, 3rd ed. 2011
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining,
Inference, and Prediction, 2nd ed., Springer-Verlag, 2009
D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques . MIT Press,
2009.



B. Liu. Web Data Mining, Springer 2006.



T. M. Mitchell. Machine Learning, McGraw Hill, 1997



M. Newman. Networks: An Introduction. Oxford University Press, 2010.



P.-N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, Wiley, 2005



I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java
Implementations, Morgan Kaufmann, 2nd ed. 2005

51
52

Weitere ähnliche Inhalte

Was ist angesagt?

Chapter - 8.2 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.2 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 8.2 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.2 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
Data Mining: Applying data mining
Data Mining: Applying data miningData Mining: Applying data mining
Data Mining: Applying data miningDataminingTools Inc
 
Data mining-2
Data mining-2Data mining-2
Data mining-2Nit Hik
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningAbcdDcba12
 
Data miningppt378
Data miningppt378Data miningppt378
Data miningppt378nitttin
 
Data Mining based on Hashing Technique
Data Mining based on Hashing TechniqueData Mining based on Hashing Technique
Data Mining based on Hashing Techniqueijtsrd
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniquesHatem Magdy
 

Was ist angesagt? (18)

Chapter - 8.2 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.2 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 8.2 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.2 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Data Mining: Applying data mining
Data Mining: Applying data miningData Mining: Applying data mining
Data Mining: Applying data mining
 
Data mining-2
Data mining-2Data mining-2
Data mining-2
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
data mining
data miningdata mining
data mining
 
Data miningppt378
Data miningppt378Data miningppt378
Data miningppt378
 
Introduction to data warehouse
Introduction to data warehouseIntroduction to data warehouse
Introduction to data warehouse
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Data Mining based on Hashing Technique
Data Mining based on Hashing TechniqueData Mining based on Hashing Technique
Data Mining based on Hashing Technique
 
Introduction to DataMining
Introduction to DataMiningIntroduction to DataMining
Introduction to DataMining
 
17 cs002
17 cs00217 cs002
17 cs002
 
G045033841
G045033841G045033841
G045033841
 
Data Mining
Data MiningData Mining
Data Mining
 
Project 0th Review
Project 0th ReviewProject 0th Review
Project 0th Review
 
Ghhh
GhhhGhhh
Ghhh
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniques
 
Data mining
Data miningData mining
Data mining
 

Andere mochten auch

Edital de licitação da barragem fronteiras, em crateús ce
Edital de licitação da barragem fronteiras, em crateús ceEdital de licitação da barragem fronteiras, em crateús ce
Edital de licitação da barragem fronteiras, em crateús ceJosé Ripardo
 
Scocsy,Rotterdam,Groep4,Freeke,Finaldoc
Scocsy,Rotterdam,Groep4,Freeke,FinaldocScocsy,Rotterdam,Groep4,Freeke,Finaldoc
Scocsy,Rotterdam,Groep4,Freeke,Finaldocmatthiasfreeke
 
Guia de_derecho_romano
Guia  de_derecho_romanoGuia  de_derecho_romano
Guia de_derecho_romanoMariana Muñoz
 
Forensic science 2011
Forensic science 2011Forensic science 2011
Forensic science 2011Bruno Mmassy
 

Andere mochten auch (6)

Edital de licitação da barragem fronteiras, em crateús ce
Edital de licitação da barragem fronteiras, em crateús ceEdital de licitação da barragem fronteiras, em crateús ce
Edital de licitação da barragem fronteiras, em crateús ce
 
Scocsy,Rotterdam,Groep4,Freeke,Finaldoc
Scocsy,Rotterdam,Groep4,Freeke,FinaldocScocsy,Rotterdam,Groep4,Freeke,Finaldoc
Scocsy,Rotterdam,Groep4,Freeke,Finaldoc
 
Guia de_derecho_romano
Guia  de_derecho_romanoGuia  de_derecho_romano
Guia de_derecho_romano
 
Forensic science 2011
Forensic science 2011Forensic science 2011
Forensic science 2011
 
Clase Savando a quien mas quieres
Clase Savando a quien mas quieresClase Savando a quien mas quieres
Clase Savando a quien mas quieres
 
Matthewpoland
MatthewpolandMatthewpoland
Matthewpoland
 

Ähnlich wie Chaper 13 trend

Chaper 13 trend, Han & Kamber
Chaper 13 trend, Han & KamberChaper 13 trend, Han & Kamber
Chaper 13 trend, Han & KamberHouw Liong The
 
Chapter 13. Trends and Research Frontiers in Data Mining.ppt
Chapter 13. Trends and Research Frontiers in Data Mining.pptChapter 13. Trends and Research Frontiers in Data Mining.ppt
Chapter 13. Trends and Research Frontiers in Data Mining.pptSubrata Kumer Paul
 
Data Mining Application and Trends
Data Mining Application and TrendsData Mining Application and Trends
Data Mining Application and TrendsVijayasankariS
 
1.3 applications, issues
1.3 applications, issues1.3 applications, issues
1.3 applications, issuesRajendran
 
1.3 applications, issues
1.3 applications, issues1.3 applications, issues
1.3 applications, issuesKrish_ver2
 
DM UNIT_5 ppt for btech final year students
DM UNIT_5 ppt for btech final year studentsDM UNIT_5 ppt for btech final year students
DM UNIT_5 ppt for btech final year studentssriharipatilin
 
Introduction.ppt
Introduction.pptIntroduction.ppt
Introduction.pptbommaiah
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data miningDevakumar Jain
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptPadmajaLaksh
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1DanWooster1
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data miningDatamining Tools
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data miningDataminingTools Inc
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.pptadmsoyadm4
 

Ähnlich wie Chaper 13 trend (20)

Chaper 13 trend, Han & Kamber
Chaper 13 trend, Han & KamberChaper 13 trend, Han & Kamber
Chaper 13 trend, Han & Kamber
 
Data mininng trends
Data mininng trendsData mininng trends
Data mininng trends
 
Chapter 13. Trends and Research Frontiers in Data Mining.ppt
Chapter 13. Trends and Research Frontiers in Data Mining.pptChapter 13. Trends and Research Frontiers in Data Mining.ppt
Chapter 13. Trends and Research Frontiers in Data Mining.ppt
 
Data Mining Application and Trends
Data Mining Application and TrendsData Mining Application and Trends
Data Mining Application and Trends
 
1.3 applications, issues
1.3 applications, issues1.3 applications, issues
1.3 applications, issues
 
1.3 applications, issues
1.3 applications, issues1.3 applications, issues
1.3 applications, issues
 
DM UNIT_5 ppt for btech final year students
DM UNIT_5 ppt for btech final year studentsDM UNIT_5 ppt for btech final year students
DM UNIT_5 ppt for btech final year students
 
Chapter 1. Introduction.ppt
Chapter 1. Introduction.pptChapter 1. Introduction.ppt
Chapter 1. Introduction.ppt
 
Introduction.ppt
Introduction.pptIntroduction.ppt
Introduction.ppt
 
Introduction
IntroductionIntroduction
Introduction
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.ppt
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
 
data mining
data miningdata mining
data mining
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 

Mehr von Houw Liong The

Museumgeologi 130427165857-phpapp02
Museumgeologi 130427165857-phpapp02Museumgeologi 130427165857-phpapp02
Museumgeologi 130427165857-phpapp02Houw Liong The
 
Research in data mining
Research in data miningResearch in data mining
Research in data miningHouw Liong The
 
Fisika & komputasi cerdas
Fisika & komputasi cerdasFisika & komputasi cerdas
Fisika & komputasi cerdasHouw Liong The
 
Sharma : social networks
Sharma : social networksSharma : social networks
Sharma : social networksHouw Liong The
 
Introduction to-graph-theory-1204617648178088-2
Introduction to-graph-theory-1204617648178088-2Introduction to-graph-theory-1204617648178088-2
Introduction to-graph-theory-1204617648178088-2Houw Liong The
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberHouw Liong The
 
Chapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & KamberChapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & KamberHouw Liong The
 
Web & text mining lecture10
Web & text mining lecture10Web & text mining lecture10
Web & text mining lecture10Houw Liong The
 
Graph mining seminar_2009
Graph mining seminar_2009Graph mining seminar_2009
Graph mining seminar_2009Houw Liong The
 
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningHouw Liong The
 
Chapter 09 classification advanced
Chapter 09 classification advancedChapter 09 classification advanced
Chapter 09 classification advancedHouw Liong The
 
System dynamics prof nagurney
System dynamics prof nagurneySystem dynamics prof nagurney
System dynamics prof nagurneyHouw Liong The
 
System dynamics math representation
System dynamics math representationSystem dynamics math representation
System dynamics math representationHouw Liong The
 

Mehr von Houw Liong The (20)

Museumgeologi 130427165857-phpapp02
Museumgeologi 130427165857-phpapp02Museumgeologi 130427165857-phpapp02
Museumgeologi 130427165857-phpapp02
 
Space weather
Space weather Space weather
Space weather
 
Research in data mining
Research in data miningResearch in data mining
Research in data mining
 
Indonesia
IndonesiaIndonesia
Indonesia
 
Canfis
CanfisCanfis
Canfis
 
Climate Change
Climate ChangeClimate Change
Climate Change
 
Space Weather
Space Weather Space Weather
Space Weather
 
Fisika komputasi
Fisika komputasiFisika komputasi
Fisika komputasi
 
Fisika & komputasi cerdas
Fisika & komputasi cerdasFisika & komputasi cerdas
Fisika & komputasi cerdas
 
Climate model
Climate modelClimate model
Climate model
 
Sharma : social networks
Sharma : social networksSharma : social networks
Sharma : social networks
 
Introduction to-graph-theory-1204617648178088-2
Introduction to-graph-theory-1204617648178088-2Introduction to-graph-theory-1204617648178088-2
Introduction to-graph-theory-1204617648178088-2
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
 
Chapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & KamberChapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & Kamber
 
Web & text mining lecture10
Web & text mining lecture10Web & text mining lecture10
Web & text mining lecture10
 
Graph mining seminar_2009
Graph mining seminar_2009Graph mining seminar_2009
Graph mining seminar_2009
 
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text mining
 
Chapter 09 classification advanced
Chapter 09 classification advancedChapter 09 classification advanced
Chapter 09 classification advanced
 
System dynamics prof nagurney
System dynamics prof nagurneySystem dynamics prof nagurney
System dynamics prof nagurney
 
System dynamics math representation
System dynamics math representationSystem dynamics math representation
System dynamics math representation
 

Kürzlich hochgeladen

Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 

Kürzlich hochgeladen (20)

Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 

Chaper 13 trend

  • 1. Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 13 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University ©2011 Han, Kamber & Pei. All rights reserved.
  • 2.
  • 3. Chapter 13: Data Mining Trends and Research Frontiers  Mining Complex Types of Data  Other Methodologies of Data Mining  Data Mining Applications  Data Mining and Society  Data Mining Trends  Summary 3
  • 4. Mining Complex Types of Data  Mining Sequence Data  Mining Time Series  Mining Symbolic Sequences  Mining Biological Sequences  Mining Graphs and Networks  Mining Other Kinds of Data 4
  • 5. Mining Sequence Data  Similarity Search in Time Series Data   Regression and Trend Analysis in Time-Series Data   Feature-based vs. sequence-distance-based vs. model-based Alignment of Biological Sequences   GSP, PrefixSpan, constraint-based sequential pattern mining Sequence Classification   long term + cyclic + seasonal variation + random movements Sequential Pattern Mining in Symbolic Sequences   Subsequence match, dimensionality reduction, query-based similarity search, motif-based similarity search Pair-wise vs. multi-sequence alignment, substitution matirces, BLAST Hidden Markov Model for Biological Sequence Analysis  Markov chain vs. hidden Markov models, forward vs. Viterbi vs. BaumWelch algorithms 5
  • 6. Mining Graphs and Networks  Graph Pattern Mining   Statistical Modeling of Networks   Frequent subgraph patterns, closed graph patterns, gSpan vs. CloseGraph Small world phenomenon, power law (log-tail) distribution, densification Clustering and Classification of Graphs and Homogeneous Networks    Clustering: Fast Modularity vs. SCAN Classification: model vs. pattern-based mining Clustering, Ranking and Classification of Heterogeneous Networks   RankClus, RankClass, and meta path-based, user-guided methodology Role Discovery and Link Prediction in Information Networks  PathPredict  Similarity Search and OLAP in Information Networks: PathSim, GraphCube  Evolution of Social and Information Networks: EvoNetClus 6
  • 7. Mining Other Kinds of Data  Mining Spatial Data   Mining Spatiotemporal and Moving Object Data   Topic modeling, i-topic model, integration with geo- and networked data Mining Web Data   Social media data, geo-tagged spatial clustering, periodicity analysis, … Mining Text Data   Applications: healthcare, air-traffic control, flood simulation Mining Multimedia Data   Spatiotemporal data mining, trajectory mining, periodica, swarm, … Mining Cyber-Physical System Data   Spatial frequent/co-located patterns, spatial clustering and classification Web content, web structure, and web usage mining Mining Data Streams  Dynamics, one-pass, patterns, clustering, classification, outlier detection 7
  • 8. Chapter 13: Data Mining Trends and Research Frontiers  Mining Complex Types of Data  Other Methodologies of Data Mining  Data Mining Applications  Data Mining and Society  Data Mining Trends  Summary 8
  • 9. Other Methodologies of Data Mining  Statistical Data Mining  Views on Data Mining Foundations  Visual and Audio Data Mining 9
  • 10. Major Statistical Data Mining Methods  Regression  Generalized Linear Model  Analysis of Variance  Mixed-Effect Models  Factor Analysis  Discriminant Analysis  Survival Analysis 10
  • 11. Statistical Data Mining (1)  There are many well-established statistical techniques for data analysis, particularly for numeric data  applied extensively to data from scientific experiments and data from economics and the social sciences Regression  predict the value of a response (dependent) variable from one or more predictor (independent) variables where the variables are numeric  forms of regression: linear, multiple, weighted, polynomial, nonparametric, and robust  11
  • 12. Scientific and Statistical Data Mining (2)   Generalized linear models  allow a categorical response variable (or some transformation of it) to be related to a set of predictor variables  similar to the modeling of a numeric response variable using linear regression  include logistic regression and Poisson regression Mixed-effect models For analyzing grouped data, i.e. data that can be classified according to one or more grouping variables  Typically describe relationships between a response variable and some covariates in data grouped according to one or more factors  12
  • 13. Scientific and Statistical Data Mining (3)  Regression trees   Binary trees used for classification and prediction Similar to decision trees:Tests are performed at the internal nodes In a regression tree the mean of the objective attribute is computed and used as the predicted value Analysis of variance    Analyze experimental data for two or more populations described by a numeric response variable and one or more categorical variables (factors) 13
  • 14. Statistical Data Mining (4)   Factor analysis  determine which variables are combined to generate a given factor  e.g., for many psychiatric data, one can indirectly measure other quantities (such as test scores) that reflect the factor of interest Discriminant analysis  predict a categorical response variable, commonly used in social science  Attempts to determine several discriminant functions (linear combinations of the independent variables) that discriminate among the groups defined by the response variable www.spss.com/datamine/factor.htm 14
  • 15. Statistical Data Mining (5)  Time series: many methods such as autoregression, ARIMA (Autoregressive integrated moving-average modeling), long memory time-series modeling  Quality control: displays group summary charts  Survival analysis Predicts the probability that a patient undergoing a medical treatment would survive at least to time t (life span prediction)  15
  • 16. Other Methodologies of Data Mining  Statistical Data Mining  Views on Data Mining Foundations  Visual and Audio Data Mining 16
  • 17. Views on Data Mining Foundations (I)  Data reduction    Basis of data mining: Reduce data representation Trades accuracy for speed in response Data compression   Basis of data mining: Compress the given data by encoding in terms of bits, association rules, decision trees, clusters, etc. Probability and statistical theory  Basis of data mining: Discover joint probability distributions of random variables 17
  • 18. Views on Data Mining Foundations (II)  Microeconomic view   A view of utility: Finding patterns that are interesting only to the extent in that they can be used in the decision-making process of some enterprise Pattern Discovery and Inductive databases     Basis of data mining: Discover patterns occurring in the database, such as associations, classification models, sequential patterns, etc. Data mining is the problem of performing inductive logic on databases The task is to query the data and the theory (i.e., patterns) of the database Popular among many researchers in database systems 18
  • 19. Other Methodologies of Data Mining  Statistical Data Mining  Views on Data Mining Foundations  Visual and Audio Data Mining 19
  • 20. Visual Data Mining   Visualization: Use of computer graphics to create visual images which aid in the understanding of complex, often massive representations of data Visual Data Mining: discovering implicit but useful knowledge from large data sets using visualization techniques Multimedia Human Computer Systems Computer Graphics Interfaces Visual Data Mining High Pattern Performance Recognitio Computing n 20
  • 21. Visualization  Purpose of Visualization      Gain insight into an information space by mapping data onto graphical primitives Provide qualitative overview of large data sets Search for patterns, trends, structure, irregularities, relationships among data. Help find interesting regions and suitable parameters for further quantitative analysis. Provide a visual proof of computer representations derived 21
  • 22. Visual Data Mining & Data Visualization   Integration of visualization and data mining  data visualization  data mining result visualization  data mining process visualization  interactive visual data mining Data visualization  Data in a database or data warehouse can be viewed  at different levels of abstraction  as different combinations of attributes or dimensions  Data can be presented in various visual forms 22
  • 23. Data Mining Result Visualization   Presentation of the results or knowledge obtained from data mining in visual forms Examples  Scatter plots and boxplots (obtained from descriptive data mining)  Decision trees  Association rules  Clusters  Outliers  Generalized rules 23
  • 24. Boxplots from Statsoft: Multiple Variable Combinations 24
  • 25. Visualization of Data Mining Results in SAS Enterprise Miner: Scatter Plots 25
  • 26. Visualization of Association Rules in SGI/MineSet 3.0 26
  • 27. Visualization of a Decision Tree in SGI/MineSet 3.0 27
  • 28. Visualization of Cluster Grouping in IBM Intelligent Miner 28
  • 29. Data Mining Process Visualization  Presentation of the various processes of data mining in visual forms so that users can see  Data extraction process  Where the data is extracted  How the data is cleaned, integrated, preprocessed, and mined  Method selected for data mining  Where the results are stored  How they may be viewed 29
  • 30. Visualization of Data Mining Processes by Clementine See your solution discovery process clearly Understand variations with visualized data 30
  • 31. Interactive Visual Data Mining   Using visualization tools in the data mining process to help users make smart data mining decisions Example   Display the data distribution in a set of attributes using colored sectors or columns (depending on whether the whole space is represented by either a circle or a set of columns) Use the display to which sector should first be selected for classification and where a good split point for this sector may be 31
  • 33. Audio Data Mining      Uses audio signals to indicate the patterns of data or the features of data mining results An interesting alternative to visual mining An inverse task of mining audio (such as music) databases which is to find patterns from audio data Visual data mining may disclose interesting patterns using graphical displays, but requires users to concentrate on watching patterns Instead, transform patterns into sound and music and listen to pitches, rhythms, tune, and melody in order to identify anything interesting or unusual 33
  • 34. Chapter 13: Data Mining Trends and Research Frontiers  Mining Complex Types of Data  Other Methodologies of Data Mining  Data Mining Applications  Data Mining and Society  Data Mining Trends  Summary 34
  • 35. Data Mining Applications   Data mining: A young discipline with broad and diverse applications  There still exists a nontrivial gap between generic data mining methods and effective and scalable data mining tools for domain-specific applications Some application domains (briefly discussed here)  Data Mining for Financial data analysis  Data Mining for Retail and Telecommunication Industries  Data Mining in Science and Engineering  Data Mining for Intrusion Detection and Prevention  Data Mining and Recommender Systems 35
  • 36. Data Mining for Financial Data Analysis (I)   Financial data collected in banks and financial institutions are often relatively complete, reliable, and of high quality Design and construction of data warehouses for multidimensional data analysis and data mining  View the debt and revenue changes by month, by region, by sector, and by other factors Access statistical information such as max, min, total, average, trend, etc. Loan payment prediction/consumer credit policy analysis  feature selection and attribute relevance ranking  Loan payment performance  Consumer credit rating   36
  • 37. Data Mining for Financial Data Analysis (II)   Classification and clustering of customers for targeted marketing  multidimensional segmentation by nearest-neighbor, classification, decision trees, etc. to identify customer groups or associate a new customer to an appropriate customer group Detection of money laundering and other financial crimes  integration of from multiple DBs (e.g., bank transactions, federal/state crime history DBs)  Tools: data visualization, linkage analysis, classification, clustering tools, outlier analysis, and sequential pattern analysis tools (find unusual access sequences) 37
  • 38. Data Mining for Retail & Telcomm. Industries (I)   Retail industry: huge amounts of data on sales, customer shopping history, e-commerce, etc. Applications of retail data mining  Identify customer buying behaviors  Discover customer shopping patterns and trends  Improve the quality of customer service  Achieve better customer retention and satisfaction  Enhance goods consumption ratios   Design more effective goods transportation and distribution policies Telcomm. and many other industries: Share many similar goals and expectations of retail data mining 38
  • 39. Data Mining Practice for Retail Industry   Design and construction of data warehouses Multidimensional analysis of sales, customers, products, time, and region  Analysis of the effectiveness of sales campaigns  Customer retention: Analysis of customer loyalty    Use customer loyalty card information to register sequences of purchases of particular customers Use sequential pattern mining to investigate changes in customer consumption or loyalty Suggest adjustments on the pricing and variety of goods  Product recommendation and cross-reference of items  Fraudulent analysis and the identification of usual patterns  Use of visualization tools in data analysis 39
  • 40. Data Mining in Science and Engineering  Data warehouses and data preprocessing   Mining complex data types   Resolving inconsistencies or incompatible data collected in diverse environments and different periods (e.g. eco-system studies) Spatiotemporal, biological, diverse semantics and relationships Graph-based and network-based mining  Links, relationships, data flow, etc.  Visualization tools and domain-specific knowledge  Other issues   Data mining in social sciences and social studies: text and social media Data mining in computer science: monitoring systems, software bugs, network intrusion 40
  • 41. Data Mining for Intrusion Detection and Prevention  Majority of intrusion detection and prevention systems use    Signature-based detection: use signatures, attack patterns that are preconfigured and predetermined by domain experts Anomaly-based detection: build profiles (models of normal behavior) and detect those that are substantially deviate from the profiles What data mining can help   New data mining algorithms for intrusion detection Association, correlation, and discriminative pattern analysis help select and build discriminative classifiers  Analysis of stream data: outlier detection, clustering, model shifting  Distributed data mining  Visualization and querying tools 41
  • 42. Data Mining and Recommender Systems    Recommender systems: Personalization, making product recommendations that are likely to be of interest to a user Approaches: Content-based, collaborative, or their hybrid  Content-based: Recommends items that are similar to items the user preferred or queried in the past  Collaborative filtering: Consider a user's social environment, opinions of other customers who have similar tastes or preferences Data mining and recommender systems  Users C × items S: extract from known to unknown ratings to predict user-item combinations  Memory-based method often uses k-nearest neighbor approach  Model-based method uses a collection of ratings to learn a model (e.g., probabilistic models, clustering, Bayesian networks, etc.)  Hybrid approaches integrate both to improve performance (e.g., using ensemble) 42
  • 43. Chapter 13: Data Mining Trends and Research Frontiers  Mining Complex Types of Data  Other Methodologies of Data Mining  Data Mining Applications  Data Mining and Society  Data Mining Trends  Summary 43
  • 44. Ubiquitous and Invisible Data Mining  Ubiquitous Data Mining    Data mining is used everywhere, e.g., online shopping Ex. Customer relationship management (CRM) Invisible Data Mining      Invisible: Data mining functions are built in daily life operations Ex. Google search: Users may be unaware that they are examining results returned by data Invisible data mining is highly desirable Invisible mining needs to consider efficiency and scalability, user interaction, incorporation of background knowledge and visualization techniques, finding interesting patterns, real-time, … Further work: Integration of data mining into existing business and scientific technologies to provide domain-specific data mining tools 44
  • 45. Privacy, Security and Social Impacts of Data Mining  Many data mining applications do not touch personal data    E.g., meteorology, astronomy, geography, geology, biology, and other scientific and engineering data Many DM studies are on developing scalable algorithms to find general or statistically significant patterns, not touching individuals The real privacy concern: unconstrained access of individual records, especially privacy-sensitive information  Method 1: Removing sensitive IDs associated with the data  Method 2: Data security-enhancing methods    Multi-level security model: permit to access to only authorized level Encryption: e.g., blind signatures, biometric encryption, and anonymous databases (personal information is encrypted and stored at different locations) Method 3: Privacy-preserving data mining methods 45
  • 46. Privacy-Preserving Data Mining   Privacy-preserving (privacy-enhanced or privacy-sensitive) mining:  Obtaining valid mining results without disclosing the underlying sensitive data values  Often needs trade-off between information loss and privacy Privacy-preserving data mining methods:  Randomization (e.g., perturbation): Add noise to the data in order to mask some attribute values of records  K-anonymity and l-diversity: Alter individual records so that they cannot be uniquely identified     k-anonymity: Any given record maps onto at least k other records l-diversity: enforcing intra-group diversity of sensitive values Distributed privacy preservation: Data partitioned and distributed either horizontally, vertically, or a combination of both Downgrading the effectiveness of data mining: The output of data mining may violate privacy  Modify data or mining results, e.g., hiding some association rules or slightly distorting some classification models 46
  • 47. Chapter 13: Data Mining Trends and Research Frontiers  Mining Complex Types of Data  Other Methodologies of Data Mining  Data Mining Applications  Data Mining and Society  Data Mining Trends  Summary 47
  • 48. Trends of Data Mining  Application exploration: Dealing with application-specific problems  Scalable and interactive data mining methods  Integration of data mining with Web search engines, database systems, data warehouse systems and cloud computing systems  Mining social and information networks  Mining spatiotemporal, moving objects and cyber-physical systems  Mining multimedia, text and web data  Mining biological and biomedical data  Data mining with software engineering and system engineering  Visual and audio data mining  Distributed data mining and real-time data stream mining  Privacy protection and information security in data mining 48
  • 49. Chapter 13: Data Mining Trends and Research Frontiers  Mining Complex Types of Data  Other Methodologies of Data Mining  Data Mining Applications  Data Mining and Society  Data Mining Trends  Summary 49
  • 50. Summary   We present a high-level overview of mining complex data types Statistical data mining methods, such as regression, generalized linear models, analysis of variance, etc., are popularly adopted  Researchers also try to build theoretical foundations for data mining  Visual/audio data mining has been popular and effective     Application-based mining integrates domain-specific knowledge with data analysis techniques and provide mission-specific solutions Ubiquitous data mining and invisible data mining are penetrating our data lives Privacy and data security are importance issues in data mining, and privacy-preserving data mining has been developed recently Our discussion on trends in data mining shows that data mining is a promising, young field, with great, strategic importance 50
  • 51. References and Further Reading  The books lists a lot of references for further reading. Here we only list a few books  E. Alpaydin. Introduction to Machine Learning, 2nd ed., MIT Press, 2011        S. Chakrabarti. Mining the Web: Statistical Analysis of Hypertex and Semi-Structured Data . Morgan Kaufmann, 2002 R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification, 2ed., Wiley-Interscience, 2000 D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning about a Highly Connected World. Cambridge University Press, 2010. U. Fayyad, G. Grinstein, and A. Wierse (eds.), Information Visualization in Data Mining and Knowledge Discovery, Morgan Kaufmann, 2001 J. Han, M. Kamber, J. Pei. Data Mining: Concepts and Techniques . Morgan Kaufmann, 3rd ed. 2011 T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed., Springer-Verlag, 2009 D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques . MIT Press, 2009.  B. Liu. Web Data Mining, Springer 2006.  T. M. Mitchell. Machine Learning, McGraw Hill, 1997  M. Newman. Networks: An Introduction. Oxford University Press, 2010.  P.-N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, Wiley, 2005  I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, 2nd ed. 2005 51
  • 52. 52

Hinweis der Redaktion

  1. Mixed-effects models provide a powerful and flexible tool for the analysis of balanced and unbalanced grouped data. These data arise in several areas of investigation and are characterized by the presence of correlation between observations within the same group. Some examples are repeated measures data, longitudinal studies, and nested designs. Classical modeling techniques which assume independence of the observations are not appropriate for grouped data.
  2. How the interactive Clementine knowledge discovery process works See your solution discovery process clearlyThe interactive stream approach to data mining is the key to Clementine's power. Using icons that represent steps in the data mining process, you mine your data by building a stream - a visual map of the process your data flows through. Start by simply dragging a source icon from the object palette onto the Clementine desktop to access your data flow. Then, explore your data visually with graphs. Apply several types of algorithms to build your model by simply placing the appropriate icons onto the desktop to form a stream. Discover knowledge interactivelyData mining with Clementine is a "discovery-driven" process. Work toward a solution by applying your business expertise to select the next step in your stream, based on the discoveries made in the previous step. You can continually adapt or extend initial streams as you work through the solution to your business problem. Easily build and test modelsAll of Clementine's advanced techniques work together to quickly give you the best answer to your business problems. You can build and test numerous models to immediately see which model produces the best result. Or you can even combine models by using the results of one model as input into another model. These "meta-models" consider the initial model's decisions and can improve results substantially. Understand variations in your business with visualized dataPowerful data visualization techniques help you understand key relationships in your data and guide the way to the best results. Spot characteristics and patterns at a glance with Clementine's interactive graphs. Then "query by mouse" to explore these patterns by selecting subsets of data or deriving new variables on the fly from discoveries made within the graph. How Clementine scales to the size of the challengeThe Clementine approach to scaling is unique in the way it aims to scale the complete data mining process to the size of large, challenging datasets. Clementine executes common operations used throughout the data mining process in the database through SQL queries. This process leverages the power of the database for faster processing, enabling you to get better results with large datasets.
  3. Buying patterns, targeted marketing