SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Competitive advantage from Data
Mining: some lessons learnt in the
Information Systems field
Mykola Pechenizkiy, Seppo Puuronen
Department of Computer Science
University of Jyväskylä
Finland
Alexey Tsymbal
Department of Computer Science
Trinity College Dublin
Ireland
PMKD’05 Copenhagen, Denmark August 22-26, 2005
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
OutlineOutline
• Introduction and What is our message?
• Part I: Existing frameworks for DM
– Theory-oriented: Databases; Statistics; Machine learning; etc
– Process-oriented: Fayyad’s, CRISP, Reinartz’s
• Part II: Where we are? – rigor vs. relevance in DM
• Part III: Towards the new framework for DM
research
– DM System as adaptive Information System (IS)
– DM research as IS Development: DM system as artefact
– DM success model: success factors
– KM Challenges in KDD
– One possible reference for new DM research framework
• Further plans and Discussion
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
What isWhat is Data MiningData Mining
Data mining or Knowledge discovery is the process of
finding previously unknown and potentially interesting
patterns and relations in large databases (Fayyad, KDD’96)
Data mining is the emerging science and industry of
applying modern statistical and computational
technologies to the problem of finding useful patterns
hidden within large databases (John 1997)
Intersection of many fields: statistics, AI, machine
learning, databases, neural networks, pattern recognition,
econometrics, etc.
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
H.H. Information SystemsInformation Systems
 H.0 GENERAL
 H.1 MODELS AND PRINCIPLES
 H.2 DATABASE MANAGEMENT
• H.2.0 General
– Security, integrity, and protection
• H.2.8 Database Applications
– Data mining
– Image databases
– Scientific databases
– Spatial databases and GIS
– Statistical databases
• H.2.m Miscellaneous
http://www.acm.org/class/1998/ valid in 2003
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
I. Computing MethodologiesI. Computing Methodologies
 I.5 PATTERN RECOGNITION
• I.5.0 General
• I.5.1 Models
– Deterministic
– Fuzzy set
– Geometric
– Neural nets
– Statistical
– Structural
• I.5.2 Design Methodology
– Classifier design &
evaluation
– Feature evaluation &
selection
– Pattern analysis
• I.5.3 Clustering
– Algorithms
– Similarity measures
• I.5.4 Applications
– Computer vision
– Signal processing
– Text processing
– Waveform analysis
 I.2 ARTIFICIAL INTELLIGENCE
• I.2.0 General
– Cognitive simulation
– Philosophical foundations
• I.2.1 Applications and Expert Systems
• I.2.2 Automatic Programming
• I.2.3 Deduction and Theorem Proving
• I.2.4 Knowledge Representation
Formalisms and Methods
• I.2.5 Programming Languages and
Software
• I.2.6 Learning
– Analogies
– Concept learning
– Connectionism and neural nets
– Induction
– Knowledge acquisition
– Language acquisition
– Parameter learning
• I.2.7 Natural Language Processing
• I.2.m Miscellaneous
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
G. Mathematics of ComputingG. Mathematics of Computing
 G.3 PROBABILITY AND STATISTICS
• Correlation and regression analysis
• Distribution functions
• Experimental design
• Markov processes
• Multivariate statistics
• Nonparametric statistics
• Probabilistic algorithms (including Monte Carlo)
• Statistical computing
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
Our MessageOur Message
• DM is still a technology having great expectations
to enable organizations to take more benefit of
their huge databases.
• There exist some success stories where
organizations have managed to have competitive
advantage of DM.
• Still the strong focus of most DM-researchers in
technology-oriented topics does not support
expanding the scope in less rigorous but
practically very relevant sub-areas.
• Research in the IS discipline has strong traditions
to take into account human and organizational
aspects of systems beside the technical ones.
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
Our MessageOur Message
• Currently the maturation of DM-supporting processes which would
take into account human and organizational aspects is still living its
childhood.
• DM community might benefit, at least from the practical point of
view, looking at some other older sub-areas of IT having traditions to
consider solution-driven concepts with a focus also on human and
organizational aspects.
• The DM community by becoming more amenable to research results
of the IS community might be able to increase its collective
understanding of
– how DM artifacts are developed – conceived, constructed, and
implemented,
– how DM artifacts are used, supported and evolved,
– how DM artifacts impact and are impacted by the contexts in
which they are embedded.
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
Part IPart I
• Existing Frameworks for DM
– Theory-oriented
• Databases;
• Statistics;
• Machine learning;
• Data compression
– Process-oriented
• Fayyad’s
• CRISP-DM
• Reinartz’s
Theory-Oriented FrameworksTheory-Oriented Frameworks
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
Database PerspectiveDatabase Perspective
• DM as application to DBs
– “In the same way business applications are currently supported
using SQL-based API, the KKD applications need to be provided
with application development support.”
– query KDD objects, support for finding NNs, clustering, or
discretization and aggregate operations.
• Inductive databases approach
– query concept should be applied also to data mining and
knowledge discovery tasks
• “there is no such thing as discovery, it is all in the power of the
query language”
– contain not only the data but the theory of the data as well
Imielinski, T., and Mannila, H. 1996, A database perspective on knowledge discovery.
Communications of the ACM, 39(11), 58-64.
Boulicaut, J., Klemettinen, M., and Mannila, H. 1999, Modeling KDD processes within
the inductive database framework. In Proceedings of the First International
Conference on Data Warehousing and Knowledge Discovery, Springer-Verlag,
London, 293-302
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
Reductionism ApproachReductionism Approach
• Two basic Statistical Paradigms
– “Statistical Experiment”
• Fisher’s version, inductive principle of maximum likelihood
• Neyman and Pearson-Wald’s version, inductive behaviour
• Bayesian version, maximum posterior probability
• “Statistical learning from empirical process”
– “Structural Data Analysis”
• SVD
• Data mining ≠ statistics - the issue of computational feasibility has a
much clearer role in data mining than in statistics
– data mining area approaches that emphasize on database integration,
simplicity of use, and the understandability of results
– theoretical framework of statistics does not concern much about data
analysis as a process that includes several steps
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
Machine Learning ApproachMachine Learning Approach
• “let the data suggest a model” can be seen as a
practical alternative to the statistical paradigm “fit a
model to the data”
• Constructive Induction – a learning process, two
intertwined phases: construction of the “best”
representation space and generating hypothesis in the
found space (Michalski & Wnek, 1993).
– Feature transformation (PCA, SVD, Random
Projection)
– Feature selection
– LSI
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
Data Compression ApproachData Compression Approach
– Compress the data set by finding some structure or
knowledge for it, where knowledge is interpreted as a
representation that allows coding the data by using fewer
amount of bits.
– Theories should not be ad hoc that is they should not
overfit the examples used to build it.
– Occam’s razor principle,14th century.
• "when you have two competing models which
make exactly the same predictions, the one that
is simpler is the better".
Mehta, M., Rissanen, J., and Agrawal, R. 1995, MDL-based decision tree pruning.
In U.M. Fayyad, R. Uthurusamy (Eds.) Proceedings of the KDD 1995, AAAI Press,
Montreal, Canada, 216-221.
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
Other Theoretical frameworks for DMOther Theoretical frameworks for DM
• Microeconomic view
– the key point is that data mining is about finding actionable
patterns: the only interest is in patterns that can somehow be used
to increase utility;
– a decision theoretic formulation of this principle: the goal can be
formulated in finding a decision x that tries to maximise utility
function f(x).
Kleinberg, J., Papadimitriou, C., and Raghavan, P. 1998, A microeconomic view of data
mining, Data Mining and Knowledge Discovery 2(4), 311-324
• Philosophy of Science
– logical empiricism, critical rationalism, systems theory
– formism, mechanism, contextualism
– dispersive vs. integrative, analytical vs. synthetic theories
– subjectivist vs. objectivist, nomothetic vs. ideographic,
nominalism vs. realism, voluntarism vs. determinism,
epistemological assumptions
– Explanation, prediction, understanding
Process-Oriented FrameworksProcess-Oriented Frameworks
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
Knowledge discovery as a processKnowledge discovery as a process
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R.,
Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1997.
I
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
CRISP-DMCRISP-DM
http://www.crisp-dm.org/
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
KDD: “Vertical Solutions”KDD: “Vertical Solutions”
Business
Understanding
Data
Understanding
Data
Preparation
Data
Exproration
Data
Mining
Evaluation &
Interpretation
Deployment
Experience accumulation
Reinartz, T. 1999, Focusing Solutions for Data Mining.
LNAI 1623, Berlin Heidelberg.
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
Conclusion on different frameworksConclusion on different frameworks
– Reductionist approach of viewing data mining as statistics has
advantages of the strong background, and easy-formulated
problems.
– The data mining tasks concerning processed like clusterisation,
regression and classification fit easily into these approaches.
– More recent (process-oriented) frameworks address the issues
related to a view of data mining as a process, and its iterative
and interactive nature
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
Part IIPart II
Where we are?
Rigor and Relevance in DM Reseach
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
So, where are we?So, where are we?
• Lin in Wu et al. notices that a new successful
industry (as DM) can follow consecutive phases:
1. discovering a new idea,
2. ensuring its applicability,
3. producing small-scale systems to test the market,
4. better understanding of new technology and
5. producing a fully scaled system.
• At the present moment there are several dozens
of DM systems, none of which can be compared
to the scale of a DBMS system.
– This fact indicates that we are still in the 3rd phase in
the DM area!
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
Rigor vs Relevance in DM ResearchRigor vs Relevance in DM Research
Relevance
Rigor
Relevance Rigor
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
Where is the focus?Where is the focus?
• Still! … speeding-up, scaling-up, and increasing the accuracies of
DM techniques.
• Piatetsky-Shapiro : “we see many papers proposing incremental
refinements in association rules algorithms, but very few papers
describing how the discovered association rules are used”
• Lin claims that the R&D goals of DM are quite different:
– since research is knowledge-oriented while development is
profit-oriented.
– Thus, DM research is concentrated on the development of new
algorithms or their enhancements,
– but the DM developers in domain areas are aware of cost
considerations: investment in research, product development,
marketing, and product support.
• However, we believe that the study of the DM development and DM
use processes is equally important as the technological aspects and
therefore such research activities are likely to emerge within the DM
field.
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
Part IIIPart III
Towards the new framework for
DM research
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
DMS in the Kernel of an OrganizationDMS in the Kernel of an Organization
DM Task(s)
DMS (Artifact)
Organization
Environment
• DM is fundamentally application-oriented area motivated by business
and scientific needs to make sense of mountains of data.
• A DMS is generally used to support or do some task(s) by human
beings in an organizational environment both having their desires
related to DMS.
• Further, the organization has its own environment that has its own
interest related to DMS, e.g. that privacy of people is not violated.
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
The ISs-based paradigm for DMThe ISs-based paradigm for DM
Ives B., Hamilton S., Davis G. (1980). “A Framework for Research in Computer-based MIS”
Management Science, 26(9), 910-934.
“Information systems are powerful instruments for organizational
problem solving through formal information processing”
Lyytinen, K., 1987, “Different perspectives on ISs: problems and solutions.” ACM Computing Surveys, 19(1), 5-46.
User
Environment
IS
Development
Environment
IS
operations
environment
The
Use
Process
The
Development
Process
The
Operation
Process
The Organizational Environment
The External Environment
The
Information
Subsystem
(ISS)
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
DM Artifact DevelopmentDM Artifact Development
DM Artifact
Development
Experimentation
Theory Building
Observation
Adapted from: Nunamaker, W., Chen, M., and Purdin, T. 1990-91, Systems
development in information systems research, Journal of Management
Information Systems, 7(3), 89-106.
A multimethodological approach to the construction of an artefact for DM
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
Research methods in a paper on DMResearch methods in a paper on DM
– Theoretical approach: theory
creating
• Hypothesis, new
algorithm, etc.
– Constructive approach
• Prototype of a DM tool
– Theoretical approach: theory
testing and evaluation
• Artificial, benchmark,
real-world data
• Evaluation techniques
– Conclusion on theory
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
The Action Research and Design ScienceThe Action Research and Design Science
Approach to Artifact CreationApproach to Artifact Creation
Design
Knowledge
Awareness of business
problem
Action planning
Action taking
Conclusion
Business
Knowledge
Artifact Development
Artifact Evaluation
Contextual
Knowledge
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
DM Artifact Use: Success Model 1 of 3DM Artifact Use: Success Model 1 of 3
System
Quality
Information
Quality
Use
User
Satisfaction
Individual
Impact
Organizational
Impact
Service
Quality
Adapted from D&M IS Success Models
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
DM Artifact Use: Success Model 2 ofDM Artifact Use: Success Model 2 of
33
• What are the key factors of successful use and impact of
DMS both at the individual and organizational levels.
1. how the system is used, and also supported and
evolved, and
2. how the system impacts and is impacted by the
contexts in which it is embedded.
Coppock: the failure factors of DM-related projects.
• have nothing to do with the skill of the modeler or the
quality of data.
• But those do include:
1. persons in charge of the project did not formulate
actionable insights,
2. the sponsors of the work did not communicate the
insights derived to key constituents,
3. the results don't agree with institutional truths
the leadership, communication skills and
understanding of the culture of the organization are
not less important than the traditionally emphasized
technological job of turning data into insights
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
DM Artifact Use: Success Model 3 ofDM Artifact Use: Success Model 3 of
33
• Hermiz communicated his beliefs that there are
the four critical success factors for DM projects:
• (1) having a clearly articulated business problem that needs
to be solved and for which DM is a proper tool;
• (2) insuring that the problem being pursued is supported by
the right type of data of sufficient quality and in sufficient
quantity for DM;
• (3) recognizing that DM is a process with many components
and dependencies – the entire project cannot be "managed"
in the traditional sense of the business word;
• (4) planning to learn from the DM process regardless of the
outcome, and clearly understanding, that there is no
guarantee that any given DM project will be successful.
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
KM PerspectiveKM Perspective
• A knowledge-driven approach to enhance the
dynamic integration of DM strategies in knowledge
discovery systems.
• Focus here is on knowledge management aimed to
organise a systematic process of (meta-)knowledge
capture and refinement over time.
– knowledge extracted from data
– the higher-level knowledge required for managing DM
techniques’ selection, combination and application
• Basic knowledge management processes of
– knowledge creation and identification,
representation, collection and organization,
sharing, adaptation, and application
• DEXA’05: TAKMA WS paper&presentation are available
Knowledge
Creation &
Acquisition
Knowledge
Organization &
Storage
Knowledge
Distribution&
Integration
Knowledge
Adaptation &
Application
Knowledge Evaluation, Validation and Refinement
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
New Research Framework for DMNew Research Framework for DM
ResearchResearch
People
Roles
Capabilities
Characteristics
Organizations
Strategy
Structure&Culture
Processes
Technology
Infrastructure
Applications
Communications
Architecture
Development
Capabilities
Environment Knowledge Base
Foundations
Base-level theories
Frameworks
Models
Instantiation
Validation Criteria
Design knowledge
Methodologies
Validation Criteria
(not instantiations
of models but KDD
processes, services,
systems)
Develop/Build
Theories
Artifacts
Justify/
Evaluate
Analytical
Case Study
Experimental
Field Study
Simulation
Assess Refine
(Un-)Successful Applications in
the appropriate environment
Contribution to Knowledge Base
DM Research
ApplicableKnowledge
BusinessNeeds
Relevance Rigor
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
Further WorkFurther Work
• Definition of Relevance concept in DM research
• The revision of the book chapter
• Further work on the new framework for DM
research
• Organization of Workshop or Special Track or
Working conference on
– more social directions in DM research likely with one of
the focuses on IS as a sister discipline.
Few options:
– IRIS Scandinavian Conference on IS is one option
– Next PMKD
– Workshop in Jyväskylä
PMKD’05 Copenhagen, Denmark August 22-26, 2005
Competitive advantage from DM: lessons learnt in the IS field by M.
Thank You!Thank You!
Book chapter draft is available on request from
Mykola Pechenizkiy
Department of Computer Science and Information Systems,
University of Jyväskylä, FINLAND
E-mail: mpechen@cs.jyu.fi
Tel.: +358 14 2602472 Fax: +358 14 260 3011
http://www.cs.jyu.fi/~mpechen
Feedback is very welcome:
• Questions
• Suggestions
• Collaboration

Weitere ähnliche Inhalte

Andere mochten auch

Research Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories MetadataResearch Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories MetadataRicard de la Vega
 
Data mining paper survey for Health Care Support System
Data mining paper survey for Health Care Support SystemData mining paper survey for Health Care Support System
Data mining paper survey for Health Care Support System鴻鈞 王
 
Teaching with Google Books: research, copyright, and data mining
Teaching with Google Books: research, copyright, and data miningTeaching with Google Books: research, copyright, and data mining
Teaching with Google Books: research, copyright, and data miningNathan Rinne
 
Preservaçao digital de tese e dissertaçoes
Preservaçao digital de tese e dissertaçoesPreservaçao digital de tese e dissertaçoes
Preservaçao digital de tese e dissertaçoesRicard de la Vega
 
Social Targeting: Understanding Social Media Data Mining & Analysis
Social Targeting: Understanding Social Media Data Mining & AnalysisSocial Targeting: Understanding Social Media Data Mining & Analysis
Social Targeting: Understanding Social Media Data Mining & AnalysisInfini Graph
 
Google File System
Google File SystemGoogle File System
Google File Systemnadikari123
 
Preprocessing of Academic Data for Mining Association Rule, Presentation @WAD...
Preprocessing of Academic Data for Mining Association Rule, Presentation @WAD...Preprocessing of Academic Data for Mining Association Rule, Presentation @WAD...
Preprocessing of Academic Data for Mining Association Rule, Presentation @WAD...shibbirtanvin
 
Mining the Social Web to Analyze the Impact of Social Media on Socialization,...
Mining the Social Web to Analyze the Impact of Social Media on Socialization,...Mining the Social Web to Analyze the Impact of Social Media on Socialization,...
Mining the Social Web to Analyze the Impact of Social Media on Socialization,...shibbirtanvin
 
Medical Informatics: Computational Analytics in Healthcare
Medical Informatics: Computational Analytics in HealthcareMedical Informatics: Computational Analytics in Healthcare
Medical Informatics: Computational Analytics in HealthcareNUS-ISS
 
Emotion detection from text using data mining and text mining
Emotion detection from text using data mining and text miningEmotion detection from text using data mining and text mining
Emotion detection from text using data mining and text miningSakthi Dasans
 
Clinical Decision Support Systems - Sunil Nair Health Informatics Dalhousie U...
Clinical Decision Support Systems - Sunil Nair Health Informatics Dalhousie U...Clinical Decision Support Systems - Sunil Nair Health Informatics Dalhousie U...
Clinical Decision Support Systems - Sunil Nair Health Informatics Dalhousie U...Sunil Nair
 
Recommender Systems - A Review and Recent Research Trends
Recommender Systems  -  A Review and Recent Research TrendsRecommender Systems  -  A Review and Recent Research Trends
Recommender Systems - A Review and Recent Research TrendsSujoy Bag
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Designsudhakara st
 
Amazon Item-to-Item Recommendations
Amazon Item-to-Item RecommendationsAmazon Item-to-Item Recommendations
Amazon Item-to-Item RecommendationsRoger Chen
 
Data mining by_ashok
Data mining by_ashokData mining by_ashok
Data mining by_ashokAshok Kumar
 

Andere mochten auch (19)

Research Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories MetadataResearch Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories Metadata
 
Panel at AMIA 2013 Conference on big data - The Exposome and the quantified s...
Panel at AMIA 2013 Conference on big data - The Exposome and the quantified s...Panel at AMIA 2013 Conference on big data - The Exposome and the quantified s...
Panel at AMIA 2013 Conference on big data - The Exposome and the quantified s...
 
Data mining paper survey for Health Care Support System
Data mining paper survey for Health Care Support SystemData mining paper survey for Health Care Support System
Data mining paper survey for Health Care Support System
 
Teaching with Google Books: research, copyright, and data mining
Teaching with Google Books: research, copyright, and data miningTeaching with Google Books: research, copyright, and data mining
Teaching with Google Books: research, copyright, and data mining
 
Preservaçao digital de tese e dissertaçoes
Preservaçao digital de tese e dissertaçoesPreservaçao digital de tese e dissertaçoes
Preservaçao digital de tese e dissertaçoes
 
Social Targeting: Understanding Social Media Data Mining & Analysis
Social Targeting: Understanding Social Media Data Mining & AnalysisSocial Targeting: Understanding Social Media Data Mining & Analysis
Social Targeting: Understanding Social Media Data Mining & Analysis
 
Google File System
Google File SystemGoogle File System
Google File System
 
Ymag56 hr
Ymag56 hrYmag56 hr
Ymag56 hr
 
Preprocessing of Academic Data for Mining Association Rule, Presentation @WAD...
Preprocessing of Academic Data for Mining Association Rule, Presentation @WAD...Preprocessing of Academic Data for Mining Association Rule, Presentation @WAD...
Preprocessing of Academic Data for Mining Association Rule, Presentation @WAD...
 
Mining the Social Web to Analyze the Impact of Social Media on Socialization,...
Mining the Social Web to Analyze the Impact of Social Media on Socialization,...Mining the Social Web to Analyze the Impact of Social Media on Socialization,...
Mining the Social Web to Analyze the Impact of Social Media on Socialization,...
 
Data mining
Data mining Data mining
Data mining
 
Medical Informatics: Computational Analytics in Healthcare
Medical Informatics: Computational Analytics in HealthcareMedical Informatics: Computational Analytics in Healthcare
Medical Informatics: Computational Analytics in Healthcare
 
Data mining notes
Data mining notesData mining notes
Data mining notes
 
Emotion detection from text using data mining and text mining
Emotion detection from text using data mining and text miningEmotion detection from text using data mining and text mining
Emotion detection from text using data mining and text mining
 
Clinical Decision Support Systems - Sunil Nair Health Informatics Dalhousie U...
Clinical Decision Support Systems - Sunil Nair Health Informatics Dalhousie U...Clinical Decision Support Systems - Sunil Nair Health Informatics Dalhousie U...
Clinical Decision Support Systems - Sunil Nair Health Informatics Dalhousie U...
 
Recommender Systems - A Review and Recent Research Trends
Recommender Systems  -  A Review and Recent Research TrendsRecommender Systems  -  A Review and Recent Research Trends
Recommender Systems - A Review and Recent Research Trends
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
Amazon Item-to-Item Recommendations
Amazon Item-to-Item RecommendationsAmazon Item-to-Item Recommendations
Amazon Item-to-Item Recommendations
 
Data mining by_ashok
Data mining by_ashokData mining by_ashok
Data mining by_ashok
 

Ähnlich wie Research in data mining

Competitive advantage from Data Mining: some lessons learnt ...
Competitive advantage from Data Mining: some lessons learnt ...Competitive advantage from Data Mining: some lessons learnt ...
Competitive advantage from Data Mining: some lessons learnt ...butest
 
Competitive advantage from Data Mining: some lessons learnt ...
Competitive advantage from Data Mining: some lessons learnt ...Competitive advantage from Data Mining: some lessons learnt ...
Competitive advantage from Data Mining: some lessons learnt ...butest
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.docbutest
 
Machine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesMachine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesRui Pedro Paiva
 
6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data mining6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data miningINFOGAIN PUBLICATION
 
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEDATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEIJDKP
 
Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical Universitybutest
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7CS, NcState
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseVaticle
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataCS, NcState
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptPadmajaLaksh
 
2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheonMark Reynolds
 
Ug 3 1 r19 cse syllabus
Ug 3 1 r19 cse syllabusUg 3 1 r19 cse syllabus
Ug 3 1 r19 cse syllabusSubbuBuddu
 
Network f ountain-cib-w78-2019 v2
Network f ountain-cib-w78-2019 v2Network f ountain-cib-w78-2019 v2
Network f ountain-cib-w78-2019 v2pdemian
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science LandscapePhilip Bourne
 

Ähnlich wie Research in data mining (20)

Competitive advantage from Data Mining: some lessons learnt ...
Competitive advantage from Data Mining: some lessons learnt ...Competitive advantage from Data Mining: some lessons learnt ...
Competitive advantage from Data Mining: some lessons learnt ...
 
Competitive advantage from Data Mining: some lessons learnt ...
Competitive advantage from Data Mining: some lessons learnt ...Competitive advantage from Data Mining: some lessons learnt ...
Competitive advantage from Data Mining: some lessons learnt ...
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.doc
 
Machine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesMachine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and Techniques
 
6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data mining6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data mining
 
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEDATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
 
Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical University
 
Semester V-converted.pdf
Semester V-converted.pdfSemester V-converted.pdf
Semester V-converted.pdf
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7
 
17 cs002
17 cs00217 cs002
17 cs002
 
ml-01x01.pdf
ml-01x01.pdfml-01x01.pdf
ml-01x01.pdf
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge Base
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software Data
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.ppt
 
2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon
 
Mba ii sys
Mba ii sysMba ii sys
Mba ii sys
 
ml-02x01.pdf
ml-02x01.pdfml-02x01.pdf
ml-02x01.pdf
 
Ug 3 1 r19 cse syllabus
Ug 3 1 r19 cse syllabusUg 3 1 r19 cse syllabus
Ug 3 1 r19 cse syllabus
 
Network f ountain-cib-w78-2019 v2
Network f ountain-cib-w78-2019 v2Network f ountain-cib-w78-2019 v2
Network f ountain-cib-w78-2019 v2
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
 

Mehr von Houw Liong The

Museumgeologi 130427165857-phpapp02
Museumgeologi 130427165857-phpapp02Museumgeologi 130427165857-phpapp02
Museumgeologi 130427165857-phpapp02Houw Liong The
 
Fisika komputasi
Fisika komputasiFisika komputasi
Fisika komputasiHouw Liong The
 
Fisika & komputasi cerdas
Fisika & komputasi cerdasFisika & komputasi cerdas
Fisika & komputasi cerdasHouw Liong The
 
Sharma : social networks
Sharma : social networksSharma : social networks
Sharma : social networksHouw Liong The
 
Introduction to-graph-theory-1204617648178088-2
Introduction to-graph-theory-1204617648178088-2Introduction to-graph-theory-1204617648178088-2
Introduction to-graph-theory-1204617648178088-2Houw Liong The
 
Chaper 13 trend, Han & Kamber
Chaper 13 trend, Han & KamberChaper 13 trend, Han & Kamber
Chaper 13 trend, Han & KamberHouw Liong The
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberHouw Liong The
 
Chapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & KamberChapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & KamberHouw Liong The
 
Web & text mining lecture10
Web & text mining lecture10Web & text mining lecture10
Web & text mining lecture10Houw Liong The
 
Graph mining seminar_2009
Graph mining seminar_2009Graph mining seminar_2009
Graph mining seminar_2009Houw Liong The
 
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningHouw Liong The
 
Chapter 09 classification advanced
Chapter 09 classification advancedChapter 09 classification advanced
Chapter 09 classification advancedHouw Liong The
 
System dynamics prof nagurney
System dynamics prof nagurneySystem dynamics prof nagurney
System dynamics prof nagurneyHouw Liong The
 
System dynamics math representation
System dynamics math representationSystem dynamics math representation
System dynamics math representationHouw Liong The
 

Mehr von Houw Liong The (20)

Museumgeologi 130427165857-phpapp02
Museumgeologi 130427165857-phpapp02Museumgeologi 130427165857-phpapp02
Museumgeologi 130427165857-phpapp02
 
Space weather
Space weather Space weather
Space weather
 
Indonesia
IndonesiaIndonesia
Indonesia
 
Canfis
CanfisCanfis
Canfis
 
Climate Change
Climate ChangeClimate Change
Climate Change
 
Space Weather
Space Weather Space Weather
Space Weather
 
Fisika komputasi
Fisika komputasiFisika komputasi
Fisika komputasi
 
Fisika & komputasi cerdas
Fisika & komputasi cerdasFisika & komputasi cerdas
Fisika & komputasi cerdas
 
Climate model
Climate modelClimate model
Climate model
 
Sharma : social networks
Sharma : social networksSharma : social networks
Sharma : social networks
 
Introduction to-graph-theory-1204617648178088-2
Introduction to-graph-theory-1204617648178088-2Introduction to-graph-theory-1204617648178088-2
Introduction to-graph-theory-1204617648178088-2
 
Chaper 13 trend, Han & Kamber
Chaper 13 trend, Han & KamberChaper 13 trend, Han & Kamber
Chaper 13 trend, Han & Kamber
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
 
Chapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & KamberChapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & Kamber
 
Web & text mining lecture10
Web & text mining lecture10Web & text mining lecture10
Web & text mining lecture10
 
Graph mining seminar_2009
Graph mining seminar_2009Graph mining seminar_2009
Graph mining seminar_2009
 
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text mining
 
Chapter 09 classification advanced
Chapter 09 classification advancedChapter 09 classification advanced
Chapter 09 classification advanced
 
System dynamics prof nagurney
System dynamics prof nagurneySystem dynamics prof nagurney
System dynamics prof nagurney
 
System dynamics math representation
System dynamics math representationSystem dynamics math representation
System dynamics math representation
 

KĂźrzlich hochgeladen

Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 

KĂźrzlich hochgeladen (20)

Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 

Research in data mining

  • 1. Competitive advantage from Data Mining: some lessons learnt in the Information Systems field Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland Alexey Tsymbal Department of Computer Science Trinity College Dublin Ireland PMKD’05 Copenhagen, Denmark August 22-26, 2005
  • 2. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. OutlineOutline • Introduction and What is our message? • Part I: Existing frameworks for DM – Theory-oriented: Databases; Statistics; Machine learning; etc – Process-oriented: Fayyad’s, CRISP, Reinartz’s • Part II: Where we are? – rigor vs. relevance in DM • Part III: Towards the new framework for DM research – DM System as adaptive Information System (IS) – DM research as IS Development: DM system as artefact – DM success model: success factors – KM Challenges in KDD – One possible reference for new DM research framework • Further plans and Discussion
  • 3. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. What isWhat is Data MiningData Mining Data mining or Knowledge discovery is the process of finding previously unknown and potentially interesting patterns and relations in large databases (Fayyad, KDD’96) Data mining is the emerging science and industry of applying modern statistical and computational technologies to the problem of finding useful patterns hidden within large databases (John 1997) Intersection of many fields: statistics, AI, machine learning, databases, neural networks, pattern recognition, econometrics, etc.
  • 4. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. H.H. Information SystemsInformation Systems  H.0 GENERAL  H.1 MODELS AND PRINCIPLES  H.2 DATABASE MANAGEMENT • H.2.0 General – Security, integrity, and protection • H.2.8 Database Applications – Data mining – Image databases – Scientific databases – Spatial databases and GIS – Statistical databases • H.2.m Miscellaneous http://www.acm.org/class/1998/ valid in 2003
  • 5. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. I. Computing MethodologiesI. Computing Methodologies  I.5 PATTERN RECOGNITION • I.5.0 General • I.5.1 Models – Deterministic – Fuzzy set – Geometric – Neural nets – Statistical – Structural • I.5.2 Design Methodology – Classifier design & evaluation – Feature evaluation & selection – Pattern analysis • I.5.3 Clustering – Algorithms – Similarity measures • I.5.4 Applications – Computer vision – Signal processing – Text processing – Waveform analysis  I.2 ARTIFICIAL INTELLIGENCE • I.2.0 General – Cognitive simulation – Philosophical foundations • I.2.1 Applications and Expert Systems • I.2.2 Automatic Programming • I.2.3 Deduction and Theorem Proving • I.2.4 Knowledge Representation Formalisms and Methods • I.2.5 Programming Languages and Software • I.2.6 Learning – Analogies – Concept learning – Connectionism and neural nets – Induction – Knowledge acquisition – Language acquisition – Parameter learning • I.2.7 Natural Language Processing • I.2.m Miscellaneous
  • 6. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. G. Mathematics of ComputingG. Mathematics of Computing  G.3 PROBABILITY AND STATISTICS • Correlation and regression analysis • Distribution functions • Experimental design • Markov processes • Multivariate statistics • Nonparametric statistics • Probabilistic algorithms (including Monte Carlo) • Statistical computing
  • 7. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Our MessageOur Message • DM is still a technology having great expectations to enable organizations to take more benefit of their huge databases. • There exist some success stories where organizations have managed to have competitive advantage of DM. • Still the strong focus of most DM-researchers in technology-oriented topics does not support expanding the scope in less rigorous but practically very relevant sub-areas. • Research in the IS discipline has strong traditions to take into account human and organizational aspects of systems beside the technical ones.
  • 8. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Our MessageOur Message • Currently the maturation of DM-supporting processes which would take into account human and organizational aspects is still living its childhood. • DM community might benefit, at least from the practical point of view, looking at some other older sub-areas of IT having traditions to consider solution-driven concepts with a focus also on human and organizational aspects. • The DM community by becoming more amenable to research results of the IS community might be able to increase its collective understanding of – how DM artifacts are developed – conceived, constructed, and implemented, – how DM artifacts are used, supported and evolved, – how DM artifacts impact and are impacted by the contexts in which they are embedded.
  • 9. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Part IPart I • Existing Frameworks for DM – Theory-oriented • Databases; • Statistics; • Machine learning; • Data compression – Process-oriented • Fayyad’s • CRISP-DM • Reinartz’s
  • 11. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Database PerspectiveDatabase Perspective • DM as application to DBs – “In the same way business applications are currently supported using SQL-based API, the KKD applications need to be provided with application development support.” – query KDD objects, support for finding NNs, clustering, or discretization and aggregate operations. • Inductive databases approach – query concept should be applied also to data mining and knowledge discovery tasks • “there is no such thing as discovery, it is all in the power of the query language” – contain not only the data but the theory of the data as well Imielinski, T., and Mannila, H. 1996, A database perspective on knowledge discovery. Communications of the ACM, 39(11), 58-64. Boulicaut, J., Klemettinen, M., and Mannila, H. 1999, Modeling KDD processes within the inductive database framework. In Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery, Springer-Verlag, London, 293-302
  • 12. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Reductionism ApproachReductionism Approach • Two basic Statistical Paradigms – “Statistical Experiment” • Fisher’s version, inductive principle of maximum likelihood • Neyman and Pearson-Wald’s version, inductive behaviour • Bayesian version, maximum posterior probability • “Statistical learning from empirical process” – “Structural Data Analysis” • SVD • Data mining ≠ statistics - the issue of computational feasibility has a much clearer role in data mining than in statistics – data mining area approaches that emphasize on database integration, simplicity of use, and the understandability of results – theoretical framework of statistics does not concern much about data analysis as a process that includes several steps
  • 13. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Machine Learning ApproachMachine Learning Approach • “let the data suggest a model” can be seen as a practical alternative to the statistical paradigm “fit a model to the data” • Constructive Induction – a learning process, two intertwined phases: construction of the “best” representation space and generating hypothesis in the found space (Michalski & Wnek, 1993). – Feature transformation (PCA, SVD, Random Projection) – Feature selection – LSI
  • 14. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Data Compression ApproachData Compression Approach – Compress the data set by finding some structure or knowledge for it, where knowledge is interpreted as a representation that allows coding the data by using fewer amount of bits. – Theories should not be ad hoc that is they should not overfit the examples used to build it. – Occam’s razor principle,14th century. • "when you have two competing models which make exactly the same predictions, the one that is simpler is the better". Mehta, M., Rissanen, J., and Agrawal, R. 1995, MDL-based decision tree pruning. In U.M. Fayyad, R. Uthurusamy (Eds.) Proceedings of the KDD 1995, AAAI Press, Montreal, Canada, 216-221.
  • 15. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Other Theoretical frameworks for DMOther Theoretical frameworks for DM • Microeconomic view – the key point is that data mining is about finding actionable patterns: the only interest is in patterns that can somehow be used to increase utility; – a decision theoretic formulation of this principle: the goal can be formulated in finding a decision x that tries to maximise utility function f(x). Kleinberg, J., Papadimitriou, C., and Raghavan, P. 1998, A microeconomic view of data mining, Data Mining and Knowledge Discovery 2(4), 311-324 • Philosophy of Science – logical empiricism, critical rationalism, systems theory – formism, mechanism, contextualism – dispersive vs. integrative, analytical vs. synthetic theories – subjectivist vs. objectivist, nomothetic vs. ideographic, nominalism vs. realism, voluntarism vs. determinism, epistemological assumptions – Explanation, prediction, understanding
  • 17. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Knowledge discovery as a processKnowledge discovery as a process Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R., Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1997. I
  • 18. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. CRISP-DMCRISP-DM http://www.crisp-dm.org/
  • 19. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. KDD: “Vertical Solutions”KDD: “Vertical Solutions” Business Understanding Data Understanding Data Preparation Data Exproration Data Mining Evaluation & Interpretation Deployment Experience accumulation Reinartz, T. 1999, Focusing Solutions for Data Mining. LNAI 1623, Berlin Heidelberg.
  • 20. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Conclusion on different frameworksConclusion on different frameworks – Reductionist approach of viewing data mining as statistics has advantages of the strong background, and easy-formulated problems. – The data mining tasks concerning processed like clusterisation, regression and classification fit easily into these approaches. – More recent (process-oriented) frameworks address the issues related to a view of data mining as a process, and its iterative and interactive nature
  • 21. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Part IIPart II Where we are? Rigor and Relevance in DM Reseach
  • 22. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. So, where are we?So, where are we? • Lin in Wu et al. notices that a new successful industry (as DM) can follow consecutive phases: 1. discovering a new idea, 2. ensuring its applicability, 3. producing small-scale systems to test the market, 4. better understanding of new technology and 5. producing a fully scaled system. • At the present moment there are several dozens of DM systems, none of which can be compared to the scale of a DBMS system. – This fact indicates that we are still in the 3rd phase in the DM area!
  • 23. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Rigor vs Relevance in DM ResearchRigor vs Relevance in DM Research Relevance Rigor Relevance Rigor
  • 24. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Where is the focus?Where is the focus? • Still! … speeding-up, scaling-up, and increasing the accuracies of DM techniques. • Piatetsky-Shapiro : “we see many papers proposing incremental refinements in association rules algorithms, but very few papers describing how the discovered association rules are used” • Lin claims that the R&D goals of DM are quite different: – since research is knowledge-oriented while development is profit-oriented. – Thus, DM research is concentrated on the development of new algorithms or their enhancements, – but the DM developers in domain areas are aware of cost considerations: investment in research, product development, marketing, and product support. • However, we believe that the study of the DM development and DM use processes is equally important as the technological aspects and therefore such research activities are likely to emerge within the DM field.
  • 25. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Part IIIPart III Towards the new framework for DM research
  • 26. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. DMS in the Kernel of an OrganizationDMS in the Kernel of an Organization DM Task(s) DMS (Artifact) Organization Environment • DM is fundamentally application-oriented area motivated by business and scientific needs to make sense of mountains of data. • A DMS is generally used to support or do some task(s) by human beings in an organizational environment both having their desires related to DMS. • Further, the organization has its own environment that has its own interest related to DMS, e.g. that privacy of people is not violated.
  • 27. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. The ISs-based paradigm for DMThe ISs-based paradigm for DM Ives B., Hamilton S., Davis G. (1980). “A Framework for Research in Computer-based MIS” Management Science, 26(9), 910-934. “Information systems are powerful instruments for organizational problem solving through formal information processing” Lyytinen, K., 1987, “Different perspectives on ISs: problems and solutions.” ACM Computing Surveys, 19(1), 5-46. User Environment IS Development Environment IS operations environment The Use Process The Development Process The Operation Process The Organizational Environment The External Environment The Information Subsystem (ISS)
  • 28. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. DM Artifact DevelopmentDM Artifact Development DM Artifact Development Experimentation Theory Building Observation Adapted from: Nunamaker, W., Chen, M., and Purdin, T. 1990-91, Systems development in information systems research, Journal of Management Information Systems, 7(3), 89-106. A multimethodological approach to the construction of an artefact for DM
  • 29. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Research methods in a paper on DMResearch methods in a paper on DM – Theoretical approach: theory creating • Hypothesis, new algorithm, etc. – Constructive approach • Prototype of a DM tool – Theoretical approach: theory testing and evaluation • Artificial, benchmark, real-world data • Evaluation techniques – Conclusion on theory
  • 30. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. The Action Research and Design ScienceThe Action Research and Design Science Approach to Artifact CreationApproach to Artifact Creation Design Knowledge Awareness of business problem Action planning Action taking Conclusion Business Knowledge Artifact Development Artifact Evaluation Contextual Knowledge
  • 31. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. DM Artifact Use: Success Model 1 of 3DM Artifact Use: Success Model 1 of 3 System Quality Information Quality Use User Satisfaction Individual Impact Organizational Impact Service Quality Adapted from D&M IS Success Models
  • 32. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. DM Artifact Use: Success Model 2 ofDM Artifact Use: Success Model 2 of 33 • What are the key factors of successful use and impact of DMS both at the individual and organizational levels. 1. how the system is used, and also supported and evolved, and 2. how the system impacts and is impacted by the contexts in which it is embedded. Coppock: the failure factors of DM-related projects. • have nothing to do with the skill of the modeler or the quality of data. • But those do include: 1. persons in charge of the project did not formulate actionable insights, 2. the sponsors of the work did not communicate the insights derived to key constituents, 3. the results don't agree with institutional truths the leadership, communication skills and understanding of the culture of the organization are not less important than the traditionally emphasized technological job of turning data into insights
  • 33. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. DM Artifact Use: Success Model 3 ofDM Artifact Use: Success Model 3 of 33 • Hermiz communicated his beliefs that there are the four critical success factors for DM projects: • (1) having a clearly articulated business problem that needs to be solved and for which DM is a proper tool; • (2) insuring that the problem being pursued is supported by the right type of data of sufficient quality and in sufficient quantity for DM; • (3) recognizing that DM is a process with many components and dependencies – the entire project cannot be "managed" in the traditional sense of the business word; • (4) planning to learn from the DM process regardless of the outcome, and clearly understanding, that there is no guarantee that any given DM project will be successful.
  • 34. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. KM PerspectiveKM Perspective • A knowledge-driven approach to enhance the dynamic integration of DM strategies in knowledge discovery systems. • Focus here is on knowledge management aimed to organise a systematic process of (meta-)knowledge capture and refinement over time. – knowledge extracted from data – the higher-level knowledge required for managing DM techniques’ selection, combination and application • Basic knowledge management processes of – knowledge creation and identification, representation, collection and organization, sharing, adaptation, and application • DEXA’05: TAKMA WS paper&presentation are available Knowledge Creation & Acquisition Knowledge Organization & Storage Knowledge Distribution& Integration Knowledge Adaptation & Application Knowledge Evaluation, Validation and Refinement
  • 35. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. New Research Framework for DMNew Research Framework for DM ResearchResearch People Roles Capabilities Characteristics Organizations Strategy Structure&Culture Processes Technology Infrastructure Applications Communications Architecture Development Capabilities Environment Knowledge Base Foundations Base-level theories Frameworks Models Instantiation Validation Criteria Design knowledge Methodologies Validation Criteria (not instantiations of models but KDD processes, services, systems) Develop/Build Theories Artifacts Justify/ Evaluate Analytical Case Study Experimental Field Study Simulation Assess Refine (Un-)Successful Applications in the appropriate environment Contribution to Knowledge Base DM Research ApplicableKnowledge BusinessNeeds Relevance Rigor
  • 36. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Further WorkFurther Work • Definition of Relevance concept in DM research • The revision of the book chapter • Further work on the new framework for DM research • Organization of Workshop or Special Track or Working conference on – more social directions in DM research likely with one of the focuses on IS as a sister discipline. Few options: – IRIS Scandinavian Conference on IS is one option – Next PMKD – Workshop in Jyväskylä
  • 37. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Thank You!Thank You! Book chapter draft is available on request from Mykola Pechenizkiy Department of Computer Science and Information Systems, University of Jyväskylä, FINLAND E-mail: mpechen@cs.jyu.fi Tel.: +358 14 2602472 Fax: +358 14 260 3011 http://www.cs.jyu.fi/~mpechen Feedback is very welcome: • Questions • Suggestions • Collaboration

Hinweis der Redaktion

  1. <number>
  2. <number> ACM classification system for the computing field: DM is a subject of database applications (H.2.8), database management (H.2), and information systems field (H.)
  3. SPSS whitepaper [4] states that “Unless there’s a method, there’s madness”. It is accepted that just by pushing a button someone should not expect useful results to appear. An industry standard to DM projects CRISP-DM is a good initiative and a starting point directed towards the development of DM meta-artifact (methodology to produce DM artifacts). However, in our opinion it is just one guideline, which is too general-level, that every DM developer follows with or without success.
  4. In fact, the study of development and use processes was recognized to be of importance in the IS fields many years ago, and therefore it has been introduced into the different IS frameworks.
  5. Nevertheless, so far in the DM community there exist too few research activities directed towards the study of a DM system as an artifact aimed to enable certain DM tasks in a certain context (Figure 1). In the IS discipline two research paradigms – the behavioral-science paradigm and design-science paradigm – have
  6. The first efforts in that direction are the ones presented in the DM Review magazine [9, 21], referred below. We believe that such efforts should be encouraged in DM research and followed by research-based reports.
  7. Lin in Wu et al. [43] notices that in fact there have been no major impacts of DM on the business world echoed. However, even reporting of existing success stories is important. Giraud-Carrier [18] reported 136 success stories of DM, covering 9 business areas with 30 DM tools or DM vendors referred. Unfortunately, there was no deep analysis provided that would summarize or discover the main success factors and the research should be continued.
  8. In order to distinguish between the knowledge extracted from data and the higher-level knowledge (from the KDS perspective) required for managing techniques’ selection, combination and application we will refer to the latter as meta-knowledge.
  9. <number>