SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
Global
innovative
Leadership Module
Disclaimer> The information and views set out in this publication are those of the author(s) and do not
necessarily reflect the official opinion of the European Union. Neither the European Union institutions
and bodies nor any person acting on their behalf may be held responsible for the use which may be
made of the information contained therein.
Data Mining
The process of analyzing data to discover hidden
patterns and relationships that can help you manage
and improve your business.
What is data mining?
There are two new types of mining:
• Text mining
• Web mining.
They increase both the accuracy and depth of the
insights uncovered through your data mining
efforts.
What types of data are
used in data mining?
Categorical variables:
• Nominal: categories with no ranking (e.g. gender,
race/ethnicity, place of birth, etc.);
• Ordinal: categories with a ranking (e.g.
educational level, income categories, Likert scales
(strongly agree, agree, disagree, strongly
disagree) etc.);
• Continuous: A zero point and equal distance
between values (e.g. age, height, weight, # of
hours studying a day, etc.).
Data types
• Increasing revenues from customers
• Understanding customer segments and
preferences
• Identifying profitable customers and acquiring
new ones
• Improving cross-selling and up-selling
• Retaining customers and increasing loyalty
What business problems
does data mining solve?
• Increasing ROI and reducing marketing campaign
costs
• Detecting fraud, waste, and abuse
• Determining credit risks
• Increasing Web site profitability
• Increasing retail store traffic and optimizing
layouts for increased sales
• Monitoring business performance
What business problems
does data mining solve?
• SPSS data mining products and services ensure
timely, reliable results by supporting the CRoss-
Industry Standard Process for Data Mining
(CRISP-DM).* CRISP-DM provides step-by-step
guidelines, tasks, and objectives for every stage
of the data mining process.
How does the data mining
process work?
• Business understanding: Achieve a clear
understanding of your business challenges
• Data understanding: Determine what data are
available to mine for answers
• Data preparation: Prepare the data in the
appropriate format to answer your business
questions
• Modeling: Design data models to meet your
requirements
• Evaluation: Test your results against the goals of
your project
• Deployment: Make the results of the project
available to decision makers
Six phases in CRISP-DM
• Make sure project stakeholders know that data
mining is not a silver bullet that magically solves
business problems.
• As with any business problem, stakeholders need
to find a solvable problem and work on the
solution.
Set expectations
• Know “who, what, when, where, why, and how”
from a business perspective
• Develop a thorough understanding of the project
parameters: the current business situation, the
primary business objective of the project, the
criteria for success, and who will determine the
success of the project.
Business understanding
Make sure to go over every aspect of the project in
advance to ensure you have what you need for
success:
• Personnel (project sponsor, business, and
technical experts)
• Data sources (access to warehouse or operational
data)
• Computing resources (hardware, platforms)
• Software (data mining and other relevant
software)
Assess the situation and
inventory resources
• List and clarify all of the assumptions you have
made about:
• Data quality (accuracy, availability)
• External factors (economic issues, competition,
technical advances)
• Internal factors (the business problem)
• Models (Is it necessary to understand, describe,
or explain the models to senior management?)
What assumptions are being
made about the project?
• Gather all of the data you will need for your
project.
• A web mining tool will add a deeper level of
insight to the project.
• Up to 80 percent of your data may be hidden in
text documents. A text mining tool to efficiently
search these sources for valuable information.
Make sure the data are
available
Select your data
Decide what data to use for analysis and list the
reasons for your decisions. This involves:
• Performing significance and correlation tests
to determine which fields to include
• Selecting data subsets
• Using sampling techniques to review small
chunks of data for appropriateness
Data preparation
Integrate Data
Joining multiple data tables
Summarization/aggregation of data
Deriving new variables
Data Preparation Phase
Select Data
Attribute subset selection
Rationale for Inclusion/Exclusion
Data sampling
Training/Validation and Test sets
Data Transformation
Using functions such as log
Factor/Principal Components analysis
Normalization/ Discretisation /Binarisation
Clean Data
Handling missing values/Outliers
Data Preparation Phase
• Testing is crucial beforehand after building a
model.
• To create a model, run your modeling tool on the
dataset you have prepared.
• Create a detailed model report that lists the rules
produced, the parameter settings used, the
model’s behavior and interpretation, and any
conclusions about patterns revealed in the data.
Build your model
Select of the appropriate modeling technique
Data pre-processing implications
Attribute independence
Data types/Normalisation/Distributions
Dependent on
Develop a testing regime
Sampling
Verify samples have similar characteristics
and are representative of the population
The Modeling Phase
Build Model
Choose initial parameter settings
Study model behaviour
Sensitivity analysis
Assess the model
Beware of over-fitting
Investigate the error distribution
Identify segments of the state space where
the model is less effective
Iteratively adjust parameter settings
Document reasons of these changes
The Modeling Phase
Validate Model
Human evaluation of results by domain experts
Evaluate usefulness of results from business
perspective
Define control groups
Calculate lift curves
Expected Return on Investment
Review Process
Determine next steps
Potential for deployment
Deployment architecture
Metrics for success of deployment
The Evaluation Phase
• Summarize deployable models or software
results
• Develop and evaluate alternative deployment
plans
• Confirm how the results will be distributed to
recipients
• Determine how to monitor the use of the results
and measure the benefits
• Identify possible problems and pitfalls of
deployment
Deployment: Create a
deployment plan
• To create your final report, first:
• Identify which reports are needed (slides,
management summary, etc.)
• Analyze how well the data mining goals were met
• Identify report recipients
• Outline the structure and content of the report
• Select which discoveries to include
Deployment: create a final
report
• Interview all significant project members about
their experiences
• Interview any end users of your data mining
results about their experiences
• Document and analyze the specific data mining
steps that you took
Review the project
• Knowledge Deployment is specific to objectives
Knowledge Presentation
Deployment within Scoring Engines and
Integration with the current IT infrastructure
XML interfaces to 3rd party tools
Generation of a report
Monitoring and evaluation of effectiveness
• Process deployment/production
• Produce final project report
Document everything along the way
The Deployment Phase
• Look for a tool with a proven record of solving
the business problems your project addresses.
• Choose a tool that you know to be useful in
solving problems within your industry and that
has a successful track record with the types of
applications you’re planning.
Selecting A Data Mining
Tool
Classification
• The process of identifying the group to which an
object belongs by examining characteristics of
the object. In classification, the groups are
defined by an external criterion (contrast with
clustering).
Data Mining Tasks
Clustering
• The process of grouping records based on similarity.
Clustering divides a dataset so that records with
similar content are in the same group, and groups
are as different as possible from each other
(contrast with classification).
Segmentation
Why Segmentation
• Used by e.g. retail and consumer product
companies trying to learn about and describe
their customers' buying habits, gender, age,
income level, etc.
• A valuable approach in Market Research, and
SPSS offers some useful tools to facilitate this
commercial process.
• Factor Analysis - to find patterns within variables
• Categories - use if data doesn’t fit assumptions
for Factor Analysis
• Cluster Analysis - to find patterns between
individuals
• Two-Step Cluster – To use with both categorical
and continuous variables
• Discriminant Analysis - to look for differences
between groups, try to predict target variable
• Answer Tree (decision tree) - combinations of
data, to predict target
Which Test to use?
Thanks for your attention

Weitere ähnliche Inhalte

Was ist angesagt?

Business Policy & Strategy
Business Policy & Strategy Business Policy & Strategy
Business Policy & Strategy
Talha Jalal
 
The Nature of Strategic Management
The Nature of Strategic ManagementThe Nature of Strategic Management
The Nature of Strategic Management
Noel Buensuceso
 

Was ist angesagt? (20)

Marketing assignment presentation
Marketing assignment presentationMarketing assignment presentation
Marketing assignment presentation
 
Business analysis | Gabrielle Rusignuolo
Business analysis | Gabrielle RusignuoloBusiness analysis | Gabrielle Rusignuolo
Business analysis | Gabrielle Rusignuolo
 
Strategic marketing-chapter-1
Strategic marketing-chapter-1Strategic marketing-chapter-1
Strategic marketing-chapter-1
 
hierarchy of strategic intent
hierarchy of strategic intent hierarchy of strategic intent
hierarchy of strategic intent
 
Business policy & strategic management
Business policy & strategic managementBusiness policy & strategic management
Business policy & strategic management
 
Business Policy & Strategy
Business Policy & Strategy Business Policy & Strategy
Business Policy & Strategy
 
Strategic Management: The Ultimate Goal of Strategic Planning
Strategic Management: The Ultimate Goal of Strategic Planning Strategic Management: The Ultimate Goal of Strategic Planning
Strategic Management: The Ultimate Goal of Strategic Planning
 
CSf presentation 5
CSf presentation 5CSf presentation 5
CSf presentation 5
 
Functional strategies
Functional strategiesFunctional strategies
Functional strategies
 
Understanding local networks
Understanding local networksUnderstanding local networks
Understanding local networks
 
SWOT
SWOTSWOT
SWOT
 
Week 3 swot-analysis
Week 3 swot-analysisWeek 3 swot-analysis
Week 3 swot-analysis
 
Strategy formulation
Strategy formulationStrategy formulation
Strategy formulation
 
Presentation on Strategic Management (SM)- MBA
Presentation on Strategic Management (SM)- MBAPresentation on Strategic Management (SM)- MBA
Presentation on Strategic Management (SM)- MBA
 
The Nature of Strategic Management
The Nature of Strategic ManagementThe Nature of Strategic Management
The Nature of Strategic Management
 
Strategic Analysis
Strategic AnalysisStrategic Analysis
Strategic Analysis
 
Strategic intent
Strategic intentStrategic intent
Strategic intent
 
Business strategy formulation
Business strategy formulationBusiness strategy formulation
Business strategy formulation
 
Strategy formulation and SOWT analysis
Strategy formulation and SOWT analysisStrategy formulation and SOWT analysis
Strategy formulation and SOWT analysis
 
competitor-analysis
competitor-analysiscompetitor-analysis
competitor-analysis
 

Ähnlich wie Data mining

Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
Vivastream
 

Ähnlich wie Data mining (20)

Group 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxGroup 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptx
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
BI Knowledge Sharing Session 1
BI Knowledge Sharing Session 1BI Knowledge Sharing Session 1
BI Knowledge Sharing Session 1
 
Data Science in Python.pptx
Data Science in Python.pptxData Science in Python.pptx
Data Science in Python.pptx
 
Big data@work
Big data@workBig data@work
Big data@work
 
KIT601 Unit I.pptx
KIT601 Unit I.pptxKIT601 Unit I.pptx
KIT601 Unit I.pptx
 
Measurement Roadmap
Measurement RoadmapMeasurement Roadmap
Measurement Roadmap
 
MEGA Solution Footprint V5.pptx
MEGA Solution Footprint V5.pptxMEGA Solution Footprint V5.pptx
MEGA Solution Footprint V5.pptx
 
Analytics
AnalyticsAnalytics
Analytics
 
WHAT IS BUSINESS ANALYTICS um hj mnjh nit 1 ppt only kjjn
WHAT IS BUSINESS ANALYTICS um hj mnjh nit 1 ppt only kjjnWHAT IS BUSINESS ANALYTICS um hj mnjh nit 1 ppt only kjjn
WHAT IS BUSINESS ANALYTICS um hj mnjh nit 1 ppt only kjjn
 
Introductions to Business Analytics
Introductions to Business Analytics Introductions to Business Analytics
Introductions to Business Analytics
 
Introduction to Data Analytics - PPM.pptx
Introduction to Data Analytics - PPM.pptxIntroduction to Data Analytics - PPM.pptx
Introduction to Data Analytics - PPM.pptx
 
KPMG_Task2.pptx
KPMG_Task2.pptxKPMG_Task2.pptx
KPMG_Task2.pptx
 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data mining
 
Data Analyst Occupational Brief
Data Analyst Occupational BriefData Analyst Occupational Brief
Data Analyst Occupational Brief
 
1120 track1 grossman
1120 track1 grossman1120 track1 grossman
1120 track1 grossman
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
 
Data Mining Implementation process.pptx
Data Mining Implementation process.pptxData Mining Implementation process.pptx
Data Mining Implementation process.pptx
 
Data architecture around risk management
Data architecture around risk managementData architecture around risk management
Data architecture around risk management
 

Kürzlich hochgeladen

Beyond the Codes_Repositioning towards sustainable development
Beyond the Codes_Repositioning towards sustainable developmentBeyond the Codes_Repositioning towards sustainable development
Beyond the Codes_Repositioning towards sustainable development
Nimot Muili
 
Abortion pills in Jeddah |• +966572737505 ] GET CYTOTEC
Abortion pills in Jeddah |• +966572737505 ] GET CYTOTECAbortion pills in Jeddah |• +966572737505 ] GET CYTOTEC
Abortion pills in Jeddah |• +966572737505 ] GET CYTOTEC
Abortion pills in Riyadh +966572737505 get cytotec
 
Agile Coaching Change Management Framework.pptx
Agile Coaching Change Management Framework.pptxAgile Coaching Change Management Framework.pptx
Agile Coaching Change Management Framework.pptx
alinstan901
 

Kürzlich hochgeladen (15)

Safety T fire missions army field Artillery
Safety T fire missions army field ArtillerySafety T fire missions army field Artillery
Safety T fire missions army field Artillery
 
Strategic Management, Vision Mission, Internal Analsysis
Strategic Management, Vision Mission, Internal AnalsysisStrategic Management, Vision Mission, Internal Analsysis
Strategic Management, Vision Mission, Internal Analsysis
 
Beyond the Codes_Repositioning towards sustainable development
Beyond the Codes_Repositioning towards sustainable developmentBeyond the Codes_Repositioning towards sustainable development
Beyond the Codes_Repositioning towards sustainable development
 
Abortion pills in Jeddah |• +966572737505 ] GET CYTOTEC
Abortion pills in Jeddah |• +966572737505 ] GET CYTOTECAbortion pills in Jeddah |• +966572737505 ] GET CYTOTEC
Abortion pills in Jeddah |• +966572737505 ] GET CYTOTEC
 
Intro_University_Ranking_Introduction.pptx
Intro_University_Ranking_Introduction.pptxIntro_University_Ranking_Introduction.pptx
Intro_University_Ranking_Introduction.pptx
 
BDSM⚡Call Girls in Sector 99 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 99 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 99 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 99 Noida Escorts >༒8448380779 Escort Service
 
International Ocean Transportation p.pdf
International Ocean Transportation p.pdfInternational Ocean Transportation p.pdf
International Ocean Transportation p.pdf
 
internal analysis on strategic management
internal analysis on strategic managementinternal analysis on strategic management
internal analysis on strategic management
 
Dealing with Poor Performance - get the full picture from 3C Performance Mana...
Dealing with Poor Performance - get the full picture from 3C Performance Mana...Dealing with Poor Performance - get the full picture from 3C Performance Mana...
Dealing with Poor Performance - get the full picture from 3C Performance Mana...
 
Day 0- Bootcamp Roadmap for PLC Bootcamp
Day 0- Bootcamp Roadmap for PLC BootcampDay 0- Bootcamp Roadmap for PLC Bootcamp
Day 0- Bootcamp Roadmap for PLC Bootcamp
 
Call Now Pooja Mehta : 7738631006 Door Step Call Girls Rate 100% Satisfactio...
Call Now Pooja Mehta :  7738631006 Door Step Call Girls Rate 100% Satisfactio...Call Now Pooja Mehta :  7738631006 Door Step Call Girls Rate 100% Satisfactio...
Call Now Pooja Mehta : 7738631006 Door Step Call Girls Rate 100% Satisfactio...
 
Agile Coaching Change Management Framework.pptx
Agile Coaching Change Management Framework.pptxAgile Coaching Change Management Framework.pptx
Agile Coaching Change Management Framework.pptx
 
GENUINE Babe,Call Girls IN Baderpur Delhi | +91-8377087607
GENUINE Babe,Call Girls IN Baderpur  Delhi | +91-8377087607GENUINE Babe,Call Girls IN Baderpur  Delhi | +91-8377087607
GENUINE Babe,Call Girls IN Baderpur Delhi | +91-8377087607
 
Call now : 9892124323 Nalasopara Beautiful Call Girls Vasai virar Best Call G...
Call now : 9892124323 Nalasopara Beautiful Call Girls Vasai virar Best Call G...Call now : 9892124323 Nalasopara Beautiful Call Girls Vasai virar Best Call G...
Call now : 9892124323 Nalasopara Beautiful Call Girls Vasai virar Best Call G...
 
Reviewing and summarization of university ranking system to.pptx
Reviewing and summarization of university ranking system  to.pptxReviewing and summarization of university ranking system  to.pptx
Reviewing and summarization of university ranking system to.pptx
 

Data mining

  • 1. Global innovative Leadership Module Disclaimer> The information and views set out in this publication are those of the author(s) and do not necessarily reflect the official opinion of the European Union. Neither the European Union institutions and bodies nor any person acting on their behalf may be held responsible for the use which may be made of the information contained therein.
  • 3. The process of analyzing data to discover hidden patterns and relationships that can help you manage and improve your business. What is data mining?
  • 4. There are two new types of mining: • Text mining • Web mining. They increase both the accuracy and depth of the insights uncovered through your data mining efforts. What types of data are used in data mining?
  • 5. Categorical variables: • Nominal: categories with no ranking (e.g. gender, race/ethnicity, place of birth, etc.); • Ordinal: categories with a ranking (e.g. educational level, income categories, Likert scales (strongly agree, agree, disagree, strongly disagree) etc.); • Continuous: A zero point and equal distance between values (e.g. age, height, weight, # of hours studying a day, etc.). Data types
  • 6. • Increasing revenues from customers • Understanding customer segments and preferences • Identifying profitable customers and acquiring new ones • Improving cross-selling and up-selling • Retaining customers and increasing loyalty What business problems does data mining solve?
  • 7. • Increasing ROI and reducing marketing campaign costs • Detecting fraud, waste, and abuse • Determining credit risks • Increasing Web site profitability • Increasing retail store traffic and optimizing layouts for increased sales • Monitoring business performance What business problems does data mining solve?
  • 8. • SPSS data mining products and services ensure timely, reliable results by supporting the CRoss- Industry Standard Process for Data Mining (CRISP-DM).* CRISP-DM provides step-by-step guidelines, tasks, and objectives for every stage of the data mining process. How does the data mining process work?
  • 9. • Business understanding: Achieve a clear understanding of your business challenges • Data understanding: Determine what data are available to mine for answers • Data preparation: Prepare the data in the appropriate format to answer your business questions • Modeling: Design data models to meet your requirements • Evaluation: Test your results against the goals of your project • Deployment: Make the results of the project available to decision makers Six phases in CRISP-DM
  • 10. • Make sure project stakeholders know that data mining is not a silver bullet that magically solves business problems. • As with any business problem, stakeholders need to find a solvable problem and work on the solution. Set expectations
  • 11. • Know “who, what, when, where, why, and how” from a business perspective • Develop a thorough understanding of the project parameters: the current business situation, the primary business objective of the project, the criteria for success, and who will determine the success of the project. Business understanding
  • 12. Make sure to go over every aspect of the project in advance to ensure you have what you need for success: • Personnel (project sponsor, business, and technical experts) • Data sources (access to warehouse or operational data) • Computing resources (hardware, platforms) • Software (data mining and other relevant software) Assess the situation and inventory resources
  • 13. • List and clarify all of the assumptions you have made about: • Data quality (accuracy, availability) • External factors (economic issues, competition, technical advances) • Internal factors (the business problem) • Models (Is it necessary to understand, describe, or explain the models to senior management?) What assumptions are being made about the project?
  • 14. • Gather all of the data you will need for your project. • A web mining tool will add a deeper level of insight to the project. • Up to 80 percent of your data may be hidden in text documents. A text mining tool to efficiently search these sources for valuable information. Make sure the data are available
  • 15. Select your data Decide what data to use for analysis and list the reasons for your decisions. This involves: • Performing significance and correlation tests to determine which fields to include • Selecting data subsets • Using sampling techniques to review small chunks of data for appropriateness Data preparation
  • 16. Integrate Data Joining multiple data tables Summarization/aggregation of data Deriving new variables Data Preparation Phase
  • 17. Select Data Attribute subset selection Rationale for Inclusion/Exclusion Data sampling Training/Validation and Test sets Data Transformation Using functions such as log Factor/Principal Components analysis Normalization/ Discretisation /Binarisation Clean Data Handling missing values/Outliers Data Preparation Phase
  • 18. • Testing is crucial beforehand after building a model. • To create a model, run your modeling tool on the dataset you have prepared. • Create a detailed model report that lists the rules produced, the parameter settings used, the model’s behavior and interpretation, and any conclusions about patterns revealed in the data. Build your model
  • 19. Select of the appropriate modeling technique Data pre-processing implications Attribute independence Data types/Normalisation/Distributions Dependent on Develop a testing regime Sampling Verify samples have similar characteristics and are representative of the population The Modeling Phase
  • 20. Build Model Choose initial parameter settings Study model behaviour Sensitivity analysis Assess the model Beware of over-fitting Investigate the error distribution Identify segments of the state space where the model is less effective Iteratively adjust parameter settings Document reasons of these changes The Modeling Phase
  • 21. Validate Model Human evaluation of results by domain experts Evaluate usefulness of results from business perspective Define control groups Calculate lift curves Expected Return on Investment Review Process Determine next steps Potential for deployment Deployment architecture Metrics for success of deployment The Evaluation Phase
  • 22. • Summarize deployable models or software results • Develop and evaluate alternative deployment plans • Confirm how the results will be distributed to recipients • Determine how to monitor the use of the results and measure the benefits • Identify possible problems and pitfalls of deployment Deployment: Create a deployment plan
  • 23. • To create your final report, first: • Identify which reports are needed (slides, management summary, etc.) • Analyze how well the data mining goals were met • Identify report recipients • Outline the structure and content of the report • Select which discoveries to include Deployment: create a final report
  • 24. • Interview all significant project members about their experiences • Interview any end users of your data mining results about their experiences • Document and analyze the specific data mining steps that you took Review the project
  • 25. • Knowledge Deployment is specific to objectives Knowledge Presentation Deployment within Scoring Engines and Integration with the current IT infrastructure XML interfaces to 3rd party tools Generation of a report Monitoring and evaluation of effectiveness • Process deployment/production • Produce final project report Document everything along the way The Deployment Phase
  • 26. • Look for a tool with a proven record of solving the business problems your project addresses. • Choose a tool that you know to be useful in solving problems within your industry and that has a successful track record with the types of applications you’re planning. Selecting A Data Mining Tool
  • 27. Classification • The process of identifying the group to which an object belongs by examining characteristics of the object. In classification, the groups are defined by an external criterion (contrast with clustering). Data Mining Tasks
  • 28. Clustering • The process of grouping records based on similarity. Clustering divides a dataset so that records with similar content are in the same group, and groups are as different as possible from each other (contrast with classification).
  • 29. Segmentation Why Segmentation • Used by e.g. retail and consumer product companies trying to learn about and describe their customers' buying habits, gender, age, income level, etc. • A valuable approach in Market Research, and SPSS offers some useful tools to facilitate this commercial process.
  • 30. • Factor Analysis - to find patterns within variables • Categories - use if data doesn’t fit assumptions for Factor Analysis • Cluster Analysis - to find patterns between individuals • Two-Step Cluster – To use with both categorical and continuous variables • Discriminant Analysis - to look for differences between groups, try to predict target variable • Answer Tree (decision tree) - combinations of data, to predict target Which Test to use?
  • 31. Thanks for your attention