SlideShare ist ein Scribd-Unternehmen logo
1 von 26
27/Sep/2008




Data Mining   July 16, 2009        1
Evolution of Database
              technology
YEAR       PURPOSE
1960’s     Network Model, Batch Reports

1970’s     Relational data model, Executive information Systems

1980’s     Application specific DBMS(spatial data, scientific data,
           image data, …)
1990’s     Terabyte Data warehouses, Object Oriented, middleware
           and web technology
2000’s     Business Process

2010’s     Sensor DB systems, DBs on embedded systems, large
           scale pub/ sub systems
                                             Data Mining   July 16, 2009   2
Motivation : Necessity is the
       mother of invention
   Data explosion problem

    ◦ Automated data collection tools and mature database technology
      lead to tremendous amounts of data stored in databases, data
      warehouses and other information repositories
   We are drowning in data, but starving for knowledge!
   Solution: Data warehousing and data mining

    ◦ Extraction of interesting knowledge (rules, regularities, patterns,
      constraints) from data in large databases



                                                  Data Mining   July 16, 2009   3
Why Data Mining?


      Data, Data, Data Every where …

         I can’t find data I need – data is
          scattered over network

         I can’t get the data I need

         I can’t understand the data I
          need

         I can’t use the data I found


                      Data Mining   July 16, 2009   4
   An abundance of data                 This data occupies
     Super Market Scanners, POS
     data
                                           Terabytes - 10^12 bytes
     Credit cards transactions
     Call Center records
                                           Petabytes - 10^15 bytes
     ATM Machines
     Demographic data
                                           Exabytes - 10^18bytes
     Sensor Networks
     Cameras
                                           Zettabytes - 10^21bytes
     Web server logs
     Customer web site trails
                                           Zottabytes-10^24bytes
     Geographic Information System
     National Medical Records             Walmart - 24 Terabytes
     Weather Images



                                                Data Mining   July 16, 2009   5
   Process of sorting through large amounts of data and picking
    out relevant information

   Process of analyzing data from different perspectives and
    summarizing it into useful information

   Discovering hidden value in database

   It is non-trivial process of identifying valid, novel, useful and
    understandable patterns in data

   Extracting or mining knowledge from large amounts of data


                                              Data Mining   July 16, 2009   6
History Notes – Many Names of Data
              Mining

 YEAR            Names                           USES


  1960    Data Fishing, Data     Statisticians
          Dredging
  1990    Data Mining            DB Community, business


  1989    Knowledge Discovery    AI, Machine Learning community
          in databases
Other Names

Data Archaeology, Information Harvesting, Information Discovery,
Knowledge Extraction,


                                                 Data Mining   July 16, 2009   7
Data Warehousing provides the
                            Enterprise with a memory




         Data Mining provides the
        Enterprise with intelligence

July 16, 2009                      Data Mining      8
Why Data Mining?(Cont..)

   Data Warehouse is single, complete and consistent store of data from
    variety of different sources available to end users

   For example, AT and T handles billions of calls per day. Europe's Very
    Long Baseline Interferometer (VLBI) has 16 telescopes, each of which
    produces 1 Gigabit/second of astronomical data over a 25-day
    observation session

   We need data mining for
      Transforming data into useful information to users
      Present data in useful format
      Provide data access to business analyst, Information technology
       professionals



                                                 Data Mining   July 16, 2009   9
Data Mining Process
   Data Mining is the technique used to carry out KDD.

   Data Mining turns data into information and then to knowledge


                             Information




                   Data

                                           Knowledge



                                              Data Mining   July 16, 2009   10
Steps in Data Mining
1. Data cleaning
        To remove noise and inconsistent data
2. Data integration
   To integrate (compile) multiple data
sources
3. Data selection
   Data relevant to analysis is selected
4. Data transformation
   Summary normalization aggregation operations are performed
   (convert data into two dimension form) and consolidate the data



                                           Data Mining   July 16, 2009   11
Steps in Data Mining(Cont..)
5. Data mining
 Intelligent methods are applied to the data to discover
 knowledge or patterns

6. Pattern evaluation
 Evaluation of the interesting patterns by thresholding

7. Knowledge Discovery
 Visualization and presentation methods are used to present
 the mined knowledge to the user.


                                           Data Mining   July 16, 2009   12
Pattern Evaluation
◦ Data mining: the core of
  knowledge discovery
  process.                         Data Mining

                    Task-relevant Data


      Data                   Selection
      Warehouse
Data Cleaning

          Data Integration


        Databases
                                                 Data Mining   July 16, 2009   13
Data Mining Tasks
1. Classification
•   Classification maps data into predefined groups or classes.
•   It may be represented by methods such as decision trees, etc.

Decision tree
 Flow chart like tree structure
 Each node denotes test of
  an attribute value
 Each branch represents
  outcome of test
 Leaves represent classes
  or class distribution.


                                            Data Mining   July 16, 2009   14
2. Regression
Used to map a data item to a real valued prediction variable.
Example. A manager wants to reach a certain level of savings before his
  retirement. Periodically he predicts his retirement savings by current value
  and several past values. He uses a simple linear regressive formula to
  predict the values of savings in future.


3. Prediction
Many real world applications can be seen
predicting future data states based on
past and current data.
Example -   Predicting flooding is difficult problem


                                                         Data Mining   July 16, 2009   15
4. Clustering
Clustering is similar to classification
except that the groups are not predefined.
5. Association Rule
Association refers to uncovering relationship                              1998
among data.
Used in retail sales community to identify the items                       Bread and
(products) that are frequently                                              Jam sell
                                             Zzzz...
purchased together.                                                         together!




                                             Data Mining   July 16, 2009            16
6. Summarization
Summarization of general characteristics or features of target class of
  data.
Data characterization presented in various forms - pie charts, bar
  charts, curves.
Data discrimination comparison of general features of target class of
  data objects with general features of objects from one or a set of
  contrasting classes.
7. Outlier Analysis
Database may contain data objects that do not comply with general
  behavior model of data. These data objects are called as outliers.
Data mining methods discard outliers as noise or exceptions.
In applications such as fraud detection, rare events may be more
  interesting than regularly occurring events.
                                               Data Mining   July 16, 2009   17
Data Mining: Types of Data

   Relational data and transactional data

   Text

   Images, video

   Mixtures of data




                                         Data Mining   July 16, 2009   18
Data Mining Products

   DataMind -- neurOagent
   Information Discovery -- IDIS
   SAS Institute -- SAS/Neuronets




                                      19
                             Data Mining   July 16, 2009
Data Mining Software
   RapidMiner and Weka – Defining data mining process

   Top 8 data mining software in 2008

           Angoss software
           Infor CRM Epiphany
           Portrait Software
           SAS
           SPSS
           ThinkAnalytics
           Unica
           Viscovery


                                            Data Mining   July 16, 2009   20
Application Areas


       Industry            Application
       Finance             Credit Card Analysis
       Insurance           Fraud Analysis
       Telecommunication   Call record analysis




July 16, 2009                Data Mining          21
Applications
   Financial Industry, Banks, Businesses, E-commerce
    ◦ Stock and investment analysis
    ◦ Identify loyal customers and risky customer
    ◦ Predict customer spending

   Database analysis and decision support
    ◦ Market analysis and management
      target marketing, customer relation management, market basket
       analysis.
    ◦ Risk analysis and management
      Forecasting, quality control, competitive analysis
    ◦ Fraud detection and management

                                                   Data Mining   July 16, 2009   22
Data Mining in Usage

1.   Intelligent Miner
    It is IBM data mining product
    Distinct feature is include scalability of its mining algorithm and tight
     integration with IBM DB2 related data base system.


5.   DB Miner
      Developed by DBMiner Technologies Inc.
     Distinct features of DBMiner are Data cube based Online Analytical
     Mining



                                                   Data Mining   July 16, 2009   23
The Telecomm Slice
Product




Household

Telecomm          o ns
              e gi
             R
   Video                 Europe
                  Far East
   Audio        India

            Retail Direct    Special            Sales Channel




                                             Data Mining   July 16, 2009   24
Conclusion
   Data mining: discovering interesting patterns from large amounts of
    data
   A KDD process includes data cleaning, data integration, data
    selection, transformation, data mining, pattern evaluation, and
    knowledge presentation
   Mining can be performed in a variety of information repositories
   Data mining functionalities: characterization,               discrimination,
    association, classification, clustering, outlier etc




                                                 Data Mining   July 16, 2009       25
Thank you !!!
         Data Mining   July 16, 2009   26

Weitere ähnliche Inhalte

Was ist angesagt?

Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
Saif Ullah
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
vivekjv
 
3.5 Exploratory Data Analysis
3.5 Exploratory Data Analysis3.5 Exploratory Data Analysis
3.5 Exploratory Data Analysis
mlong24
 

Was ist angesagt? (20)

Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
Data warehousing and online analytical processing
Data warehousing and online analytical processingData warehousing and online analytical processing
Data warehousing and online analytical processing
 
Introduction data mining
Introduction data miningIntroduction data mining
Introduction data mining
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Mining & Applications
Data Mining & ApplicationsData Mining & Applications
Data Mining & Applications
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
08. Mining Type Of Complex Data
08. Mining Type Of Complex Data08. Mining Type Of Complex Data
08. Mining Type Of Complex Data
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture Notes
 
Data Mining
Data MiningData Mining
Data Mining
 
Data Mining: Concepts and techniques: Chapter 13 trend
Data Mining: Concepts and techniques: Chapter 13 trendData Mining: Concepts and techniques: Chapter 13 trend
Data Mining: Concepts and techniques: Chapter 13 trend
 
Data mining
Data miningData mining
Data mining
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an Introduction
 
3.5 Exploratory Data Analysis
3.5 Exploratory Data Analysis3.5 Exploratory Data Analysis
3.5 Exploratory Data Analysis
 
Data Mining : Concepts
Data Mining : ConceptsData Mining : Concepts
Data Mining : Concepts
 
Data analytics vs. Data analysis
Data analytics vs. Data analysisData analytics vs. Data analysis
Data analytics vs. Data analysis
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Data mining
Data miningData mining
Data mining
 

Andere mochten auch

Big Data v Data Mining
Big Data v Data MiningBig Data v Data Mining
Big Data v Data Mining
University of Hertfordshire
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
smj
 

Andere mochten auch (17)

Big Data v Data Mining
Big Data v Data MiningBig Data v Data Mining
Big Data v Data Mining
 
Data mining and_big_data_web
Data mining and_big_data_webData mining and_big_data_web
Data mining and_big_data_web
 
Lecture 01 Data Mining
Lecture 01 Data MiningLecture 01 Data Mining
Lecture 01 Data Mining
 
Text mining presentation in Data mining Area
Text mining presentation in Data mining AreaText mining presentation in Data mining Area
Text mining presentation in Data mining Area
 
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
 
Machine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web DataMachine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web Data
 
Analytics and Data Mining Industry Overview
Analytics and Data Mining Industry OverviewAnalytics and Data Mining Industry Overview
Analytics and Data Mining Industry Overview
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
 
Data warehousing and data mining
Data warehousing and data miningData warehousing and data mining
Data warehousing and data mining
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 
Data Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsData Mining and Business Intelligence Tools
Data Mining and Business Intelligence Tools
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
Ch 1 Intro to Data Mining
Ch 1 Intro to Data MiningCh 1 Intro to Data Mining
Ch 1 Intro to Data Mining
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Data mining
Data miningData mining
Data mining
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 

Ähnlich wie Data Mining Overview

Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
Young Alista
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
Harry Potter
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
James Wong
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
Fraboni Ec
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
Luis Goldster
 

Ähnlich wie Data Mining Overview (20)

Data mining concepts
Data mining conceptsData mining concepts
Data mining concepts
 
Data mining
Data miningData mining
Data mining
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
 
data mining
data miningdata mining
data mining
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
Chapter 1. Introduction.ppt
Chapter 1. Introduction.pptChapter 1. Introduction.ppt
Chapter 1. Introduction.ppt
 
Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019
 
D
DD
D
 
Data Mining - Presentation.pptx
Data Mining - Presentation.pptxData Mining - Presentation.pptx
Data Mining - Presentation.pptx
 
isd314-01
isd314-01isd314-01
isd314-01
 
Data Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesData Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notes
 
18231979 Data Mining
18231979 Data Mining18231979 Data Mining
18231979 Data Mining
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 

Kürzlich hochgeladen

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Kürzlich hochgeladen (20)

Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 

Data Mining Overview

  • 1. 27/Sep/2008 Data Mining July 16, 2009 1
  • 2. Evolution of Database technology YEAR PURPOSE 1960’s Network Model, Batch Reports 1970’s Relational data model, Executive information Systems 1980’s Application specific DBMS(spatial data, scientific data, image data, …) 1990’s Terabyte Data warehouses, Object Oriented, middleware and web technology 2000’s Business Process 2010’s Sensor DB systems, DBs on embedded systems, large scale pub/ sub systems Data Mining July 16, 2009 2
  • 3. Motivation : Necessity is the mother of invention  Data explosion problem ◦ Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories  We are drowning in data, but starving for knowledge!  Solution: Data warehousing and data mining ◦ Extraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases Data Mining July 16, 2009 3
  • 4. Why Data Mining?  Data, Data, Data Every where …  I can’t find data I need – data is scattered over network  I can’t get the data I need  I can’t understand the data I need  I can’t use the data I found Data Mining July 16, 2009 4
  • 5. An abundance of data  This data occupies  Super Market Scanners, POS data  Terabytes - 10^12 bytes  Credit cards transactions  Call Center records  Petabytes - 10^15 bytes  ATM Machines  Demographic data  Exabytes - 10^18bytes  Sensor Networks  Cameras  Zettabytes - 10^21bytes  Web server logs  Customer web site trails  Zottabytes-10^24bytes  Geographic Information System  National Medical Records  Walmart - 24 Terabytes  Weather Images Data Mining July 16, 2009 5
  • 6. Process of sorting through large amounts of data and picking out relevant information  Process of analyzing data from different perspectives and summarizing it into useful information  Discovering hidden value in database  It is non-trivial process of identifying valid, novel, useful and understandable patterns in data  Extracting or mining knowledge from large amounts of data Data Mining July 16, 2009 6
  • 7. History Notes – Many Names of Data Mining YEAR Names USES 1960 Data Fishing, Data Statisticians Dredging 1990 Data Mining DB Community, business 1989 Knowledge Discovery AI, Machine Learning community in databases Other Names Data Archaeology, Information Harvesting, Information Discovery, Knowledge Extraction, Data Mining July 16, 2009 7
  • 8. Data Warehousing provides the Enterprise with a memory Data Mining provides the Enterprise with intelligence July 16, 2009 Data Mining 8
  • 9. Why Data Mining?(Cont..)  Data Warehouse is single, complete and consistent store of data from variety of different sources available to end users  For example, AT and T handles billions of calls per day. Europe's Very Long Baseline Interferometer (VLBI) has 16 telescopes, each of which produces 1 Gigabit/second of astronomical data over a 25-day observation session  We need data mining for  Transforming data into useful information to users  Present data in useful format  Provide data access to business analyst, Information technology professionals Data Mining July 16, 2009 9
  • 10. Data Mining Process  Data Mining is the technique used to carry out KDD.  Data Mining turns data into information and then to knowledge Information Data Knowledge Data Mining July 16, 2009 10
  • 11. Steps in Data Mining 1. Data cleaning To remove noise and inconsistent data 2. Data integration To integrate (compile) multiple data sources 3. Data selection Data relevant to analysis is selected 4. Data transformation Summary normalization aggregation operations are performed (convert data into two dimension form) and consolidate the data Data Mining July 16, 2009 11
  • 12. Steps in Data Mining(Cont..) 5. Data mining Intelligent methods are applied to the data to discover knowledge or patterns 6. Pattern evaluation Evaluation of the interesting patterns by thresholding 7. Knowledge Discovery Visualization and presentation methods are used to present the mined knowledge to the user. Data Mining July 16, 2009 12
  • 13. Pattern Evaluation ◦ Data mining: the core of knowledge discovery process. Data Mining Task-relevant Data Data Selection Warehouse Data Cleaning Data Integration Databases Data Mining July 16, 2009 13
  • 14. Data Mining Tasks 1. Classification • Classification maps data into predefined groups or classes. • It may be represented by methods such as decision trees, etc. Decision tree  Flow chart like tree structure  Each node denotes test of an attribute value  Each branch represents outcome of test  Leaves represent classes or class distribution. Data Mining July 16, 2009 14
  • 15. 2. Regression Used to map a data item to a real valued prediction variable. Example. A manager wants to reach a certain level of savings before his retirement. Periodically he predicts his retirement savings by current value and several past values. He uses a simple linear regressive formula to predict the values of savings in future. 3. Prediction Many real world applications can be seen predicting future data states based on past and current data. Example - Predicting flooding is difficult problem Data Mining July 16, 2009 15
  • 16. 4. Clustering Clustering is similar to classification except that the groups are not predefined. 5. Association Rule Association refers to uncovering relationship 1998 among data. Used in retail sales community to identify the items Bread and (products) that are frequently Jam sell Zzzz... purchased together. together! Data Mining July 16, 2009 16
  • 17. 6. Summarization Summarization of general characteristics or features of target class of data. Data characterization presented in various forms - pie charts, bar charts, curves. Data discrimination comparison of general features of target class of data objects with general features of objects from one or a set of contrasting classes. 7. Outlier Analysis Database may contain data objects that do not comply with general behavior model of data. These data objects are called as outliers. Data mining methods discard outliers as noise or exceptions. In applications such as fraud detection, rare events may be more interesting than regularly occurring events. Data Mining July 16, 2009 17
  • 18. Data Mining: Types of Data  Relational data and transactional data  Text  Images, video  Mixtures of data Data Mining July 16, 2009 18
  • 19. Data Mining Products  DataMind -- neurOagent  Information Discovery -- IDIS  SAS Institute -- SAS/Neuronets 19 Data Mining July 16, 2009
  • 20. Data Mining Software  RapidMiner and Weka – Defining data mining process  Top 8 data mining software in 2008  Angoss software  Infor CRM Epiphany  Portrait Software  SAS  SPSS  ThinkAnalytics  Unica  Viscovery Data Mining July 16, 2009 20
  • 21. Application Areas Industry Application Finance Credit Card Analysis Insurance Fraud Analysis Telecommunication Call record analysis July 16, 2009 Data Mining 21
  • 22. Applications  Financial Industry, Banks, Businesses, E-commerce ◦ Stock and investment analysis ◦ Identify loyal customers and risky customer ◦ Predict customer spending  Database analysis and decision support ◦ Market analysis and management  target marketing, customer relation management, market basket analysis. ◦ Risk analysis and management  Forecasting, quality control, competitive analysis ◦ Fraud detection and management Data Mining July 16, 2009 22
  • 23. Data Mining in Usage 1. Intelligent Miner  It is IBM data mining product  Distinct feature is include scalability of its mining algorithm and tight integration with IBM DB2 related data base system. 5. DB Miner  Developed by DBMiner Technologies Inc.  Distinct features of DBMiner are Data cube based Online Analytical Mining Data Mining July 16, 2009 23
  • 24. The Telecomm Slice Product Household Telecomm o ns e gi R Video Europe Far East Audio India Retail Direct Special Sales Channel Data Mining July 16, 2009 24
  • 25. Conclusion  Data mining: discovering interesting patterns from large amounts of data  A KDD process includes data cleaning, data integration, data selection, transformation, data mining, pattern evaluation, and knowledge presentation  Mining can be performed in a variety of information repositories  Data mining functionalities: characterization, discrimination, association, classification, clustering, outlier etc Data Mining July 16, 2009 25
  • 26. Thank you !!! Data Mining July 16, 2009 26