SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Data Mining beyond
Adventure Works

Mark Tabladillo Ph.D.
http://marktab.net
October 3, 2009
Approach of this Presentation
• Emphasize
       – Conceptual value of data mining
       – Relationship of data mining to the real
         world
• Reserve
       – Specific procedures and mechanics
       – Specific mathematics
       – Production implementation


© 2009 Mark Tabladillo Ph.D.                       2
Outline
• Data Mining Fundamentals
• Interactive Demos
• Conclusion




© 2009 Mark Tabladillo Ph.D.   3
Interactive Demos
• Sports
• Government Forecasting




© 2009 Mark Tabladillo Ph.D.   4
Data Mining Definitions
• Data mining is the automatic or semi-
  automatic process of exploring data for
  meaningful or useful patterns.
• Data mining algorithms typically use
  estimation or optimization to achieve
  results (as opposed to only calculations).




© 2009 Mark Tabladillo Ph.D.                   5
Microsoft Data Mining
• Microsoft Data Mining refers to
  Microsoft’s specific implementation of
  certain common data mining algorithms for
  the DMX (Data Mining Extensions)
  language.
• Also called SQL Server Data Mining, the
  technology is integrated into SQL Server
  rather than presented as an independent
  application.

© 2009 Mark Tabladillo Ph.D.              6
Data Mining Tasks
• Supervised
       – Answer known, what is correlated?
• Unsupervised
       – Answer unknown (unspecified), what are the
         groups?
• Forecasting
       – Given a trend, what is next?        Value
                                             Slide




© 2009 Mark Tabladillo Ph.D.                          7
List the Data Mining Algorithms
• Ten Answers
• Each one is a field of academic focus




© 2009 Mark Tabladillo Ph.D.              8
The Data Mining Algorithms
•    Microsoft Naive Bayes
•    Microsoft Linear Regression
•    Microsoft Decision Trees
•    Microsoft Time Series
•    Microsoft Clustering
•    Microsoft Sequence Clustering
•    Microsoft Association Rules
•    Microsoft Neural Networks
•    Microsoft Logistic Regression
•    Text Mining
© 2009 Mark Tabladillo Ph.D.         9
The Analyze Tab


            Menu Option                     Data Mining Algorithm
            Analyze Key Influencers         Naïve Bayes
            Detect Categories               Clustering
            Fill from Example               Logistic Regression
            Forecast                        Time Series
            Highlight Exceptions            Clustering
            Scenario Analysis (Goal Seek)   Logistic Regression
            Scenario Analysis (What If)     Logistic Regression
            Prediction Calculator           Logistic Regression
            Shopping Basket Analysis        Association Rules
© 2009 Mark Tabladillo Ph.D.                                        10
Demo One:
National League Baseball
• Directions:
  You are on the management team for the
  Atlanta Braves. To better serve the team,
  you have been instructed by the owner to
  group the players by considering both their
  position and their salary.




© 2009 Mark Tabladillo Ph.D.                11
Demo One:
National League Baseball
• The following rules apply:
       – You must make more than one group
       – Each group must have at least two players
       – Players of different position may be in the
         same group




© 2009 Mark Tabladillo Ph.D.                           12
Demo One:
National League Baseball
• Individual attributes can be used to make
  groups
• Historical statistics can be used to group
  new players
• Both supervised and unsupervised
  algorithms can be applied to the same
  data



© 2009 Mark Tabladillo Ph.D.                   13
Demo Two:
Government Forecasting
• Directions:
  The President is asking your opinion on
  how the following numbers will increase
  over the next few months. Because this
  project is sensitive, you do not know what
  these numbers measure. However, based
  on the available history, make your best
  projection for the next six periods.


© 2009 Mark Tabladillo Ph.D.               14
Demo Two:
Government Forecasting
8



7



6



5



4



3



2



1



0
    Jan Feb Mar Apr May Jun       Jul   Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun           Jul   Aug Sep Oct Nov Dec
    2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008

© 2009 Mark Tabladillo Ph.D.                                                                                             15
Demo Two:
Government Forecasting
12




10




 8




 6




 4




 2




 0
     Sep Oct Nov Dec Jan Feb Mar Apr May Jun           Jul   Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun           Jul   Aug
     2007 2007 2007 2007 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2009 2009 2009 2009 2009 2009 2009 2009

© 2009 Mark Tabladillo Ph.D.                                                                                               16
Demo Two:
Government Forecasting
• Rapid response is as useful as prediction
• Seek intelligent correlations among related
  metrics
• Projections depend on time frame –
  modeling is continual




© 2009 Mark Tabladillo Ph.D.                17
Forecasting Algorithms
• Microsoft Time Series




                               Value
                               Slide




© 2009 Mark Tabladillo Ph.D.           18
Supervised Algorithms
•    Microsoft Naive Bayes
•    Microsoft Linear Regression
•    Microsoft Decision Trees
•    Microsoft Neural Networks
•    Microsoft Logistic Regression


                                     Value
                                     Slide




© 2009 Mark Tabladillo Ph.D.                 19
Unsupervised Algorithms
•    Microsoft Clustering
•    Microsoft Sequence Clustering
•    Microsoft Association Rules
•    Text Mining



                                     Value
                                     Slide




© 2009 Mark Tabladillo Ph.D.                 20
Resources
• MarkTab.NET
     Links, video resources and information for data mining

•    Data Mining with Microsoft SQL Server 2008
     by Jamie MacLennan (Author), ZhaoHui Tang (Author), Bogdan Crivat (Author)

•    Smart Business Intelligence Solutions with Microsoft® SQL Server® 2008
     (PRO-Developer)
     by Lynn Langit (Author), Matthew Roche (Author)




© 2009 Mark Tabladillo Ph.D.                                                  21
Regroup and Conclusion
• Main Points from this Presentation




© 2009 Mark Tabladillo Ph.D.           22
Contact Information
• Mark Tabladillo
  Twitter @marktabnet

• Also on:
  Linked In
  Facebook




© 2009 Mark Tabladillo Ph.D.   23
Bonus:
Sequence Clustering Ideas
•    Trading players in professional sports
•    Assigning players to certain positions
•    Moving from city to city
•    Store path at the mall
•    Cancer treatment path
•    Taking up a musical instrument
•    Taking up sports
•    Blogging
•    Viral news

© 2009 Mark Tabladillo Ph.D.                  24

Weitere ähnliche Inhalte

Mehr von Mark Tabladillo

201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusMLMark Tabladillo
 
201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0Mark Tabladillo
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine LearningMark Tabladillo
 
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...Mark Tabladillo
 
Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904Mark Tabladillo
 
Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Mark Tabladillo
 
Training of Python scikit-learn models on Azure
Training of Python scikit-learn models on AzureTraining of Python scikit-learn models on Azure
Training of Python scikit-learn models on AzureMark Tabladillo
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
 
Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808Mark Tabladillo
 
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Mark Tabladillo
 
Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Mark Tabladillo
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Mark Tabladillo
 
How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610Mark Tabladillo
 
Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016Mark Tabladillo
 
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Mark Tabladillo
 
Insider's guide to azure machine learning 201606
Insider's guide to azure machine learning 201606Insider's guide to azure machine learning 201606
Insider's guide to azure machine learning 201606Mark Tabladillo
 
Window functions for Data Science
Window functions for Data ScienceWindow functions for Data Science
Window functions for Data ScienceMark Tabladillo
 
Microsoft Technologies for Data Science 201601
Microsoft Technologies for Data Science 201601Microsoft Technologies for Data Science 201601
Microsoft Technologies for Data Science 201601Mark Tabladillo
 
Microsoft Data Science Technologies: Back Office Edition
Microsoft Data Science Technologies: Back Office EditionMicrosoft Data Science Technologies: Back Office Edition
Microsoft Data Science Technologies: Back Office EditionMark Tabladillo
 
Microsoft Data Science Technologies 201510
Microsoft Data Science Technologies 201510Microsoft Data Science Technologies 201510
Microsoft Data Science Technologies 201510Mark Tabladillo
 

Mehr von Mark Tabladillo (20)

201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML
 
201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
 
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
 
Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904
 
Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904
 
Training of Python scikit-learn models on Azure
Training of Python scikit-learn models on AzureTraining of Python scikit-learn models on Azure
Training of Python scikit-learn models on Azure
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808
 
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
 
Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612
 
How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610
 
Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016
 
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608
 
Insider's guide to azure machine learning 201606
Insider's guide to azure machine learning 201606Insider's guide to azure machine learning 201606
Insider's guide to azure machine learning 201606
 
Window functions for Data Science
Window functions for Data ScienceWindow functions for Data Science
Window functions for Data Science
 
Microsoft Technologies for Data Science 201601
Microsoft Technologies for Data Science 201601Microsoft Technologies for Data Science 201601
Microsoft Technologies for Data Science 201601
 
Microsoft Data Science Technologies: Back Office Edition
Microsoft Data Science Technologies: Back Office EditionMicrosoft Data Science Technologies: Back Office Edition
Microsoft Data Science Technologies: Back Office Edition
 
Microsoft Data Science Technologies 201510
Microsoft Data Science Technologies 201510Microsoft Data Science Technologies 201510
Microsoft Data Science Technologies 201510
 

Kürzlich hochgeladen

Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear RegressionRavindra Nath Shukla
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst SummitHolger Mueller
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMRavindra Nath Shukla
 
Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...
Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...
Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...lizamodels9
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMANIlamathiKannappan
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823
 
Event mailer assignment progress report .pdf
Event mailer assignment progress report .pdfEvent mailer assignment progress report .pdf
Event mailer assignment progress report .pdftbatkhuu1
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.Aaiza Hassan
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Servicediscovermytutordmt
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Neil Kimberley
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyEthan lee
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdfRenandantas16
 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfOnline Income Engine
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 DelhiCall Girls in Delhi
 

Kürzlich hochgeladen (20)

Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear Regression
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst Summit
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSM
 
Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...
Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...
Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMAN
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
 
Event mailer assignment progress report .pdf
Event mailer assignment progress report .pdfEvent mailer assignment progress report .pdf
Event mailer assignment progress report .pdf
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Service
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
 
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdf
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
 
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
 

Data Mining Beyond Adventure Works (Redmond WA 10/3/2009)

  • 1. Data Mining beyond Adventure Works Mark Tabladillo Ph.D. http://marktab.net October 3, 2009
  • 2. Approach of this Presentation • Emphasize – Conceptual value of data mining – Relationship of data mining to the real world • Reserve – Specific procedures and mechanics – Specific mathematics – Production implementation © 2009 Mark Tabladillo Ph.D. 2
  • 3. Outline • Data Mining Fundamentals • Interactive Demos • Conclusion © 2009 Mark Tabladillo Ph.D. 3
  • 4. Interactive Demos • Sports • Government Forecasting © 2009 Mark Tabladillo Ph.D. 4
  • 5. Data Mining Definitions • Data mining is the automatic or semi- automatic process of exploring data for meaningful or useful patterns. • Data mining algorithms typically use estimation or optimization to achieve results (as opposed to only calculations). © 2009 Mark Tabladillo Ph.D. 5
  • 6. Microsoft Data Mining • Microsoft Data Mining refers to Microsoft’s specific implementation of certain common data mining algorithms for the DMX (Data Mining Extensions) language. • Also called SQL Server Data Mining, the technology is integrated into SQL Server rather than presented as an independent application. © 2009 Mark Tabladillo Ph.D. 6
  • 7. Data Mining Tasks • Supervised – Answer known, what is correlated? • Unsupervised – Answer unknown (unspecified), what are the groups? • Forecasting – Given a trend, what is next? Value Slide © 2009 Mark Tabladillo Ph.D. 7
  • 8. List the Data Mining Algorithms • Ten Answers • Each one is a field of academic focus © 2009 Mark Tabladillo Ph.D. 8
  • 9. The Data Mining Algorithms • Microsoft Naive Bayes • Microsoft Linear Regression • Microsoft Decision Trees • Microsoft Time Series • Microsoft Clustering • Microsoft Sequence Clustering • Microsoft Association Rules • Microsoft Neural Networks • Microsoft Logistic Regression • Text Mining © 2009 Mark Tabladillo Ph.D. 9
  • 10. The Analyze Tab Menu Option Data Mining Algorithm Analyze Key Influencers Naïve Bayes Detect Categories Clustering Fill from Example Logistic Regression Forecast Time Series Highlight Exceptions Clustering Scenario Analysis (Goal Seek) Logistic Regression Scenario Analysis (What If) Logistic Regression Prediction Calculator Logistic Regression Shopping Basket Analysis Association Rules © 2009 Mark Tabladillo Ph.D. 10
  • 11. Demo One: National League Baseball • Directions: You are on the management team for the Atlanta Braves. To better serve the team, you have been instructed by the owner to group the players by considering both their position and their salary. © 2009 Mark Tabladillo Ph.D. 11
  • 12. Demo One: National League Baseball • The following rules apply: – You must make more than one group – Each group must have at least two players – Players of different position may be in the same group © 2009 Mark Tabladillo Ph.D. 12
  • 13. Demo One: National League Baseball • Individual attributes can be used to make groups • Historical statistics can be used to group new players • Both supervised and unsupervised algorithms can be applied to the same data © 2009 Mark Tabladillo Ph.D. 13
  • 14. Demo Two: Government Forecasting • Directions: The President is asking your opinion on how the following numbers will increase over the next few months. Because this project is sensitive, you do not know what these numbers measure. However, based on the available history, make your best projection for the next six periods. © 2009 Mark Tabladillo Ph.D. 14
  • 15. Demo Two: Government Forecasting 8 7 6 5 4 3 2 1 0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 © 2009 Mark Tabladillo Ph.D. 15
  • 16. Demo Two: Government Forecasting 12 10 8 6 4 2 0 Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug 2007 2007 2007 2007 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2009 2009 2009 2009 2009 2009 2009 2009 © 2009 Mark Tabladillo Ph.D. 16
  • 17. Demo Two: Government Forecasting • Rapid response is as useful as prediction • Seek intelligent correlations among related metrics • Projections depend on time frame – modeling is continual © 2009 Mark Tabladillo Ph.D. 17
  • 18. Forecasting Algorithms • Microsoft Time Series Value Slide © 2009 Mark Tabladillo Ph.D. 18
  • 19. Supervised Algorithms • Microsoft Naive Bayes • Microsoft Linear Regression • Microsoft Decision Trees • Microsoft Neural Networks • Microsoft Logistic Regression Value Slide © 2009 Mark Tabladillo Ph.D. 19
  • 20. Unsupervised Algorithms • Microsoft Clustering • Microsoft Sequence Clustering • Microsoft Association Rules • Text Mining Value Slide © 2009 Mark Tabladillo Ph.D. 20
  • 21. Resources • MarkTab.NET Links, video resources and information for data mining • Data Mining with Microsoft SQL Server 2008 by Jamie MacLennan (Author), ZhaoHui Tang (Author), Bogdan Crivat (Author) • Smart Business Intelligence Solutions with Microsoft® SQL Server® 2008 (PRO-Developer) by Lynn Langit (Author), Matthew Roche (Author) © 2009 Mark Tabladillo Ph.D. 21
  • 22. Regroup and Conclusion • Main Points from this Presentation © 2009 Mark Tabladillo Ph.D. 22
  • 23. Contact Information • Mark Tabladillo Twitter @marktabnet • Also on: Linked In Facebook © 2009 Mark Tabladillo Ph.D. 23
  • 24. Bonus: Sequence Clustering Ideas • Trading players in professional sports • Assigning players to certain positions • Moving from city to city • Store path at the mall • Cancer treatment path • Taking up a musical instrument • Taking up sports • Blogging • Viral news © 2009 Mark Tabladillo Ph.D. 24