SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Taking Your Application Design To The
     Next Level With Data Mining
                              Peter Myers
             Mentor – Solid Quality Mentors
   Silicon Valley SQL Server User Group – 21 July, 2009



           Copyright © 2009, Solid Quality Mentors. All rights reserved.
PRESENTER

• Peter Myers
• Mentor and Trainer, Solid Quality Mentors
• BBus, MCP, MCITP (DBA, Dev, BI), MCT, MVP
• 12 years’ experience designing, developing and
 supporting software solutions using Microsoft data and
 development platforms
• pmyers@solidq.com


             Copyright © 2009, Solid Quality Mentors. All rights reserved.
WHO WE ARE

• Industry experts:
  Growing, elite group of over 90 of the world’s best technical experts who, as
  reflected by the high concentration of Microsoft MVP’s and RD’s in our ranks,
  achieve excellence in their industry by maintaining the highest credentials.
• Published authors:
  Best technical reference books, Microsoft reference materials, industry white
  papers, technical magazine articles, and webcasts.
• Top technical speakers:
  PASS Community Summit, Microsoft TechEd, The Microsoft BI Conference,
  SQL Server DevConnections, countless user groups, international
  conferences and events.
• For more information visit www.solidq.com


                   Copyright © 2009, Solid Quality Mentors. All rights reserved.
WHAT WE DO

Provide advanced, world-class expertise across the entire
Microsoft relational data and development platforms and
              complimenting technologies.

 PRACTICE AREAS                                     SERVICES
 Relational Database Management                     Advanced, Public Training
 Business Intelligence                              Customized, Private Training
 Development Methodologies                          Solution Delivery & Tuning
 SharePoint Collaboration                           Enhanced, Mentoring Services



      For more information visit www.solidq.com
                  Copyright © 2009, Solid Quality Mentors. All rights reserved.
AGENDA

• Introducing Data Mining
• Describing the Data Mining Process
• SQL Server™ 2008 Data Mining
• Data Preparation
• Data Mining Visualization
• Demonstrations


             Copyright © 2009, Solid Quality Mentors. All rights reserved.
INTRODUCING
                                                                   DATA MINING
• Addresses the problem:
 “Too much data and not enough information”
• Enables data exploration, pattern discovery, and pattern
 prediction—which lead to knowledge discovery
• Forms a key part of a BI solution




              Copyright © 2009, Solid Quality Mentors. All rights reserved.
DATA MINING
                           ENABLES PREDICTIVE ANALYSIS
 Proactive                                                Data mining



                                    Predictive Analysis

                                            OLAP
Interactive



                      Ad-hoc reporting


                Canned reporting
  Passive
                                                                           Business
              Presentation            Exploration                 Discovery Insight
                 Copyright © 2009, Solid Quality Mentors. All rights reserved.
BUSINESS
                                                                         SCENARIOS
• Identifying responsive customers/unresponsive
 customers (also known as churn analysis)
• Targeting promotions
• Detecting and preventing fraud
• Correcting data during ETL
• Forecasting sales and inventory
• Cross-selling

             Copyright © 2009, Solid Quality Mentors. All rights reserved.
DESCRIBING THE
                                    DATA MINING PROCESS
                                                                       “Doing Data
                                                                         Mining”
  Business                                       Data
Understanding                                Understanding




                                                              Data
                                                           Preparation

                             Data
 Deployment

                                                             Modeling


                           Evaluation
 “Putting Data
Mining to Work”                                               www.crisp-dm.org

       Copyright © 2009, Solid Quality Mentors. All rights reserved.
DATA
                                                                        PREPARATION
• Often significant amounts of effort are required to prepare data
  for mining:
   • Transforming for cleaning and reformatting
   • Isolating and flagging abnormal data
   • Appropriately substituting missing values
   • Discretizing continuous values into ranges
   • Normalizing values between 0 and 1
• Of course, having the required data to begin with is important:
   • When designing systems, give consideration to attributes that may be
     required as inputs for classification
      o For example, demographic data: Age, Gender, Region, etc


                   Copyright © 2009, Solid Quality Mentors. All rights reserved.
MODELING


Design time
Process time
Query time                                                                     Mining Model




               Copyright © 2009, Solid Quality Mentors. All rights reserved.
MODELING


Design time
Process time
Query time                                                                     Mining Model




                                                    Data
                                                   Mining
                                                   Engine
                Training Data



               Copyright © 2009, Solid Quality Mentors. All rights reserved.
MODELING


Design time
Process time
Query time                                                                     Mining Model




                                                    Data
                                                   Mining
                                                   Engine


               Predicted Data                                                  Data to Predict

               Copyright © 2009, Solid Quality Mentors. All rights reserved.
MODEL
                                                                           VALIDATION
• It is important that the model makes sense
  • Accuracy
     o Does it correlate and predict correctly?
  • Reliability
     o Does it work similarly for different test data?
  • Usefulness
     o Does it provide insight or only obvious trivialities?
• Commonly a holdout set of data is used to test model
 accuracy


                  Copyright © 2009, Solid Quality Mentors. All rights reserved.
SQL SERVER™ 2008
                                                              DATA MINING
• Hides the complexity of an advanced technology
• Includes full suite of algorithms to automatically extract
 information from data
• Handles large volumes of data and complex data
• Data can be sourced from relational and OLAP databases
• Uses standard programming interfaces:
   • XMLA
   • DMX
• Delivers a complete framework for building and deploying
  intelligent applications

                Copyright © 2009, Solid Quality Mentors. All rights reserved.
INTEGRATED
                                   END-TO-END OFFERING
                             DELIVERY


                       SharePoint Server
  Reports               Excel
            Dashboards Workbooks     Analytic   Scorecards    Plans
                                      Views

END USER TOOLS & PERFORMANCE MANAGEMENT APPS

            Excel                     PerformancePoint Server

                          BI PLATFORM
       SQL Server                             SQL Server
    Reporting Services                      Analysis Services

                       SQL Server DBMS

               SQL Server Integration Services



     Copyright © 2009, Solid Quality Mentors. All rights reserved.
SQL SERVER™ 2008
                                               ALGORITHMS

     • Microsoft Naïve Bayes
         •    Quick and approachable algorithm
         •    Used for classification


     • Microsoft Decision Trees
         •    Popular data mining technique
         •    Used for classification, regression and association


     • Microsoft Linear Regression
         •    Finds the best possible straight line through a series of
              points
         •    Used for prediction analysis


Copyright © 2009, Solid Quality Mentors. All rights reserved.
SQL SERVER™ 2008
                                               ALGORITHMS

     • Microsoft Neural Network
         •    More sophisticated than Decision Trees and Naïve
              Bayes, this algorithm can explore extremely complex
              scenarios
         •    Used for classification and regression tasks


     • Microsoft Logistic Regression
         •    A particular case of the Neural Network algorithm


     • Microsoft Clustering
         •    Finds natural groupings inside data
         •    Supports segmentation and anomaly detection tasks


Copyright © 2009, Solid Quality Mentors. All rights reserved.
SQL SERVER™ 2008
                                               ALGORITHMS

     • Microsoft Sequence Clustering
         •    Groups a sequence of discrete events into natural
              groups based on similarity


     • Microsoft Time Series
         •    Used to predict future values from a time series
         •    Has been improved in SQL Server 2008 to produce
              more accurate long-term forecasts


     • Microsoft Association Rules
         •    Commonly supports market basket analysis to learn
              what products are purchased together


Copyright © 2009, Solid Quality Mentors. All rights reserved.
SQL SERVER™ 2008
                                                               ALGORITHMS


  Classify       Estimate                 Cluster                Forecast         Associate

• Decision     • Decision             • Clustering            • Time Series     • Association
  Trees          Trees                                                            Rules
• Logistic     • Linear                                                         • Decision
  Regression     Regression                                                       Trees
• Naïve        • Logistic
  Bayes          Regression
• Neural       • Neural
  Networks       Networks




                Copyright © 2009, Solid Quality Mentors. All rights reserved.
DATA MINING
                                                                  VISUALIZATION
• In contrast to OLTP and OLAP queries, data mining queries
 typically extract information that the user is not aware of
• Appreciate that end users do not typically query data mining
 models directly
• Visualizations can effectively present data discoveries
• SQL Server™ 2008 provides algorithm-specific visualizations that
  can:
   • Test and explore models in BIDS
   • Be embedded into Web and Windows Forms applications
• Developers can construct and plug-in custom data mining
  viewers
                Copyright © 2009, Solid Quality Mentors. All rights reserved.
DATA MINING
                                            PROGRAMMABILITY
C++ App           VB App         .NET App                Any App
                                                      Any Platform, Any
OLE DB             ADO       ADOMD.NET      AMO
                                                           Device




      XMLA                                  WAN
    Over TCP/IP
                                                      XMLA
                                                    Over HTTP

                           Analysis Server

               OLAP                         Data Mining

    Server ADOMD.NET                Data Mining Interfaces

         .NET Stored            Microsoft           Third-Party
          Procedures           Algorithms           Algorithms


   Copyright © 2009, Solid Quality Mentors. All rights reserved.
ANALYSIS SERVICES
                                                                  APIs
• AMO (Analysis Management Objects)
 • Administer database objects
 • Apply security
 • Manage processing
• ADOMD.NET
 • Connect to SSAS databases
 • Retrieve and manipulate data
• Server ADOMD.NET
 • Extend DMX by using .NET stored procedures
             Copyright © 2009, Solid Quality Mentors. All rights reserved.
DEMONSTRATIONS
1.   Creating, Training, Testing and Querying Mining Models with BIDS
2.   Embedding Visualizations Into a Windows Forms Application
3.   Embedding a Data Mining Report Into a Windows Forms Application
4.   Enhancing an E-Commerce Site with Targeted Marketing
5.   Enhancing an E-Commerce Site with Market Basket Analysis
6.   Extending DMX With a .NET Stored Procedures
7.   Automating Data Validation With Data Mining


                   Copyright © 2009, Solid Quality Mentors. All rights reserved.
                               2008,
RESOURCES

• www.microsoft.com/sql/technologies/dm
  • Links to technical resources, case studies, news, and reviews
• www.sqlserverdatamining.com
  • Site designed and maintained by the SQL Server Data Mining
      team
  •   Includes: Live samples, tutorials, webcasts, tips and tricks, and
      FAQ
• Data Mining for SQL Server 2008, by ZhaoHui Tang and
 Jamie MacLennan

                 Copyright © 2009, Solid Quality Mentors. All rights reserved.

Weitere ähnliche Inhalte

Was ist angesagt?

Using BrightWork for Project Management with SharePoint 2010 - from Atidan
Using BrightWork for Project Management with SharePoint 2010 - from AtidanUsing BrightWork for Project Management with SharePoint 2010 - from Atidan
Using BrightWork for Project Management with SharePoint 2010 - from Atidan
David J Rosenthal
 
Analyzing Multi-Structured Data
Analyzing Multi-Structured DataAnalyzing Multi-Structured Data
Analyzing Multi-Structured Data
DataWorks Summit
 
Expert Webinar Series: SharePoint Governance - Managing Content Sprawl
Expert Webinar Series:  SharePoint Governance - Managing Content SprawlExpert Webinar Series:  SharePoint Governance - Managing Content Sprawl
Expert Webinar Series: SharePoint Governance - Managing Content Sprawl
martingarland
 
Agile partners overview
Agile partners overviewAgile partners overview
Agile partners overview
acube07
 

Was ist angesagt? (17)

Using BrightWork for Project Management with SharePoint 2010 - from Atidan
Using BrightWork for Project Management with SharePoint 2010 - from AtidanUsing BrightWork for Project Management with SharePoint 2010 - from Atidan
Using BrightWork for Project Management with SharePoint 2010 - from Atidan
 
Self-Service Access and Exploration of Big Data
Self-Service Access and Exploration of Big DataSelf-Service Access and Exploration of Big Data
Self-Service Access and Exploration of Big Data
 
To Each Their Own: How to Solve Analytic Complexity
To Each Their Own: How to Solve Analytic ComplexityTo Each Their Own: How to Solve Analytic Complexity
To Each Their Own: How to Solve Analytic Complexity
 
Analyzing Multi-Structured Data
Analyzing Multi-Structured DataAnalyzing Multi-Structured Data
Analyzing Multi-Structured Data
 
Technically Speaking: How Self-Service Analytics Fosters Collaboration
Technically Speaking: How Self-Service Analytics Fosters CollaborationTechnically Speaking: How Self-Service Analytics Fosters Collaboration
Technically Speaking: How Self-Service Analytics Fosters Collaboration
 
Measure Data Quality
Measure Data QualityMeasure Data Quality
Measure Data Quality
 
2012 06 hortonworks paris hug
2012 06 hortonworks paris hug2012 06 hortonworks paris hug
2012 06 hortonworks paris hug
 
Expert Webinar Series: SharePoint Governance - Managing Content Sprawl
Expert Webinar Series:  SharePoint Governance - Managing Content SprawlExpert Webinar Series:  SharePoint Governance - Managing Content Sprawl
Expert Webinar Series: SharePoint Governance - Managing Content Sprawl
 
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxInvestigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists Toolbox
 
Why Mashups Matter
Why Mashups MatterWhy Mashups Matter
Why Mashups Matter
 
Couchbase Server and IBM BigInsights: One + One = Three
Couchbase Server and IBM BigInsights: One + One = ThreeCouchbase Server and IBM BigInsights: One + One = Three
Couchbase Server and IBM BigInsights: One + One = Three
 
The Big Picture: Big Data for the New Wave of Analytics
The Big Picture: Big Data for the New Wave of AnalyticsThe Big Picture: Big Data for the New Wave of Analytics
The Big Picture: Big Data for the New Wave of Analytics
 
Agile partners overview
Agile partners overviewAgile partners overview
Agile partners overview
 
Use of EMR for Marketing Segmentation
Use of EMR for Marketing SegmentationUse of EMR for Marketing Segmentation
Use of EMR for Marketing Segmentation
 
Cogent Company Overview.11292009
Cogent Company Overview.11292009Cogent Company Overview.11292009
Cogent Company Overview.11292009
 
Application architecture for cloud
Application architecture for cloudApplication architecture for cloud
Application architecture for cloud
 
Big Data Marketing in the AWS Cloud: Improving Cross-Media Effectiveness - We...
Big Data Marketing in the AWS Cloud: Improving Cross-Media Effectiveness - We...Big Data Marketing in the AWS Cloud: Improving Cross-Media Effectiveness - We...
Big Data Marketing in the AWS Cloud: Improving Cross-Media Effectiveness - We...
 

Ähnlich wie SQL Server Data Mining - Taking your Application Design to the Next Level

Metadata Use Cases You Can Use
Metadata Use Cases You Can UseMetadata Use Cases You Can Use
Metadata Use Cases You Can Use
dmurph4
 
Metadata Use Cases
Metadata Use CasesMetadata Use Cases
Metadata Use Cases
dmurph4
 
Dataiku r users group v2
Dataiku   r users group v2Dataiku   r users group v2
Dataiku r users group v2
Cdiscount
 
Manthan biim services and solutions
Manthan   biim services  and solutionsManthan   biim services  and solutions
Manthan biim services and solutions
Jaikumar Karuppannan
 
Getting Cloud Architecture Right the First Time Ver 2
Getting Cloud Architecture Right the First Time Ver 2Getting Cloud Architecture Right the First Time Ver 2
Getting Cloud Architecture Right the First Time Ver 2
David Linthicum
 
Right Space Brief
Right Space BriefRight Space Brief
Right Space Brief
jnassour
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business Intelligence
Don Jackson
 

Ähnlich wie SQL Server Data Mining - Taking your Application Design to the Next Level (20)

Zakipoint Introduction
Zakipoint IntroductionZakipoint Introduction
Zakipoint Introduction
 
2009/12 - Database Architechs - Presentation
2009/12 - Database Architechs - Presentation2009/12 - Database Architechs - Presentation
2009/12 - Database Architechs - Presentation
 
2009/12 Database Architechs Presentation
2009/12   Database Architechs Presentation2009/12   Database Architechs Presentation
2009/12 Database Architechs Presentation
 
CDM SIG: Fusion MDM for Customer Highlights [2010 OAUG Collaborate]
CDM SIG: Fusion MDM for Customer Highlights [2010 OAUG Collaborate]CDM SIG: Fusion MDM for Customer Highlights [2010 OAUG Collaborate]
CDM SIG: Fusion MDM for Customer Highlights [2010 OAUG Collaborate]
 
Rhapsody Technologies Introduction Deck 01 31 12
Rhapsody Technologies   Introduction Deck 01 31 12Rhapsody Technologies   Introduction Deck 01 31 12
Rhapsody Technologies Introduction Deck 01 31 12
 
Metadata Use Cases You Can Use
Metadata Use Cases You Can UseMetadata Use Cases You Can Use
Metadata Use Cases You Can Use
 
Metadata Use Cases
Metadata Use CasesMetadata Use Cases
Metadata Use Cases
 
Acuma Introduction
Acuma IntroductionAcuma Introduction
Acuma Introduction
 
OBIEE On Cloud
OBIEE On CloudOBIEE On Cloud
OBIEE On Cloud
 
Dataiku r users group v2
Dataiku   r users group v2Dataiku   r users group v2
Dataiku r users group v2
 
Improving Quality and Adoption: EIM SQL Server 2012
Improving Quality and Adoption: EIM SQL Server 2012Improving Quality and Adoption: EIM SQL Server 2012
Improving Quality and Adoption: EIM SQL Server 2012
 
Manthan biim services and solutions
Manthan   biim services  and solutionsManthan   biim services  and solutions
Manthan biim services and solutions
 
Common mistakes, pitfalls and misconceptions to avoid when launching your DAM...
Common mistakes, pitfalls and misconceptions to avoid when launching your DAM...Common mistakes, pitfalls and misconceptions to avoid when launching your DAM...
Common mistakes, pitfalls and misconceptions to avoid when launching your DAM...
 
Cogent overview
Cogent overviewCogent overview
Cogent overview
 
Getting Cloud Architecture Right the First Time Ver 2
Getting Cloud Architecture Right the First Time Ver 2Getting Cloud Architecture Right the First Time Ver 2
Getting Cloud Architecture Right the First Time Ver 2
 
PromptCloud Nasscom Emerge 50 Presentation
PromptCloud Nasscom Emerge 50 PresentationPromptCloud Nasscom Emerge 50 Presentation
PromptCloud Nasscom Emerge 50 Presentation
 
Sod Profile
Sod ProfileSod Profile
Sod Profile
 
Right Space Brief
Right Space BriefRight Space Brief
Right Space Brief
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business Intelligence
 
Energize 2013 slides
Energize 2013 slidesEnergize 2013 slides
Energize 2013 slides
 

Mehr von Mark Ginnebaugh

Mehr von Mark Ginnebaugh (20)

Automating Microsoft Power BI Creations 2015
Automating Microsoft Power BI Creations 2015Automating Microsoft Power BI Creations 2015
Automating Microsoft Power BI Creations 2015
 
Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction
Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction
Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction
 
Platfora - An Analytics Sandbox In A World Of Big Data
Platfora - An Analytics Sandbox In A World Of Big DataPlatfora - An Analytics Sandbox In A World Of Big Data
Platfora - An Analytics Sandbox In A World Of Big Data
 
Microsoft SQL Server Relational Databases and Primary Keys
Microsoft SQL Server Relational Databases and Primary KeysMicrosoft SQL Server Relational Databases and Primary Keys
Microsoft SQL Server Relational Databases and Primary Keys
 
DesignMind Microsoft Business Intelligence SQL Server
DesignMind Microsoft Business Intelligence SQL ServerDesignMind Microsoft Business Intelligence SQL Server
DesignMind Microsoft Business Intelligence SQL Server
 
San Francisco Bay Area SQL Server July 2013 meetings
San Francisco Bay Area SQL Server July 2013 meetingsSan Francisco Bay Area SQL Server July 2013 meetings
San Francisco Bay Area SQL Server July 2013 meetings
 
Silicon Valley SQL Server User Group June 2013
Silicon Valley SQL Server User Group June 2013Silicon Valley SQL Server User Group June 2013
Silicon Valley SQL Server User Group June 2013
 
Microsoft SQL Server Continuous Integration
Microsoft SQL Server Continuous IntegrationMicrosoft SQL Server Continuous Integration
Microsoft SQL Server Continuous Integration
 
Hortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopHortonworks Big Data & Hadoop
Hortonworks Big Data & Hadoop
 
Microsoft SQL Server Physical Join Operators
Microsoft SQL Server Physical Join OperatorsMicrosoft SQL Server Physical Join Operators
Microsoft SQL Server Physical Join Operators
 
Microsoft PowerPivot & Power View in Excel 2013
Microsoft PowerPivot & Power View in Excel 2013Microsoft PowerPivot & Power View in Excel 2013
Microsoft PowerPivot & Power View in Excel 2013
 
Microsoft Data Warehouse Business Intelligence Lifecycle - The Kimball Approach
Microsoft Data Warehouse Business Intelligence Lifecycle - The Kimball ApproachMicrosoft Data Warehouse Business Intelligence Lifecycle - The Kimball Approach
Microsoft Data Warehouse Business Intelligence Lifecycle - The Kimball Approach
 
Fusion-io Memory Flash for Microsoft SQL Server 2012
Fusion-io Memory Flash for Microsoft SQL Server 2012Fusion-io Memory Flash for Microsoft SQL Server 2012
Fusion-io Memory Flash for Microsoft SQL Server 2012
 
Microsoft SQL Server PASS News August 2012
Microsoft SQL Server PASS News August 2012Microsoft SQL Server PASS News August 2012
Microsoft SQL Server PASS News August 2012
 
Business Intelligence Dashboard Design Best Practices
Business Intelligence Dashboard Design Best PracticesBusiness Intelligence Dashboard Design Best Practices
Business Intelligence Dashboard Design Best Practices
 
Microsoft Mobile Business Intelligence
Microsoft Mobile Business Intelligence Microsoft Mobile Business Intelligence
Microsoft Mobile Business Intelligence
 
Microsoft SQL Server 2012 Cloud Ready
Microsoft SQL Server 2012 Cloud ReadyMicrosoft SQL Server 2012 Cloud Ready
Microsoft SQL Server 2012 Cloud Ready
 
Microsoft SQL Server 2012 Master Data Services
Microsoft SQL Server 2012 Master Data ServicesMicrosoft SQL Server 2012 Master Data Services
Microsoft SQL Server 2012 Master Data Services
 
Microsoft SQL Server PowerPivot
Microsoft SQL Server PowerPivotMicrosoft SQL Server PowerPivot
Microsoft SQL Server PowerPivot
 
Microsoft SQL Server Testing Frameworks
Microsoft SQL Server Testing FrameworksMicrosoft SQL Server Testing Frameworks
Microsoft SQL Server Testing Frameworks
 

Kürzlich hochgeladen

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

SQL Server Data Mining - Taking your Application Design to the Next Level

  • 1. Taking Your Application Design To The Next Level With Data Mining Peter Myers Mentor – Solid Quality Mentors Silicon Valley SQL Server User Group – 21 July, 2009 Copyright © 2009, Solid Quality Mentors. All rights reserved.
  • 2. PRESENTER • Peter Myers • Mentor and Trainer, Solid Quality Mentors • BBus, MCP, MCITP (DBA, Dev, BI), MCT, MVP • 12 years’ experience designing, developing and supporting software solutions using Microsoft data and development platforms • pmyers@solidq.com Copyright © 2009, Solid Quality Mentors. All rights reserved.
  • 3. WHO WE ARE • Industry experts: Growing, elite group of over 90 of the world’s best technical experts who, as reflected by the high concentration of Microsoft MVP’s and RD’s in our ranks, achieve excellence in their industry by maintaining the highest credentials. • Published authors: Best technical reference books, Microsoft reference materials, industry white papers, technical magazine articles, and webcasts. • Top technical speakers: PASS Community Summit, Microsoft TechEd, The Microsoft BI Conference, SQL Server DevConnections, countless user groups, international conferences and events. • For more information visit www.solidq.com Copyright © 2009, Solid Quality Mentors. All rights reserved.
  • 4. WHAT WE DO Provide advanced, world-class expertise across the entire Microsoft relational data and development platforms and complimenting technologies. PRACTICE AREAS SERVICES Relational Database Management Advanced, Public Training Business Intelligence Customized, Private Training Development Methodologies Solution Delivery & Tuning SharePoint Collaboration Enhanced, Mentoring Services For more information visit www.solidq.com Copyright © 2009, Solid Quality Mentors. All rights reserved.
  • 5. AGENDA • Introducing Data Mining • Describing the Data Mining Process • SQL Server™ 2008 Data Mining • Data Preparation • Data Mining Visualization • Demonstrations Copyright © 2009, Solid Quality Mentors. All rights reserved.
  • 6. INTRODUCING DATA MINING • Addresses the problem: “Too much data and not enough information” • Enables data exploration, pattern discovery, and pattern prediction—which lead to knowledge discovery • Forms a key part of a BI solution Copyright © 2009, Solid Quality Mentors. All rights reserved.
  • 7. DATA MINING ENABLES PREDICTIVE ANALYSIS Proactive Data mining Predictive Analysis OLAP Interactive Ad-hoc reporting Canned reporting Passive Business Presentation Exploration Discovery Insight Copyright © 2009, Solid Quality Mentors. All rights reserved.
  • 8. BUSINESS SCENARIOS • Identifying responsive customers/unresponsive customers (also known as churn analysis) • Targeting promotions • Detecting and preventing fraud • Correcting data during ETL • Forecasting sales and inventory • Cross-selling Copyright © 2009, Solid Quality Mentors. All rights reserved.
  • 9. DESCRIBING THE DATA MINING PROCESS “Doing Data Mining” Business Data Understanding Understanding Data Preparation Data Deployment Modeling Evaluation “Putting Data Mining to Work” www.crisp-dm.org Copyright © 2009, Solid Quality Mentors. All rights reserved.
  • 10. DATA PREPARATION • Often significant amounts of effort are required to prepare data for mining: • Transforming for cleaning and reformatting • Isolating and flagging abnormal data • Appropriately substituting missing values • Discretizing continuous values into ranges • Normalizing values between 0 and 1 • Of course, having the required data to begin with is important: • When designing systems, give consideration to attributes that may be required as inputs for classification o For example, demographic data: Age, Gender, Region, etc Copyright © 2009, Solid Quality Mentors. All rights reserved.
  • 11. MODELING Design time Process time Query time Mining Model Copyright © 2009, Solid Quality Mentors. All rights reserved.
  • 12. MODELING Design time Process time Query time Mining Model Data Mining Engine Training Data Copyright © 2009, Solid Quality Mentors. All rights reserved.
  • 13. MODELING Design time Process time Query time Mining Model Data Mining Engine Predicted Data Data to Predict Copyright © 2009, Solid Quality Mentors. All rights reserved.
  • 14. MODEL VALIDATION • It is important that the model makes sense • Accuracy o Does it correlate and predict correctly? • Reliability o Does it work similarly for different test data? • Usefulness o Does it provide insight or only obvious trivialities? • Commonly a holdout set of data is used to test model accuracy Copyright © 2009, Solid Quality Mentors. All rights reserved.
  • 15. SQL SERVER™ 2008 DATA MINING • Hides the complexity of an advanced technology • Includes full suite of algorithms to automatically extract information from data • Handles large volumes of data and complex data • Data can be sourced from relational and OLAP databases • Uses standard programming interfaces: • XMLA • DMX • Delivers a complete framework for building and deploying intelligent applications Copyright © 2009, Solid Quality Mentors. All rights reserved.
  • 16. INTEGRATED END-TO-END OFFERING DELIVERY SharePoint Server Reports Excel Dashboards Workbooks Analytic Scorecards Plans Views END USER TOOLS & PERFORMANCE MANAGEMENT APPS Excel PerformancePoint Server BI PLATFORM SQL Server SQL Server Reporting Services Analysis Services SQL Server DBMS SQL Server Integration Services Copyright © 2009, Solid Quality Mentors. All rights reserved.
  • 17. SQL SERVER™ 2008 ALGORITHMS • Microsoft Naïve Bayes • Quick and approachable algorithm • Used for classification • Microsoft Decision Trees • Popular data mining technique • Used for classification, regression and association • Microsoft Linear Regression • Finds the best possible straight line through a series of points • Used for prediction analysis Copyright © 2009, Solid Quality Mentors. All rights reserved.
  • 18. SQL SERVER™ 2008 ALGORITHMS • Microsoft Neural Network • More sophisticated than Decision Trees and Naïve Bayes, this algorithm can explore extremely complex scenarios • Used for classification and regression tasks • Microsoft Logistic Regression • A particular case of the Neural Network algorithm • Microsoft Clustering • Finds natural groupings inside data • Supports segmentation and anomaly detection tasks Copyright © 2009, Solid Quality Mentors. All rights reserved.
  • 19. SQL SERVER™ 2008 ALGORITHMS • Microsoft Sequence Clustering • Groups a sequence of discrete events into natural groups based on similarity • Microsoft Time Series • Used to predict future values from a time series • Has been improved in SQL Server 2008 to produce more accurate long-term forecasts • Microsoft Association Rules • Commonly supports market basket analysis to learn what products are purchased together Copyright © 2009, Solid Quality Mentors. All rights reserved.
  • 20. SQL SERVER™ 2008 ALGORITHMS Classify Estimate Cluster Forecast Associate • Decision • Decision • Clustering • Time Series • Association Trees Trees Rules • Logistic • Linear • Decision Regression Regression Trees • Naïve • Logistic Bayes Regression • Neural • Neural Networks Networks Copyright © 2009, Solid Quality Mentors. All rights reserved.
  • 21. DATA MINING VISUALIZATION • In contrast to OLTP and OLAP queries, data mining queries typically extract information that the user is not aware of • Appreciate that end users do not typically query data mining models directly • Visualizations can effectively present data discoveries • SQL Server™ 2008 provides algorithm-specific visualizations that can: • Test and explore models in BIDS • Be embedded into Web and Windows Forms applications • Developers can construct and plug-in custom data mining viewers Copyright © 2009, Solid Quality Mentors. All rights reserved.
  • 22. DATA MINING PROGRAMMABILITY C++ App VB App .NET App Any App Any Platform, Any OLE DB ADO ADOMD.NET AMO Device XMLA WAN Over TCP/IP XMLA Over HTTP Analysis Server OLAP Data Mining Server ADOMD.NET Data Mining Interfaces .NET Stored Microsoft Third-Party Procedures Algorithms Algorithms Copyright © 2009, Solid Quality Mentors. All rights reserved.
  • 23. ANALYSIS SERVICES APIs • AMO (Analysis Management Objects) • Administer database objects • Apply security • Manage processing • ADOMD.NET • Connect to SSAS databases • Retrieve and manipulate data • Server ADOMD.NET • Extend DMX by using .NET stored procedures Copyright © 2009, Solid Quality Mentors. All rights reserved.
  • 24. DEMONSTRATIONS 1. Creating, Training, Testing and Querying Mining Models with BIDS 2. Embedding Visualizations Into a Windows Forms Application 3. Embedding a Data Mining Report Into a Windows Forms Application 4. Enhancing an E-Commerce Site with Targeted Marketing 5. Enhancing an E-Commerce Site with Market Basket Analysis 6. Extending DMX With a .NET Stored Procedures 7. Automating Data Validation With Data Mining Copyright © 2009, Solid Quality Mentors. All rights reserved. 2008,
  • 25. RESOURCES • www.microsoft.com/sql/technologies/dm • Links to technical resources, case studies, news, and reviews • www.sqlserverdatamining.com • Site designed and maintained by the SQL Server Data Mining team • Includes: Live samples, tutorials, webcasts, tips and tricks, and FAQ • Data Mining for SQL Server 2008, by ZhaoHui Tang and Jamie MacLennan Copyright © 2009, Solid Quality Mentors. All rights reserved.