2. Types of analysis
• Ad-hoc query/Reporting/Analysis
– What is the purpose?
• Simple reports
• Key Performance Indicators
• OLAP cubes – Slice & Dice
– In Real time - What happens now?
• Events/Triggers
• Data Mining
– How do we do it?
– What happens?
3. What does Data Mining Do?
Explores
Your Data
Finds
Patterns
Performs
Predictions
4. Data Mining Algorithms
• Classification
• Regression
• Segmentation
• Association
• Forecasting
• Text Analysis
• Advanced Data Exploration
6. Data Mining Process
SSAS
(OLAP)
Business Data
DSV
Understanding Understanding
SSIS
SSAS
Data
Data (OLAP)
Preparation
SSIS
SSAS(OLAP)
SSRS Deployment
Flexible APIs SSAS
Modeling (Data
Mining)
Evaluation
www.crisp-dm.org
7. Data Mining in SQL Server 2008
• New algorithms developed in conjunction
with Microsoft Research
• Data mining is made accessible and easy to
use through integrated user interface, cross-
product integration and familiar, standard APIs
• Complete framework for building and
deploying intelligent applications on the fly
• Integration into the cloud.
8. Top New Features in SQL Server 2008
• Test multiple data mining models simultaneously with statistical
scores of error and accuracy and confirm their stability with cross
validation
• Build multiple, incompatible mining models within a single
structure; apply model analysis over filtered data; query against
structure data to present complete information, all enabled by
enhanced mining structures
• Combine the best of both worlds by blending optimized near-term
predictions (ARTXP) and stable long-term predictions (ARIMA) with
Better Time Series Support
• Discover the relationship between items that are frequently
purchased together by using Shopping Basket Analysis; generate
interactive forms for scoring new cases with Predictive Calculator,
delivered with Microsoft SQL Server 2008 Data Mining Add-ins for
Office 2007
9. Rich and Innovative Algorithms
• Benefit from many rich and innovative data mining algorithms, most developed by Microsoft Research to
support common business problems promptly and accurately.
• Market Basket Analysis - Discover which items tend to be bought together to create recommendations on-
the-fly and to determine how product placement can directly contribute to your bottom line
• Churn Analysis - Anticipate customers who may be considering canceling their service and identify benefits
that will keep them from leaving
• Market Analysis - Define market segments by automatically grouping similar customers together. Use
these segments to seek profitable customers
• Forecasting - Predict sales and inventory amounts and learn how they are interrelated to foresee
bottlenecks and improve performance
• Data Exploration - Analyze profitability across customers, or compare customers who prefer different
brands of the same product to discover new opportunities
• Unsupervised Learning - Identify previously unknown relationships between various elements of your
business to better inform your decisions
• Web Site Analysis - Understand how people use your Web site and group similar usage patterns to offer a
better experience
• Campaign Analysis - Spend marketing dollars more effectively by targeting the customers most likely to
respond to a promotion
• Information Quality - Identify and handle anomalies during data entry or data loading to improve the
quality of information
• Text Analysis - Analyze feedback to find common themes and trends that concern your customers or
employees, informing decisions with unstructured input
10. Value of Data Mining
Business Knowledge
SQL Server 2008
Business value
Data Mining
OLAP
Reports (Adhoc)
Reports (static)
Simple Complex
Usability
11. Data Mining User Interface
• SQL Server BI Development Studio
– Environment for creation and data exploration
– Data Mining projects in Visual Studio solutions, tightly
integrated
– Source Control Integration
• SQL Server Management Studio
– One tool for all administrative tasks
– Manage, view and query mining models
12. BI Integration
• Integration Services
– Data Mining processing and results integrate
directly in IS pipeline
• OLAP
– Processing of mining models directly from
cubes
– Use of mining results as dimensions
• Reporting Services
– Embed Data Mining results directly in
Reporting Services Reports
13. Applied Data Mining
• Make Decisions without Coding
– Learn business rules directly from data
• Client Customization
– Learn logic customized for each client
• Automatic Update
– Data mining application logic updated by model re-
processing
– Applications do not need to be rewritten, recompiled, re-
deployed
14. Server Mining Architecture
BI Dev Your Application
Studio
(Visual
Studio) OLE DB/ ADOMD/ XMLA
App
Deploy Data
Analysis Services Mining Model
Server
Data Mining Algorithm Data
Source
15. Data Mining EXtensions
• OLE DB for Data Mining specification
– Now part of XML/A specification
– See www.xmla.org for XML/A details
• Connect to Analysis Server
– OLEDB, ADO, ADO.Net, ADOMD.Net, XMLA
Dim cmd as ADOMD.Command
Dim reader as ADOMD.DataReader
Cmd.Connection = conn
Set reader =
Cmd.ExecuteReader(“Select
Predict(Gender)…”)
16. Typical DM Process Using DMX
Define a model:
CREATE MINING MODEL ….
Data Mining
Train a model: Management System
INSERT INTO dmm …. (DMMS)
Training Data
Prediction using a model: Mining Model
SELECT …
FROM dmm PREDICTION JOIN …
Prediction Input Data
17. DMX Commands
• Definition (DDL)
– CREATE – Make new model
– SELECT INTO – Create model by copying existing
– EXPORT – Save model as .abf file
– IMPORT – Retrieve model from .abf file
• Manipulation (DML)
– INSERT INTO – Train model
– UPDATE – Change content of model
– DELETE – Clear content
– SELECT – Browse model
18. DMX SELECT Elements
• SELECT [FLATTENED] [TOP] <columns>
• FROM <model>
• PREDICTION JOIN <table>
• ON <mapping>
• WHERE <filter>
• ORDER BY <sort expression>
– Use query builder to create SELECT statement
19. Training a DM Model: Simple
INSERT INTO CollegePlanModel
(StudentID, Gender, ParentIncome,
Encouragement, CollegePlans)
OPENROWSET(‘<provider>’, ‘<connection>’,
‘SELECT StudentID,
Gender,
ParentIncome,
Encouragement,
CollegePlans
FROM CollegePlansTrainData’)
20. Prediction Using a DM Model
• PREDICTION JOIN
SELECT t.ID, CPModel.Plan
FROM CPModel PREDICTION JOIN
OPENQUERY(…,„SELECT * FROM NewStudents‟) AS t
ON CPModel.Gender = t.Gender AND
CPModel.IQ = t.IQ
21. Visit more self help tutorials
• Pick a tutorial of your choice and browse
through it at your own pace.
• The tutorials section is free, self-guiding and
will not involve any additional support.
• Visit us at www.dataminingtools.net