DB Security Oracle 11g-Application Context, Dynamic Views & Aduits
BI Apps Data Mining- SQL Server Analysis Services 2008
1. DATA MINING
WITH
MICROSOFT SQL SERVER ANALYTICAL
SERVICES
By
SUNNY OKORO
2. Contents
Introduction to SSAS Archiecture ................................................................................................................. 2
Entity Relationship Diagram ....................................................................................................................... 11
Description .................................................................................................................................................. 11
Decision Tree Analysis................................................................................................................................. 12
Business Case .............................................................................................................................................. 12
Neural Network Analysis............................................................................................................................. 40
Business Case .............................................................................................................................................. 40
Logistic Regression Analysis ........................................................................................................................ 52
Business Case .............................................................................................................................................. 52
Reference .................................................................................................................................................... 68
1
3. Introduction to SSAS Archiecture
Microsoft SQL Server Analysis Services(SSAS) is one of the components that makes up
the Microsoft Business Intelligence Suit which includes Microsoft SQL Server Reporting
Services (SSRS) and Microsoft SQL Server Intergration Services(SSIS). SSAS can be
designed,depolyed and browsed using Microsoft Business Intelligence Development
Studios(BIDS. SSAS can also be integrated with other Microsoft applications like Excel and
Visio to create mining related projects. For this project BIDS would be utitlized for
design,deployment and browsing. Microsoft Excel would be utitlized to demonstrate mining
execrise on the last mining exercise.
Applications
1. Microsoft SQL Server 2008R2
2. Microsoft Business Intelligence Design Studio
3. Microsoft Excel
4. Microsoft Analysis Server
5. Microsoft Data Mining for Excel(Add-On)
Datasets
1. Adventure Works DataWarehouse
Data Mining
1. Cube
2. Dimensions
3. Mining Structure
Designing Microsoft SSAS Project
2
4. Figure 1 Microsoft BIDS
1. Click SQL Server Business Intelligence Development Studio icon to open BIDS
2. Click File New and select Project as illustrated in figure 1 to open New project Dialog
box illustrated in figure 2 below.
3. Select Analysis Services Project and entre the file name along with is folder path. Click
Ok to return back to BIDS
Figure 2
3
5. Figure 3
Data Source – contains the data source location. Make sure all services relating to the application or
database is started before connecting to a particular database or application
Data Source- contains a graphical representation or ERD of the data from the data source.
Cubes- 3 dimensional view of data
Dimensions –
Mining structure – Mining models like decision tree created upon existing cube or database to construct
data mining
4. Click on the data source to add the data source connection and click next to enter the
credentials needed by SSAS to access the data source as illustrated in Figure 4
5. Click on data source view to add new data source view containing objects like table that
would be used for mining as illustrated in Figure 5
6. Click on Cube to create a new cube based on existing tables from the data source using
the cube wizard which creates new dimensions. The designprocess has been captured in
Figure 6. The cube is created to make the data mining processing faster instead of getting
the data sets from the database.
4
6. 7. Once the Cube has been created , it needs to be processed as illustrated in Figure 7
8. For this mining project, Icreated thedimensions relating to Product, Customer,
Geography, Sales Territory, Time and Currency and applied thosedimension to my cube
which I created later.
9. To create dimensions, click on dimension to open the dimension wizard as illustrated in
Figure 8.
10. To create mining structure, click on mining structure to open the mining structure wizard
as illustrated in Figure 10
5
7. Figure 4 SSAS Data Source
6
Figure 5 SSAS Data Source View
12. Entity Relationship Diagram
Figure 9 Data Source View
Description
The data warehouse schema of Adventure Works Outdoor Company. For the mining exercise only Sales,
DimProduct, DimCustomer, DimSalesTeritorry ,DimGeography and DimTime dimensional and fact
tables would be utilized for mining activities
11
13. Decision Tree Analysis
Business Case
Managers from various sales regions at Adventure Works Outdoor Company want to view the
total of amount spend from the sale data warehouse base on demographics of customers which
are Gender, Marital, Educational and Occupational backgrounds using decision tree.
Demographics data are collected about the customer each time they register their profile online.
Other information collected during the registration process includes Yearly Income and Number
of Children. The goal of this mining activity is to determine the amount of each demography
spends based on the sales data in the data warehouse to aid decision makers in determining
which promotions to create for each demography.
12
37. Branch 3-A-1: Total Amount >=33811.470 And< 4262.810
Figure 45
Branch 3-A-2:Total Amount <3381.470 OR >4262.810
Figure 46
36
38. Branch 3-B: Total Amount > 381.470 and <4115.920
Figure 47
Branch 4: Total Amount <1471.900
Figure 48
37
39. Branch 4-A: Total Amount >=737.450
Figure 49
Branch 4-B: Total Amount >=737.450
Figure 50
38
40. ANALYSIS
The mining models for various decision tresses revealed interesting pictures of the demographics
of the customers in the data warehouse and their spending behaviors. On the Gender level, Male
customers outspend female customers by a small margin 50% to 49% as illustrated on Figure 7
on Decision Trees Analysis Document. Based on marital status married customers outspend
single customers 56% to 43% and in every branch of the decision tree models with expectation
of branch 2-A where the margin remained close 50% to 49% as illustrated in Figure 11on
Decision Tree Analysis Document.On the occupational level, professional and skilled manual
positions represented the majority of the population with 2835(30%) and 2344(24%). However
breakdown of the decision tree models revealed different dynamics when the populations are
sliced intodifferent nodes and the lead once held byprofessional and skilled manual
39
41. positionsdecreases slightly or diminishes as illustrated in branch 3 and corresponding nodes.The
same lesson holds truth for mining based on educational levels. Bachelor degree holders and
customers with partial college experience represented the majority of the population with 29%
and 27% .
Neural Network Analysis
Business Case
Managers at Adventure Works Outdoor Companywant to gain better understandings of the salary
range of each occupation based on the educational levels collected from the customers like
partial college, bachelor, graduate and high school diplomas. The educational demography
includes partial. With the information gained from the mining activity, they would be able to
determine which credits to offer to a customer based on their educational and occupational
background.
40
42. Figure 51Data Mining Wizard-Microsoft Neural Network
Figure 52 SSAS Data Mining Wizard- Microsoft Neural Network Cube Dimension Selection
41
43. Figure 53 SSAS Data Mining Wizard- Microsoft Neural Network Attribute and Measure Selection
Figure 54 SSAS Data Mining Wizard – Microsoft Neural Network Column usage selection
42
44. Figure 55SSAS Data Mining Wizard- Microsoft Neural Network Test Set Creation
Percentage of data for testing has to be set because SSAS would throw numerous errors if the
percentage is above 50%. This done to achieve a good result with the mining model
Figure 56 Data Mining Model Processing-Dim Customer4.dmn
43
45. Figure 57-Dim Customer 4dmn Mining Model
The gender and Marital status attributes has been set to ignore to make the model easier to read and
understand. In this section I would try to compare the income levels of customers based on their
educational levels Bachelor, Graduate and High School Diploma or Degree
Salary Range of Occupations based on Educational Levels of Customers Overview
Figure 58 Overview of the Model
44
46. Bachelors Degree Salary Range of Occupations
Figure 59-Bachelor Degree Salary Range- Model 1
Salary Range:10 ,000.000($10,000) - 35,541.537($35,541.54)
Salary Range: 35,541.537($35,541.54)- 57321817($57,321.82)
Figure 60 Bachelor Degree Salary Range- Model 2
Salary Range Value 1:35, 726.250($35,726.25) – 57,637.887($57,637.89)
Salary Range Value 2:57,637.887($57,637.89) – 79,549.525($79,549.53)
45
47. Figure 61 Bachelor Degree Salary Range- Model 3
Salary Range Value 1 57,637.887($57,637.89) – 79,549.525($79,549.53)
Salary Range Value 2 79,549.525($35,726.25)-155,096.614($155,096.61)
Graduate Degree Salary Range of Occupations
Figure 62Graduate Degree Salary Range- Model 1
Salary Range: 10,000.000($10,000)-35,726.250($35,726.25)
Salary Range: 35,726.250($35,726.25)-57,637.887($57,637.89)
46
48. Figure 63Graduate Degree Salary Range-Model 2
Salary Range Value 1 35,726.250($35,726.25)-57,637.887($57,637.89)
Salary Range Value 2 57,637.887($57,637.89)-79,549.525($79,549.53)
Figure 64 Graduate Degree Salary Range-Model 3
Salary Range Value 1 57,637.887($57,637.89) – 79,549.525($79,549.53)
Salary Range Value 2 79,549.525($79,549.53)- 155,096.614($155,096.61)
High School Diploma SalaryRange of Occupations
47
49. Figure 65High School Diploma Salary Range-Model 1
Salary Range: 10,000.000($10,000)-35,726.250($35,726.50)
Salary Range: 35,726.250($35,726.25)-57,637.887($57,637.89)
Figure 66- High School Diploma Salary Range-Model 2
Salary Range Value 1 35,726.250($35,726.25)-57,637.887($57,637.89)
Salary Range Value 2 57,637.887($57,637.89)-79,549.525($79,549.53)
Figure 67 High School Diploma Salary Range-Model 3
48
50. Salary Range Value 1 57,637.887($57,637.89) – 79,549.525($79,549.53)
Salary Range Value 2 79,549.525($79,549.53)- 155,096.614($155,096.61)
Partial College Salary Rnage of Occupations
Figure 68 Partial College Salary Range of Occuption-Model1
Salary Range: 10,000.000($10,000)-35,726.250($35,726.25)
Salary Range: 35,726.250($35,726.25)-57,637.887($57,637.89)
Figure 69 Partial College Salary Range of Occupation-Model 2
Salary Range Value 1 35,726.250($35,766.25)-57,637.887($57,637.89)
Salary Range Value 2 57,637.887($57,637.88)-79,549.525($79,549.53)
49
51. Figure 70 Partial College Salary Range of Occupation-Model 3
Salary Range Value 1 57,637.887($57,637.89) – 79,549.525($79,549.53)
Salary Range Value 2 79,549.525($79,549.53)- 155,096.614($155,096.61)
Analysis
The income level of the occupations varies based on the educational background and the career.
Clerical and manual labor related positions for example are the careers with average salary
range between $10,000 and $35,000 for customers with bachelor, graduate and high school
diplomas and partial college experiences as illustrated in data mining model 1 of each
educational background. Only skilled manual related careers have an income average between
$10,000 and $35,000 for customers with high school diplomas. A closer extermination of each
mining models based on educational levels indicates discrepancies between occupations based
on the population used to create that specific mining model. For example the average salary for
management position in model 2 for bachelor degree holders is between $57,637.89 and
79,549.53 but in model 3 the average salary range is between $79,549.53 and $155,096.61.
Based on the mining evidence, the state of each of mining models would change based on
50
52. population of the customer records that are added to the data warehouse. The mining model
would partially satisfy the business case considering that a customer with college degree or
college experience tends to earn more money. However additional criteria like payment history
can be used to qualify or disqualify customers from receiving a special coupon.
51
53. Logistic Regression Analysis
Business Case
Managers at Adventure Works Outdoor Company want to gain an understanding of the total
amount spend by customers of a particular product across various Sales Territory Countries
which includes France, United Kingdom, Canada, Germany, United States of America and
Australia by constricting sales from different fiscal year (2002-2005).
52
55. Figure 73 SSAS Data Mining Wizard- Regression Analysis Case Key Selection
Figure 74 SSAS Data Mining Wizard- Regression Analysis Column Usage selection
54
56. Figure 75 SSAS Data Mining Wizard- Regression Analysis Data Type Set up
Figure 76 SSAS Data Mining Wizard- Regression Analysis- Testing Setup
55
57. Figure 77 Sales2 dmn mining model
EXCEL AND DATA MINING
Figure 78Excel Application
To successfully use Excel as a data mining application install Microsoft SQL Server 2008 Data Mining
Add-ins.
1. Click Project Icon to set up the configurations which would open the Analysis Services
Connection Wizard displayed in Figure 55
Make sure toStart Services relating to SQL Server & SSAS
2. Click New to enter the credentials needed to access SSAS in the Connect to Analysis Services
displayed in Figure 56
3. Click Manage Models and select the structures and Models applicable as Figure 57. Process the
model
4. Click Browse and select the model and Click Next
56
58. 5. Select Attribute filter to filter outputs and copy the data to excel as illustrated in figure 58
Figure 79 Excel SSAS Connection Configuration
57
67. Snapshot of USA Sales2-US(2004-2005) Fiscal Year
Each graph bar contains numeric values associated with the fiscal year of each product
66
68. Analysis
The mining model satisfies the business case because each product sales are broken down based on
sales territories across the fiscal years from 2002 to 2005. For example Road-150 Red, 44 product sales
were at $100 in both Canadian and Australian sales territories. Having these mining models allows
managers throughout the various sales territories to compare sales prices based on fiscal year.
67
69. Reference
Cameron, S (2009). Microsoft SQL Server 2008.Analysis Services Step by Step. Retrieved from
http://proquestcombo.safaribooksonline.com.ezproxy.umuc.edu/book/databases/microsoft-sql-
server/9780735626201?bookview=overview
Ben-gan, I (2008).Microsoft SQL Server 2008 T-SQL Fundamentals. Redmond, WA: Microsoft
Press.
Nielsen,P , Parui, U & White, M(2009) Microsoft SQL Server 2008 Bible. Indianapolis, IN:
Wiley Publishing, Inc.
Fouché, P(2010). Pro SQL Server 2008 Analysis Services. Retrieved from
http://proquestcombo.safaribooksonline.com.ezproxy.umuc.edu/book/databases/microsoft-sql-
server/9781430219958?bookview=overview
Langit,L , Goff, K, Mauri, D,Malik, S &Welch,J(2008). Smart Business Intelligence Solutions
with Microsoft SQL Server 2008.
Retrieved from
http://proquestcombo.safaribooksonline.com.ezproxy.umuc.edu/book/databases/microsoft-sql-
server/9780735625808
Vitt, E, Luckevich, M &Misner,S (2008).Business Intelligence.
Retrieved from
http://proquestcombo.safaribooksonline.com.ezproxy.umuc.edu/book/databases/business-
intelligence/9780735626607
68