SSAS R2 and SharePoint 2010 – Business Intelligence
1. ANALYSIS SERVICES ANALYSIS SERVICES ANALYSIS SERVICES ANALYSIS SERVICES ANALYSIS SERVICES ANALYSIS SERVICES ANALYSIS SERVICES ANALYSIS SERVICES ANALYSIS SERVICES ANALYSIS SERVICES SQL SERVER SQL SERVER SQL SERVER SQL SERVER DATA MINING DATA MINING DATA MINING DATA MINING DATA MINING DATA MINING DATA MINING DATA MINING INTEGRATION SERVICES INTEGRATION SERVICES INTEGRATION SERVICES INTEGRATION SERVICES INTEGRATION SERVICES INTEGRATION SERVICES MS SQL Server Analysis Services 2008 and Enterprise Data Warehousing INTEGRATION SERVICES INTEGRATION SERVICES INTEGRATION SERVICES INTEGRATION SERVICES INTEGRATION SERVICES INTEGRATION SERVICES ANALYSIS SERVICES ANALYSIS SERVICES ANALYSIS SERVICES ANALYSIS SERVICES ANALYSIS SERVICES ANALYSIS SERVICES ANALYSIS SERVICES ANALYSIS SERVICES ANALYSIS SERVICES ANALYSIS SERVICES SQL SERVER SQL SERVER SQL SERVER SQL SERVER DATA MINING DATA MINING DATA MINING DATA MINING DATA MINING DATA MINING DATA MINING DATA MINING
2. About Me SlavaKokaev Email: vkokaev@bostonbi.org Personal website: www.bipro.org Blog: www.bostonbi.org/blog.aspx
4. Drive Corporate PerformanceGiving a purpose to business intelligence “You can’t manage what you can’t measure. You can’t measure what you can’t describe” Robert Kaplan and David Norton Authors of “The Balanced Scorecard”
5. Bike Factory TiresFactory Still Factory AdventureWorks Headquarter Plastic Factory Color Factory Accessory Factory Warehouse Resellers
6. Understanding The Business System Microsoft BI Platform Business Intelligence System Management System Enterprise Data Warehouse System Business Analysis System Operational System
8. Microsoft’s BI platform COLLABORATION CONTENT MANAGEMENT SharePoint Server SEARCH Reports Dashboards Excel Workbooks Analytic Views Scorecards Plans END USER TOOLS & PERFORMANCE MANAGEMENT APPS Excel Power Pivot BI PLATFORM SQL Server Reporting Services SQL Server Analysis Services SQL Server DBMS SQL Server Integration Services
9.
10. SQL Server has led innovation in the BI space for more than a decade
13. Abstract Functional Business Model IDEF0 Modeling Notation Feedback (Improvement) Plan Plans, Business Rule and KPI Input Data Process Output (Facts /Measures) Do Resources Check Result Data Act Data Mining Reporting Services SQL Server Analysis Services
14. Process Decomposition Tree Strategy Level Data Mining Plan Plan Do Do Do Technology Level Analysis Services Check Check Act Act Operations Level Reporting Services
27. Analysis System Multidimensional databases are also called online analytical processing (OLAP) databases and… Contain structures optimized for rapid ad hoc information retrieval Pre-calculate and store aggregated values Include calculation engines for fast, flexible transformation of base data Designed to reveal business trends and statistics not directly visible in the data retrieved from a data warehouse Data mining models discover patterns in data, typically for prediction analysis ProductAssociation Sales Finance Production
28. Data Visualization System Client access and distribution mechanisms can include: Static report viewers and browsers Ad hoc query tools Report writers Modeling applications Scorecard applications Portals and dashboards Delivering data is a process of continuous business improvement: Monitor Analyze Plan
29. What is a dimensional model ? A dimensional model is made up a central fact table (or tables) and its associated dimensions. The dimensional model is also called a star schema because it looks like a star with the fact table in the middle and the dimensions serving as the points on the star. From a relational data modeling perspective, the dimensional model consists of a normalized fact table with denormalized dimension tables.
31. Dimensions Dimensions are the foundation of the dimensional model, describing the objects of the business, such as employee, product, customer, service. They describe the surrounding measurement events. The business processes (facts) or actions of the business in which the dimensions participate. Each dimension table links to all the business processes in which it participates. A single dimension that is shared across all these processes is called a conformed dimension.
32. Fact Tables Each fact table contains the measurements associated with a specific business process. A record in a fact table is a measurement, and a measurement event can always produce a fact table record. These events usually have numeric measurements that quantify the magnitude of the event, such as quantity ordered, sale amount, or call duration. These numbers are called facts(or measuresin Analysis Services). The key to the fact table is a multi-part key made up of a subset of the foreign keys from each dimension table involved in the business event.
33. Sales Business Process Balance Scorecards Sales corrections and Improvement Plan Sales Sales Quota Stock Data Sale Orders (Facts /Measures) Resellers Sales Reseller (Dimension) Sales Result Monitor Sales Sales Summary Sales Transaction Analyze Sales SQL Server DB Sales Representative Sales Manager
36. Hierarchies A hierarchy is a collection of logically structured levels based on attributes. In some hierarchies, each member attribute uniquely implies the member attribute above it.
37. Surrogate Keys Primary key purpose Identifies uniqueness Relates to foreign keys in a fact table Two candidates Business key Represents source primary key Surrogate key Consolidates multiple data sources Consolidates multi-value business keys Allows tracking of dimension history Limits fact table width for optimization Using a surrogate key is considered best practice
39. Snowflaking Snowflaking is the practice of connecting lookup tables to fields in the dimension tables. Sometimes it's easier to maintain a dimension in the ETL process when it's been partially normalized or snowflaked.
46. MDX vs. T-SQL calculate YTD monthly average and compare it over several years for the same selected month WITH MEMBER Measures.MyYTD AS SUM(YTD([Date].[Calendar]),[Measures].[Internet Sales Amount]) MEMBER Measures.MyMonthCount AS SUM(YTD([Date].[Calendar]),(COUNT([Date].[Month of Year]))) MEMBER Measures.MyYTDAVG AS Measures.MyYTD / Measures.MyMonthCount SELECT {Measures.MyYTD, Measures.MyMonthCount,[Measures].[Internet Sales Amount],Measures.MyYTDAVG} On 0, [Date].[Calendar].[Month] On 1 FROM [Adventure Works] WHERE ([Date].[Month of Year].&[7])
47. Slowly Changing Dimensions Support primary role of data warehouse to describe the past accurately Maintain historical context as new or changed data is loaded into dimension tables Slowly Changing Dimension (SCD) types Type 1: Overwrite the existing dimension record Type 2: Insert a new ‘versioned’ dimension record Type 3: Track limited history with attributes The concept of Slowly Changing Dimensions was introduced by Ralph Kimball
48. Slowly Changing Dimensions Type 1 Existing record is updated History is not preserved LastName update to Valdez-Smythe
49. Slowly Changing Dimensions Type 2 Existing record is ‘expired’ and new record inserted History is preserved Most common form of Slowly Changing Dimension SalesTerritoryKey update to 10
50. Slowly Changing Dimensions Type 2 Existing record is updated Limited history is preserved Implementation is rare SalesTerritoryKey update to 10
51. Resources SQL Server 2008 Books Online,msdn2.microsoft.com/en-us/library/bb543165(sql.100).aspx The Microsoft Data Warehouse Toolkit by Joy Mundy, Warren Thornthwaite, and Ralph Kimball The Data Warehouse Lifecycle Toolkit by Ralph Kimball, et al.
Hinweis der Redaktion
Data warehousing and business intelligence are fundamentally about providing business people with the information and tools they need to make both operational and strategic business decisions.Whether the decision making is strategic or operational, from a technical perspective, you need to provide the information necessary to make decisions. Any given decision will likely require a unique subset of information. You'll need to build an information infrastructure that pulls data from across the organization, and potentially from outside the organization, and then cleans, aligns, and restructures the data to make it as flexible and usable as possible. DW/BI system requires technically sophisticated data gathering and management.Finally, you need to provide the business decision makers with the tools they need to make use of the data. In this context, "tools" means much more than just software. It means everything the business users need to understand what information is available, find the subsets they need, and structure the data to illuminate the underlying business dynamics. Therefore, "tools" means training, documentation, and support, along with ad hoc query tools, reports, and analytic applications.
Developing Project Data Sources and Package ConnectionsBecause the main purpose of SSIS is to move data from sources to destinations, the nextmost important step is to add the pointers to these sources and destinations. These pointersare called data sources and connections. Data sources are stored at the project level and arefound in Solution Explorer under the logical folder named Data Sources. Connections, on theother hand, are defined within packages and are found in the Connection Managers pane atthe bottom of the Control Flow or Data Flow tab. Connections can be based on project datasources or can stand alone within packages. The next sections walk you through the uses andimplementation of project data sources and package connections.
Business intelligence (BI) is a broad category of application programs and technologies for gathering, storing, analyzing, and providing access to data to help enterprise users make better business decisionsBI applications include the activities of decision support, query and reporting, online analytical processing (OLAP), statistical analysis, forecasting, and data mining
Think about dimensions as tables in a database because that's how it implements. Each table contains a list of homogeneous entities—products in a manufacturing company, patients in a hospital, vehicles on auto insurance policies, or customers in just about every organization. Usually, a dimension Includes all instances of its entity—all the products the company sells, for example. There is only one active row for each particular instance in the table at any time, and each row has a set of attributes that identify, describe, define, and classify the instance. A product will have a certain size and a standard weight, and belong to a product group. These sizes and groups have descriptions, like a food product might come in Mini-Pak or Jumbo size. A vehicle is painted a certain color, like white, and has a certain option package, such as the Jungle Jim sports utility package (which includes side impact air bags, six-disc CD player, DVD system, and simulated leopard skin seats).
Most facts are numeric and each fact value can vary widely depending on the business process being measured. Most facts are additive (such as dollar or unit sales), meaning they can be summed up across all dimensions. Additivity is important because DW/BI applications seldom retrieve a single fact table record. User queries generally select hundreds or thousands of records at a time and add them up. Other facts are semi-additive (such as market share or account balance), and still others are non-additive (such as unit price).Not all numeric data are facts. Exceptions include discrete descriptive information like package size or weight (describes a product) or customer age (describes a customer). Generally, these less volatile numeric values end up as descriptive attributes in dimension tables. Such descriptive information is more naturally used for constraining a query, rather than being summed in a computation. This distinction is helpful when deciding whether a data element is part of a dimension or fact.Some business processes track events without any real measures. If the event happens, we get an entry in the source system; if not, there is no row. Common examples of this kind of event include employment activities, such as hiring and firing, and event attendance, such as when a student attends a class. The fact tables that track these events typically do not have any actual fact measurements, so they're called factlessfact tables. Actually, we usually add a column called something like EventCount that contains the number 1. This provides users with an easy way to count the number of events by summing the EventCount fact.Some facts are derived or computed from other facts, just as a Net Sale num¬ber is calculated from Gross Sales minus Sales Tax. Some semi-additive facts can be handled using a derived column that is based on the context of the query. Month End Balance would add up across accounts, but not across date, for example. The non-additive Unit Price example could be avoided by defin¬ing it as a computation done in the query, which is Total Amount divided by Total Quantity. There are several options for dealing with these derived or computed facts. You can calculate them as part of the ETL process and store them in the fact table, you can put them in the fact table view definition, or you can include them in the definition of the Analysis Services database. The only way we find unacceptable is to leave the calculation to the user.
Dimensions are a structural attribute of cubes. They are organized hierarchies of categories and (levels) that describe data in the fact table. These categories and levels describe similar sets of members upon which the user wants to base an analysis.Dimensions can also be based on OLAP data mining models. They can be used to store the results of a mining model analysis and can be browsed within the context of a virtual cube.
A surrogate key is a unique value, usually an integer, assigned to each row in the dimension. This surrogate key becomes the primary key of the dimension table and is used to join the dimension to the associated foreign key field in the fact table. Surrogate keys protect the DW/BI system from changes in the source system. Surrogate keys allow the DW/BI system to integrate data from multiple source systems. Different source systems might keep data on the same customers or products, but with different keys. Surrogate keys enable you to add rows to dimensions that do not exist in the source system. Surrogate keys provide the means for tracking changes in dimension attributes over time.
A surrogate key is a unique value, usually an integer, assigned to each row in the dimension. This surrogate key becomes the primary key of the dimension table and is used to join the dimension to the associated foreign key field in the fact table. Surrogate keys protect the DW/BI system from changes in the source system. Surrogate keys allow the DW/BI system to integrate data from multiple source systems. Different source systems might keep data on the same customers or products, but with different keys. Surrogate keys enable you to add rows to dimensions that do not exist in the source system. Surrogate keys provide the means for tracking changes in dimension attributes over time.
It a Dimensions than have changeable attribute values (SCD).There is three types of SCD:Type 1 SCD overwrites the existing attribute value with the new value.The Type 1 change does not preserve the attribute value that was in place at the time a historical transaction occurred. Type 2 change tracking is a powerful technique for capturing the attribute values that were in effect at a point in time and relating them to the business events in which they participated. When a change to a Type 2 attribute occurs, the ETL process creates a new row in the dimension table to capture the new values of the changed item. Type 3, keeps separate columns for both the old and new attribute, Type 3 is less common because it involves changing the physical tables and is not very scalable.
It a Dimensions than have changeable attribute values (SCD).There is three types of SCD:Type 1 SCD overwrites the existing attribute value with the new value.The Type 1 change does not preserve the attribute value that was in place at the time a historical transaction occurred. Type 2 change tracking is a powerful technique for capturing the attribute values that were in effect at a point in time and relating them to the business events in which they participated. When a change to a Type 2 attribute occurs, the ETL process creates a new row in the dimension table to capture the new values of the changed item. Type 3, keeps separate columns for both the old and new attribute, Type 3 is less common because it involves changing the physical tables and is not very scalable.
It a Dimensions than have changeable attribute values (SCD).There is three types of SCD:Type 1 SCD overwrites the existing attribute value with the new value.The Type 1 change does not preserve the attribute value that was in place at the time a historical transaction occurred. Type 2 change tracking is a powerful technique for capturing the attribute values that were in effect at a point in time and relating them to the business events in which they participated. When a change to a Type 2 attribute occurs, the ETL process creates a new row in the dimension table to capture the new values of the changed item. Type 3, keeps separate columns for both the old and new attribute, Type 3 is less common because it involves changing the physical tables and is not very scalable.
It a Dimensions than have changeable attribute values (SCD).There is three types of SCD:Type 1 SCD overwrites the existing attribute value with the new value.The Type 1 change does not preserve the attribute value that was in place at the time a historical transaction occurred. Type 2 change tracking is a powerful technique for capturing the attribute values that were in effect at a point in time and relating them to the business events in which they participated. When a change to a Type 2 attribute occurs, the ETL process creates a new row in the dimension table to capture the new values of the changed item. Type 3, keeps separate columns for both the old and new attribute, Type 3 is less common because it involves changing the physical tables and is not very scalable.