SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Introduction toIntroduction to
DataData
WarehousingWarehousing
What is Data Warehouse?What is Data Warehouse?
• “A data warehouse is a subject-oriented, integrated,
time-variant, and nonvolatile collection of data in
support of management’s decision-making
process.”—W. H. Inmon
• A Data Warehouse is used for On-Line-Analytical-
Processing:
“Class of tools that enables the user to gain insight into data through interactive
access to a wide variety of possible views of the information”
• 3 Billion market worldwide [1999 figure,
olapreport.com]
o Retail industries: user profiling, inventory management
o Financial services: credit card analysis, fraud detection
o Telecommunications: call analysis, fraud detection
Data Warehouse InitiativesData Warehouse Initiatives
• Organized around major subjects, such as customer, product,
sales
o integrate multiple, heterogeneous data sources
o exclude data that are not useful in the decision support process
• Focusing on the modeling and analysis of data for decision
makers, not on daily operations or transaction processing
o emphasis is on complex, exploratory analysis not day-to-day operations
• Large time horizon for trend analysis (current and past data)
• Non-Volatile store
o physically separate store from the operational environment
Data Warehouse ArchitectureData Warehouse Architecture
• Extract data from
operational data sources
o clean, transform
• Bulk load/refresh
o warehouse is offline
• OLAP-server provides
multidimensional view
• Multidimensional-olap
(Essbase, oracle express)
• Relational-olap
(Redbrick, Informix, Sybase,
SQL server)
Why do we need all that?Why do we need all that?
• Operational databases are for On Line Transaction Processing
o automate day-to-day operations (purchasing, banking etc)
o transactions access (and modify!) a few records at a time
o database design is application oriented
o metric: transactions/sec
• Data Warehouse is for On Line Analytical Processing (OLAP)
o complex queries that access millions of records
o need historical data for trend analysis
o long scans would interfere with normal operations
o synchronizing data-intensive queries among physically separated
databases would be a nightmare!
o metric: query response time
Examples of OLAPExamples of OLAP
• Comparisons (this period v.s. last period)
o Show me the sales per region for this year and compare it to that of the
previous year to identify discrepancies
• Multidimensional ratios (percent to total)
o Show me the contribution to weekly profit made by all items sold in the
northeast stores between may 1 and may 7
• Ranking and statistical profiles (top N/bottom N)
o Show me sales, profit and average call volume per day for my 10 most
profitable salespeople
• Custom consolidation (market segments, ad hoc groups)
o Show me an abbreviated income statement by quarter for the last four
quarters for my northeast region operations
Multidimensional ModelingMultidimensional Modeling
• Example: compute total sales volume per product
and store
ProductTotal
Sales 1 2 3 4
1 $454 - - $925
2 $468 $800 - -
3 $296 - $240 -
Store
4 $652 - $540 $745
Product
Store
800
Dimensions and HierarchiesDimensions and Hierarchies
DIMENSIONS
month
NY
category region year
product country quarter
city month week
day
store
PRODUCT LOCATION TIME
productcity
DVD
August
Sales of DVDs in
NY in August
• A cell in the cube may store values (measurements) relative to the
combination of the labeled dimensions
• A cell in the cube may store values (measurements) relative to the
combination of the labeled dimensions• A cell in the cube may store values (measurements) relative to the
combination of the labeled dimensions
• A cell in the cube may store values (measurements) relative to the
combination of the labeled dimensions
• A cell in the cube may store values (measurements) relative to the
combination of the labeled dimensions
Common OLAP OperationsCommon OLAP Operations
• Roll-up: move up the hierarchy
o e.g given total sales per
city, we can roll-up to get
sales per state
• Drill-down: move down the
hierarchy
o more fine-grained
aggregation
o lowest level can be the
detail records (drill-through)
category region year
product country quarter
state month week
city day
store
PRODUCT LOCATION TIME
PivotingPivoting
• Pivoting: aggregate on selected dimensions
o usually 2 dims (cross-tabulation)
Product
Sales
1 2 3 4 ALL
1 454 - - 925 1379
2 468 800 - - 1268
3 296 - 240 - 536
4 652 - 540 745 1937
Store
ALL 1870 800 780 1670 5120
Slice and Dice QueriesSlice and Dice Queries
• Slice and Dice: select and project on one or
more dimensions
product
customers
storecustomer = “Smith”
RoadmapRoadmap
• What is a data warehouse and what it is for
• What are the differences between OLTP and OLAP
• Multi-dimensional data modeling
• Data warehouse design
o the star schema, bitmap indexes
• The Data Cube operator
o semantics and computation
• Aggregate View Selection
• Other Issues
Data Warehouse DesignData Warehouse Design
• Most data warehouses adopt a star schema to represent the
multidimensional model
• Each dimension is represented by a dimension-table
o LOCATION(location_key,store,street_address,city,state,cou
ntry,region)
o dimension tables are not normalized
• Transactions are described through a fact-table
o each tuple consists of a pointer to each of the dimension-
tables (foreign-key) and a list of measures (e.g. sales $$$)
Star Schema ExampleStar Schema Example
time_key
day
day_of_the_week
month
quarter
year
TIME
location_key
store
street_address
city
state
country
region
LOCATION
SALES
product_key
product_name
category
brand
color
supplier_name
PRODUCT
time_key
product_key
location_key
units_sold
amount
{measures
Advantages of Star SchemaAdvantages of Star Schema
• Facts and dimensions are clearly depicted
o dimension tables are relatively static, data is loaded (append
mostly) into fact table(s)
o easy to comprehend (and write queries)
“Find total sales per product-category in our stores in Europe”
SELECT PRODUCT.category, SUM(SALES.amount)
FROM SALES, PRODUCT,LOCATION
WHERE SALES.product_key = PRODUCT.product_key
AND SALES.location_key = LOCATION.location_key
AND LOCATION.region=“Europe”
GROUP BY PRODUCT.category
Star Schema Query ProcessingStar Schema Query Processing
{measures
JOIN
time_key
day
day_of_the_week
month
quarter
year
TIME
location_key
store
street_address
city
state
country
region
LOCATION
SALES
product_key
product_name
category
brand
color
supplier_name
PRODUCT
time_key
product_key
location_key
units_sold
amount
JOIN
Sregion=“Europe”
Pcategory
Indexing OLAP Data: Bitmap IndexIndexing OLAP Data: Bitmap Index
• Each value in the column has a bit vector:
o The i-th bit is set if the i-th row of the base table has the value
for the indexed column
o The length of the bit vector: # of records in the base table
• Mainly intended for small cardinality domains
location_key Region
L1 Asia
L2 Europe
L3 Asia
L4 America
L5 Europe
Asia Europe America
1 0 0
0 1 0
1 0 0
0 0 1
0 1 0
LOCATION Index on Region
Join-IndexJoin-Index
• Join index relates the values
of the dimensions of a star
schema to rows in the fact
table.
o a join index on region maintains
for each distinct region a list of
ROW-IDs of the tuples recording
the sales in the region
• Join indices can span
multiple dimensions OR
o can be implemented as
bitmap-indexes (per
dimension)
o use bit-op for multiple-joins
R102
R117
R118
R124
SALES
region = Africa
region = America
region = Asia
region = Europe
LOCATION
1
1
1
1
Problem Solved?Problem Solved?
• “Find total sales per product-category in our stores in Europe”
o Join-index will prune ¾ of the data (uniform sales), but the remaining ¼ is
still large (several millions transactions)
• Index is unclustered
• High level aggregations are expensive!!!!!
o long scans to get the data
o hashing or sorting necessary for group-bys region
country
state
city
store
LOCATON
⇒Long Query Response Times
⇒Pre-computation is necessary
Multiple Simultaneous AggregatesMultiple Simultaneous Aggregates
Product
Sales
1 2 3 4 ALL
1 454 - - 925 1379
2 468 800 - - 1268
3 296 - 240 - 536
4 652 - 540 745 1937
Store
ALL 1870 800 780 1670 5120
Cross-Tabulation (products/store)
Sub-totals per store
Sub-totals per product
Total sales
4 Group-bys here:
(store,product)
(store)
(product)
()
Need to write 4 queries!!!
The Data Cube Operator (Gray et al)The Data Cube Operator (Gray et al)
• All previous aggregates in a single query:
SELECT LOCATION.store, SALES.product_key, SUM (amount)
FROM SALES, LOCATION
WHERE SALES.location_key=LOCATION.location_key
CUBE BY SALES.product_key, LOCATION.store
Challenge: Optimize Aggregate Computation
Store Product_key sum(amout)
1 1 454
1 4 925
2 1 468
2 2 800
3 1 296
3 3 240
4 1 625
4 3 240
4 4 745
1 ALL 1379
1 ALL 1268
1 ALL 536
1 ALL 1937
ALL 1 1870
ALL 2 800
ALL 3 780
ALL 4 1670
ALL ALL 5120
Relational View of Data CubeRelational View of Data Cube
Product
Sales
1 2 3 4 ALL
1 454 - - 925 1379
2 468 800 - - 1268
3 296 - 240 - 536
4 652 - 540 745 1937
Store
ALL 1870 800 780 1670 5120
SELECT LOCATION.store, SALES.product_key, SUM (amount)
FROM SALES, LOCATION
WHERE SALES.location_key=LOCATION.location_key
CUBE BY SALES.product_key, LOCATION.store
Data Cube: Multidimensional ViewData Cube: Multidimensional View
Total annual sales
of DVDs in America
Quarter
Product
Region
sum
sum
DVD
VCR
PC
1Qtr 2Qtr 3Qtr 4Qtr
America
Europe
Asia
sum
Other Extensions to SQLOther Extensions to SQL
• Complex aggregation at multiple granularities (Ross et. all 1998)
o Compute multiple dependent aggregates
• Other proposals: the MD-join operator (Chatziantoniou et. all 1999]
SELECT LOCATION.store, SALES.product_key, SUM (amount)
FROM SALES, LOCATION
WHERE SALES.location_key=LOCATION.location_key
CUBE BY SALES.product_key, LOCATION.store: R
SUCH THAT R.amount = max(amount)
Data Cube ComputationData Cube Computation
• Model dependencies among the aggregates:
most detailed “view”
can be computed from view
(product,store,quarter) by
summing-up all quarterly sales
product,store,quarter
product storequarter
none
store,quarterproduct,quarter product, store
Computation DirectivesComputation Directives
• Hash/sort based methods (Agrawal et. al. VLDB’96)
1. Smallest-parent
2. Cache-results
3. Amortize-scans
4. Share-sorts
5. Share-partitions
product,store,quarter
product storequarter
none
store,quarterproduct,quarter product, store
Alternative Array-based ApproachAlternative Array-based Approach
• Model data as a sparse multidimensional array
o partition array into chunks (a small sub-cube which fits in
memory).
o fast addressing based on (chunk_id, offset)
• Compute aggregates in “multi-way” by visiting cube cells in
the order which minimizes the # of times to visit each cell, and
reduces memory access and storage cost.
B
What is the best
traversing order to do
multi-way aggregation?
Reality check:too many views!Reality check:too many views!
• 2n
views for n
dimensions
(no-
hierarchies)
• Storage/upd
ate-time
explosion
• More pre-
computation
doesn’t mean
better
performance!
!!!
How to choose among the views?How to choose among the views?
• Use some notion of benefit per view
• Limit: disk space or maintenance-time
product,store,quarter
product storequarter
none
store,quarterproduct,quarter product, store
Hanarayan et al SIGMOD’96:
Pick views greedily until space is
filled
∑
<≤
−=
)()(,:
))()((),(
uCuCvuu
vS
Sv
uCuCSvB
Catch: quadratic in the number
of views, which is exponential!!!
View Selection ProblemView Selection Problem
• Selection is based on a workload estimate (e.g. logs) and a
given constraint (disk space or update window)
• NP-hard, optimal selection can not be computed > 4-5
dimensions
o greedy algorithms (e.g. [Harinarayan96]) run at least in polynomial
time in the number of views i.e exponential in the number of
dimensions!!!
• Optimal selection can not be approximated [Karloff99]
o greedy view selection can behave arbitrary bad
• Lack of good models for a cost-based optimization!
Other IssuesOther Issues
• Fact+Dimension tables in the DW are views of tables stored in the
sources
• Lots of view maintenance problems
o correctly reflect asynchronous changes at the sources
o making views self-maintainable
• Interactive queries (on-line aggregation)
o e.g. show running-estimates + confidence intervals
• Computing Iceberg queries efficiently
• Approximation
o rough-estimates for hi-level aggregates are often good-enough
o histogram, wavelet, sampling based techniques (e.g. AQUA)
ThankThank You !!!You !!!
For More Information click below link:
Follow Us on:
http://vibranttechnologies.co.in/datawarehousing-classes-in-mumbai.html

Weitere ähnliche Inhalte

Andere mochten auch (14)

Sql server select queries ppt 18
Sql server select queries ppt 18Sql server select queries ppt 18
Sql server select queries ppt 18
 
Datastage
DatastageDatastage
Datastage
 
Datastage Introduction To Data Warehousing
Datastage Introduction To Data WarehousingDatastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
 
QTP Training by INFOTECH
QTP Training by INFOTECHQTP Training by INFOTECH
QTP Training by INFOTECH
 
SQL- Introduction to SQL database
SQL- Introduction to SQL database SQL- Introduction to SQL database
SQL- Introduction to SQL database
 
Salesforce - Introduction to Security & Access
Salesforce -  Introduction to Security & Access Salesforce -  Introduction to Security & Access
Salesforce - Introduction to Security & Access
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.
 
Teradata a z
Teradata a zTeradata a z
Teradata a z
 
Datastage database design and data modeling ppt 4
Datastage database design and data modeling ppt 4Datastage database design and data modeling ppt 4
Datastage database design and data modeling ppt 4
 
Buisness analyst business analysis overview ppt 5
Buisness analyst business analysis overview ppt 5Buisness analyst business analysis overview ppt 5
Buisness analyst business analysis overview ppt 5
 
datastage training | datastage online training | datastage training videos | ...
datastage training | datastage online training | datastage training videos | ...datastage training | datastage online training | datastage training videos | ...
datastage training | datastage online training | datastage training videos | ...
 
Data ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housingData ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housing
 
data stage-material
data stage-materialdata stage-material
data stage-material
 
Teradata
TeradataTeradata
Teradata
 

Ähnlich wie Data ware housing- Introduction to olap .

1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)tafosepsdfasg
 
Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesInformaticaTrainingClasses
 
Olap fundamentals
Olap fundamentalsOlap fundamentals
Olap fundamentalsAmit Sharma
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPDhiren Gala
 
Intro to datawarehouse dev 1.0
Intro to datawarehouse   dev 1.0Intro to datawarehouse   dev 1.0
Intro to datawarehouse dev 1.0Jannet Peetz
 
RetailSense - Multi-Retailer Point of Sale Analytics
RetailSense - Multi-Retailer Point of Sale AnalyticsRetailSense - Multi-Retailer Point of Sale Analytics
RetailSense - Multi-Retailer Point of Sale AnalyticsWinston Ledet
 
IT301-Datawarehousing (1) and its sub topics.pptx
IT301-Datawarehousing (1) and its sub topics.pptxIT301-Datawarehousing (1) and its sub topics.pptx
IT301-Datawarehousing (1) and its sub topics.pptxReneeClintGortifacio
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 
Mba 433 MIS - Data Warehouse
Mba 433 MIS - Data WarehouseMba 433 MIS - Data Warehouse
Mba 433 MIS - Data WarehouseVinita Prasad
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptxhqlm1
 

Ähnlich wie Data ware housing- Introduction to olap . (20)

1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)
 
3dw
3dw3dw
3dw
 
3dw
3dw3dw
3dw
 
ch19.ppt
ch19.pptch19.ppt
ch19.ppt
 
ch19.ppt
ch19.pptch19.ppt
ch19.ppt
 
Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
 
Olap fundamentals
Olap fundamentalsOlap fundamentals
Olap fundamentals
 
Essbase intro
Essbase introEssbase intro
Essbase intro
 
OLAP
OLAPOLAP
OLAP
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAP
 
02 Essbase
02 Essbase02 Essbase
02 Essbase
 
Intro to datawarehouse dev 1.0
Intro to datawarehouse   dev 1.0Intro to datawarehouse   dev 1.0
Intro to datawarehouse dev 1.0
 
RetailSense - Multi-Retailer Point of Sale Analytics
RetailSense - Multi-Retailer Point of Sale AnalyticsRetailSense - Multi-Retailer Point of Sale Analytics
RetailSense - Multi-Retailer Point of Sale Analytics
 
IT301-Datawarehousing (1) and its sub topics.pptx
IT301-Datawarehousing (1) and its sub topics.pptxIT301-Datawarehousing (1) and its sub topics.pptx
IT301-Datawarehousing (1) and its sub topics.pptx
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
2. data warehouse 2nd unit
2. data warehouse 2nd unit2. data warehouse 2nd unit
2. data warehouse 2nd unit
 
Mba 433 MIS - Data Warehouse
Mba 433 MIS - Data WarehouseMba 433 MIS - Data Warehouse
Mba 433 MIS - Data Warehouse
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptx
 

Mehr von Vibrant Technologies & Computers

Mehr von Vibrant Technologies & Computers (20)

SQL Introduction to displaying data from multiple tables
SQL Introduction to displaying data from multiple tables  SQL Introduction to displaying data from multiple tables
SQL Introduction to displaying data from multiple tables
 
SQL- Introduction to MySQL
SQL- Introduction to MySQLSQL- Introduction to MySQL
SQL- Introduction to MySQL
 
ITIL - introduction to ITIL
ITIL - introduction to ITILITIL - introduction to ITIL
ITIL - introduction to ITIL
 
Salesforce - classification of cloud computing
Salesforce - classification of cloud computingSalesforce - classification of cloud computing
Salesforce - classification of cloud computing
 
Salesforce - cloud computing fundamental
Salesforce - cloud computing fundamentalSalesforce - cloud computing fundamental
Salesforce - cloud computing fundamental
 
SQL- Introduction to PL/SQL
SQL- Introduction to  PL/SQLSQL- Introduction to  PL/SQL
SQL- Introduction to PL/SQL
 
SQL- Introduction to advanced sql concepts
SQL- Introduction to  advanced sql conceptsSQL- Introduction to  advanced sql concepts
SQL- Introduction to advanced sql concepts
 
SQL Inteoduction to SQL manipulating of data
SQL Inteoduction to SQL manipulating of data   SQL Inteoduction to SQL manipulating of data
SQL Inteoduction to SQL manipulating of data
 
SQL- Introduction to SQL Set Operations
SQL- Introduction to SQL Set OperationsSQL- Introduction to SQL Set Operations
SQL- Introduction to SQL Set Operations
 
Sas - Introduction to designing the data mart
Sas - Introduction to designing the data martSas - Introduction to designing the data mart
Sas - Introduction to designing the data mart
 
Sas - Introduction to working under change management
Sas - Introduction to working under change managementSas - Introduction to working under change management
Sas - Introduction to working under change management
 
SAS - overview of SAS
SAS - overview of SASSAS - overview of SAS
SAS - overview of SAS
 
Teradata - Architecture of Teradata
Teradata - Architecture of TeradataTeradata - Architecture of Teradata
Teradata - Architecture of Teradata
 
Teradata - Restoring Data
Teradata - Restoring Data Teradata - Restoring Data
Teradata - Restoring Data
 
Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
 
Sql server T-sql basics ppt-3
Sql server T-sql basics  ppt-3Sql server T-sql basics  ppt-3
Sql server T-sql basics ppt-3
 
Sql server building a database ppt 12
Sql server building a database ppt 12Sql server building a database ppt 12
Sql server building a database ppt 12
 
Sql server introduction to sql server
Sql server introduction to sql server Sql server introduction to sql server
Sql server introduction to sql server
 
websphere - Understanding web sphere performance
websphere - Understanding web sphere performancewebsphere - Understanding web sphere performance
websphere - Understanding web sphere performance
 
Websphere - Introduction to SSL part 1
Websphere  - Introduction to SSL part 1Websphere  - Introduction to SSL part 1
Websphere - Introduction to SSL part 1
 

Kürzlich hochgeladen

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 

Kürzlich hochgeladen (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 

Data ware housing- Introduction to olap .

  • 1.
  • 3. What is Data Warehouse?What is Data Warehouse? • “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.”—W. H. Inmon • A Data Warehouse is used for On-Line-Analytical- Processing: “Class of tools that enables the user to gain insight into data through interactive access to a wide variety of possible views of the information” • 3 Billion market worldwide [1999 figure, olapreport.com] o Retail industries: user profiling, inventory management o Financial services: credit card analysis, fraud detection o Telecommunications: call analysis, fraud detection
  • 4. Data Warehouse InitiativesData Warehouse Initiatives • Organized around major subjects, such as customer, product, sales o integrate multiple, heterogeneous data sources o exclude data that are not useful in the decision support process • Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing o emphasis is on complex, exploratory analysis not day-to-day operations • Large time horizon for trend analysis (current and past data) • Non-Volatile store o physically separate store from the operational environment
  • 5. Data Warehouse ArchitectureData Warehouse Architecture • Extract data from operational data sources o clean, transform • Bulk load/refresh o warehouse is offline • OLAP-server provides multidimensional view • Multidimensional-olap (Essbase, oracle express) • Relational-olap (Redbrick, Informix, Sybase, SQL server)
  • 6. Why do we need all that?Why do we need all that? • Operational databases are for On Line Transaction Processing o automate day-to-day operations (purchasing, banking etc) o transactions access (and modify!) a few records at a time o database design is application oriented o metric: transactions/sec • Data Warehouse is for On Line Analytical Processing (OLAP) o complex queries that access millions of records o need historical data for trend analysis o long scans would interfere with normal operations o synchronizing data-intensive queries among physically separated databases would be a nightmare! o metric: query response time
  • 7. Examples of OLAPExamples of OLAP • Comparisons (this period v.s. last period) o Show me the sales per region for this year and compare it to that of the previous year to identify discrepancies • Multidimensional ratios (percent to total) o Show me the contribution to weekly profit made by all items sold in the northeast stores between may 1 and may 7 • Ranking and statistical profiles (top N/bottom N) o Show me sales, profit and average call volume per day for my 10 most profitable salespeople • Custom consolidation (market segments, ad hoc groups) o Show me an abbreviated income statement by quarter for the last four quarters for my northeast region operations
  • 8. Multidimensional ModelingMultidimensional Modeling • Example: compute total sales volume per product and store ProductTotal Sales 1 2 3 4 1 $454 - - $925 2 $468 $800 - - 3 $296 - $240 - Store 4 $652 - $540 $745 Product Store 800
  • 9. Dimensions and HierarchiesDimensions and Hierarchies DIMENSIONS month NY category region year product country quarter city month week day store PRODUCT LOCATION TIME productcity DVD August Sales of DVDs in NY in August • A cell in the cube may store values (measurements) relative to the combination of the labeled dimensions • A cell in the cube may store values (measurements) relative to the combination of the labeled dimensions• A cell in the cube may store values (measurements) relative to the combination of the labeled dimensions • A cell in the cube may store values (measurements) relative to the combination of the labeled dimensions • A cell in the cube may store values (measurements) relative to the combination of the labeled dimensions
  • 10. Common OLAP OperationsCommon OLAP Operations • Roll-up: move up the hierarchy o e.g given total sales per city, we can roll-up to get sales per state • Drill-down: move down the hierarchy o more fine-grained aggregation o lowest level can be the detail records (drill-through) category region year product country quarter state month week city day store PRODUCT LOCATION TIME
  • 11. PivotingPivoting • Pivoting: aggregate on selected dimensions o usually 2 dims (cross-tabulation) Product Sales 1 2 3 4 ALL 1 454 - - 925 1379 2 468 800 - - 1268 3 296 - 240 - 536 4 652 - 540 745 1937 Store ALL 1870 800 780 1670 5120
  • 12. Slice and Dice QueriesSlice and Dice Queries • Slice and Dice: select and project on one or more dimensions product customers storecustomer = “Smith”
  • 13. RoadmapRoadmap • What is a data warehouse and what it is for • What are the differences between OLTP and OLAP • Multi-dimensional data modeling • Data warehouse design o the star schema, bitmap indexes • The Data Cube operator o semantics and computation • Aggregate View Selection • Other Issues
  • 14. Data Warehouse DesignData Warehouse Design • Most data warehouses adopt a star schema to represent the multidimensional model • Each dimension is represented by a dimension-table o LOCATION(location_key,store,street_address,city,state,cou ntry,region) o dimension tables are not normalized • Transactions are described through a fact-table o each tuple consists of a pointer to each of the dimension- tables (foreign-key) and a list of measures (e.g. sales $$$)
  • 15. Star Schema ExampleStar Schema Example time_key day day_of_the_week month quarter year TIME location_key store street_address city state country region LOCATION SALES product_key product_name category brand color supplier_name PRODUCT time_key product_key location_key units_sold amount {measures
  • 16. Advantages of Star SchemaAdvantages of Star Schema • Facts and dimensions are clearly depicted o dimension tables are relatively static, data is loaded (append mostly) into fact table(s) o easy to comprehend (and write queries) “Find total sales per product-category in our stores in Europe” SELECT PRODUCT.category, SUM(SALES.amount) FROM SALES, PRODUCT,LOCATION WHERE SALES.product_key = PRODUCT.product_key AND SALES.location_key = LOCATION.location_key AND LOCATION.region=“Europe” GROUP BY PRODUCT.category
  • 17. Star Schema Query ProcessingStar Schema Query Processing {measures JOIN time_key day day_of_the_week month quarter year TIME location_key store street_address city state country region LOCATION SALES product_key product_name category brand color supplier_name PRODUCT time_key product_key location_key units_sold amount JOIN Sregion=“Europe” Pcategory
  • 18. Indexing OLAP Data: Bitmap IndexIndexing OLAP Data: Bitmap Index • Each value in the column has a bit vector: o The i-th bit is set if the i-th row of the base table has the value for the indexed column o The length of the bit vector: # of records in the base table • Mainly intended for small cardinality domains location_key Region L1 Asia L2 Europe L3 Asia L4 America L5 Europe Asia Europe America 1 0 0 0 1 0 1 0 0 0 0 1 0 1 0 LOCATION Index on Region
  • 19. Join-IndexJoin-Index • Join index relates the values of the dimensions of a star schema to rows in the fact table. o a join index on region maintains for each distinct region a list of ROW-IDs of the tuples recording the sales in the region • Join indices can span multiple dimensions OR o can be implemented as bitmap-indexes (per dimension) o use bit-op for multiple-joins R102 R117 R118 R124 SALES region = Africa region = America region = Asia region = Europe LOCATION 1 1 1 1
  • 20. Problem Solved?Problem Solved? • “Find total sales per product-category in our stores in Europe” o Join-index will prune ¾ of the data (uniform sales), but the remaining ¼ is still large (several millions transactions) • Index is unclustered • High level aggregations are expensive!!!!! o long scans to get the data o hashing or sorting necessary for group-bys region country state city store LOCATON ⇒Long Query Response Times ⇒Pre-computation is necessary
  • 21. Multiple Simultaneous AggregatesMultiple Simultaneous Aggregates Product Sales 1 2 3 4 ALL 1 454 - - 925 1379 2 468 800 - - 1268 3 296 - 240 - 536 4 652 - 540 745 1937 Store ALL 1870 800 780 1670 5120 Cross-Tabulation (products/store) Sub-totals per store Sub-totals per product Total sales 4 Group-bys here: (store,product) (store) (product) () Need to write 4 queries!!!
  • 22. The Data Cube Operator (Gray et al)The Data Cube Operator (Gray et al) • All previous aggregates in a single query: SELECT LOCATION.store, SALES.product_key, SUM (amount) FROM SALES, LOCATION WHERE SALES.location_key=LOCATION.location_key CUBE BY SALES.product_key, LOCATION.store Challenge: Optimize Aggregate Computation
  • 23. Store Product_key sum(amout) 1 1 454 1 4 925 2 1 468 2 2 800 3 1 296 3 3 240 4 1 625 4 3 240 4 4 745 1 ALL 1379 1 ALL 1268 1 ALL 536 1 ALL 1937 ALL 1 1870 ALL 2 800 ALL 3 780 ALL 4 1670 ALL ALL 5120 Relational View of Data CubeRelational View of Data Cube Product Sales 1 2 3 4 ALL 1 454 - - 925 1379 2 468 800 - - 1268 3 296 - 240 - 536 4 652 - 540 745 1937 Store ALL 1870 800 780 1670 5120 SELECT LOCATION.store, SALES.product_key, SUM (amount) FROM SALES, LOCATION WHERE SALES.location_key=LOCATION.location_key CUBE BY SALES.product_key, LOCATION.store
  • 24. Data Cube: Multidimensional ViewData Cube: Multidimensional View Total annual sales of DVDs in America Quarter Product Region sum sum DVD VCR PC 1Qtr 2Qtr 3Qtr 4Qtr America Europe Asia sum
  • 25. Other Extensions to SQLOther Extensions to SQL • Complex aggregation at multiple granularities (Ross et. all 1998) o Compute multiple dependent aggregates • Other proposals: the MD-join operator (Chatziantoniou et. all 1999] SELECT LOCATION.store, SALES.product_key, SUM (amount) FROM SALES, LOCATION WHERE SALES.location_key=LOCATION.location_key CUBE BY SALES.product_key, LOCATION.store: R SUCH THAT R.amount = max(amount)
  • 26. Data Cube ComputationData Cube Computation • Model dependencies among the aggregates: most detailed “view” can be computed from view (product,store,quarter) by summing-up all quarterly sales product,store,quarter product storequarter none store,quarterproduct,quarter product, store
  • 27. Computation DirectivesComputation Directives • Hash/sort based methods (Agrawal et. al. VLDB’96) 1. Smallest-parent 2. Cache-results 3. Amortize-scans 4. Share-sorts 5. Share-partitions product,store,quarter product storequarter none store,quarterproduct,quarter product, store
  • 28. Alternative Array-based ApproachAlternative Array-based Approach • Model data as a sparse multidimensional array o partition array into chunks (a small sub-cube which fits in memory). o fast addressing based on (chunk_id, offset) • Compute aggregates in “multi-way” by visiting cube cells in the order which minimizes the # of times to visit each cell, and reduces memory access and storage cost. B What is the best traversing order to do multi-way aggregation?
  • 29. Reality check:too many views!Reality check:too many views! • 2n views for n dimensions (no- hierarchies) • Storage/upd ate-time explosion • More pre- computation doesn’t mean better performance! !!!
  • 30. How to choose among the views?How to choose among the views? • Use some notion of benefit per view • Limit: disk space or maintenance-time product,store,quarter product storequarter none store,quarterproduct,quarter product, store Hanarayan et al SIGMOD’96: Pick views greedily until space is filled ∑ <≤ −= )()(,: ))()((),( uCuCvuu vS Sv uCuCSvB Catch: quadratic in the number of views, which is exponential!!!
  • 31. View Selection ProblemView Selection Problem • Selection is based on a workload estimate (e.g. logs) and a given constraint (disk space or update window) • NP-hard, optimal selection can not be computed > 4-5 dimensions o greedy algorithms (e.g. [Harinarayan96]) run at least in polynomial time in the number of views i.e exponential in the number of dimensions!!! • Optimal selection can not be approximated [Karloff99] o greedy view selection can behave arbitrary bad • Lack of good models for a cost-based optimization!
  • 32. Other IssuesOther Issues • Fact+Dimension tables in the DW are views of tables stored in the sources • Lots of view maintenance problems o correctly reflect asynchronous changes at the sources o making views self-maintainable • Interactive queries (on-line aggregation) o e.g. show running-estimates + confidence intervals • Computing Iceberg queries efficiently • Approximation o rough-estimates for hi-level aggregates are often good-enough o histogram, wavelet, sampling based techniques (e.g. AQUA)
  • 33. ThankThank You !!!You !!! For More Information click below link: Follow Us on: http://vibranttechnologies.co.in/datawarehousing-classes-in-mumbai.html