SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Data Warehousing
      and
     OLAP
Motivation
Aims of information technology:
  • To help workers in their everyday business activity
    and improve their productivity ie. clerical data
    processing tasks
  • To help knowledge workers (executives,
    managers, analysts) make faster and better
    decisions ie. decision support systems
• Two types of applications:
  – Operational applications
  – Analytical applications
Motivation
  • In most organizations, data about specific parts of
    business is there - lots and lots of data,
    somewhere, in some form.
On the other hand:
  Data is available but not information -- and not
   the right information at the right time.
Aim of data warehouse is to:
  • bring together information from multiple sources
    as to provide a consistent database source for
    decision support queries.
  • off-load decision support applications from the on-
    line transaction system.
Warehousing
• Growing industry: $ 8 billion in 1998
• Range from desktop to huge
  warehouses
  – Walmart: 900-CPU, 2,700 disks, 23TB
  – Teradata system
• Lots of new terms
  – ROLAP, MOLAP, HOLAP
  – rollup. drill-down, slice& dice
The Architecture of Data
                               What’s has been
                               learned from data

summaries by                      Logical model
                   Business
who, what,          rules            physical layout of
when,
                                     data
where,...         Metadata

               Database schema                     who,
                                                   what,
                Summary data
                                                   when,
               Operational data                    where,
Decision Support and OLAP
 • DSS: Information technology to help
   knowledge workers (executives, managers,
   analysts) make faster and better decisions:
       what were the sales volumes by region and by
        product category in the last year?
       how did the share price of computer manufacturers
        correlate with quarterly profits over the past 10
        years?
       will a 10% discount increase sales volume
        sufficiently?
 • OLAP is an element of decision support
   system
 • Data mining is a powerful, high-
   performance data analysis tool for decision
Data Processing Models
There are two basic data processing
 models:
  • OLTP – the main aim of OLTP is reliable
    and efficient processing of a large number
    of transactions and ensuring data
    consistency.
  • OLAP – the main aim of OLAP is efficient
    multidimensional processing of large data
    volumes.
Traditional OLTP
Traditionally, DBMS have been used for
on-line transaction processing (OLTP)
  • order entry: pull up order xx-yy-zz and update
    status field
  • banking: transfer $100 from account X to account
    Y
     •   clerical data processing tasks
     •   detailed up-to-date data
     •   structured, repetitive tasks
     •   short transactions are the unit of work
     •   read and/or update a few records
     •   isolation, recovery, and integrity are critical
OLTP vs. OLAP
• OLTP: On Line Transaction Processing
  – Describes processing at operational sites
• OLAP: On Line Analytical Processing
  – Describes processing at warehouse
OLTP vs. OLAP
            OLTP                         OLAP
users       Clerk, IT professional       Knowledge worker
function day to day operations           decision support
DB design application-oriented           subject-oriented
data       current, up-to-date           historical, summarized
           detailed, flat relational      multidimensional
           isolated                       integrated, consolidated
usage      repetitive                    ad-hoc
access     read/write,                    lots of scans
           index/hash on prim. key
unit of work short, simple transaction            complex query
# records accessed               tens             millions
#users thousands                         hundreds
DB size 100MB-GB                         100GB-TB
metric transaction throughput            query throughput, response
What is a Data Warehouse
• “A data warehouse is a subject-
  oriented, integrated, time-variant,
  and nonvolatile collection of data in
  support of management’s decision-
  making process.” --- W. H. Inmon
• Collection of data that is used primarily
  in organizational decision making
• A decision support database that is
  maintained     separately    from    the
  organization’s operational database
Data Warehouse - Subject
            Oriented
• Subject oriented: oriented to the major
  subject areas of the corporation that
  have been defined in the data model.
  • E.g. for an insurance company: customer,
    product, transaction or activity, policy,
    claim, account, and etc.
• Operational DB and applications may
  be organized differently
  • E.g. based on type of insurance's: auto,
    life, medical, fire, ...
Data Warehouse – Integrated

• There is no consistency in encoding,
  naming conventions, …, among
  different data sources
• Heterogeneous data sources
• When data is moved to the warehouse,
  it is converted.
Data Warehouse - Non-Volatile
• Operational data is regularly accessed
  and manipulated a record at a time, and
  update is done to data in the
  operational environment.
• Warehouse Data is loaded and
  accessed. Update of data does not
  occur    in   the   data     warehouse
  environment.
Data Warehouse - Time Variance
• The time horizon for the data warehouse is
  significantly longer than that of operational
  systems.
     • Operational database: current value data.
     • Data warehouse data : nothing more than a
       sophisticated series of snapshots, taken of at
       some moment in time.
• The key structure of operational data may or
  may not contain some element of time. The
  key structure of the data warehouse always
  contains some element of time.
Why Separate Data
          Warehouse?
• Performance
  • special data organization, access methods,
    and implementation methods are needed
    to support multidimensional views and
    operations typical of OLAP
  • Complex OLAP queries would degrade
    performance for operational transactions
  • Concurrency control and recovery modes
    of OLTP are not compatible with OLAP
    analysis
Why Separate Data
           Warehouse?
• Function
  • missing data: Decision support requires
    historical data which operational DBs do
    not typically maintain
  • data     consolidation:       DS     requires
    consolidation (aggregation, summarization)
    of data from heterogeneous sources:
    operational DBs, external sources
  • data quality: different sources typically use
    inconsistent data representations, codes
    and formats which have to be reconciled.
Advantages of Warehousing
•   High query performance
•   Local processing at sources unaffected
•   Can operate when sources unavailable
•   Extra information at warehouse
    – Modify, summarize (store aggregates)
    – Add historical information

Weitere ähnliche Inhalte

Was ist angesagt?

Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
A P
 
Gulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And MiningGulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And Mining
gulab sharma
 

Was ist angesagt? (20)

Traditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewTraditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overview
 
Bi presentation Designing and Implementing Business Intelligence Systems
Bi presentation   Designing and Implementing Business Intelligence SystemsBi presentation   Designing and Implementing Business Intelligence Systems
Bi presentation Designing and Implementing Business Intelligence Systems
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence Architecture
 
Cognos datawarehouse
Cognos datawarehouseCognos datawarehouse
Cognos datawarehouse
 
Data warehousing unit 1
Data warehousing unit 1Data warehousing unit 1
Data warehousing unit 1
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
 
OLAP technology
OLAP technologyOLAP technology
OLAP technology
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OneData Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_One
 
SAP HANA Architecture Overview | SAP HANA Tutorial
SAP HANA Architecture Overview | SAP HANA TutorialSAP HANA Architecture Overview | SAP HANA Tutorial
SAP HANA Architecture Overview | SAP HANA Tutorial
 
Kushal Data Warehousing PPT
Kushal Data Warehousing PPTKushal Data Warehousing PPT
Kushal Data Warehousing PPT
 
DATA MART APPROCHES TO ARCHITECTURE
DATA MART APPROCHES TO ARCHITECTUREDATA MART APPROCHES TO ARCHITECTURE
DATA MART APPROCHES TO ARCHITECTURE
 
Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
 
Introduction to Business Intelligence
Introduction to Business IntelligenceIntroduction to Business Intelligence
Introduction to Business Intelligence
 
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
 
Gulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And MiningGulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And Mining
 
Oracle: Fundamental Of DW
Oracle: Fundamental Of DWOracle: Fundamental Of DW
Oracle: Fundamental Of DW
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse
 

Andere mochten auch (6)

That vs. which
That vs. whichThat vs. which
That vs. which
 
Bab 3 tik presentasi
Bab 3 tik presentasiBab 3 tik presentasi
Bab 3 tik presentasi
 
M tech seminar on dem uncertainty
M tech seminar on dem uncertaintyM tech seminar on dem uncertainty
M tech seminar on dem uncertainty
 
Chompu
ChompuChompu
Chompu
 
Bab 2 tik presentasi
Bab 2 tik presentasiBab 2 tik presentasi
Bab 2 tik presentasi
 
Linguistik perbandingan
Linguistik perbandinganLinguistik perbandingan
Linguistik perbandingan
 

Ähnlich wie Lecture1

1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
Fabio Fumarola
 

Ähnlich wie Lecture1 (20)

Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
data warehousing
data warehousingdata warehousing
data warehousing
 
presentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxpresentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptx
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processing
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptx
 
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehouse system and its concepts
Data warehouse system and its conceptsData warehouse system and its concepts
Data warehouse system and its concepts
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Data warehouse
Data warehouse Data warehouse
Data warehouse
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
Data ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housingData ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housing
 
05_Decision Support and OLAP.pdf
05_Decision Support and OLAP.pdf05_Decision Support and OLAP.pdf
05_Decision Support and OLAP.pdf
 

Lecture1

  • 1. Data Warehousing and OLAP
  • 2. Motivation Aims of information technology: • To help workers in their everyday business activity and improve their productivity ie. clerical data processing tasks • To help knowledge workers (executives, managers, analysts) make faster and better decisions ie. decision support systems • Two types of applications: – Operational applications – Analytical applications
  • 3. Motivation • In most organizations, data about specific parts of business is there - lots and lots of data, somewhere, in some form. On the other hand: Data is available but not information -- and not the right information at the right time. Aim of data warehouse is to: • bring together information from multiple sources as to provide a consistent database source for decision support queries. • off-load decision support applications from the on- line transaction system.
  • 4. Warehousing • Growing industry: $ 8 billion in 1998 • Range from desktop to huge warehouses – Walmart: 900-CPU, 2,700 disks, 23TB – Teradata system • Lots of new terms – ROLAP, MOLAP, HOLAP – rollup. drill-down, slice& dice
  • 5. The Architecture of Data What’s has been learned from data summaries by Logical model Business who, what, rules physical layout of when, data where,... Metadata Database schema who, what, Summary data when, Operational data where,
  • 6. Decision Support and OLAP • DSS: Information technology to help knowledge workers (executives, managers, analysts) make faster and better decisions:  what were the sales volumes by region and by product category in the last year?  how did the share price of computer manufacturers correlate with quarterly profits over the past 10 years?  will a 10% discount increase sales volume sufficiently? • OLAP is an element of decision support system • Data mining is a powerful, high- performance data analysis tool for decision
  • 7. Data Processing Models There are two basic data processing models: • OLTP – the main aim of OLTP is reliable and efficient processing of a large number of transactions and ensuring data consistency. • OLAP – the main aim of OLAP is efficient multidimensional processing of large data volumes.
  • 8. Traditional OLTP Traditionally, DBMS have been used for on-line transaction processing (OLTP) • order entry: pull up order xx-yy-zz and update status field • banking: transfer $100 from account X to account Y • clerical data processing tasks • detailed up-to-date data • structured, repetitive tasks • short transactions are the unit of work • read and/or update a few records • isolation, recovery, and integrity are critical
  • 9. OLTP vs. OLAP • OLTP: On Line Transaction Processing – Describes processing at operational sites • OLAP: On Line Analytical Processing – Describes processing at warehouse
  • 10. OLTP vs. OLAP OLTP OLAP users Clerk, IT professional Knowledge worker function day to day operations decision support DB design application-oriented subject-oriented data current, up-to-date historical, summarized detailed, flat relational multidimensional isolated integrated, consolidated usage repetitive ad-hoc access read/write, lots of scans index/hash on prim. key unit of work short, simple transaction complex query # records accessed tens millions #users thousands hundreds DB size 100MB-GB 100GB-TB metric transaction throughput query throughput, response
  • 11. What is a Data Warehouse • “A data warehouse is a subject- oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision- making process.” --- W. H. Inmon • Collection of data that is used primarily in organizational decision making • A decision support database that is maintained separately from the organization’s operational database
  • 12. Data Warehouse - Subject Oriented • Subject oriented: oriented to the major subject areas of the corporation that have been defined in the data model. • E.g. for an insurance company: customer, product, transaction or activity, policy, claim, account, and etc. • Operational DB and applications may be organized differently • E.g. based on type of insurance's: auto, life, medical, fire, ...
  • 13. Data Warehouse – Integrated • There is no consistency in encoding, naming conventions, …, among different data sources • Heterogeneous data sources • When data is moved to the warehouse, it is converted.
  • 14. Data Warehouse - Non-Volatile • Operational data is regularly accessed and manipulated a record at a time, and update is done to data in the operational environment. • Warehouse Data is loaded and accessed. Update of data does not occur in the data warehouse environment.
  • 15. Data Warehouse - Time Variance • The time horizon for the data warehouse is significantly longer than that of operational systems. • Operational database: current value data. • Data warehouse data : nothing more than a sophisticated series of snapshots, taken of at some moment in time. • The key structure of operational data may or may not contain some element of time. The key structure of the data warehouse always contains some element of time.
  • 16. Why Separate Data Warehouse? • Performance • special data organization, access methods, and implementation methods are needed to support multidimensional views and operations typical of OLAP • Complex OLAP queries would degrade performance for operational transactions • Concurrency control and recovery modes of OLTP are not compatible with OLAP analysis
  • 17. Why Separate Data Warehouse? • Function • missing data: Decision support requires historical data which operational DBs do not typically maintain • data consolidation: DS requires consolidation (aggregation, summarization) of data from heterogeneous sources: operational DBs, external sources • data quality: different sources typically use inconsistent data representations, codes and formats which have to be reconciled.
  • 18. Advantages of Warehousing • High query performance • Local processing at sources unaffected • Can operate when sources unavailable • Extra information at warehouse – Modify, summarize (store aggregates) – Add historical information