SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
Chapter-1
Data Warehousing
What is Data Warehouse?
• Data warehousing provides architectures and
tools for business executives to systematically
organize, understand, and use their data to
make strategic decision.
• A Data Warehouse refers to a database that is
maintained separately from an organization’s
operational databases.
• Data warehouse systems allow for the integration
of a variety of application systems. They
support information processing by providing a solid
platform of consolidated historical data
for analysis.
• According to William H. Inmon, a leading architect in the
construction of data warehouse systems,
“A data warehouse is a subject-oriented, integrated,
time-variant, and nonvolatile collection of data in
support of management’s decision making process”.
This short, but comprehensive definition presents the
major features of a data warehouse.
• The four keywords, subject-oriented, integrated, time-
variant, and non-volatile, distinguish data warehouses
from other data repository systems, such as relational
database systems, transaction processing systems,
and file systems.
Subject-oriented:
• A data warehouse is organized around major subjects,
such as customer, supplier, product and sales.
• Rather than concentrating on the day-to-day
operations and transaction processing of an
organization, a data warehouse focuses on the
modeling and analysis of data for decision
makers.
• Hence, data warehouse typically provide a simple and
concise view around particular subject issues by
excluding data that are not useful in the decision
support process.
• For example, to learn more about your company's
sales data, you can build a warehouse that
concentrates on sales.
• Using this warehouse, you can answer questions
like "Who was our best customer for this item last
year?"
• This ability to define a data warehouse by
subject matter, sales in this case, makes the
data warehouse subject oriented.
Integrated:
• A data warehouse is usually constructed by integrating
multiple heterogeneous sources, such as relational
databases, flat files, and on-line transaction records.
• Data cleaning and data integration techniques are
applied to ensure consistency in naming conventions,
encoding structures, attribute measures, and so on.
Time-variant:
• Data are stored to provide information from a historical
perspective (e.g. past 5-10 years).
• Every key structure in the data warehouse contains,
either implicitly or explicitly, an element of time.
Nonvolatile:
• A data warehouse is always a physically separate
store of data transformed from the application data
found in the operational environment.
• Due to this separation, a data warehouse does not
require transaction processing, recovery, and
concurrency control mechanisms.
• It usually requires only two operations in data
accessing: initial loading of data and access of
data.
• Nonvolatile means that, once entered into the
warehouse, data should not change.
• This is logical because the purpose of a warehouse is
to enable you to analyze what has occurred.
• In summary, a data warehouse is a
semantically consistent data store that serves
as a physical implementation of a decision
support data model and stores the
information on which an enterprise needs to
make strategic decisions.
• A data warehouse is often viewed as an
architecture, constructed by integrating data
from multiple heterogeneous sources to
support structured and/or ad hoc queries,
analytical reporting, and decision making.
• Based on this information, we view data warehousing
as the process of constructing and using data
warehouses.
• The construction of data warehouse requires data
cleaning, data integration and data consolidation.
• The utilization of a data warehouse often necessitates
a collection of decision support technologies.
• This allows “knowledge workers” (e.g. managers,
analysts, and executives) to use the warehouse to
quickly and conveniently obtain an overview of the
data, and to make sound decisions based on
information in the warehouse.
Differences between Operational Database
Systems and Data Warehouses
• The major task of on-line operational database systems is to
perform on-line transaction and query processing. These
systems are called on-line transaction processing (OLTP)
systems. They cover most of the day-to-day operations of an
organization, such as purchasing, inventory, manufacturing,
banking, payroll, registration, and accounting.
• Data warehouse systems, on the other hand, serve users or
knowledge workers in the role of data analysis and decision
making. Such systems can organize and present data in
various formats in order to accommodate the diverse needs of
the different users. These systems are known as on-line
analytical processing (OLAP) systems.
The major distinguishing features between
OLTP and OLAP
• Users and system orientation: An OLTP system is
customer-oriented and is used for transaction and query
processing by clerks, clients, and information technology
professionals. An OLAP system is market-oriented and is
used for data analysis by knowledge workers, including
managers, executives, and analysts.
• Data contents: An OLTP system manages current data that,
typically, are too detailed to be easily used for decision making.
An OLAP system manages large amounts of historical data,
provides facilities for summarization and aggregation, and
stores and manages information at different levels of
granularity. These features make the data easier to use in
informed decision making.
• Database design: An OLTP system usually adopts an entity-
relationship (ER) data model and application-oriented
database design. An OLAP system typically adopts either a
star or snowflake model and a subject-oriented database
design.
• View: An OLTP system focuses mainly on the current data
within an enterprise or department, without referring to
historical data or data in different organizations. In contrast, an
OLAP system often spans(pairs) multiple versions of a
database schema, due to the evolutionary process of an
organization. OLAP systems also deal with information that
originates from different organizations, integrating information
from many data stores.
• Access patterns: The access patterns of an OLTP system
consists mainly of short, atomic transactions. Such a system
requires concurrency control and recovery mechanisms.
However, accesses to OLAP systems are mostly read-only
operations (because most data warehouses store historical
data rather than up-to-date information), although many could
be complex queries.
Feature OLTP OLAP
Characteristic Operational processing Informational processing
Orientation Transaction Analysis
User Clerk, DBA, database professional Knowledge worker (e.g., manager, executive, analyst)
Function Day-to-day operations Decision support, long-term informational requirements
DB design Normalized (3NF), application oriented Star/snowflake, subject-oriented
Data Current, guaranteed up-to-date Historical; accuracy maintained over time
Summarization Primitive, highly detailed Summarized, consolidated
View Detailed, flat relational Summarized, multidimensional
Unit of work Short, simple transaction Complex query
Access Read/write Mostly read
Focus Data in Information out
Operations Index/hash on primary key Lots of scan
Number of records accessed Tens Millions
Number of users Thousands Hundreds
DB Size 1 GB to 100 GB 1 TB to 100 TB
Priority High performance, high availability High flexibility, end-user autonomy
Metric Transaction throughput Query throughput, response time
Why do you need to construct a separate
Data Warehouse?
1. A major reason for such a separation is to help promote the
high performance of both systems.
– Operational database is designed and tuned for known tasks and
workloads.
– Data Warehouse queries are often complex, involve computation of
large data and need special data organization. Processing these queries
on operational databases would substantially degrade the performance
of operational tasks.
2. OLTP supports concurrent processing of multiple transactions.
Locking and logging are required to ensure consistency and
robustness of transactions.
• OLAP query often needs read-only access of data records.(No
roll back) . OLAP requires historical data whereas OLTP do not
typically maintain historical data.

Weitere ähnliche Inhalte

Was ist angesagt?

Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biA P
 
Ch1 data-warehousing
Ch1 data-warehousingCh1 data-warehousing
Ch1 data-warehousingAhmad Shlool
 
Ch1 data-warehousing
Ch1 data-warehousingCh1 data-warehousing
Ch1 data-warehousingAhmad Shlool
 
Data warehouse system and its concepts
Data warehouse system and its conceptsData warehouse system and its concepts
Data warehouse system and its conceptsGaurav Garg
 
Data mining 2 - Data warehouse (cheat sheet - printable)
Data mining 2 - Data warehouse (cheat sheet - printable)Data mining 2 - Data warehouse (cheat sheet - printable)
Data mining 2 - Data warehouse (cheat sheet - printable)yesheeka
 
Data warehousing
Data warehousingData warehousing
Data warehousingVarun Jain
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data martAmit Sarkar
 
Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)yesheeka
 
Data warehouse
Data warehouseData warehouse
Data warehouseMR Z
 
Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse Lesa Cote
 
Data Warehousing and Mining
Data Warehousing and MiningData Warehousing and Mining
Data Warehousing and Miningethantelaviv
 
Data mining 3 - Data Models and Data Warehouse Design (cheat sheet - printable)
Data mining  3 - Data Models and Data Warehouse Design (cheat sheet - printable)Data mining  3 - Data Models and Data Warehouse Design (cheat sheet - printable)
Data mining 3 - Data Models and Data Warehouse Design (cheat sheet - printable)yesheeka
 

Was ist angesagt? (20)

Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
 
Ch1 data-warehousing
Ch1 data-warehousingCh1 data-warehousing
Ch1 data-warehousing
 
Ch1 data-warehousing
Ch1 data-warehousingCh1 data-warehousing
Ch1 data-warehousing
 
Data warehouse system and its concepts
Data warehouse system and its conceptsData warehouse system and its concepts
Data warehouse system and its concepts
 
Data mining 2 - Data warehouse (cheat sheet - printable)
Data mining 2 - Data warehouse (cheat sheet - printable)Data mining 2 - Data warehouse (cheat sheet - printable)
Data mining 2 - Data warehouse (cheat sheet - printable)
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data mart
 
Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)
 
Unit 1
Unit 1Unit 1
Unit 1
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Data warehouse proposal
Data warehouse proposalData warehouse proposal
Data warehouse proposal
 
Data Warehousing and Mining
Data Warehousing and MiningData Warehousing and Mining
Data Warehousing and Mining
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data mining 3 - Data Models and Data Warehouse Design (cheat sheet - printable)
Data mining  3 - Data Models and Data Warehouse Design (cheat sheet - printable)Data mining  3 - Data Models and Data Warehouse Design (cheat sheet - printable)
Data mining 3 - Data Models and Data Warehouse Design (cheat sheet - printable)
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 

Ähnlich wie data warehousing

Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Vibrant Technologies & Computers
 
Data warehousing.pptx
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptxAnusuya123
 
Module 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxModule 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxnikshaikh786
 
presentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxpresentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxvipush1
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data WarehousingAAKANKSHA JAIN
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxSalehaMariyam
 
Data warehouse - Nivetha Durganathan
Data warehouse - Nivetha DurganathanData warehouse - Nivetha Durganathan
Data warehouse - Nivetha DurganathanNivetha Durganathan
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse conceptsobieefans
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Materialobieefans
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseSOMASUNDARAM T
 

Ähnlich wie data warehousing (20)

DATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptxDATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptx
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.
 
Data warehousing.pptx
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptx
 
Module 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxModule 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptx
 
presentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxpresentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptx
 
Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data Warehousing
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptx
 
Data warehouse - Nivetha Durganathan
Data warehouse - Nivetha DurganathanData warehouse - Nivetha Durganathan
Data warehouse - Nivetha Durganathan
 
Data warehouse
Data warehouse Data warehouse
Data warehouse
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Material
 
Oracle sql plsql & dw
Oracle sql plsql & dwOracle sql plsql & dw
Oracle sql plsql & dw
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Lecture1
Lecture1Lecture1
Lecture1
 
Data Management
Data ManagementData Management
Data Management
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 

Kürzlich hochgeladen

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 

Kürzlich hochgeladen (20)

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 

data warehousing

  • 2. What is Data Warehouse? • Data warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decision. • A Data Warehouse refers to a database that is maintained separately from an organization’s operational databases. • Data warehouse systems allow for the integration of a variety of application systems. They support information processing by providing a solid platform of consolidated historical data for analysis.
  • 3. • According to William H. Inmon, a leading architect in the construction of data warehouse systems, “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision making process”. This short, but comprehensive definition presents the major features of a data warehouse.
  • 4. • The four keywords, subject-oriented, integrated, time- variant, and non-volatile, distinguish data warehouses from other data repository systems, such as relational database systems, transaction processing systems, and file systems.
  • 5. Subject-oriented: • A data warehouse is organized around major subjects, such as customer, supplier, product and sales. • Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers. • Hence, data warehouse typically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.
  • 6. • For example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales. • Using this warehouse, you can answer questions like "Who was our best customer for this item last year?" • This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented.
  • 7.
  • 8. Integrated: • A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-line transaction records. • Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on.
  • 9.
  • 10. Time-variant: • Data are stored to provide information from a historical perspective (e.g. past 5-10 years). • Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.
  • 11. Nonvolatile: • A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. • Due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms. • It usually requires only two operations in data accessing: initial loading of data and access of data. • Nonvolatile means that, once entered into the warehouse, data should not change. • This is logical because the purpose of a warehouse is to enable you to analyze what has occurred.
  • 12.
  • 13. • In summary, a data warehouse is a semantically consistent data store that serves as a physical implementation of a decision support data model and stores the information on which an enterprise needs to make strategic decisions. • A data warehouse is often viewed as an architecture, constructed by integrating data from multiple heterogeneous sources to support structured and/or ad hoc queries, analytical reporting, and decision making.
  • 14. • Based on this information, we view data warehousing as the process of constructing and using data warehouses. • The construction of data warehouse requires data cleaning, data integration and data consolidation. • The utilization of a data warehouse often necessitates a collection of decision support technologies. • This allows “knowledge workers” (e.g. managers, analysts, and executives) to use the warehouse to quickly and conveniently obtain an overview of the data, and to make sound decisions based on information in the warehouse.
  • 15. Differences between Operational Database Systems and Data Warehouses • The major task of on-line operational database systems is to perform on-line transaction and query processing. These systems are called on-line transaction processing (OLTP) systems. They cover most of the day-to-day operations of an organization, such as purchasing, inventory, manufacturing, banking, payroll, registration, and accounting. • Data warehouse systems, on the other hand, serve users or knowledge workers in the role of data analysis and decision making. Such systems can organize and present data in various formats in order to accommodate the diverse needs of the different users. These systems are known as on-line analytical processing (OLAP) systems.
  • 16. The major distinguishing features between OLTP and OLAP • Users and system orientation: An OLTP system is customer-oriented and is used for transaction and query processing by clerks, clients, and information technology professionals. An OLAP system is market-oriented and is used for data analysis by knowledge workers, including managers, executives, and analysts. • Data contents: An OLTP system manages current data that, typically, are too detailed to be easily used for decision making. An OLAP system manages large amounts of historical data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity. These features make the data easier to use in informed decision making.
  • 17. • Database design: An OLTP system usually adopts an entity- relationship (ER) data model and application-oriented database design. An OLAP system typically adopts either a star or snowflake model and a subject-oriented database design. • View: An OLTP system focuses mainly on the current data within an enterprise or department, without referring to historical data or data in different organizations. In contrast, an OLAP system often spans(pairs) multiple versions of a database schema, due to the evolutionary process of an organization. OLAP systems also deal with information that originates from different organizations, integrating information from many data stores.
  • 18. • Access patterns: The access patterns of an OLTP system consists mainly of short, atomic transactions. Such a system requires concurrency control and recovery mechanisms. However, accesses to OLAP systems are mostly read-only operations (because most data warehouses store historical data rather than up-to-date information), although many could be complex queries.
  • 19. Feature OLTP OLAP Characteristic Operational processing Informational processing Orientation Transaction Analysis User Clerk, DBA, database professional Knowledge worker (e.g., manager, executive, analyst) Function Day-to-day operations Decision support, long-term informational requirements DB design Normalized (3NF), application oriented Star/snowflake, subject-oriented Data Current, guaranteed up-to-date Historical; accuracy maintained over time Summarization Primitive, highly detailed Summarized, consolidated View Detailed, flat relational Summarized, multidimensional Unit of work Short, simple transaction Complex query Access Read/write Mostly read Focus Data in Information out Operations Index/hash on primary key Lots of scan Number of records accessed Tens Millions Number of users Thousands Hundreds DB Size 1 GB to 100 GB 1 TB to 100 TB Priority High performance, high availability High flexibility, end-user autonomy Metric Transaction throughput Query throughput, response time
  • 20. Why do you need to construct a separate Data Warehouse? 1. A major reason for such a separation is to help promote the high performance of both systems. – Operational database is designed and tuned for known tasks and workloads. – Data Warehouse queries are often complex, involve computation of large data and need special data organization. Processing these queries on operational databases would substantially degrade the performance of operational tasks. 2. OLTP supports concurrent processing of multiple transactions. Locking and logging are required to ensure consistency and robustness of transactions. • OLAP query often needs read-only access of data records.(No roll back) . OLAP requires historical data whereas OLTP do not typically maintain historical data.