SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Data Warehousing


           Data Warehousing
            Lectures based on material from
                   Phil Trinder (HW)




                   Monica Farrow
            email : M.Farrow@hw.ac.uk

02/08/13                  Data Warehousing    1
Data Warehouse
   Two definitions:
       “A data warehouse is a copy of transaction data
        specifically structured for querying and reporting.”
           
               Data Warehousing Information Center
               http://www.dwinfocenter.org/defined.html

       A data warehouse is a specialised database to support
        strategic decision making
           
               Decision making involves:
                       Analysing the problem, e.g.
                           Why are my sales not meeting my targets?

                           What products are not meeting their targets?

                           What are the trends for the failing products?

                       Generating alternative solutions, evaluating them, and choosing
                        the best

     02/08/1306/30/08                        Data Warehousing                         1.2
Decision Support Systems
   These are used by management to make strategic
    or policy decisions
   They have existed for a long time
   Characteristics
        Aimed at loosely specified problems
        Combine models and analytical approaches with
         data retrieval
        Good usability for non-specialist use
        Flexible: to support multiple decision-making
         approaches

    02/08/13                Data Warehousing             3
A wine club example
   100,000 members, 2000 wines, 150 suppliers,
    750,000 orders per year

   Systems : storage technology
        Member administration : indexed sequential files
        Stock control: relational database
        Order processing: relational database
        Despatch: proprietary database




    02/08/13                 Data Warehousing               4
Wine Club Operational Schema


                                                    Member

                                                       places
            supplies

     Supplier          Wine                       MemberOrder

                          in                            On


                       Stock                       OrderItem
                               Is for




02/08/13                       Data Warehousing                 5
Wine Club Questions
   Competitors have moved in. Is our market share
    falling?
   What products are increasing/decreasing in
    popularity?
   Which products are seasonal?
   Which members place regular orders?
   Are some products more popular in certain parts
    of the country?
   Which members concentrate on particular
    products?

    02/08/13             Data Warehousing             6
Strategic vs Operational Issues
   Strategic*: planning and policy making, long term
    and broad brush, higher levels of management,
    e.g.
         When to launch a new product?
         What would be the effect of closing the Edinburgh
          branch
   Operational: day-to-day running of business.
    Details and immediate, lower levels of
    management
         Which items are out of stock?
         What is the status of order 34522?
   *Here, ‘strategic’ is in the management context, not executive

     02/08/13                      Data Warehousing                  7
Motivation for data warehousing
   Operational data is not suitable to guide strategic
    decisions
        Some of the data is not relevant
        Data may be archived regularly once it is not
         regularly required
   Need to examine trends
        What is happening over time?
        Queries over time may significantly affect the
         speed of operational processing
   Solution: record sales on a regular basis, separate
    from the operational system, and analyse them
   This is the start of a warehouse
    02/08/13                  Data Warehousing            8
Data warehouse characteristics
   Subject-oriented e.g. sales
   Non-volatile – no alteration to records once they
    are added
        Whereas in operational processing, records will
         frequently be updated (e.g. alteration to prices,
         quantity etc)
   Integrated, data from multiple (operational)
    sources are accumulated in an integrated format
        E.g. wine club has >1 operational db
   Time variant: data is recorded against time to
    allow trend analysis
    02/08/13                  Data Warehousing               9
Data warehouse characteristics continued

   Records are extracted to make future querying
    easy. Therefore
        There is likely to be some data duplication,
         including storage of derived data (data obtained
         from calculations and aggregations)
        There will be less joins and more indexes than in a
         well-designed operational database.
        The data warehouse will be larger than the
         corresponding operational database
               
                   Data in operational databases will be archived
                   periodically, whereas a data warehouse keeps data
                   for years to allow trend analysis.
    02/08/13                          Data Warehousing             10
Warehouse construction
   Now we have a look at each stage in warehouse
    construction:

    Source1
                    Extraction                  Presentation

    Source n


                     Integration                Aggregate
                                                Navigators



                                   DBMS

    02/08/13                 Data Warehousing                  11
Extraction
   Retrieve data from all data sources: files,
    databases etc
   The process to extract data will be an add-on to
    the existing operational system. For example,
        Day-end extraction run
        When a sale is recorded, this triggers extraction of
         the sale data




    02/08/13                  Data Warehousing             12
Integration
   When data is extracted from different sources,
    integration may be required:
        Format Integration, similar to type mismatch
               
                   Examples: gender
                       ‘male’, ‘female’
                       ‘M’, ‘F’
                       0 and 1
        Semantic integration: does a word have the same
         meaning in all the data being integrated?
                  Example – a ‘sale’ means:
                       order processing: order received
                       stock control: extracted items from physical warehouse
                       despatch: goods shipped

    02/08/13                               Data Warehousing                      13
Data Warehouse design: dimensional analysis

   Dimensional analysis is used to identify the
    requirements of the warehouse
   What are the aspects of the data that are
    strategically important? e.g.
        Member
        Product - wine
        Time always
   We don’t know in advance exactly what the
    queries will be!


    02/08/13               Data Warehousing         14
3 dimensions example




  Smith


MEMBER
  Jones
                                                               Q1 2008
                                                       Q4 2007
  Bloggs
                                                  Q3 2007   TIME

            Macon   Chablis Merlot Chardonnay
                      PRODUCT


 02/08/13                      Data Warehousing                     15
Star Schema
   A star schema is one of the simplest designs for a
    data warehouse.
        A central fact table, containing all the main
         information, is the centre of the star
        Smaller dimension tables, containing look-up
         information for attributes in the fact table, at the
         points.

                                       Wine       SALES

                                                  Central
                                                  fact
                                        Member    table     Time


    02/08/13                   Data Warehousing                 16
Star Schema Design for DB

Wine

winecode,
winename,
vintage,
description,
                     SALES
price
                     Central                Time
Member               fact
                     table                  timecode,
membercode,
membername,                                 date,
memberAddress        winecode,              periodno,
                     membercode,            quarterno,
                     timecode,              year
                     quantity,
                     cost

   02/08/13              Data Warehousing                17
Warehouse Database
   Centre of star schema becomes a relation: the fact table –
    numeric facts and foreign keys
        Sales(membercode, winecode, timecode, qty, itemcost)
   Each dimension becomes a relation: a dimension table
        Member(membercode, membername, memberaddress)
        Wine(winecode, name, vintage, description, price)
   There is ALWAYS a time dimension table
        This includes period and quarter details, since they are
         frequently used in queries
        Time(timecode, date, periodno, quarterno, year)




    02/08/13                      Data Warehousing                  18
Using the Warehouse
   The strategic questions can now be investigated
    using data extracted by SQL queries
   For example, to discover which wines have
    increasing and decreasing sales, we can retrieve
    a table giving the total sales for each wine against
    time:
        SELECT w.winename, t.period_number,SUM(s.qty)
         FROM sales s, wine w, time t
         WHERE s.winecode = w.winecode
         AND s.timecode = t.timecode
         GROUP BY w.winename, t.periodno
         ORDER BY w.winename, t.periodno
    02/08/13               Data Warehousing           19
Indexes
   Usually a lot of indexes will be created, to make
    queries more efficient
        An index helps speed up retrieval.
        A column that is frequently referred to in the
         WHERE clause is a potential candidate for
         indexing.

   Diagrams of the 2 most commonly used indexes
    in data warehousing are shown on the next slides:
        Indexes may be based on the B-Tree
        Also bitmap indexes are widely used
    02/08/13                   Data Warehousing           20
B-tree index




02/08/13             Data Warehousing   21
Bitmap indexes
   Bitmap indexes
        An example on the next slide
        For each value of a domain, there is a bitmap
         identifying the row Ids of satisfying tuples
               
                   1 if a match, 0 otherwise
        Usually applied to attributes with a sparse domain
                  In Oracle, <100 distinct values
                  E.g. bitmaps for all tuples with sex = male and for
                   sex=female
        Updating a bitmap takes a lot of time, so use for
         tables with hardly any updates, inserts, deletes
        Ideal for data warehousing

    02/08/13                           Data Warehousing                  22
Bitmap indexes example
   The first table is a table about Sailors
   The second table shows a bitmap index for the rating
    attribute, assuming values are only from 1-3
   There is a row in the bitmap index for each row in the
    Sailor table
   Column headings in the index are the values in the rating
    column
               SAILORS                               Bitmap index
               Id        Rating   etc                1     2        3
               22        1        Other data         1     0        0

               23        2        Other data         0     1        0

               31        3        Other data         0     0        1

               35        1        Other data         1     0        0
    02/08/13                      Data Warehousing                      23
Materialised views and Aggregation
   Data warehouses grow continuously, and may
    become very large indeed
   Problems: the time to compute a query and the
    size of the result can be very large indeed
   Solution: materialised views and aggregation

   A materialised view is a stored pre-computed
    table, used to prevent frequent use of time-
    consuming joins and calculations


    02/08/13              Data Warehousing          24
Aggregates
   Basic idea: sacrifice detail to reduce the size of the data
   Store precomputed tables at a useful level of detail,
    consisting of commonly used sums, counts etc.
   Must be carefully selected, e.g.
        Sales to each member of each wine summer for each
         quarter
        Sales of each wine summed for each quarter for each month
   Levels of aggregation
        None(i.e. detail)
        Light (e.g. monthly)
        Highly (e.g. quarterly)


    02/08/13                       Data Warehousing               25
Aggregate navigator
   An aggregate navigator uses information about
    available aggregates to automatically rewrite
    queries to use them
   It also records aggregates usage, so that unused
    aggregates can be removed
   It can suggest useful new aggregates
        E.g. a frequent query is based on the number of
         wines sold per month in a range of price bands.
         This is suggested as a new aggregate



    02/08/13                 Data Warehousing              26
Presentation requirements
   Must be easy to use
   Visualise the results of queries in many ways
    e.g. charts, graphs, scatter diagrams etc
   Make good use of colour and dimensions 2D,
    2.5D, 3D, animation
    Example of 2.5D graph


   Have analysis tools: statistical and curve fitting
   For example the product sales trend table
    would be plotted as a graph

    02/08/13                Data Warehousing             27
OLAP
   OnLine Analytic Processing uses multidimensional
    analysis of the data

   Allows users to get summaries and find answers
    to known questions
        What is the average profit month by month?
        If we increased sales by 10%, what would the
         effect be?




    02/08/13                 Data Warehousing           28
Data mining
   Data mining is the extraction of hidden predictive
    information from large databases
        E.g. what’s likely to happen to sales next March
         and why?
   The actual techniques for data mining are not
    covered in this course.
   Data mining is usually based on the data in a data
    warehouse, and ideally data mining tools are
    integrated with the data warehouse.
   Data Mining provides the Enterprise with
    intelligence and Data Warehousing provides the
    Enterprise with a memory.
    02/08/13                  Data Warehousing              29
Summary
   A data warehouse is a specialised database to
    enable efficient and straightforward production of
    reports to support strategic decision making.

   It contains a copy of the operational data, often
    integrated from >1 source. Records, once added,
    are not altered. The central fact table in a star
    schema design will be very large.




    02/08/13              Data Warehousing           30
Discussion/Exercise
   A company sells garden trees from several stores located
    around the country. People visit the store, and buy trees.
    The names of the customers are always recorded, and
    many customers place repeat orders.
   The company would like to set up a data warehouse so
    that they can analyse details such as
        Frequency of sales per customer
        Which store has the best sales, ranked by month
        Top selling tree by month
        Etc etc
   Create a suitable star schema, inventing appropriate
    attributes
    02/08/13                     Data Warehousing           31

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
Shanthi Mukkavilli
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing
Girish Dhareshwar
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
Eyad Manna
 
Data Warehouse Architectures
Data Warehouse ArchitecturesData Warehouse Architectures
Data Warehouse Architectures
Theju Paul
 

Was ist angesagt? (20)

Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemas
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing
 
Introduction Data warehouse
Introduction Data warehouseIntroduction Data warehouse
Introduction Data warehouse
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data Warehousing
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Data Warehouse Architectures
Data Warehouse ArchitecturesData Warehouse Architectures
Data Warehouse Architectures
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
From Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data WarehouseFrom Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data Warehouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehouse architecture
Data warehouse architecture Data warehouse architecture
Data warehouse architecture
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etl
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
 
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
 
Basic Introduction of Data Warehousing from Adiva Consulting
Basic Introduction of  Data Warehousing from Adiva ConsultingBasic Introduction of  Data Warehousing from Adiva Consulting
Basic Introduction of Data Warehousing from Adiva Consulting
 

Ähnlich wie Data warehousing

introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
kiran14360
 

Ähnlich wie Data warehousing (20)

Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012
 
Dbm630_lecture02-03
Dbm630_lecture02-03Dbm630_lecture02-03
Dbm630_lecture02-03
 
Dbm630_Lecture02-03
Dbm630_Lecture02-03Dbm630_Lecture02-03
Dbm630_Lecture02-03
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
7 data warehouse & marts
7 data warehouse & marts7 data warehouse & marts
7 data warehouse & marts
 
DATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptxDATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptx
 
1-_Intro_to_Data_Minning__DWH.ppt
1-_Intro_to_Data_Minning__DWH.ppt1-_Intro_to_Data_Minning__DWH.ppt
1-_Intro_to_Data_Minning__DWH.ppt
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
bi notes.docx
bi notes.docxbi notes.docx
bi notes.docx
 
SUPERB DATA WAREHOUSE.ppt
SUPERB DATA WAREHOUSE.pptSUPERB DATA WAREHOUSE.ppt
SUPERB DATA WAREHOUSE.ppt
 
Data Vault Overview
Data Vault OverviewData Vault Overview
Data Vault Overview
 
Database Vs.pptx
Database Vs.pptxDatabase Vs.pptx
Database Vs.pptx
 
Chpt2.ppt
Chpt2.pptChpt2.ppt
Chpt2.ppt
 
Unit 1
Unit 1Unit 1
Unit 1
 

Data warehousing

  • 1. Data Warehousing Data Warehousing Lectures based on material from Phil Trinder (HW) Monica Farrow email : M.Farrow@hw.ac.uk 02/08/13 Data Warehousing 1
  • 2. Data Warehouse  Two definitions:  “A data warehouse is a copy of transaction data specifically structured for querying and reporting.”  Data Warehousing Information Center http://www.dwinfocenter.org/defined.html  A data warehouse is a specialised database to support strategic decision making  Decision making involves:  Analysing the problem, e.g.  Why are my sales not meeting my targets?  What products are not meeting their targets?  What are the trends for the failing products?  Generating alternative solutions, evaluating them, and choosing the best 02/08/1306/30/08 Data Warehousing 1.2
  • 3. Decision Support Systems  These are used by management to make strategic or policy decisions  They have existed for a long time  Characteristics  Aimed at loosely specified problems  Combine models and analytical approaches with data retrieval  Good usability for non-specialist use  Flexible: to support multiple decision-making approaches 02/08/13 Data Warehousing 3
  • 4. A wine club example  100,000 members, 2000 wines, 150 suppliers, 750,000 orders per year  Systems : storage technology  Member administration : indexed sequential files  Stock control: relational database  Order processing: relational database  Despatch: proprietary database 02/08/13 Data Warehousing 4
  • 5. Wine Club Operational Schema Member places supplies Supplier Wine MemberOrder in On Stock OrderItem Is for 02/08/13 Data Warehousing 5
  • 6. Wine Club Questions  Competitors have moved in. Is our market share falling?  What products are increasing/decreasing in popularity?  Which products are seasonal?  Which members place regular orders?  Are some products more popular in certain parts of the country?  Which members concentrate on particular products? 02/08/13 Data Warehousing 6
  • 7. Strategic vs Operational Issues  Strategic*: planning and policy making, long term and broad brush, higher levels of management, e.g.  When to launch a new product?  What would be the effect of closing the Edinburgh branch  Operational: day-to-day running of business. Details and immediate, lower levels of management  Which items are out of stock?  What is the status of order 34522?  *Here, ‘strategic’ is in the management context, not executive 02/08/13 Data Warehousing 7
  • 8. Motivation for data warehousing  Operational data is not suitable to guide strategic decisions  Some of the data is not relevant  Data may be archived regularly once it is not regularly required  Need to examine trends  What is happening over time?  Queries over time may significantly affect the speed of operational processing  Solution: record sales on a regular basis, separate from the operational system, and analyse them  This is the start of a warehouse 02/08/13 Data Warehousing 8
  • 9. Data warehouse characteristics  Subject-oriented e.g. sales  Non-volatile – no alteration to records once they are added  Whereas in operational processing, records will frequently be updated (e.g. alteration to prices, quantity etc)  Integrated, data from multiple (operational) sources are accumulated in an integrated format  E.g. wine club has >1 operational db  Time variant: data is recorded against time to allow trend analysis 02/08/13 Data Warehousing 9
  • 10. Data warehouse characteristics continued  Records are extracted to make future querying easy. Therefore  There is likely to be some data duplication, including storage of derived data (data obtained from calculations and aggregations)  There will be less joins and more indexes than in a well-designed operational database.  The data warehouse will be larger than the corresponding operational database  Data in operational databases will be archived periodically, whereas a data warehouse keeps data for years to allow trend analysis. 02/08/13 Data Warehousing 10
  • 11. Warehouse construction  Now we have a look at each stage in warehouse construction: Source1 Extraction Presentation Source n Integration Aggregate Navigators DBMS 02/08/13 Data Warehousing 11
  • 12. Extraction  Retrieve data from all data sources: files, databases etc  The process to extract data will be an add-on to the existing operational system. For example,  Day-end extraction run  When a sale is recorded, this triggers extraction of the sale data 02/08/13 Data Warehousing 12
  • 13. Integration  When data is extracted from different sources, integration may be required:  Format Integration, similar to type mismatch  Examples: gender  ‘male’, ‘female’  ‘M’, ‘F’  0 and 1  Semantic integration: does a word have the same meaning in all the data being integrated?  Example – a ‘sale’ means:  order processing: order received  stock control: extracted items from physical warehouse  despatch: goods shipped 02/08/13 Data Warehousing 13
  • 14. Data Warehouse design: dimensional analysis  Dimensional analysis is used to identify the requirements of the warehouse  What are the aspects of the data that are strategically important? e.g.  Member  Product - wine  Time always  We don’t know in advance exactly what the queries will be! 02/08/13 Data Warehousing 14
  • 15. 3 dimensions example Smith MEMBER Jones Q1 2008 Q4 2007 Bloggs Q3 2007 TIME Macon Chablis Merlot Chardonnay PRODUCT 02/08/13 Data Warehousing 15
  • 16. Star Schema  A star schema is one of the simplest designs for a data warehouse.  A central fact table, containing all the main information, is the centre of the star  Smaller dimension tables, containing look-up information for attributes in the fact table, at the points. Wine SALES Central fact Member table Time 02/08/13 Data Warehousing 16
  • 17. Star Schema Design for DB Wine winecode, winename, vintage, description, SALES price Central Time Member fact table timecode, membercode, membername, date, memberAddress winecode, periodno, membercode, quarterno, timecode, year quantity, cost 02/08/13 Data Warehousing 17
  • 18. Warehouse Database  Centre of star schema becomes a relation: the fact table – numeric facts and foreign keys  Sales(membercode, winecode, timecode, qty, itemcost)  Each dimension becomes a relation: a dimension table  Member(membercode, membername, memberaddress)  Wine(winecode, name, vintage, description, price)  There is ALWAYS a time dimension table  This includes period and quarter details, since they are frequently used in queries  Time(timecode, date, periodno, quarterno, year) 02/08/13 Data Warehousing 18
  • 19. Using the Warehouse  The strategic questions can now be investigated using data extracted by SQL queries  For example, to discover which wines have increasing and decreasing sales, we can retrieve a table giving the total sales for each wine against time:  SELECT w.winename, t.period_number,SUM(s.qty) FROM sales s, wine w, time t WHERE s.winecode = w.winecode AND s.timecode = t.timecode GROUP BY w.winename, t.periodno ORDER BY w.winename, t.periodno 02/08/13 Data Warehousing 19
  • 20. Indexes  Usually a lot of indexes will be created, to make queries more efficient  An index helps speed up retrieval.  A column that is frequently referred to in the WHERE clause is a potential candidate for indexing.  Diagrams of the 2 most commonly used indexes in data warehousing are shown on the next slides:  Indexes may be based on the B-Tree  Also bitmap indexes are widely used 02/08/13 Data Warehousing 20
  • 21. B-tree index 02/08/13 Data Warehousing 21
  • 22. Bitmap indexes  Bitmap indexes  An example on the next slide  For each value of a domain, there is a bitmap identifying the row Ids of satisfying tuples  1 if a match, 0 otherwise  Usually applied to attributes with a sparse domain  In Oracle, <100 distinct values  E.g. bitmaps for all tuples with sex = male and for sex=female  Updating a bitmap takes a lot of time, so use for tables with hardly any updates, inserts, deletes  Ideal for data warehousing 02/08/13 Data Warehousing 22
  • 23. Bitmap indexes example  The first table is a table about Sailors  The second table shows a bitmap index for the rating attribute, assuming values are only from 1-3  There is a row in the bitmap index for each row in the Sailor table  Column headings in the index are the values in the rating column SAILORS Bitmap index Id Rating etc 1 2 3 22 1 Other data 1 0 0 23 2 Other data 0 1 0 31 3 Other data 0 0 1 35 1 Other data 1 0 0 02/08/13 Data Warehousing 23
  • 24. Materialised views and Aggregation  Data warehouses grow continuously, and may become very large indeed  Problems: the time to compute a query and the size of the result can be very large indeed  Solution: materialised views and aggregation  A materialised view is a stored pre-computed table, used to prevent frequent use of time- consuming joins and calculations 02/08/13 Data Warehousing 24
  • 25. Aggregates  Basic idea: sacrifice detail to reduce the size of the data  Store precomputed tables at a useful level of detail, consisting of commonly used sums, counts etc.  Must be carefully selected, e.g.  Sales to each member of each wine summer for each quarter  Sales of each wine summed for each quarter for each month  Levels of aggregation  None(i.e. detail)  Light (e.g. monthly)  Highly (e.g. quarterly) 02/08/13 Data Warehousing 25
  • 26. Aggregate navigator  An aggregate navigator uses information about available aggregates to automatically rewrite queries to use them  It also records aggregates usage, so that unused aggregates can be removed  It can suggest useful new aggregates  E.g. a frequent query is based on the number of wines sold per month in a range of price bands. This is suggested as a new aggregate 02/08/13 Data Warehousing 26
  • 27. Presentation requirements  Must be easy to use  Visualise the results of queries in many ways e.g. charts, graphs, scatter diagrams etc  Make good use of colour and dimensions 2D, 2.5D, 3D, animation Example of 2.5D graph  Have analysis tools: statistical and curve fitting  For example the product sales trend table would be plotted as a graph 02/08/13 Data Warehousing 27
  • 28. OLAP  OnLine Analytic Processing uses multidimensional analysis of the data  Allows users to get summaries and find answers to known questions  What is the average profit month by month?  If we increased sales by 10%, what would the effect be? 02/08/13 Data Warehousing 28
  • 29. Data mining  Data mining is the extraction of hidden predictive information from large databases  E.g. what’s likely to happen to sales next March and why?  The actual techniques for data mining are not covered in this course.  Data mining is usually based on the data in a data warehouse, and ideally data mining tools are integrated with the data warehouse.  Data Mining provides the Enterprise with intelligence and Data Warehousing provides the Enterprise with a memory. 02/08/13 Data Warehousing 29
  • 30. Summary  A data warehouse is a specialised database to enable efficient and straightforward production of reports to support strategic decision making.  It contains a copy of the operational data, often integrated from >1 source. Records, once added, are not altered. The central fact table in a star schema design will be very large. 02/08/13 Data Warehousing 30
  • 31. Discussion/Exercise  A company sells garden trees from several stores located around the country. People visit the store, and buy trees. The names of the customers are always recorded, and many customers place repeat orders.  The company would like to set up a data warehouse so that they can analyse details such as  Frequency of sales per customer  Which store has the best sales, ranked by month  Top selling tree by month  Etc etc  Create a suitable star schema, inventing appropriate attributes 02/08/13 Data Warehousing 31