I'm planning to give you a detailed introduction to the concepts of the data warehouse world.
We will also see why data mining and data warehouses are closely connected to each other.
1. STUDIEREN UND DURCHSTARTEN. Author I: Dip.-Inf. (FH) Johannes Hoppe Author II: M.Sc. Johannes Hofmeister Author III: Prof. Dr. Dieter Homeister Date: 18.03.2011
2. Data Warehouse Author I: Dip.-Inf. (FH) Johannes Hoppe Author II: M.Sc. Johannes Hofmeister Author III: Prof. Dr. Dieter Homeister Date: 18.03.2011
4. Definition DW “A data warehouse is a single source for key, corporate information needed to enable business decisions .” Dieter Homeister (his DM Script) 4
5. Data Warehouse OLTP and DSS Defined An application that updates is called an on-line transaction processing (OLTP) application An application that issues queries to the readonly database is called a decision support system (DSS) 5
6. Data Warehouse Stovepipe vs. Integration When systems stand by themselves they are often referred to as stovepipes Systems that easily share data are called well integrated systems 6
7. Data Warehouse Problems with Stovepipe Architecture (1/2) Problems Users who wish to access data must query several different DSS to find it Data may have fundamental conflicts between DSS a department code table in one DSS may differ in another DSS a measurement may be stored in meters in one DSS and yards in another 7
8. Data Warehouse Problems with Stovepipe Architecture (2/2) Solution: Use a data warehouse, where data is integrated from the several different stovepipe systems Data warehouse is really sharing-lite -- you don’t have toco-ordinate as much when applications are built and you still reap the benefits of data sharing 8
9. Data Warehouse Data Warehouse Solution A data warehouse is an attempt to integrate separate DSS so that users can query one place to find the answers to their questions A data warehouse has the key, corporate data in the organization A data warehouse tracks historical data 9
11. Data Warehouse Selling the Data Warehouse(1/2) A data warehouse project will fail without corporate sponsorship Preferably, the project should be sponsored by the CEO The CEO must be sold on the value to the business to improve competitive advantage by deploying a data warehouse 11
12. Data Warehouse Selling the Data Warehouse (2/2) If an active, corporate sponsor does not exist, data sources will be very difficult to identify Only add data to the warehouse that will answer key, corporate questions asked by the corporate sponsor. Otherwise, you will have a data dump 12
13. Data Warehouse Building a Useful Data Warehouse You really need: strong executive sponsorship good knowledge of the data sound software engineering stability from source systems users who want a success A 75 percent failure rate is often cited 13
14. Data Warehouse Enterprise Information System An EIS (Enterprise Information System) allows users to query data in a data warehouse Now users can access key, corporate data in the data warehouse 14
15. Data Warehouse Users of an Enterprise Information System (1/3) multiple EIS (or different graphical interfaces) are needed to satisfy different types of users General users want a tool that provides detailed data, but is easy to use Want access to the data warehouse to do routine tasks such as Find me Joes phone number, etc. Simple application, not focused on large reports 15
16. Data Warehouse Users of an Enterprise Information System (2/3) Executives Want a high-level, summary data (and a simple tool) Must be easy to use, users want to click a few buttons and get data they want Results must be graphs Users should be able to drill-down into key areas. 16
17. Data Warehouse Users of an Enterprise Information System (3/3) Analysts want a flexible, more detailed tool Often very knowledgeable about the data Willing to do more work to learn about the data Sometimes even learn SQL to issue their own ad-hoc queries 17
18. Data Warehouse Need for Data Warehouses Data warehouses provide a single place to store key corporate data users can go one place to find this key data using an enterprise information system (EIS) also a place to store and access historical data Users measure performance goals for their company over a period of time Company statistics are available Data not stored in the same place is difficult to locate and compare, easily lost Single query can be used to access key data 18
19. Data Warehouse Security & Data Warehouses Building a data warehouse does increase security risk because key, corporate information are all in one place Risk reduction: database system components can be used to protect the data warehouse. These include Views Access control Security Administration Encryption Audit (logging of all accesses) 19
20. Data Warehouse Moving Data into the Data Warehouse Moving data from source OLTP systems to the data warehouse is one of the hardest tasks in data warehousing Updates to the data warehouse are performed periodically weekly , nightly, monthly … Occasionally, real-time data is needed in a data warehouse, but this is not very common see the document about ETL, too! 20
21. Data Warehouse Data Mart A data mart is a subset of the data warehouse that may make it simpler for users to access key corporate data Sometimes, users only need a piece of data from the data warehouse The data mart is typically fed from the data warehouse 21
22. References Data Warehouse Books and References Ralph Kimball, MargyRossl: The Data Warehouse Toolkit, 2nd Ed., John Wiley & Sons 2002 (Lists ofpitfalls, verydetailedforseveralapplicationslike CRM, HR, Insurances)W. H. Inmon: Buildingthe Data Warehouse, 3rd Ed., John Wiley & Sons 2002 (DW design, migration, techicaldetails)Claudia Imhoff, Nicholas Galemmo, Jonathan G. Geiger: Mastering Data Warehouse Design, John Wiley & Sons 2003 (Technical andbusinessview, design, optimization)Donald K. Burleson, W. H. Inmon, Joseph Hudicka: The Data Warehouse eBusiness DBA Handbook, BMC Software andDBAzine/RampantTechpress 2003 (AvailableaseBook, technicaldetails, eBusiness, focus on Oracle, DB/DW administration, tools) 22
23. References Data Warehouse Books and References Maria Sueli Almeida, MissaoIshikawa, Joerg Reinschmidt, Torsten Roeber: GettingStartedwithDataWarehouseand Business Intelligence, www.redbooks.ibm.com, 1999 (eBookfrom IBM, focus on DB2, verytechnical)Mark W. Humphries, Michael W. Hawkins, Michelle C. Dy: Data Warehousing, Pearson Education, 1998 (Verytechnical, incl. projectmgmt., architecture, hardwareand parallel computing)Chris Todman: Designing a Data Warehouse, Prentice Hall 2000 (Introduction, not verydetailed) 23
25. References Data Warehouse Books and References David Grossman, Ophir Frieder: Introductionto Data Warehouse, Illinois Institute of Technology 2005Dr. Andreas Geppert, Credit Suisse: Data Warehousing - Data-Warehouse-Entwurf, 2006, http://arvo.ifi.unizh.ch/dbtg/Classes/DWH/Slides/dwh-04-sl.pdf (p31: Explainationofstar/snowflake/galaxyscheme, in German)Carmela R. Balassiano: Data Warehouse Design Feb. 2007, http://academic.brooklyn.cuny.edu/cis/cbalassiano/CIS717-2%20course%20documents/week2/Data%20Warehouse%20primer.ppt (p12, p18: Explainationofstar/snowflake/galaxyscheme, in English) 25