SlideShare ist ein Scribd-Unternehmen logo
1 von 46
2/12/2013   2.Data warehouse/D.S.Jagli   1
Topics to be covered
 1.    Getting data into the Data warehouse.
 2.    Extraction
 3.    Transformation
 4.    Cleansing
 5.    Loading
 6.    Summarization
 7.    Meta data
 8.    Data Marts
 9.    Data warehousing & ERP
 10.   Data warehousing & KM
 11.   Data warehousing & CRM

2/12/2013           2.Data warehouse/D.S.Jagli   2
A Sales Manager wants to know….
                                       Which are our
                                   lowest/highest margin
                                        customers ?
                                                             Who are my customers
      What is the most                                        and what products
    effective distribution                                     are they buying?
           channel?



  What product prom-                                               Which customers
-otions have the biggest                                         are most likely to go
   impact on revenue?                                            to the competition ?
                                       What impact will
                                     new products/services
                                       have on revenue
                                         and margins?
  2/12/2013                  2.Data warehouse/D.S.Jagli                                  3
Data, Data everywhere
yet ...          I can’t find the data I need
                                   data is scattered over the network.
                                   many versions, slight differences.

                             I can’t get the data I need
                                   need an expert to get the data.



                             I can’t understand the data I found
                                   available data poorly documented.


                             I can’t use the data I found
                                   results are unexpected.
                                   data needs to be transformed from one
                                     format to other.
2/12/2013      2.Data warehouse/D.S.Jagli                                   4
What is a Data Warehousing?

                                  A process of transforming data
                                  into information and making it
                                  available to users in a timely
                                  enough manner to make a
                                  difference

                                  [Forrester Research, April 1996]




2/12/2013   2.Data warehouse/D.S.Jagli                               5
What is a Data warehouse
   A data warehouse is a repository (collection of resources that
    can be accessed to retrieve information) of an organization's
    electronically stored data, designed to facilitate reporting and
    analysis.
   In simple form data warehouse is a collection of large amount
    of data.
  A single, complete and consistent store of data obtained from a
    variety of different sources made available to end users in
    what they can understand and use in a business context.
    [Barry Devlin]


2/12/2013         2.Data warehouse/D.S.Jagli                           6
What are the users saying...
  Data should be integrated across the enterprise
  Summary data has a real value to the
   organization
  Historical data holds the key to understanding
   data over time




2/12/2013            2.Data warehouse/D.S.Jagli      7
Data Warehouse

      A data warehouse is a
      1.    Subject-oriented
      2.    Integrated
      3.    Time-varying
      4.    Non-volatile
       collection of data that is used primarily in organizational
       decision making.
                   -- Bill Inmon, Building the Data Warehouse


2/12/2013                2.Data warehouse/D.S.Jagli                  8
Subject Oriented: Data is organized by business topic not
    by customer_id

              Data is Integrated and Loaded by Subject


             Cust             1996

                              1996
             Prod                             D/W
                                              Data
                              1997

              Order
                              1998


              payt



2/12/2013        2.Data warehouse/D.S.Jagli                  9
 Integrate: Integrated means that data are stored as a single
    unit not as a collection of files that may have different structures
   .
  Eg:
                           Operatio nal S ystems

              Order Pro cessing           Order ID = 10
                                                              D /W
              A cco unts Receivable Order ID = 12
                                                          Order ID = 16
              Pro duct Management Order ID = 8




              HR S y stem                 S ex = M/F
                                                             D /W
              Pay ro ll                   S ex = 1/2
                                                          S ex = M/F
              Pro duct Management S ex = 0/1




2/12/2013                 2.Data warehouse/D.S.Jagli                      10
 Time Variant: it means that a time dimension is explicitly
   included in the data so that trends and changes over time can
   be studied .

  Data elements won’ t change i.e a single ,complete and
    consistent store of data obtained from a variety of sources and
    made available to end users in a way they can understand and
    use in business context.

  Eg: Designated Time Frame (3 - 10 Years)


  Key Includes Date
2/12/2013         2.Data warehouse/D.S.Jagli                      11
Non-Volatile: It means that the data don’t keep changing, new
    data may be added on a scheduled basic but old data aren’t
    discarded

                    Data Warehouse

            • No Data Update



             Load                                     Read

                                                       Read
                                                       Read

                                                     Read




2/12/2013               2.Data warehouse/D.S.Jagli               12
     These definitions are quite different on the surface .yet
       several common threads emerge from reading them as
       group:
 1. A DWH containing a lot of data some of the data comes
       from operational source and outside.

 2. A DWH is organized so as to facilitate the use of this data for
       decision making purpose.

 3. The DWH provides tools by which end users can access this
       data



2/12/2013          2.Data warehouse/D.S.Jagli                     13
ODS Vs DW
               ODS                                    DATAWARE HOUSE
1. Stores current operational data              1. Stores historic and current data
2. Perform numerous quick and 2. Perform complex queries on
     simple queries       on       small            large amounts of data
     amounts of data
3. Contains      small amount           of 3. contains large amounts of static
     transactional data                       data
4. Day To Day Decisions,                        4. Long-Term Decisions,
5. Current operational results,                 5. Strategic reporting,
6. Tactical reporting,                          6. Trend detection
7. Near to Normal form                          7. Star Schema
8. Frequency      of Load:        Twice 8. Frequency of Load: Weekly,
     Daily , Daily, Weekly                 Monthly, Quarterly
                                                                                    14
2/12/2013          2.Data warehouse/D.S.Jagli                                     14
OLTP vs. Data Warehouse

  OLTP systems are tuned for known transactions and workloads
    while workload is not known a priori in a data warehouse.

  Special data organization, access methods and implementation
    methods are needed to support data warehouse queries.

       e.g. average amount spent on phone calls between 9AM-5PM in Pune
            during the month of December.




2/12/2013                2.Data warehouse/D.S.Jagli                    15
OLTP vs. Data Warehouse

  OLTP                                    Warehouse (DSS)
     Application Oriented                   Subject Oriented
     Used to run business                   Used to analyze business
     Detailed data                          Summarized and refined
     Current up to date                     Snapshot data
     Isolated Data                          Integrated Data
     Repetitive access                      Ad-hoc access
     Clerical User                          Knowledge User (Manager)




2/12/2013          2.Data warehouse/D.S.Jagli                            16
Who uses DWHs?
      Ideal DWH user is something like this:
 1.   This person’s job involves drawing conclusions from ,making decisions
      based on, large masses of data
 2. This person also doesn’t want to access a database in a highly technical
      fashion
 3. Any person who wants to analyze data and take a decision
 Eg: Tourists: Browse information harvested by farmers



               Farmers: Harvest information from known access paths




2/12/2013             2.Data warehouse/D.S.Jagli                               17
Data Warehouse Architecture
   Source Data
                                                                                 Data mining
                                         Data Storage
                 Data Staging
   Relational                                                     Information
   Databases                                                      Delivery

                 1.   EXTRACT           Optimized Loader
                 2.   TRANSFORM
                 3.   CLEAN
      ERP        4.   LOAD
                 5.   REFRESH                                                   Report/User Query
     Systems
                                       Data Warehouse
                                           Engine
                                                                  Analyze
   Purchased                                                       Query
      Data                                                                             OLAP




                                Meta Data
     Legacy
      Data                                  Metadata Repository

2/12/2013                 2.Data warehouse/D.S.Jagli                                           18
Overview of the Components
 1.     Source Data Component
 2.     Data Staging Component
 3.     Data Storage Component
 4.     Information Delivery Component
 5.     Metadata Component
 6.     Management and Control Component




2/12/2013          2.Data warehouse/D.S.Jagli   19
Overview of the Components
  1.        Source Data Component:
              1.   Production Data : data comes from operational systems of the
                   Enterprise.

              2.   Internal Data: in every organization keeps some private data.

              3.   Archived Data: it comes from back ups.




2/12/2013               2.Data warehouse/D.S.Jagli                                 20
Topics to be covered
 1.    Getting data into the Data warehouse.
 2.    Extraction
 3.    Transformation
 4.    Cleansing
 5.    Loading
 6.    Summarization
 7.    Meta data
 8.    Data Marts
 9.    Data warehousing & ERP
 10.   Data warehousing & KM
 11.   Data warehousing & CRM

2/12/2013           2.Data warehouse/D.S.Jagli   21
Overview of the Components: Getting data into
the DWH
 2.Data Staging Component
  5 steps of data staging
 1. Extraction
 2. Transformation
 3. Cleansing
 4. Loading
 5. Summarization



2/12/2013     2.Data warehouse/D.S.Jagli        22
Overview of the Components: Getting data
into the DWH
 Data staging component
 1. Extracting
  Capture of relevant data from operational source in “as is” status
  Sources for data generally in legacy mainframes in, IMS, DB2; more data
      today in relational databases on Unix.
 2.     Transformation:
       In the case Multiple input sources to a data warehouse, inconsistency
        can sometimes make data unusable.
       Transformation is the process of dealing with these inconsistencies
                Eg: Use of different names/formats
                 Mumbai, Bombay
                Cust_id, C_id
                dd/mm/yy, mm/dd/yy


2/12/2013               2.Data warehouse/D.S.Jagli                          23
Overview of the Components: Getting data into
  the DWH
   Data staging component
   3. Cleansing
               It is necessary to go through data entered into DWH and make it
                error free, this process is called Data cleansing.

               This include missing data , incorrect data in one source, inconsistent
                data and conflicting data when two or more sources are involved.

               Clean data is vital for the success of the warehouse
               Eg:Seshadri, Sheshadri, Sesadri, Seshadri S., Srinivasan Seshadri,
                etc. are the same person


2/12/2013                 2.Data warehouse/D.S.Jagli                                 24
Overview of the Components: Getting data into the
DWH
 Data staging component
 4. Loading
      •     It implies physical movement of data from the computers storing the
            source databases to that which will store the data warehouse.

      •     Most common channel for the data movement process is a high-speed
            communication link.

      •     It is always necessary to close off access to the DWH when the
            loading is taking place.



2/12/2013              2.Data warehouse/D.S.Jagli                             25
Overview of the Components: Getting data into the
DWH
 Data staging component
 5. Summarization :
  In which any desired summaries of the data warehouse data are
     precalculated for later use
  Once the DWH database has been loaded it is possible to create
     summaries.
  Eg: base customer (1985-87)
               custid, from date, to date, name, phone, dob
               base customer (1988-90)
               custid, from date, to date, name, credit rating, employer
               customer activity (1986-89) -- monthly summary
               customer activity detail (1987-89)
               custid, activity date, amount, clerk id, order no
2/12/2013                 2.Data warehouse/D.S.Jagli                        26
Overview of the Components: Getting data into the DWH
   Data staging component
            •   Summarized data stored : advantages
                1. reduce storage costs
                2. reduce CPU usage
                3. increases performance since smaller number of records to be
                   processed
                4. design around traditional high level reporting needs




2/12/2013                2.Data warehouse/D.S.Jagli                              27
Overview of the Components
   Data storage Component
   The data storage for data warehouse is a separate repository
   Propagate updates on source data to the warehouse
    periodically (e.g., every night, every week) or after
       significant events.

    refresh policy set by administrator based on user needs and
            traffic.

    possibly different policies for different sources.


    This function is time consuming.
2/12/2013              2.Data warehouse/D.S.Jagli             28
Overview of the Components
 Information delivery component
 •      In order to provide information to the wide community
        of data warehouse users the information delivery
        component includes different methods.
 •      E g: Online ,intranet, internet, email.




2/12/2013          2.Data warehouse/D.S.Jagli               29
Overview of the Components
 Meta data component
 Meta data are data that describe data in the data warehouse
  Administrative metadata
      1.    source databases and their contents
      2.    gateway descriptions
      3.    warehouse schema, view & derived data definitions
      4.    dimensions, hierarchies
      5.    pre-defined queries and reports
      6.    data mart locations and contents
      7.    data partitions
      8.    data extraction, cleansing, transformation rules
      9.    data refresh
      10.   user profiles, user groups
      11.   security: user authorization, access control


2/12/2013               2.Data warehouse/D.S.Jagli              30
Meta data component
  Business meta data
       business terms and definitions
       ownership of data
  operational metadata
       data lineage: history of migrated data and sequence of
        transformations applied
       currency of data: active, archived,
       monitoring information: warehouse usage statistics, error
        reports, audit trails.
  End- user meta data

2/12/2013           2.Data warehouse/D.S.Jagli                      31
Overview of the Components
 Management and control component
           This component of the data warehouse sits on the top of all
            other components. it coordinates services and activities with
            within the DWH


           It interacts with meta data component to perform the
            management and control functions




2/12/2013              2.Data warehouse/D.S.Jagli                           32
Content of DWH Data
 Operational data to warehouse data
 4 Data levels:
       1. Operational data                       Query processing
       2. Atomic data
       3. Summary data                            Summary data
       4. Query processing
                                                   Atomic data

                                                  Operational data

 Eg1: My checking account balance right now is “ 1200/-Rs”
 Eg2: My checking account balance at the end of January was “25000/-Rs”
 Eg3: At the end of January 2011 we have 12500 customers
 Eg4: Our customers in Mumbai zone grew by 10% during last three months



2/12/2013           2.Data warehouse/D.S.Jagli                            33
What comes first
Data mart
  Data mart
  Data mart a subset of a data warehouse that supports the requirements of
    particular department or business function.

  Data mart focuses on only the requirements of users associated with one
    department or business function.

  Data marts do not normally contain detailed operational data, unlike data
    warehouses.

  As data marts contain less data compared with data warehouses, data marts
    are more easily understood and navigated.

2/12/2013             2.Data warehouse/D.S.Jagli                           35
From the Data Warehouse to Data Marts
Information
 Individually                              Less
 Structured


  Departmentally                          History
  Structured                              Normalized
                                          Detailed


      Organizationally                      More
      Structured         Data Warehouse


Data               36
Reasons for creating a data mart
 1.    To give users access to the data they need to analyze most often
 2.    To provide data in a form that matches the collective view of the data by a
       group of users in a department or business function
 3.    To improve end-user response time due to the reduction in the volume of
       data to be accessed
 4.    To provide appropriately structured data as dictated by the requirements of
       end-user access tools
 5.    data cleansing, loading, transformation, and integration are far easier, and
       hence implementing and setting up a data mart is simpler.
 6.    The cost of implementing data marts is normally less
 7.    The potential users of a data mart are more clearly defined and can be
       more easily targeted to obtain support for a data mart project

2/12/2013              2.Data warehouse/D.S.Jagli                                 37
Data Warehouse and Data Marts
               OLAP
               Data Mart
               Lightly summarized
               Departmentally structured



                 Organizationally structured
                 Atomic
                 Detailed Data Warehouse Data

          38
Characteristics of the Departmental Data Mart
                     1. OLAP
                     2. Small
                     3. Flexible
                     4. Customized by Department
                     5. Source is departmentally
                        structured data warehouse




                                                    39
Techniques for Creating Departmental Data
Mart
                              OLAP
   Sales   Finance   Mktg.    Subset
                              Summarized
                              Superset
                              Indexed
                              Arrayed




                                            40
Data Mart Centric
                    Data Sources



                    Data Marts




                    Data Warehouse




           41
Problems with Data Mart Centric Solution




If you end up creating multiple warehouses, integrating them is a
problem


                   42
True Warehouse
Data Sources




Data Warehouse




Data Marts


                 43
Bill Inman Vs Ralph Kimball
 Top -down Vs Bottom -up w.r.t Data Mart




2/12/2013     2.Data warehouse/D.S.Jagli    44
A Comparison DM vs DWH
 DATA MART                                         DATAWARE HOUSE


 Limited subject areas (one or two)                Many subject areas

 One data mart for one business                    Enterprise repository with multiple
 theme/ subject area                               data marts
 Has limited dimensions and           Has all dimensions and measures
 measures depends on the subject area required

 Time to build is short                            Long time activity

 Can be built as smaller scale DW                  Large scale



2/12/2013             2.Data warehouse/D.S.Jagli                                         45
Continued growth in




2/12/2013   2.Data warehouse/D.S.Jagli                         46

Weitere ähnliche Inhalte

Was ist angesagt?

Data resource management
Data resource managementData resource management
Data resource managementNirajan Silwal
 
OODM-object oriented data model
OODM-object oriented data modelOODM-object oriented data model
OODM-object oriented data modelAnilPokhrel7
 
Lecture 1&2(rdbms-ii)
Lecture 1&2(rdbms-ii)Lecture 1&2(rdbms-ii)
Lecture 1&2(rdbms-ii)Ravinder Kamboj
 
Integrity Constraints
Integrity ConstraintsIntegrity Constraints
Integrity Constraintsmadhav bansal
 
Client server security threats
Client server security threatsClient server security threats
Client server security threatsrahul kundu
 
PATIENT MANAGEMENT SYSTEM project
PATIENT MANAGEMENT SYSTEM projectPATIENT MANAGEMENT SYSTEM project
PATIENT MANAGEMENT SYSTEM projectLaud Randy Amofah
 
Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)Jargalsaikhan Alyeksandr
 
MANAGEMENT INFORMATION SYSTEM AND PAYROLL MANAGEMENT
MANAGEMENT  INFORMATION  SYSTEM AND PAYROLL  MANAGEMENT MANAGEMENT  INFORMATION  SYSTEM AND PAYROLL  MANAGEMENT
MANAGEMENT INFORMATION SYSTEM AND PAYROLL MANAGEMENT RV Devi Prasad
 
Physical database design(database)
Physical database design(database)Physical database design(database)
Physical database design(database)welcometofacebook
 
Architecture of-dbms-and-data-independence
Architecture of-dbms-and-data-independenceArchitecture of-dbms-and-data-independence
Architecture of-dbms-and-data-independenceAnuj Modi
 
Data and Message Security
Data and Message SecurityData and Message Security
Data and Message SecurityNrapesh Shah
 
E-commerce- Security & Encryption
E-commerce- Security & EncryptionE-commerce- Security & Encryption
E-commerce- Security & EncryptionBiroja
 

Was ist angesagt? (20)

Data resource management
Data resource managementData resource management
Data resource management
 
Rdbms
RdbmsRdbms
Rdbms
 
OODM-object oriented data model
OODM-object oriented data modelOODM-object oriented data model
OODM-object oriented data model
 
Databases: Normalisation
Databases: NormalisationDatabases: Normalisation
Databases: Normalisation
 
Lecture 1&2(rdbms-ii)
Lecture 1&2(rdbms-ii)Lecture 1&2(rdbms-ii)
Lecture 1&2(rdbms-ii)
 
Data warehouse physical design
Data warehouse physical designData warehouse physical design
Data warehouse physical design
 
Integrity Constraints
Integrity ConstraintsIntegrity Constraints
Integrity Constraints
 
Client server security threats
Client server security threatsClient server security threats
Client server security threats
 
PATIENT MANAGEMENT SYSTEM project
PATIENT MANAGEMENT SYSTEM projectPATIENT MANAGEMENT SYSTEM project
PATIENT MANAGEMENT SYSTEM project
 
Edi ppt
Edi pptEdi ppt
Edi ppt
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)
 
MANAGEMENT INFORMATION SYSTEM AND PAYROLL MANAGEMENT
MANAGEMENT  INFORMATION  SYSTEM AND PAYROLL  MANAGEMENT MANAGEMENT  INFORMATION  SYSTEM AND PAYROLL  MANAGEMENT
MANAGEMENT INFORMATION SYSTEM AND PAYROLL MANAGEMENT
 
Physical database design(database)
Physical database design(database)Physical database design(database)
Physical database design(database)
 
Architecture of-dbms-and-data-independence
Architecture of-dbms-and-data-independenceArchitecture of-dbms-and-data-independence
Architecture of-dbms-and-data-independence
 
Dbms models
Dbms modelsDbms models
Dbms models
 
Parallel Database
Parallel DatabaseParallel Database
Parallel Database
 
Data and Message Security
Data and Message SecurityData and Message Security
Data and Message Security
 
E-commerce- Security & Encryption
E-commerce- Security & EncryptionE-commerce- Security & Encryption
E-commerce- Security & Encryption
 
Database Management System
Database Management SystemDatabase Management System
Database Management System
 

Ähnlich wie Data ware house

Yoyopresentasi 1225941108853502-8 2
Yoyopresentasi 1225941108853502-8 2Yoyopresentasi 1225941108853502-8 2
Yoyopresentasi 1225941108853502-8 2Venkatesh Reddy
 
Lecture 02 - The Data Warehouse Environment
Lecture 02 - The Data Warehouse Environment Lecture 02 - The Data Warehouse Environment
Lecture 02 - The Data Warehouse Environment phanleson
 
W 1&2 introduction to omt-ii
W 1&2 introduction to omt-iiW 1&2 introduction to omt-ii
W 1&2 introduction to omt-iiMajid Orakzai
 
Big Data-Survey
Big Data-SurveyBig Data-Survey
Big Data-Surveyijeei-iaes
 
Chapter 13The Data WarehouseBLCN-534 Fundamentals o.docx
Chapter 13The Data WarehouseBLCN-534 Fundamentals o.docxChapter 13The Data WarehouseBLCN-534 Fundamentals o.docx
Chapter 13The Data WarehouseBLCN-534 Fundamentals o.docxketurahhazelhurst
 
7 data warehouse & marts
7 data warehouse & marts7 data warehouse & marts
7 data warehouse & martsNymphea Saraf
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseSOMASUNDARAM T
 
Data warehouse
Data warehouseData warehouse
Data warehouseRajThakuri
 
BI LECTURE 3- 2023.pptx
BI LECTURE 3- 2023.pptxBI LECTURE 3- 2023.pptx
BI LECTURE 3- 2023.pptxAmanyaLaban
 
Modern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleModern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleVasu S
 
Big data analytics - Introduction to Big Data and Hadoop
Big data analytics - Introduction to Big Data and HadoopBig data analytics - Introduction to Big Data and Hadoop
Big data analytics - Introduction to Big Data and HadoopSamiraChandan
 

Ähnlich wie Data ware house (20)

Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Yoyopresentasi 1225941108853502-8 2
Yoyopresentasi 1225941108853502-8 2Yoyopresentasi 1225941108853502-8 2
Yoyopresentasi 1225941108853502-8 2
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data lake ppt
Data lake pptData lake ppt
Data lake ppt
 
A19 amis
A19 amisA19 amis
A19 amis
 
Lecture 02 - The Data Warehouse Environment
Lecture 02 - The Data Warehouse Environment Lecture 02 - The Data Warehouse Environment
Lecture 02 - The Data Warehouse Environment
 
W 1&2 introduction to omt-ii
W 1&2 introduction to omt-iiW 1&2 introduction to omt-ii
W 1&2 introduction to omt-ii
 
Big Data-Survey
Big Data-SurveyBig Data-Survey
Big Data-Survey
 
Chapter 13The Data WarehouseBLCN-534 Fundamentals o.docx
Chapter 13The Data WarehouseBLCN-534 Fundamentals o.docxChapter 13The Data WarehouseBLCN-534 Fundamentals o.docx
Chapter 13The Data WarehouseBLCN-534 Fundamentals o.docx
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
7 data warehouse & marts
7 data warehouse & marts7 data warehouse & marts
7 data warehouse & marts
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
BI LECTURE 3- 2023.pptx
BI LECTURE 3- 2023.pptxBI LECTURE 3- 2023.pptx
BI LECTURE 3- 2023.pptx
 
Abstract
AbstractAbstract
Abstract
 
Modern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleModern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | Qubole
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
 
Big data analytics - Introduction to Big Data and Hadoop
Big data analytics - Introduction to Big Data and HadoopBig data analytics - Introduction to Big Data and Hadoop
Big data analytics - Introduction to Big Data and Hadoop
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 

Mehr von VESIT/University of Mumbai

Mehr von VESIT/University of Mumbai (9)

4.dynamic analysis
4.dynamic analysis4.dynamic analysis
4.dynamic analysis
 
3.static testing
3.static testing 3.static testing
3.static testing
 
2.testing in the software life cycle
2.testing in the software life cycle 2.testing in the software life cycle
2.testing in the software life cycle
 
1.basics of software testing
1.basics of software testing 1.basics of software testing
1.basics of software testing
 
Handy annotations-within-oracle-10g
Handy annotations-within-oracle-10gHandy annotations-within-oracle-10g
Handy annotations-within-oracle-10g
 
Rational Unified Treatment for Web Application Vulnerability Assessment
Rational Unified Treatment for Web Application Vulnerability AssessmentRational Unified Treatment for Web Application Vulnerability Assessment
Rational Unified Treatment for Web Application Vulnerability Assessment
 
THE APPLICATION OF CAUSE EFFECT GRAPH FOR THE COLLEGE PLACEMENT PROCESS
THE APPLICATION OF CAUSE EFFECT GRAPH FOR THE COLLEGE PLACEMENT PROCESSTHE APPLICATION OF CAUSE EFFECT GRAPH FOR THE COLLEGE PLACEMENT PROCESS
THE APPLICATION OF CAUSE EFFECT GRAPH FOR THE COLLEGE PLACEMENT PROCESS
 
Working with temperaments
Working with temperamentsWorking with temperaments
Working with temperaments
 
planning & project management for DWH
planning & project management for DWHplanning & project management for DWH
planning & project management for DWH
 

Data ware house

  • 1. 2/12/2013 2.Data warehouse/D.S.Jagli 1
  • 2. Topics to be covered 1. Getting data into the Data warehouse. 2. Extraction 3. Transformation 4. Cleansing 5. Loading 6. Summarization 7. Meta data 8. Data Marts 9. Data warehousing & ERP 10. Data warehousing & KM 11. Data warehousing & CRM 2/12/2013 2.Data warehouse/D.S.Jagli 2
  • 3. A Sales Manager wants to know…. Which are our lowest/highest margin customers ? Who are my customers What is the most and what products effective distribution are they buying? channel? What product prom- Which customers -otions have the biggest are most likely to go impact on revenue? to the competition ? What impact will new products/services have on revenue and margins? 2/12/2013 2.Data warehouse/D.S.Jagli 3
  • 4. Data, Data everywhere yet ...  I can’t find the data I need  data is scattered over the network.  many versions, slight differences.  I can’t get the data I need  need an expert to get the data.  I can’t understand the data I found  available data poorly documented.  I can’t use the data I found  results are unexpected.  data needs to be transformed from one format to other. 2/12/2013 2.Data warehouse/D.S.Jagli 4
  • 5. What is a Data Warehousing? A process of transforming data into information and making it available to users in a timely enough manner to make a difference [Forrester Research, April 1996] 2/12/2013 2.Data warehouse/D.S.Jagli 5
  • 6. What is a Data warehouse  A data warehouse is a repository (collection of resources that can be accessed to retrieve information) of an organization's electronically stored data, designed to facilitate reporting and analysis.  In simple form data warehouse is a collection of large amount of data. A single, complete and consistent store of data obtained from a variety of different sources made available to end users in what they can understand and use in a business context. [Barry Devlin] 2/12/2013 2.Data warehouse/D.S.Jagli 6
  • 7. What are the users saying...  Data should be integrated across the enterprise  Summary data has a real value to the organization  Historical data holds the key to understanding data over time 2/12/2013 2.Data warehouse/D.S.Jagli 7
  • 8. Data Warehouse  A data warehouse is a 1. Subject-oriented 2. Integrated 3. Time-varying 4. Non-volatile collection of data that is used primarily in organizational decision making. -- Bill Inmon, Building the Data Warehouse 2/12/2013 2.Data warehouse/D.S.Jagli 8
  • 9. Subject Oriented: Data is organized by business topic not by customer_id Data is Integrated and Loaded by Subject Cust 1996 1996 Prod D/W Data 1997 Order 1998 payt 2/12/2013 2.Data warehouse/D.S.Jagli 9
  • 10.  Integrate: Integrated means that data are stored as a single unit not as a collection of files that may have different structures .  Eg: Operatio nal S ystems Order Pro cessing Order ID = 10 D /W A cco unts Receivable Order ID = 12 Order ID = 16 Pro duct Management Order ID = 8 HR S y stem S ex = M/F D /W Pay ro ll S ex = 1/2 S ex = M/F Pro duct Management S ex = 0/1 2/12/2013 2.Data warehouse/D.S.Jagli 10
  • 11.  Time Variant: it means that a time dimension is explicitly included in the data so that trends and changes over time can be studied .  Data elements won’ t change i.e a single ,complete and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use in business context.  Eg: Designated Time Frame (3 - 10 Years)  Key Includes Date 2/12/2013 2.Data warehouse/D.S.Jagli 11
  • 12. Non-Volatile: It means that the data don’t keep changing, new data may be added on a scheduled basic but old data aren’t discarded Data Warehouse • No Data Update Load Read Read Read Read 2/12/2013 2.Data warehouse/D.S.Jagli 12
  • 13. These definitions are quite different on the surface .yet several common threads emerge from reading them as group: 1. A DWH containing a lot of data some of the data comes from operational source and outside. 2. A DWH is organized so as to facilitate the use of this data for decision making purpose. 3. The DWH provides tools by which end users can access this data 2/12/2013 2.Data warehouse/D.S.Jagli 13
  • 14. ODS Vs DW ODS DATAWARE HOUSE 1. Stores current operational data 1. Stores historic and current data 2. Perform numerous quick and 2. Perform complex queries on simple queries on small large amounts of data amounts of data 3. Contains small amount of 3. contains large amounts of static transactional data data 4. Day To Day Decisions, 4. Long-Term Decisions, 5. Current operational results, 5. Strategic reporting, 6. Tactical reporting, 6. Trend detection 7. Near to Normal form 7. Star Schema 8. Frequency of Load: Twice 8. Frequency of Load: Weekly, Daily , Daily, Weekly Monthly, Quarterly 14 2/12/2013 2.Data warehouse/D.S.Jagli 14
  • 15. OLTP vs. Data Warehouse  OLTP systems are tuned for known transactions and workloads while workload is not known a priori in a data warehouse.  Special data organization, access methods and implementation methods are needed to support data warehouse queries.  e.g. average amount spent on phone calls between 9AM-5PM in Pune during the month of December. 2/12/2013 2.Data warehouse/D.S.Jagli 15
  • 16. OLTP vs. Data Warehouse  OLTP  Warehouse (DSS)  Application Oriented  Subject Oriented  Used to run business  Used to analyze business  Detailed data  Summarized and refined  Current up to date  Snapshot data  Isolated Data  Integrated Data  Repetitive access  Ad-hoc access  Clerical User  Knowledge User (Manager) 2/12/2013 2.Data warehouse/D.S.Jagli 16
  • 17. Who uses DWHs?  Ideal DWH user is something like this: 1. This person’s job involves drawing conclusions from ,making decisions based on, large masses of data 2. This person also doesn’t want to access a database in a highly technical fashion 3. Any person who wants to analyze data and take a decision Eg: Tourists: Browse information harvested by farmers  Farmers: Harvest information from known access paths 2/12/2013 2.Data warehouse/D.S.Jagli 17
  • 18. Data Warehouse Architecture Source Data Data mining Data Storage Data Staging Relational Information Databases Delivery 1. EXTRACT Optimized Loader 2. TRANSFORM 3. CLEAN ERP 4. LOAD 5. REFRESH Report/User Query Systems Data Warehouse Engine Analyze Purchased Query Data OLAP Meta Data Legacy Data Metadata Repository 2/12/2013 2.Data warehouse/D.S.Jagli 18
  • 19. Overview of the Components 1. Source Data Component 2. Data Staging Component 3. Data Storage Component 4. Information Delivery Component 5. Metadata Component 6. Management and Control Component 2/12/2013 2.Data warehouse/D.S.Jagli 19
  • 20. Overview of the Components 1. Source Data Component: 1. Production Data : data comes from operational systems of the Enterprise. 2. Internal Data: in every organization keeps some private data. 3. Archived Data: it comes from back ups. 2/12/2013 2.Data warehouse/D.S.Jagli 20
  • 21. Topics to be covered 1. Getting data into the Data warehouse. 2. Extraction 3. Transformation 4. Cleansing 5. Loading 6. Summarization 7. Meta data 8. Data Marts 9. Data warehousing & ERP 10. Data warehousing & KM 11. Data warehousing & CRM 2/12/2013 2.Data warehouse/D.S.Jagli 21
  • 22. Overview of the Components: Getting data into the DWH 2.Data Staging Component  5 steps of data staging 1. Extraction 2. Transformation 3. Cleansing 4. Loading 5. Summarization 2/12/2013 2.Data warehouse/D.S.Jagli 22
  • 23. Overview of the Components: Getting data into the DWH Data staging component 1. Extracting  Capture of relevant data from operational source in “as is” status  Sources for data generally in legacy mainframes in, IMS, DB2; more data today in relational databases on Unix. 2. Transformation:  In the case Multiple input sources to a data warehouse, inconsistency can sometimes make data unusable.  Transformation is the process of dealing with these inconsistencies  Eg: Use of different names/formats  Mumbai, Bombay  Cust_id, C_id  dd/mm/yy, mm/dd/yy 2/12/2013 2.Data warehouse/D.S.Jagli 23
  • 24. Overview of the Components: Getting data into the DWH Data staging component 3. Cleansing  It is necessary to go through data entered into DWH and make it error free, this process is called Data cleansing.  This include missing data , incorrect data in one source, inconsistent data and conflicting data when two or more sources are involved.  Clean data is vital for the success of the warehouse  Eg:Seshadri, Sheshadri, Sesadri, Seshadri S., Srinivasan Seshadri, etc. are the same person 2/12/2013 2.Data warehouse/D.S.Jagli 24
  • 25. Overview of the Components: Getting data into the DWH Data staging component 4. Loading • It implies physical movement of data from the computers storing the source databases to that which will store the data warehouse. • Most common channel for the data movement process is a high-speed communication link. • It is always necessary to close off access to the DWH when the loading is taking place. 2/12/2013 2.Data warehouse/D.S.Jagli 25
  • 26. Overview of the Components: Getting data into the DWH Data staging component 5. Summarization :  In which any desired summaries of the data warehouse data are precalculated for later use  Once the DWH database has been loaded it is possible to create summaries.  Eg: base customer (1985-87)  custid, from date, to date, name, phone, dob  base customer (1988-90)  custid, from date, to date, name, credit rating, employer  customer activity (1986-89) -- monthly summary  customer activity detail (1987-89)  custid, activity date, amount, clerk id, order no 2/12/2013 2.Data warehouse/D.S.Jagli 26
  • 27. Overview of the Components: Getting data into the DWH Data staging component • Summarized data stored : advantages 1. reduce storage costs 2. reduce CPU usage 3. increases performance since smaller number of records to be processed 4. design around traditional high level reporting needs 2/12/2013 2.Data warehouse/D.S.Jagli 27
  • 28. Overview of the Components Data storage Component The data storage for data warehouse is a separate repository Propagate updates on source data to the warehouse  periodically (e.g., every night, every week) or after significant events.  refresh policy set by administrator based on user needs and traffic.  possibly different policies for different sources.  This function is time consuming. 2/12/2013 2.Data warehouse/D.S.Jagli 28
  • 29. Overview of the Components Information delivery component • In order to provide information to the wide community of data warehouse users the information delivery component includes different methods. • E g: Online ,intranet, internet, email. 2/12/2013 2.Data warehouse/D.S.Jagli 29
  • 30. Overview of the Components Meta data component Meta data are data that describe data in the data warehouse  Administrative metadata 1. source databases and their contents 2. gateway descriptions 3. warehouse schema, view & derived data definitions 4. dimensions, hierarchies 5. pre-defined queries and reports 6. data mart locations and contents 7. data partitions 8. data extraction, cleansing, transformation rules 9. data refresh 10. user profiles, user groups 11. security: user authorization, access control 2/12/2013 2.Data warehouse/D.S.Jagli 30
  • 31. Meta data component  Business meta data  business terms and definitions  ownership of data  operational metadata  data lineage: history of migrated data and sequence of transformations applied  currency of data: active, archived,  monitoring information: warehouse usage statistics, error reports, audit trails.  End- user meta data 2/12/2013 2.Data warehouse/D.S.Jagli 31
  • 32. Overview of the Components Management and control component  This component of the data warehouse sits on the top of all other components. it coordinates services and activities with within the DWH  It interacts with meta data component to perform the management and control functions 2/12/2013 2.Data warehouse/D.S.Jagli 32
  • 33. Content of DWH Data Operational data to warehouse data 4 Data levels: 1. Operational data Query processing 2. Atomic data 3. Summary data Summary data 4. Query processing Atomic data Operational data Eg1: My checking account balance right now is “ 1200/-Rs” Eg2: My checking account balance at the end of January was “25000/-Rs” Eg3: At the end of January 2011 we have 12500 customers Eg4: Our customers in Mumbai zone grew by 10% during last three months 2/12/2013 2.Data warehouse/D.S.Jagli 33
  • 35. Data mart  Data mart  Data mart a subset of a data warehouse that supports the requirements of particular department or business function.  Data mart focuses on only the requirements of users associated with one department or business function.  Data marts do not normally contain detailed operational data, unlike data warehouses.  As data marts contain less data compared with data warehouses, data marts are more easily understood and navigated. 2/12/2013 2.Data warehouse/D.S.Jagli 35
  • 36. From the Data Warehouse to Data Marts Information Individually Less Structured Departmentally History Structured Normalized Detailed Organizationally More Structured Data Warehouse Data 36
  • 37. Reasons for creating a data mart 1. To give users access to the data they need to analyze most often 2. To provide data in a form that matches the collective view of the data by a group of users in a department or business function 3. To improve end-user response time due to the reduction in the volume of data to be accessed 4. To provide appropriately structured data as dictated by the requirements of end-user access tools 5. data cleansing, loading, transformation, and integration are far easier, and hence implementing and setting up a data mart is simpler. 6. The cost of implementing data marts is normally less 7. The potential users of a data mart are more clearly defined and can be more easily targeted to obtain support for a data mart project 2/12/2013 2.Data warehouse/D.S.Jagli 37
  • 38. Data Warehouse and Data Marts OLAP Data Mart Lightly summarized Departmentally structured Organizationally structured Atomic Detailed Data Warehouse Data 38
  • 39. Characteristics of the Departmental Data Mart 1. OLAP 2. Small 3. Flexible 4. Customized by Department 5. Source is departmentally structured data warehouse 39
  • 40. Techniques for Creating Departmental Data Mart  OLAP Sales Finance Mktg.  Subset  Summarized  Superset  Indexed  Arrayed 40
  • 41. Data Mart Centric Data Sources Data Marts Data Warehouse 41
  • 42. Problems with Data Mart Centric Solution If you end up creating multiple warehouses, integrating them is a problem 42
  • 43. True Warehouse Data Sources Data Warehouse Data Marts 43
  • 44. Bill Inman Vs Ralph Kimball  Top -down Vs Bottom -up w.r.t Data Mart 2/12/2013 2.Data warehouse/D.S.Jagli 44
  • 45. A Comparison DM vs DWH DATA MART DATAWARE HOUSE Limited subject areas (one or two) Many subject areas One data mart for one business Enterprise repository with multiple theme/ subject area data marts Has limited dimensions and Has all dimensions and measures measures depends on the subject area required Time to build is short Long time activity Can be built as smaller scale DW Large scale 2/12/2013 2.Data warehouse/D.S.Jagli 45
  • 46. Continued growth in 2/12/2013 2.Data warehouse/D.S.Jagli 46