SlideShare ist ein Scribd-Unternehmen logo
1 von 63
INTRODUCTION
TO DATA
WAREHOUSING
INMON
Father of the data warehouse
Co-creator of the Corporate
Information Factory.
He has 35 years of
experience in database
technology management
and data warehouse design.
INMON-CONT’D
Bill has written about a variety
of topics on the building, usage,
& maintenance of the data warehouse
& the Corporate Information Factory.
He has written more than 650
articles (Datamation, ComputerWorld,
and Byte Magazine).
Inmon has published 45 books.
 Many of books has been translated to Chinese, Dutch, French, German, Japanese,
Korean, Portuguese, Russian, and Spanish.
INTRODUCTION
What is Data Warehouse?
A data warehouse is a collection of integrated
databases designed to support a DSS.
According to Inmon’s (father of data
warehousing) definition(Inmon,1992a,p.5):
 It is a collection of integrated, subject-oriented databases designed to support the DSS
function, where each unit of data is non-volatile and relevant to some moment in time.
INTRODUCTION-CONT’D.
Where is it used?
It is used for evaluating future strategy.
It needs a successful technician:
 Flexible.
 Team player.
 Good balance of business and technical understanding.
INTRODUCTION-CONT’D.
The ultimate use of data warehouse is Mass Customization.
 For example, it increased Capital One’s customers from
1 million to approximately 9 millions in 8 years.
Just like a muscle: DW increases in strength with active use.
 With each new test and product, valuable information is
added to the DW, allowing the analyst to learn from the
success and failure of the past.
The key to survival:
 Is the ability to analyze, plan, and react to changing
business conditions in a much more rapid fashion.
DATA WAREHOUSE
In order for data to be effective, DW must be:
 Consistent.
 Well integrated.
 Well defined.
 Time stamped.
DW environment:
 The data store, data mart & the metadata.
THE DATA STORE
An operational data store (ODS) stores data for a
specific application. It feeds the data warehouse a
stream of desired raw data.
Is the most common component of DW environment.
Data store is generally subject oriented, volatile, current
commonly focused on customers, products, orders,
policies, claims, etc…
DATA STORE & DATA
WAREHOUSEData store & Data warehouse, table 10-1 page 296
THE DATA STORE-CONT’D.
Its day-to-day function is to store the data for a single specific set of operational
application.
Its function is to feed the data warehouse data for the purpose of analysis.
THE DATA MART
It is lower-cost, scaled down version of the DW.
Data Mart offer a targeted and less costly method of gaining the advantages
associated with data warehousing and can be scaled up to a full DW
environment over time.
DATA WAREHOUSE AND
DATA MART
DATA MART
DATA SOURCE
DATA SOURCES, DATA MART
AND DATA WAREHOUSE
THE META DATA
Last component of DW environments.
It is information that is kept about the warehouse
rather than information kept within the warehouse.
Legacy systems generally don’t keep a record of
characteristics of the data (such as what pieces of
data exist and where they are located).
The metadata is simply data about data.
CONCLUSION
A Data Warehouse is a collection of integrated subject-
oriented databases designed to support a DSS.
 Each unit of data is non-volatile and relevant to some moment in time.
An operational data store (ODS) stores data for a specific
application. It feeds the data warehouse a stream of
desired raw data.
A data mart is a lower-cost, scaled-down version of a data
warehouse, usually designed to support a small group of
users (rather than the entire firm).
The metadata is information that is kept about the
warehouse.
DATA WAREHOUSE
Subject oriented
Data integrated
Time variant
Nonvolatile
CHARACTERISTICS OF DATA
WAREHOUSE
 Subject oriented. Data are organized
based on how the users refer to them.
 Integrated. All inconsistencies regarding
naming convention and value
representations are removed.
 Nonvolatile. Data are stored in read-only
format and do not change over time.
 Time variant. Data are not current but
normally time series.
CHARACTERISTICS OF DATA
WAREHOUSE
 Summarized Operational data are mapped
into a decision-usable format
 Large volume. Time series data sets are
normally quite large.
 Not normalized. DW data can be, and
often are, redundant.
 Metadata. Data about data are stored.
 Data sources. Data come from internal
and external unintegrated operational
systems.
A DATA WAREHOUSE IS SUBJECT
ORIENTED
SUBJECT ORIENTATION
Application EnvironmentApplication Environment Data warehouseData warehouse
EnvironmentEnvironment
Design activities must be equallyDesign activities must be equally
focused on both process and databasefocused on both process and database
designdesign
DW world is primarily void of processDW world is primarily void of process
design and tends to focus exclusively ondesign and tends to focus exclusively on
issues of data modeling and databaseissues of data modeling and database
designdesign
DATA INTEGRATED
Integration –consistency naming conventions and measurement
attributers, accuracy, and common aggregation.
Establishment of a common unit of measure for all synonymous data
elements from dissimilar database.
The data must be stored in the DW in an integrated, globally acceptable
manner
DATA INTEGRATED
TIME VARIANT
In an operational application system, the
expectation is that all data within the
database are accurate as of the moment
of access. In the DW data are simply
assumed to be accurate as of some
moment in time and not necessarily right
now.
One of the places where DW data display
time variance is in the structure of the
record key. Every primary key contained
within the DW must contain, either
implicitly or explicitly an element of
time( day, week, month, etc)
TIME VARIANT
Every piece of data contained within the warehouse must be associated
with a particular point in time if any useful analysis is to be
conducted with it.
Another aspect of time variance in DW data is that, once recorded, data
within the warehouse cannot be updated or changed.
NONVOLATILITY
Typical activities such as deletes, inserts, and changes that are
performed in an operational application environment are completely
nonexistent in a DW environment.
Only two data operations are ever performed in the DW: data loading
and data access
NONVOLATILITY
Application DW
The design issues must focus on data
integrity and update anomalies. Complex
processes must be coded to ensure that the
data update processes allow for high
integrity of the final product.
Such issues are no concern to in a DW
environment because data update is never
performed.
Data is placed in normalized form to
ensure a minimal redundancy (totals that
could be calculated would never be
stored)
Designers find it useful to store many of
such calculations or summarizations.
The technologies necessary to support
issues of transaction and data recovery,
roll back, and detection and remedy of
deadlock are quite complex.
Relative simplicity in technology
ISSUES OF DATA REDUNDANCY
BETWEEN DW AND OPERATIONAL
ENVIRONMENTS
The lack of relevancy of issues such as data
normalization in the DW environment may suggest
that existence of massive data redundancy within
the data warehouse and between the operational
and DW environments.
Inmon(1992) pointed out and proved that it is not
true.
ISSUES OF DATA REDUNDANCY BETWEEN DW
AND OPERATIONAL ENVIRONMENTS
The data being loaded into the DW are filtered and “cleansed” as
they pass from the operational database to the warehouse.
Because of this cleansing numerous data that exists in the
operational environment never pass to the data warehouse.
Only the data necessary for processing by the DSS or EIS are
ever actually loaded into the DW.
The time horizons for warehouse and operational data elements are
unique. Data in the operational environment are fresh,
whereas warehouse data are generally much older.(so there is
minimal opportunity of the data to overlap between two
environments )
The data loaded into the DW often undergo a radical transformation
as they pass form operational to the DW environment. So data
in DW are not the same.
Given this factors, Inmon suggests that data redundancy
between the two environments is a rare occurrence with a
THE DATA WAREHOUSE ARCHITECTURE
The architecture consists of various interconnected elements:
 Operational and external database layer – the source data for the DW
 Information access layer – the tools the end user access to extract and analyze the
data
 Data access layer – the interface between the operational and information access
layers
 Metadata layer – the data directory or repository of metadata information
COMPONENTS OF THE DATA
WAREHOUSE ARCHITECTURE
THE DATA WAREHOUSE ARCHITECTURE
Additional layers are:
 Process management layer – the scheduler or job controller
 Application messaging layer – the “middleware” that
transports information around the firm
 Physical data warehouse layer – where the actual data used
in the DSS are located
 Data staging layer – all of the processes necessary to select,
edit, summarize and load warehouse data from the
operational and external data bases
DATA WAREHOUSING TYPOLOGY
The virtual data warehouse – the end users
have direct access to the data stores,
using tools enabled at the data access
layer
The central data warehouse – a single
physical database contains all of the data
for a specific functional area
The distributed data warehouse – the
components are distributed across
several physical databases
THE METADATA
The name suggests some high-level
technological concept, but it really is
fairly simple. Metadata is “data about
data”.
With the emergence of the data warehouse
as a decision support structure, the
metadata are considered as much a
resource as the business data they
describe.
Metadata are abstractions -- they are high
level data that provide concise
descriptions of lower-level data.
THE METADATA
For example, a line in a sales database may
contain: 4056 KJ596 223.45
This is mostly meaningless until we consult the
metadata that tells us it was store number 4056,
product KJ596 and sales of $223.45
The metadata are essential ingredients in the
transformation of raw data into knowledge. They
are the “keys” that allow us to handle the raw
data.
GENERAL METADATA ISSUES
General metadata issues associated with
Data Warehouse use:
What tables, attributes and keys does the DW contain?
Where did each set of data come from?
What transformations were applied with cleansing?
How have the metadata changed over time?
How often do the data get reloaded?
Are there so many data elements that you need to be
careful what you ask for?
COMPONENTS OF THE METADATA
Transformation maps – records that show what transformations were
applied
Extraction & relationship history – records that show what data was
analyzed
Algorithms for summarization – methods available for aggregating and
summarizing
Data ownership – records that show origin
Patterns of access – records that show what data are accessed and
how often
TYPICAL MAPPING METADATA
Transformation mapping records include:
 Identification of original source
 Attribute conversions
 Physical characteristic conversions
 Encoding/reference table conversions
 Naming changes
 Key changes
 Values of default attributes
 Logic to choose from multiple sources
 Algorithmic changes
IMPLEMENTING THE DATA
WAREHOUSE
Kozar list of “seven deadly sins” of data
warehouse implementation:
1. “If you build it, they will come” – the DW needs to be
designed to meet people’s needs
2. Omission of an architectural framework – you need to
consider the number of users, volume of data, update
cycle, etc.
3. Underestimating the importance of documenting
assumptions – the assumptions and potential conflicts
must be included in the framework
“SEVEN DEADLY SINS”, CONTINUED
4. Failure to use the right tool – a DW project needs
different tools than those used to develop an
application
5. Life cycle abuse – in a DW, the life cycle really
never ends
6. Ignorance about data conflicts – resolving these
takes a lot more effort than most people realize
7. Failure to learn from mistakes – since one DW
project tends to beget another, learning from the
early mistakes will yield higher quality later
DATA WAREHOUSE TECHNOLOGIES
No one currently offers an end-to-end DW
solution. Organizations buy bits and
pieces from a number of vendors and
hopefully make them work together.
SAS, IBM, Software AG, Information Builders
and Platinum offer solutions that are at
least fairly comprehensive.
The market is very competitive. Table 10-6
in the text lists 90 firms that produce DW
products.
THE FUTURE OF DATA
WAREHOUSING
As the DW becomes a standard part of an
organization, there will be efforts to find
new ways to use the data. This will likely
bring with it several new challenges:
 Regulatory constraints may limit the ability to combine
sources of disparate data.
 These disparate sources are likely to contain unstructured
data, which is hard to store.
 The Internet makes it possible to access data from virtually
“anywhere”. Of course, this just increases the disparity.
Interesting Facts
Data Can be Used
To
Robust
Infrastructure
Success of Data
Warehouse
Projects
Implementing
Data
Warehouse
Real Time Alerts
& Integration
Identity Theft
What Can You
Do?
OBJECTIVE
INTERESTING FACTS
Harrah’s Entertainment’s Data Warehouse
holds 30 terabytes, or 30 trillion bytes of
data, roughly three times the number of
printed characters in the Library of
Congress
Casinos, retailers, airlines, and banks are
piling up data so vast, it would have been
unthinkable years ago; result from the
curse of cheap storage
INTERESTING FACTS
Storage Shipments as of 2004: 22 exabytes or 22 million trillion bytes
of hard disk space, double the amount in 2002.
Equivalent to 4x’s the space needed to store every word ever spoken by
every human being who has ever lived.
Should double again in 2006
DATA CAN BE USED TO
Quantify the volume impact of vehicles across the
marketing matrix
Account for decay and saturation factors in the
determination of investment choices and returns
Execute “what-if” simulations of pricing or promotional
scenarios before a proposed action is taken
Provide a continuous planning, measurement, analysis
and optimization cycle supported by a software
structure
Deliver robust data feeds into other systems supporting
supply chain, sales, and financial reporting and
endeavors
ROBUST INFRASTRUCTURE
Data Identification and Acquisition
Data Cleansing, Mapping, and Transformation
Production System Loading and Ongoing Update
SUCCESS OF DATA WAREHOUSE
PROJECTSOver half of Data Warehouse projects are
Doomed
 Fail due to lack of attention to Data Quality Issues
 More than half only have limited acceptance
 Consistency and Accuracy of Data
 Most businesses fail to use business intelligence (BI) strategically
 IT organizations build data warehouses with little to no business
involvement
“A
REAL-TIM
E
EN
TERPR
W
ITHOUT REAL-TIM
E
BUSIN
ESS
IN
TELLIGEN
CE
IS
A
REAL FAST,DUM
B
ORGANIZATION.”
STEPHEN BROBST
CHIEF TECHNOLOGY OFFICE
TERADATA
SUCCESS OF DATA WAREHOUSE
PROJECTS
Most challenging type of deployment for an
enterprise
 Large scale and complex system configurations
 Sophisticated data modeling and analysis tools
 High visibility in broad range of important business functions
within company
 Adoption of Linux-Based Platform
IMPLEMENTING DATA WAREHOUSE
Challenges:
 Identifying new processes
 Assuring there were of real use
 Implementing and ensuring cultural shifts
 Managing content and New communities towards a common benefit
 Linear models
 Standards, Governance, Controls, Valuation
TERADATA
Division of NCR in Dayton, Ohio
Competitor of IBM and Oracle
Multi-million Dollar Machines to run the world’s biggest data
warehouses
 Wal-Mart
 Bank of America
 Verizon Wireless
TERADATA’S SUCCESS
Conventional IBM or Sun Microsystems overload for a couple hours to
days on a few terabytes and/or data queries
IBM cannot return computation on certain complex requests
Equivalent to having data but not able to use it.
REAL TIME ALERTS & INTEGRATION
Teradata 8.0 Version released in Oct 2004
 Improves real-time alerts and integration
Businesses can analyze operational info
against historical info to identify events
in near real-time using the new table
design
Used by:
 Continental Airlines in the US: reroute passengers on
delayed flights, reissuing tickets, reserving a room in a hotel
booking system
 Southwest Airlines- savings between $1.2-$1.4 Million
IDENTITY THEFT
Government Regulation of Personal Data is Needed
(National Consumer Protection Standards)
ChoicePoint Folly
 Georgia-based data-collection company
 Founded in 1997 to analyze insurance claims information, but now
provides data to customers including finance companies, law
enforcement, and government
 Obtain personal information by perusing public records, or purchasing
the information from other companies
IDENTITY THEFT
Duped by scammers who set 150 phony
accounts to access personal data of as
many as 145,000 people nationwide
Scammers set user accounts by faxing in
phony business licenses, undetected for
one year
750 people had their identities stolen
Theft would have gone unnoticed without
California Identity theft law SB 1386
IDENTITY THEFT
MSN Event
Data Warehouse Information Gathering
Over the Phone Interviews
Trash Can Hunting
Gathered from Doctors, Internet
Transactions, Telephone Operators
(Overseas or Prisoners)
MSN EMAIL
WHAT CAN YOU DO?
Carefully monitor your credit card bills and
credit reports
Request a once a year free access credit
report via the three big credit agencies.
 Equifax, Experian, TransUnion
Victims: contact Federal Trade Commission
to report the theft and monitor credit
reports.
 1-800-IDTHEFT
REFERENCES
Decision Support Systems in the 21st
Century 2nd
Edition, by George M.
Marakas, Prentice Hall, Upper Saddle River, NJ, 2003
http://seattletimes.nwsource.com/html/editorialsopinion/2002191098_cr
Seattle times, plugging holes in data warehousing
Teradata warehouse improves real-time alerts and integration
Cliff Saran. Computer Weekly. Sutton: Oct 12, 2004. p. 22 (1
page)
ON THE MARK
Mark Hall. Computerworld. Framingham: Oct 18, 2004. Vol. 38,
Iss. 42; p. 6 (1 page)
Optimization: It's All About the Data Brandweek: Ellen Pederson,
Mark Anderson
THE NO-SACRIFICE, AFFORDABLE DATA WAREHOUSE APP Intelligent
Enterprises, Michael Gonzalez
REFERENCES
http://www.dmreview.com/article_sub.cfm?articleId=7071
Convergence- Beyond the Data Warehouse
http://www.computerworld.com/printthis/2001/0,4814,56969,00.
html Micro-segmentation – Computerworld
Too Much Information Forbes article on data warehouse
http://reviews.cnet.com/4520-3513_7-5690533-1.html When
identity thieves strike data warehouses
Over half of data warehouse projects doomed VNU Business
Publications Limited, Robert Jaques 25 February 2005
http://www.linuxworld.com/magazine/?issueid=571 Linux World
Article
QUESTIONS?

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing Girish Dhareshwar
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guidethomasmary607
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
Data mining and data warehousing
Data mining and data warehousingData mining and data warehousing
Data mining and data warehousingumesh patil
 
Enterprise resource planning system & data warehousing implementation
Enterprise resource planning system & data warehousing implementationEnterprise resource planning system & data warehousing implementation
Enterprise resource planning system & data warehousing implementationSumya Abdelrazek
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 
DATA Warehousing & Data Mining
DATA Warehousing & Data MiningDATA Warehousing & Data Mining
DATA Warehousing & Data Miningcpjcollege
 
Project Presentation on Data WareHouse
Project Presentation on Data WareHouseProject Presentation on Data WareHouse
Project Presentation on Data WareHouseAbhi Bhardwaj
 
Data warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika KotechaData warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika KotechaRadhika Kotecha
 
Components of a Data-Warehouse
Components of a Data-WarehouseComponents of a Data-Warehouse
Components of a Data-WarehouseAbdul Aslam
 
Dwdm 2(data warehouse)
Dwdm 2(data warehouse)Dwdm 2(data warehouse)
Dwdm 2(data warehouse)Er Bansal
 
Basic Introduction of Data Warehousing from Adiva Consulting
Basic Introduction of  Data Warehousing from Adiva ConsultingBasic Introduction of  Data Warehousing from Adiva Consulting
Basic Introduction of Data Warehousing from Adiva Consultingadivasoft
 
Data warehouseconceptsandarchitecture
Data warehouseconceptsandarchitectureData warehouseconceptsandarchitecture
Data warehouseconceptsandarchitecturesamaksh1982
 
Data mining and data warehousing
Data mining and data warehousingData mining and data warehousing
Data mining and data warehousingSatya P. Joshi
 

Was ist angesagt? (20)

Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing
 
Data ware house
Data ware houseData ware house
Data ware house
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Data mining and data warehousing
Data mining and data warehousingData mining and data warehousing
Data mining and data warehousing
 
Data warehouse
Data warehouse Data warehouse
Data warehouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Enterprise resource planning system & data warehousing implementation
Enterprise resource planning system & data warehousing implementationEnterprise resource planning system & data warehousing implementation
Enterprise resource planning system & data warehousing implementation
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
DATA Warehousing & Data Mining
DATA Warehousing & Data MiningDATA Warehousing & Data Mining
DATA Warehousing & Data Mining
 
Project Presentation on Data WareHouse
Project Presentation on Data WareHouseProject Presentation on Data WareHouse
Project Presentation on Data WareHouse
 
Data warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika KotechaData warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika Kotecha
 
Components of a Data-Warehouse
Components of a Data-WarehouseComponents of a Data-Warehouse
Components of a Data-Warehouse
 
Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 
Dwdm 2(data warehouse)
Dwdm 2(data warehouse)Dwdm 2(data warehouse)
Dwdm 2(data warehouse)
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Ppt
PptPpt
Ppt
 
Basic Introduction of Data Warehousing from Adiva Consulting
Basic Introduction of  Data Warehousing from Adiva ConsultingBasic Introduction of  Data Warehousing from Adiva Consulting
Basic Introduction of Data Warehousing from Adiva Consulting
 
Data warehouseconceptsandarchitecture
Data warehouseconceptsandarchitectureData warehouseconceptsandarchitecture
Data warehouseconceptsandarchitecture
 
Data mining and data warehousing
Data mining and data warehousingData mining and data warehousing
Data mining and data warehousing
 

Andere mochten auch

DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingJason S
 
Datawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceDatawarehousing and Business Intelligence
Datawarehousing and Business IntelligencePrithwis Mukerjee
 
2013 OHSUG - Clinical Data Warehouse Implementation
2013 OHSUG - Clinical Data Warehouse Implementation2013 OHSUG - Clinical Data Warehouse Implementation
2013 OHSUG - Clinical Data Warehouse ImplementationPerficient
 
Bi presentation Designing and Implementing Business Intelligence Systems
Bi presentation   Designing and Implementing Business Intelligence SystemsBi presentation   Designing and Implementing Business Intelligence Systems
Bi presentation Designing and Implementing Business Intelligence SystemsVispi Munshi
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecyclebartlowe
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse conceptsobieefans
 
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...Alan D. Duncan
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Introduction to Business Intelligence
Introduction to Business IntelligenceIntroduction to Business Intelligence
Introduction to Business IntelligenceAlmog Ramrajkar
 

Andere mochten auch (13)

DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Introduction to data warehouse
Introduction to data warehouseIntroduction to data warehouse
Introduction to data warehouse
 
Datawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceDatawarehousing and Business Intelligence
Datawarehousing and Business Intelligence
 
2013 OHSUG - Clinical Data Warehouse Implementation
2013 OHSUG - Clinical Data Warehouse Implementation2013 OHSUG - Clinical Data Warehouse Implementation
2013 OHSUG - Clinical Data Warehouse Implementation
 
Bi presentation Designing and Implementing Business Intelligence Systems
Bi presentation   Designing and Implementing Business Intelligence SystemsBi presentation   Designing and Implementing Business Intelligence Systems
Bi presentation Designing and Implementing Business Intelligence Systems
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecycle
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Introduction to Business Intelligence
Introduction to Business IntelligenceIntroduction to Business Intelligence
Introduction to Business Intelligence
 

Ähnlich wie Introduction to data warehousing

DATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forDATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forAyushMeraki1
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseSOMASUNDARAM T
 
UNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxUNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxDURGADEVIL
 
Rando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suiteRando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suiteCarlo Vaccari
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Materialobieefans
 
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptxUNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptxshruthisweety4
 
Data warehouse
Data warehouseData warehouse
Data warehouseRajThakuri
 
Database management system.pptx
Database management system.pptxDatabase management system.pptx
Database management system.pptxRamyaGr4
 
Introduction-to-Databases.pptx
Introduction-to-Databases.pptxIntroduction-to-Databases.pptx
Introduction-to-Databases.pptxIvanDarrylLopez
 
Prague data management meetup 2017-02-28
Prague data management meetup 2017-02-28Prague data management meetup 2017-02-28
Prague data management meetup 2017-02-28Martin Bém
 
Data Warehouse By Piyush
Data Warehouse By PiyushData Warehouse By Piyush
Data Warehouse By Piyushastronish
 
1. Briefly describe the major components of a data warehouse archi.docx
1. Briefly describe the major components of a data warehouse archi.docx1. Briefly describe the major components of a data warehouse archi.docx
1. Briefly describe the major components of a data warehouse archi.docxmonicafrancis71118
 
Data warehouse and data mining.pptx
Data warehouse and data mining.pptxData warehouse and data mining.pptx
Data warehouse and data mining.pptxChristinaGayenMondal
 

Ähnlich wie Introduction to data warehousing (20)

Presentation
PresentationPresentation
Presentation
 
DATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forDATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining for
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
DW 101
DW 101DW 101
DW 101
 
Data and Database
Data and DatabaseData and Database
Data and Database
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
UNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxUNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docx
 
Rando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suiteRando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suite
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Material
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptxUNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Database management system.pptx
Database management system.pptxDatabase management system.pptx
Database management system.pptx
 
Introduction-to-Databases.pptx
Introduction-to-Databases.pptxIntroduction-to-Databases.pptx
Introduction-to-Databases.pptx
 
Prague data management meetup 2017-02-28
Prague data management meetup 2017-02-28Prague data management meetup 2017-02-28
Prague data management meetup 2017-02-28
 
Unit 1
Unit 1Unit 1
Unit 1
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data Warehouse By Piyush
Data Warehouse By PiyushData Warehouse By Piyush
Data Warehouse By Piyush
 
1. Briefly describe the major components of a data warehouse archi.docx
1. Briefly describe the major components of a data warehouse archi.docx1. Briefly describe the major components of a data warehouse archi.docx
1. Briefly describe the major components of a data warehouse archi.docx
 
Data warehouse and data mining.pptx
Data warehouse and data mining.pptxData warehouse and data mining.pptx
Data warehouse and data mining.pptx
 

Mehr von uncleRhyme

Performance appraisal
Performance appraisalPerformance appraisal
Performance appraisaluncleRhyme
 
Labor managementrelations
Labor managementrelationsLabor managementrelations
Labor managementrelationsuncleRhyme
 
Compensation and benefits
Compensation and benefitsCompensation and benefits
Compensation and benefitsuncleRhyme
 
Training and developing employees
Training and developing employeesTraining and developing employees
Training and developing employeesuncleRhyme
 
02 database oprimization - improving sql performance - ent-db
02  database oprimization - improving sql performance - ent-db02  database oprimization - improving sql performance - ent-db
02 database oprimization - improving sql performance - ent-dbuncleRhyme
 
01 database security ent-db
01  database security ent-db01  database security ent-db
01 database security ent-dbuncleRhyme
 
networkmedia presentation1
networkmedia presentation1networkmedia presentation1
networkmedia presentation1uncleRhyme
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architectureuncleRhyme
 
Ictinfraosi7 layers tcpipmodel2016e
Ictinfraosi7 layers tcpipmodel2016eIctinfraosi7 layers tcpipmodel2016e
Ictinfraosi7 layers tcpipmodel2016euncleRhyme
 
Introduction to data warehousing
Introduction to data warehousingIntroduction to data warehousing
Introduction to data warehousinguncleRhyme
 
Ictinframodule1
Ictinframodule1Ictinframodule1
Ictinframodule1uncleRhyme
 
Revenue and profit
Revenue and profitRevenue and profit
Revenue and profituncleRhyme
 

Mehr von uncleRhyme (16)

Performance appraisal
Performance appraisalPerformance appraisal
Performance appraisal
 
Labor managementrelations
Labor managementrelationsLabor managementrelations
Labor managementrelations
 
Compensation and benefits
Compensation and benefitsCompensation and benefits
Compensation and benefits
 
Training and developing employees
Training and developing employeesTraining and developing employees
Training and developing employees
 
Chapter 4
Chapter 4Chapter 4
Chapter 4
 
Chapter 3
Chapter 3Chapter 3
Chapter 3
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Chapter 1
Chapter 1Chapter 1
Chapter 1
 
02 database oprimization - improving sql performance - ent-db
02  database oprimization - improving sql performance - ent-db02  database oprimization - improving sql performance - ent-db
02 database oprimization - improving sql performance - ent-db
 
01 database security ent-db
01  database security ent-db01  database security ent-db
01 database security ent-db
 
networkmedia presentation1
networkmedia presentation1networkmedia presentation1
networkmedia presentation1
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Ictinfraosi7 layers tcpipmodel2016e
Ictinfraosi7 layers tcpipmodel2016eIctinfraosi7 layers tcpipmodel2016e
Ictinfraosi7 layers tcpipmodel2016e
 
Introduction to data warehousing
Introduction to data warehousingIntroduction to data warehousing
Introduction to data warehousing
 
Ictinframodule1
Ictinframodule1Ictinframodule1
Ictinframodule1
 
Revenue and profit
Revenue and profitRevenue and profit
Revenue and profit
 

Kürzlich hochgeladen

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 

Kürzlich hochgeladen (20)

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 

Introduction to data warehousing

  • 2. INMON Father of the data warehouse Co-creator of the Corporate Information Factory. He has 35 years of experience in database technology management and data warehouse design.
  • 3. INMON-CONT’D Bill has written about a variety of topics on the building, usage, & maintenance of the data warehouse & the Corporate Information Factory. He has written more than 650 articles (Datamation, ComputerWorld, and Byte Magazine). Inmon has published 45 books.  Many of books has been translated to Chinese, Dutch, French, German, Japanese, Korean, Portuguese, Russian, and Spanish.
  • 4. INTRODUCTION What is Data Warehouse? A data warehouse is a collection of integrated databases designed to support a DSS. According to Inmon’s (father of data warehousing) definition(Inmon,1992a,p.5):  It is a collection of integrated, subject-oriented databases designed to support the DSS function, where each unit of data is non-volatile and relevant to some moment in time.
  • 5. INTRODUCTION-CONT’D. Where is it used? It is used for evaluating future strategy. It needs a successful technician:  Flexible.  Team player.  Good balance of business and technical understanding.
  • 6. INTRODUCTION-CONT’D. The ultimate use of data warehouse is Mass Customization.  For example, it increased Capital One’s customers from 1 million to approximately 9 millions in 8 years. Just like a muscle: DW increases in strength with active use.  With each new test and product, valuable information is added to the DW, allowing the analyst to learn from the success and failure of the past. The key to survival:  Is the ability to analyze, plan, and react to changing business conditions in a much more rapid fashion.
  • 7. DATA WAREHOUSE In order for data to be effective, DW must be:  Consistent.  Well integrated.  Well defined.  Time stamped. DW environment:  The data store, data mart & the metadata.
  • 8. THE DATA STORE An operational data store (ODS) stores data for a specific application. It feeds the data warehouse a stream of desired raw data. Is the most common component of DW environment. Data store is generally subject oriented, volatile, current commonly focused on customers, products, orders, policies, claims, etc…
  • 9. DATA STORE & DATA WAREHOUSEData store & Data warehouse, table 10-1 page 296
  • 10. THE DATA STORE-CONT’D. Its day-to-day function is to store the data for a single specific set of operational application. Its function is to feed the data warehouse data for the purpose of analysis.
  • 11. THE DATA MART It is lower-cost, scaled down version of the DW. Data Mart offer a targeted and less costly method of gaining the advantages associated with data warehousing and can be scaled up to a full DW environment over time.
  • 15. DATA SOURCES, DATA MART AND DATA WAREHOUSE
  • 16. THE META DATA Last component of DW environments. It is information that is kept about the warehouse rather than information kept within the warehouse. Legacy systems generally don’t keep a record of characteristics of the data (such as what pieces of data exist and where they are located). The metadata is simply data about data.
  • 17. CONCLUSION A Data Warehouse is a collection of integrated subject- oriented databases designed to support a DSS.  Each unit of data is non-volatile and relevant to some moment in time. An operational data store (ODS) stores data for a specific application. It feeds the data warehouse a stream of desired raw data. A data mart is a lower-cost, scaled-down version of a data warehouse, usually designed to support a small group of users (rather than the entire firm). The metadata is information that is kept about the warehouse.
  • 18. DATA WAREHOUSE Subject oriented Data integrated Time variant Nonvolatile
  • 19. CHARACTERISTICS OF DATA WAREHOUSE  Subject oriented. Data are organized based on how the users refer to them.  Integrated. All inconsistencies regarding naming convention and value representations are removed.  Nonvolatile. Data are stored in read-only format and do not change over time.  Time variant. Data are not current but normally time series.
  • 20. CHARACTERISTICS OF DATA WAREHOUSE  Summarized Operational data are mapped into a decision-usable format  Large volume. Time series data sets are normally quite large.  Not normalized. DW data can be, and often are, redundant.  Metadata. Data about data are stored.  Data sources. Data come from internal and external unintegrated operational systems.
  • 21. A DATA WAREHOUSE IS SUBJECT ORIENTED
  • 22. SUBJECT ORIENTATION Application EnvironmentApplication Environment Data warehouseData warehouse EnvironmentEnvironment Design activities must be equallyDesign activities must be equally focused on both process and databasefocused on both process and database designdesign DW world is primarily void of processDW world is primarily void of process design and tends to focus exclusively ondesign and tends to focus exclusively on issues of data modeling and databaseissues of data modeling and database designdesign
  • 23. DATA INTEGRATED Integration –consistency naming conventions and measurement attributers, accuracy, and common aggregation. Establishment of a common unit of measure for all synonymous data elements from dissimilar database. The data must be stored in the DW in an integrated, globally acceptable manner
  • 25. TIME VARIANT In an operational application system, the expectation is that all data within the database are accurate as of the moment of access. In the DW data are simply assumed to be accurate as of some moment in time and not necessarily right now. One of the places where DW data display time variance is in the structure of the record key. Every primary key contained within the DW must contain, either implicitly or explicitly an element of time( day, week, month, etc)
  • 26. TIME VARIANT Every piece of data contained within the warehouse must be associated with a particular point in time if any useful analysis is to be conducted with it. Another aspect of time variance in DW data is that, once recorded, data within the warehouse cannot be updated or changed.
  • 27. NONVOLATILITY Typical activities such as deletes, inserts, and changes that are performed in an operational application environment are completely nonexistent in a DW environment. Only two data operations are ever performed in the DW: data loading and data access
  • 28. NONVOLATILITY Application DW The design issues must focus on data integrity and update anomalies. Complex processes must be coded to ensure that the data update processes allow for high integrity of the final product. Such issues are no concern to in a DW environment because data update is never performed. Data is placed in normalized form to ensure a minimal redundancy (totals that could be calculated would never be stored) Designers find it useful to store many of such calculations or summarizations. The technologies necessary to support issues of transaction and data recovery, roll back, and detection and remedy of deadlock are quite complex. Relative simplicity in technology
  • 29. ISSUES OF DATA REDUNDANCY BETWEEN DW AND OPERATIONAL ENVIRONMENTS The lack of relevancy of issues such as data normalization in the DW environment may suggest that existence of massive data redundancy within the data warehouse and between the operational and DW environments. Inmon(1992) pointed out and proved that it is not true.
  • 30. ISSUES OF DATA REDUNDANCY BETWEEN DW AND OPERATIONAL ENVIRONMENTS The data being loaded into the DW are filtered and “cleansed” as they pass from the operational database to the warehouse. Because of this cleansing numerous data that exists in the operational environment never pass to the data warehouse. Only the data necessary for processing by the DSS or EIS are ever actually loaded into the DW. The time horizons for warehouse and operational data elements are unique. Data in the operational environment are fresh, whereas warehouse data are generally much older.(so there is minimal opportunity of the data to overlap between two environments ) The data loaded into the DW often undergo a radical transformation as they pass form operational to the DW environment. So data in DW are not the same. Given this factors, Inmon suggests that data redundancy between the two environments is a rare occurrence with a
  • 31. THE DATA WAREHOUSE ARCHITECTURE The architecture consists of various interconnected elements:  Operational and external database layer – the source data for the DW  Information access layer – the tools the end user access to extract and analyze the data  Data access layer – the interface between the operational and information access layers  Metadata layer – the data directory or repository of metadata information
  • 32. COMPONENTS OF THE DATA WAREHOUSE ARCHITECTURE
  • 33. THE DATA WAREHOUSE ARCHITECTURE Additional layers are:  Process management layer – the scheduler or job controller  Application messaging layer – the “middleware” that transports information around the firm  Physical data warehouse layer – where the actual data used in the DSS are located  Data staging layer – all of the processes necessary to select, edit, summarize and load warehouse data from the operational and external data bases
  • 34. DATA WAREHOUSING TYPOLOGY The virtual data warehouse – the end users have direct access to the data stores, using tools enabled at the data access layer The central data warehouse – a single physical database contains all of the data for a specific functional area The distributed data warehouse – the components are distributed across several physical databases
  • 35. THE METADATA The name suggests some high-level technological concept, but it really is fairly simple. Metadata is “data about data”. With the emergence of the data warehouse as a decision support structure, the metadata are considered as much a resource as the business data they describe. Metadata are abstractions -- they are high level data that provide concise descriptions of lower-level data.
  • 36. THE METADATA For example, a line in a sales database may contain: 4056 KJ596 223.45 This is mostly meaningless until we consult the metadata that tells us it was store number 4056, product KJ596 and sales of $223.45 The metadata are essential ingredients in the transformation of raw data into knowledge. They are the “keys” that allow us to handle the raw data.
  • 37. GENERAL METADATA ISSUES General metadata issues associated with Data Warehouse use: What tables, attributes and keys does the DW contain? Where did each set of data come from? What transformations were applied with cleansing? How have the metadata changed over time? How often do the data get reloaded? Are there so many data elements that you need to be careful what you ask for?
  • 38. COMPONENTS OF THE METADATA Transformation maps – records that show what transformations were applied Extraction & relationship history – records that show what data was analyzed Algorithms for summarization – methods available for aggregating and summarizing Data ownership – records that show origin Patterns of access – records that show what data are accessed and how often
  • 39. TYPICAL MAPPING METADATA Transformation mapping records include:  Identification of original source  Attribute conversions  Physical characteristic conversions  Encoding/reference table conversions  Naming changes  Key changes  Values of default attributes  Logic to choose from multiple sources  Algorithmic changes
  • 40. IMPLEMENTING THE DATA WAREHOUSE Kozar list of “seven deadly sins” of data warehouse implementation: 1. “If you build it, they will come” – the DW needs to be designed to meet people’s needs 2. Omission of an architectural framework – you need to consider the number of users, volume of data, update cycle, etc. 3. Underestimating the importance of documenting assumptions – the assumptions and potential conflicts must be included in the framework
  • 41. “SEVEN DEADLY SINS”, CONTINUED 4. Failure to use the right tool – a DW project needs different tools than those used to develop an application 5. Life cycle abuse – in a DW, the life cycle really never ends 6. Ignorance about data conflicts – resolving these takes a lot more effort than most people realize 7. Failure to learn from mistakes – since one DW project tends to beget another, learning from the early mistakes will yield higher quality later
  • 42. DATA WAREHOUSE TECHNOLOGIES No one currently offers an end-to-end DW solution. Organizations buy bits and pieces from a number of vendors and hopefully make them work together. SAS, IBM, Software AG, Information Builders and Platinum offer solutions that are at least fairly comprehensive. The market is very competitive. Table 10-6 in the text lists 90 firms that produce DW products.
  • 43. THE FUTURE OF DATA WAREHOUSING As the DW becomes a standard part of an organization, there will be efforts to find new ways to use the data. This will likely bring with it several new challenges:  Regulatory constraints may limit the ability to combine sources of disparate data.  These disparate sources are likely to contain unstructured data, which is hard to store.  The Internet makes it possible to access data from virtually “anywhere”. Of course, this just increases the disparity.
  • 44. Interesting Facts Data Can be Used To Robust Infrastructure Success of Data Warehouse Projects Implementing Data Warehouse Real Time Alerts & Integration Identity Theft What Can You Do? OBJECTIVE
  • 45. INTERESTING FACTS Harrah’s Entertainment’s Data Warehouse holds 30 terabytes, or 30 trillion bytes of data, roughly three times the number of printed characters in the Library of Congress Casinos, retailers, airlines, and banks are piling up data so vast, it would have been unthinkable years ago; result from the curse of cheap storage
  • 46. INTERESTING FACTS Storage Shipments as of 2004: 22 exabytes or 22 million trillion bytes of hard disk space, double the amount in 2002. Equivalent to 4x’s the space needed to store every word ever spoken by every human being who has ever lived. Should double again in 2006
  • 47. DATA CAN BE USED TO Quantify the volume impact of vehicles across the marketing matrix Account for decay and saturation factors in the determination of investment choices and returns Execute “what-if” simulations of pricing or promotional scenarios before a proposed action is taken Provide a continuous planning, measurement, analysis and optimization cycle supported by a software structure Deliver robust data feeds into other systems supporting supply chain, sales, and financial reporting and endeavors
  • 48. ROBUST INFRASTRUCTURE Data Identification and Acquisition Data Cleansing, Mapping, and Transformation Production System Loading and Ongoing Update
  • 49. SUCCESS OF DATA WAREHOUSE PROJECTSOver half of Data Warehouse projects are Doomed  Fail due to lack of attention to Data Quality Issues  More than half only have limited acceptance  Consistency and Accuracy of Data  Most businesses fail to use business intelligence (BI) strategically  IT organizations build data warehouses with little to no business involvement
  • 51. SUCCESS OF DATA WAREHOUSE PROJECTS Most challenging type of deployment for an enterprise  Large scale and complex system configurations  Sophisticated data modeling and analysis tools  High visibility in broad range of important business functions within company  Adoption of Linux-Based Platform
  • 52. IMPLEMENTING DATA WAREHOUSE Challenges:  Identifying new processes  Assuring there were of real use  Implementing and ensuring cultural shifts  Managing content and New communities towards a common benefit  Linear models  Standards, Governance, Controls, Valuation
  • 53. TERADATA Division of NCR in Dayton, Ohio Competitor of IBM and Oracle Multi-million Dollar Machines to run the world’s biggest data warehouses  Wal-Mart  Bank of America  Verizon Wireless
  • 54. TERADATA’S SUCCESS Conventional IBM or Sun Microsystems overload for a couple hours to days on a few terabytes and/or data queries IBM cannot return computation on certain complex requests Equivalent to having data but not able to use it.
  • 55. REAL TIME ALERTS & INTEGRATION Teradata 8.0 Version released in Oct 2004  Improves real-time alerts and integration Businesses can analyze operational info against historical info to identify events in near real-time using the new table design Used by:  Continental Airlines in the US: reroute passengers on delayed flights, reissuing tickets, reserving a room in a hotel booking system  Southwest Airlines- savings between $1.2-$1.4 Million
  • 56. IDENTITY THEFT Government Regulation of Personal Data is Needed (National Consumer Protection Standards) ChoicePoint Folly  Georgia-based data-collection company  Founded in 1997 to analyze insurance claims information, but now provides data to customers including finance companies, law enforcement, and government  Obtain personal information by perusing public records, or purchasing the information from other companies
  • 57. IDENTITY THEFT Duped by scammers who set 150 phony accounts to access personal data of as many as 145,000 people nationwide Scammers set user accounts by faxing in phony business licenses, undetected for one year 750 people had their identities stolen Theft would have gone unnoticed without California Identity theft law SB 1386
  • 58. IDENTITY THEFT MSN Event Data Warehouse Information Gathering Over the Phone Interviews Trash Can Hunting Gathered from Doctors, Internet Transactions, Telephone Operators (Overseas or Prisoners)
  • 60. WHAT CAN YOU DO? Carefully monitor your credit card bills and credit reports Request a once a year free access credit report via the three big credit agencies.  Equifax, Experian, TransUnion Victims: contact Federal Trade Commission to report the theft and monitor credit reports.  1-800-IDTHEFT
  • 61. REFERENCES Decision Support Systems in the 21st Century 2nd Edition, by George M. Marakas, Prentice Hall, Upper Saddle River, NJ, 2003 http://seattletimes.nwsource.com/html/editorialsopinion/2002191098_cr Seattle times, plugging holes in data warehousing Teradata warehouse improves real-time alerts and integration Cliff Saran. Computer Weekly. Sutton: Oct 12, 2004. p. 22 (1 page) ON THE MARK Mark Hall. Computerworld. Framingham: Oct 18, 2004. Vol. 38, Iss. 42; p. 6 (1 page) Optimization: It's All About the Data Brandweek: Ellen Pederson, Mark Anderson THE NO-SACRIFICE, AFFORDABLE DATA WAREHOUSE APP Intelligent Enterprises, Michael Gonzalez
  • 62. REFERENCES http://www.dmreview.com/article_sub.cfm?articleId=7071 Convergence- Beyond the Data Warehouse http://www.computerworld.com/printthis/2001/0,4814,56969,00. html Micro-segmentation – Computerworld Too Much Information Forbes article on data warehouse http://reviews.cnet.com/4520-3513_7-5690533-1.html When identity thieves strike data warehouses Over half of data warehouse projects doomed VNU Business Publications Limited, Robert Jaques 25 February 2005 http://www.linuxworld.com/magazine/?issueid=571 Linux World Article

Hinweis der Redaktion

  1. To maintain a robust infrastructure, you should follow these guidelines. Which data sources reflect the dynamics of the contemporary and historical market place. Validity Checks, Above loaded into the integrated data repository enabling a continuous cycle of near-time data availability
  2. These challenges are especially predominant in dot-com companies. The key word of Implementing Data Warehouse is Convergence! The inherent Linear relationship was either business has driven requirements for IT or IT has been an enabler of business. This was a practical need of separation. Structural changes in business as a result of endemic information technology (information power on all desktops and many businesspersons acting as their own information centers, automated information is a basic lubricant of all business processes.) Convergence means that info is no longer a separate commodity. For the Final Note: Information, as any other business commodity, deserves the same treatment as any other core component of business
  3. IBM and Oracle is designed more to handle online transaction processing
  4. Southwest is a low-cost infrastructure