SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Welcome To Loginworks Softwares
A Quick Guide to Datawarehouse
What is Data Warehouse?
A Data Warehouse is a collective bunch
of data of corporate world and these
retrieve data from the external source
and other operational systems. It
handles the enterprise’s daily
transaction operations which helps in
making the business decision due to
reporting and analysis tools at different
levels. Furthermore, data is inserted into
the warehouse database through ETL
process (data extraction, data
transformation, and data loading ) by
using OLTP applications like external
providers and mainframe applications
Data Warehouse
The whole concept of DW was
established in the 1980s by IBM
employees “Paul Murphy and Barry
Devlin”. They formally enhanced
“Business Data Warehouse”. Therefore,
this concept is specially formed to
provide the data flow from external
operational programs to make the better
decision for organizations perspective.
Basically, Data warehouse provides
consolidated data, transactional data,
and historical data from different
sources. Data warehouse segregates
the workload analysis and enhances the
organization data from other external
Why the need of Data Warehouse?
A data warehouse is a domain system of OLTP (Online transaction processing). So, data
can be stored in the data warehouse through data warehouse appliance for months and
years to make the historical analysis. A warehouse appliance is a mixture of software and
hardware tools. Therefore, these tools store the data in the database of a data warehouse.
Basically, the storage capacity of data warehouse comes around in petabyte and terabyte
ranges.
So, let’s understand the practical meaning of data warehousing with an example. So, let’s
assume a big organization which stores data (information) related to its employees, their
date of joining, date of leaving, their salaries, product estimation, client and customer
information, invoices and sales etc. Furthermore, the head of an organization may want any
information related to the measurement of cost-reduction. As a result, the answer will rely on
historical analysis of the data. This is the main advantage of data warehousing.
Business Analysis Framework
This framework tells the actual working of a data warehouse. A (Business Analyst) BA
retrieves the valuable information from the DW (Data Warehouse). Further, it identifies the
performance and makes crucial decision to win with other market stakeholders. For this
reason, a data warehouse increases the organization’s productivity. It also provides the
Properties: Data warehouse
Integrated: It consolidates the data from
external sources like relational DBMS,
extended DBMS, and Flat files etc. It also
provides the effective data analysis in the
data warehouse.
Subject Oriented: This is the main feature of
the data warehouse. Data warehousing is
the subject-oriented system because that
provides the data of a subject instead of
company’s ongoing operations. These
subject can be the product, branch
information, sales, and revenue etc. A
simple data warehousing does not require
any ongoing operations. It also provides
data analysis and modeling for decision
Non-Volatile: In this situation, old data is not
deleted while the new record is added to the
database. Old data is merged with new data by
using SQL JOINS query. A data warehouse is
apart from the external operational sources that do
not reflect the in the database of the data
warehouse.
Time-Variant: Extracted or collected data is
analyzed for a particular period of time. Here, data
provides in historical form.
ETL Process in Data Warehouse
ETL is a combination structure of
extraction of data, the transformation of
data and loading of data. These ETL
process help in preparation of data in
the database management. It also
extracts data from external sources,
then transforms the data by using
concatenation, data aggregation, and
calculations rules. Later, it loads the
information in the data warehouse. As a
result, it simply analyzes the business
data to make the crucial business
decision. So, let us understand the ETL
process flow in brief as follows.
Extraction
Extract process reads the data from unknown external database and extracts in the desired
form to make the data easily accessible for business objectives. On another hand, the major
point to extract data is to fetch the relevant information from the system with minimum
resources. As a result, this step does not affect the data.
Transformation
This function work like filtration and data cleaning. This step is applied to some important
lookup rules and formulas on extracted data as per the business requirements. Then, it
converts the data into the desired table or a database. Therefore, this phase of ETL
contains verification, validation, and rejection of data. First of all, it analyzes the extracted
data through verification that defines the extracted data is in the correct format or not.
Secondly, It validates the extracted data means make sure that data is extracted correctly
what we required for. Here, you can use SQL during the validation process it will help you to
find the correct data. You can even find the duplicate data, sorting of data, data conversion,
and consistency of the data in the database.
After validation process, the query result will give you a final data. Now, apply set of rules to
get the duplicate and unwanted data. As a result, it automatically will reject the data. This is
Loading
This is the final stage of ETL process. This phase
defines the resultant of data extraction and
transformation. After two steps of EPL data then
loads into the database of the data warehouse.
You can also take help of SELECT command while
data loading into the database. As a result, once
you load the data then refresh the database.
Data Warehouse Architecture Model
(Three-Tier Architecture)
A data warehouse is an RDBMS
(Relational Database
management System), that is well
developed for query analysis. A
data warehouse maintains huge
data in the database to make the
changes in historical analysis. In
this session, I will discuss the
three-tier architecture model of
the data warehouse. In
conclusion, this three-tier
architecture contains three layers
Bottom-Tier Layer
This is the bottom tier layer contain all of the database servers in the data warehouse which
relates to the relational database system. Data is extracted from external and operational
databases due to some backends tool. Then, feed data into the database of the warehouse
by performing ETL process (Extract, Transform, and Loading). Basically, we use ETL
process to load data into the bottom layer of the data warehouse architecture model. In
addition, the bottom layer contains several entities like as
Data warehouse
Data Marts
Metadata Repository
Monitoring & Administration
1. Data Warehouse
A data warehouse is a sophisticated form of operational database that contains only
required information and allow to get fast access to the information.
2. Data Marts
Data Mart is an optimized version of the data warehouse which relies on a particular
subjected area. A data mart maintains access layer to collect the relevant information from
the organizational data store. Therefore, data marts only focus on specific business
architecture to build the warehouse in a large size that takes less time and generate
organization revenue. Basically, it divides the data into small portion like sales information,
customers information, and items information etc. In addition, some key points need to
remember of data mart as follows.
Data Marts are Windows, Unix and Linux based servers that create data marts. It also
contains low cost to create data marts.
Implementation of data marts cycles cannot be stored for a long time means it only valid for
weeks not for months and years.
Data marts are much smaller in size but more flexible. Data marts can customize the data
by their departments.
Because of the user understanding, data mart requires less implementation as compared to
the data warehouse. As a result, data mart can increase business process.
Types: Data Mart
• Data marts contain its three type as follows
• Dependent data mart directly sourced from
the database of a data warehouse.
• Independent data mart sourced with more
than one data warehouse.
• Hybrid data mart converts the data from data
warehouse to operational database.
So, let’s understand the data marts types in brief.
Dependent Data Mart
A dependent data mart access data source
directly from the single data warehouse
database. This type of data mart depends on
only enterprise data warehouse or centralized
data warehouse. You can access only one
database at a time. But, if you want more than
one then you have to individually configure them
is called dependent data mart.
Independent Data Mart
This type of data mart works without
the use of any centralized system or
data enterprise warehouse.
Independent data mart divides the
data into small portion within an
organization. Therefore, independent
data mart does not contain any
relation value with data enterprise
warehouse and other operational
data marts. Hence, it works
independently from the operational
system. For this reason, you need to
develop new centralized store where
an end-user can analyze the data
Hybrid Data Mart
A hybrid data mart is a mixture of
multiple inputs from the external
sources in the data warehouse.
Basically, it works on the ad-hoc
environment. It is designed to work
on more than a database and also
provide the fastest implementation
for any big organization. It requires
less cleaning process and supports
big database. Hybrid data mart
contains more flexibility than
dependent and independent data
marts for small data applications.
Implementation process of Data Mart
Data mart contains five implementation process as follows.
Designing
Constructing
Populating
Accessing
Managing
Designing
Designing is the 1st phase of implementation in the data mart, that covers all actions between the
initial request of data mart to collective data through the basic data mart requirements. It also
discovers both physical & logical model of the data mart. In addition, this phase contains some tasks
as follows.
Collect the required technical and business requirement.
It verifies the source of data and selects the proper subset of data.
It helps in making the physical and logical architecture of the data mart.
Constructing
Constructing is the 2nd phase of datamart, it creates the logical and physical architecture of the data
mart that provides fast access to data. In addition, this phase contains some tasks as follows.
An RDBMS stores the constructed data which contains the addition of the data & deletion of the data.
By using SQL query, you can perform several functions like indexes, views, and tables.
Constructed data remains secure by using SQL security libraries.
Populating
This is the third phase of data mart process, that covers all the steps which are used in above two
phase of data mart like getting the data from the external source, cleaning of the data and modifying of
the data. In addition, this phase contains some tasks as follows.
Select the target sources for getting the data.
Extraction of data from the sources.
Data cleaning and data transformation.
Loading and storing the data (metadata) into the database.
Accessing
This is the 4th phase of data mart implementation, this phase makes the data usable for data analysis,
querying the data, creating the chart, graphs, and reports to publish them into the external market for
business objectives. Furthermore, an end-user submits the set of queries and display the final result of
the queries.
Managing
This is the fifth last phase of data mart implementation, that covers all of the tasks which relate to
3. Metadata Repository
Metadata is another form of simple data. A simple data which represent different data format
call as metadata. Metadata segregate the especially relevant information of the data. It also
provides the detailed information of the data. So, let’s assume an index of the notebook
consumes as a metadata for the internal content of the notebook. It also defines the
warehouse objects and roadmap in the data warehouse. Now, understand the metadata
repository, which is an internal part of data warehouse system. As a result, it contains some
metadata categories as follows.
Business Metadata: This type of metadata covers business information like business goals,
owner information, changing policies information and business definition etc.
Operational Metadata: It contains lineage of data and data currency. Data currency defines
the active mode of data, purged and archived. Data lineage refers to the data transformation
and migration rules.
Mapping of data from operating system to Data warehouse: This type of metadata contains
sources database, the content of the data sources, extraction of data, data partition,
cleaning of data, the transformation of data, purging of data and refreshing of data rules etc.
Summarization Algorithms: It contains data aggregation, dimension algorithms, data
4. Monitoring & Administration
This element is basically used for data refreshment and disaster recovery. It set the
limitation in size of data warehouse and provide data in synchronization. Further, it controls
the ranges of query and numbers format. It helps to increase performance better of the
database.
Middle-Tier Layer
In the middle-tier, there is an OLAP server that works as a single processing combination of
ROLAP and MOLAP database systems. This OLAP server provides the multidimensional
way to get the data model. For this reason, analysts and managers can provide consistent
and fastest way to get the information.
ROLAP stands for Relational On-Line Analytical Process that manipulate the stored data in
the relational database. ROLAP model can easily maintain the large data. MOLAP stands
for Multidimensional OLAP system that maintains dynamic multidimensional operations and
the data. Data can be stored in the multidimensional cube. It provides fastest data retrieval
due to MOLAP cubes. The whole structure of middle layer is to represent the abstracted
view of the data warehouse. As a result, this layer contains common behavior between
database and end-user.
Components of OLAP Server
There are three components in OLAP system as follows.
Relational OLAP
Multidimensional OLAP
Hybrid OLAP
Relational OLAP
Basically, ROLAP works on the stored data in the relational database. ROLAP server defines the
relationship between front-end(client) tools and relational back-end tools. ROLAP server provides the
way to manage and store the data by using relational and extended DBMS. Therefore, this server can
easily handle a large number of data. But, it contains a drawback that it has some limited SQL
functionalities to maintain the database and aggregate tables.
Multidimensional OLAP
Multidimensional OLAP system handles the multidimensional data through array-based storage
systems. It also provides an optimal solution for dice and slice operations. MOLAP perform the fastest
access of the data in comparison with relational OLAP. It can also handle complex calculations. But, it
contains some drawbacks like data aggregation cannot perform while changing the dimension of
cubes. MOLAP maintains a small number of data.
Hybrid OLAP
Hybrid is a complete mixture of MOLAP & ROLAP servers. A hybrid
server provides the fastest accessibility than MOLAP & ROLAP
servers. It contains better scalability and computation. It also
maintains a big number of data (More than a terabyte).
Note: OLAP contains additional type as “Specialized SQL servers”,
that allows to execute advanced functions of SQL query and provide
better processing support than Snowflake and Star schemas.
So, let’s look at the difference between OLAP and OLTP. This is the
most important part in data warehouse, which could be asked in the
interviews for data warehousing profile.
Top-Tier Layer
Top-Tier is a front-end client-side layer in a three-tier model of the data warehouse. This
contains reporting & query tools, data mining & data analysis tools. Here, reporting tools
contain report writers & productive reporting tools. Analysis tools are also used to make
charts based on data mining result. Data mining tools define the especially relevant
information from the hidden patterns. Now, we have completed the three-architecture model
of the data warehouse.
Conclusion
I hope you guys enjoyed this tour of the data warehouse. A simple data warehouse is a hub where the
especially relevant information resides in the database. There are many databases for storing the
value but I explained the relational and extended database management system. Because they both
contain maximum data storage than other databases. This is a complete tour since definition to its
conclusion. In addition, this article will help to learn what exactly data warehouse is? And why is this
so important?
I also explained the whole concept of the data warehouse from the scratch which will help to learn for
freshers in their interviews and the exams. In the data warehouse architecture, metadata plays a vital
role as it species the values, the point of connection, source, features and usage of the data
warehouse. In addition, last but not the least the main topic of this article is the difference between
OLAP and OLTP. So, most of the companies who hire a candidate or employee for data warehousing
profile they must ask these differences.
Thanks For Watching
Connect With Source Url :- https://bit.ly/2OvTimI
Call Us:- +1-434-608-0184

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 

Empfohlen

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Empfohlen (20)

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 

A Quick Guide to Datawarehouse

  • 2. A Quick Guide to Datawarehouse What is Data Warehouse? A Data Warehouse is a collective bunch of data of corporate world and these retrieve data from the external source and other operational systems. It handles the enterprise’s daily transaction operations which helps in making the business decision due to reporting and analysis tools at different levels. Furthermore, data is inserted into the warehouse database through ETL process (data extraction, data transformation, and data loading ) by using OLTP applications like external providers and mainframe applications
  • 3. Data Warehouse The whole concept of DW was established in the 1980s by IBM employees “Paul Murphy and Barry Devlin”. They formally enhanced “Business Data Warehouse”. Therefore, this concept is specially formed to provide the data flow from external operational programs to make the better decision for organizations perspective. Basically, Data warehouse provides consolidated data, transactional data, and historical data from different sources. Data warehouse segregates the workload analysis and enhances the organization data from other external
  • 4. Why the need of Data Warehouse? A data warehouse is a domain system of OLTP (Online transaction processing). So, data can be stored in the data warehouse through data warehouse appliance for months and years to make the historical analysis. A warehouse appliance is a mixture of software and hardware tools. Therefore, these tools store the data in the database of a data warehouse. Basically, the storage capacity of data warehouse comes around in petabyte and terabyte ranges. So, let’s understand the practical meaning of data warehousing with an example. So, let’s assume a big organization which stores data (information) related to its employees, their date of joining, date of leaving, their salaries, product estimation, client and customer information, invoices and sales etc. Furthermore, the head of an organization may want any information related to the measurement of cost-reduction. As a result, the answer will rely on historical analysis of the data. This is the main advantage of data warehousing. Business Analysis Framework This framework tells the actual working of a data warehouse. A (Business Analyst) BA retrieves the valuable information from the DW (Data Warehouse). Further, it identifies the performance and makes crucial decision to win with other market stakeholders. For this reason, a data warehouse increases the organization’s productivity. It also provides the
  • 5. Properties: Data warehouse Integrated: It consolidates the data from external sources like relational DBMS, extended DBMS, and Flat files etc. It also provides the effective data analysis in the data warehouse. Subject Oriented: This is the main feature of the data warehouse. Data warehousing is the subject-oriented system because that provides the data of a subject instead of company’s ongoing operations. These subject can be the product, branch information, sales, and revenue etc. A simple data warehousing does not require any ongoing operations. It also provides data analysis and modeling for decision
  • 6. Non-Volatile: In this situation, old data is not deleted while the new record is added to the database. Old data is merged with new data by using SQL JOINS query. A data warehouse is apart from the external operational sources that do not reflect the in the database of the data warehouse. Time-Variant: Extracted or collected data is analyzed for a particular period of time. Here, data provides in historical form.
  • 7. ETL Process in Data Warehouse ETL is a combination structure of extraction of data, the transformation of data and loading of data. These ETL process help in preparation of data in the database management. It also extracts data from external sources, then transforms the data by using concatenation, data aggregation, and calculations rules. Later, it loads the information in the data warehouse. As a result, it simply analyzes the business data to make the crucial business decision. So, let us understand the ETL process flow in brief as follows.
  • 8. Extraction Extract process reads the data from unknown external database and extracts in the desired form to make the data easily accessible for business objectives. On another hand, the major point to extract data is to fetch the relevant information from the system with minimum resources. As a result, this step does not affect the data. Transformation This function work like filtration and data cleaning. This step is applied to some important lookup rules and formulas on extracted data as per the business requirements. Then, it converts the data into the desired table or a database. Therefore, this phase of ETL contains verification, validation, and rejection of data. First of all, it analyzes the extracted data through verification that defines the extracted data is in the correct format or not. Secondly, It validates the extracted data means make sure that data is extracted correctly what we required for. Here, you can use SQL during the validation process it will help you to find the correct data. You can even find the duplicate data, sorting of data, data conversion, and consistency of the data in the database. After validation process, the query result will give you a final data. Now, apply set of rules to get the duplicate and unwanted data. As a result, it automatically will reject the data. This is
  • 9. Loading This is the final stage of ETL process. This phase defines the resultant of data extraction and transformation. After two steps of EPL data then loads into the database of the data warehouse. You can also take help of SELECT command while data loading into the database. As a result, once you load the data then refresh the database.
  • 10. Data Warehouse Architecture Model (Three-Tier Architecture) A data warehouse is an RDBMS (Relational Database management System), that is well developed for query analysis. A data warehouse maintains huge data in the database to make the changes in historical analysis. In this session, I will discuss the three-tier architecture model of the data warehouse. In conclusion, this three-tier architecture contains three layers
  • 11. Bottom-Tier Layer This is the bottom tier layer contain all of the database servers in the data warehouse which relates to the relational database system. Data is extracted from external and operational databases due to some backends tool. Then, feed data into the database of the warehouse by performing ETL process (Extract, Transform, and Loading). Basically, we use ETL process to load data into the bottom layer of the data warehouse architecture model. In addition, the bottom layer contains several entities like as Data warehouse Data Marts Metadata Repository Monitoring & Administration 1. Data Warehouse A data warehouse is a sophisticated form of operational database that contains only required information and allow to get fast access to the information.
  • 12. 2. Data Marts Data Mart is an optimized version of the data warehouse which relies on a particular subjected area. A data mart maintains access layer to collect the relevant information from the organizational data store. Therefore, data marts only focus on specific business architecture to build the warehouse in a large size that takes less time and generate organization revenue. Basically, it divides the data into small portion like sales information, customers information, and items information etc. In addition, some key points need to remember of data mart as follows. Data Marts are Windows, Unix and Linux based servers that create data marts. It also contains low cost to create data marts. Implementation of data marts cycles cannot be stored for a long time means it only valid for weeks not for months and years. Data marts are much smaller in size but more flexible. Data marts can customize the data by their departments. Because of the user understanding, data mart requires less implementation as compared to the data warehouse. As a result, data mart can increase business process.
  • 13. Types: Data Mart • Data marts contain its three type as follows • Dependent data mart directly sourced from the database of a data warehouse. • Independent data mart sourced with more than one data warehouse. • Hybrid data mart converts the data from data warehouse to operational database. So, let’s understand the data marts types in brief. Dependent Data Mart A dependent data mart access data source directly from the single data warehouse database. This type of data mart depends on only enterprise data warehouse or centralized data warehouse. You can access only one database at a time. But, if you want more than one then you have to individually configure them is called dependent data mart.
  • 14. Independent Data Mart This type of data mart works without the use of any centralized system or data enterprise warehouse. Independent data mart divides the data into small portion within an organization. Therefore, independent data mart does not contain any relation value with data enterprise warehouse and other operational data marts. Hence, it works independently from the operational system. For this reason, you need to develop new centralized store where an end-user can analyze the data
  • 15. Hybrid Data Mart A hybrid data mart is a mixture of multiple inputs from the external sources in the data warehouse. Basically, it works on the ad-hoc environment. It is designed to work on more than a database and also provide the fastest implementation for any big organization. It requires less cleaning process and supports big database. Hybrid data mart contains more flexibility than dependent and independent data marts for small data applications.
  • 16. Implementation process of Data Mart Data mart contains five implementation process as follows. Designing Constructing Populating Accessing Managing
  • 17. Designing Designing is the 1st phase of implementation in the data mart, that covers all actions between the initial request of data mart to collective data through the basic data mart requirements. It also discovers both physical & logical model of the data mart. In addition, this phase contains some tasks as follows. Collect the required technical and business requirement. It verifies the source of data and selects the proper subset of data. It helps in making the physical and logical architecture of the data mart. Constructing Constructing is the 2nd phase of datamart, it creates the logical and physical architecture of the data mart that provides fast access to data. In addition, this phase contains some tasks as follows. An RDBMS stores the constructed data which contains the addition of the data & deletion of the data. By using SQL query, you can perform several functions like indexes, views, and tables. Constructed data remains secure by using SQL security libraries.
  • 18. Populating This is the third phase of data mart process, that covers all the steps which are used in above two phase of data mart like getting the data from the external source, cleaning of the data and modifying of the data. In addition, this phase contains some tasks as follows. Select the target sources for getting the data. Extraction of data from the sources. Data cleaning and data transformation. Loading and storing the data (metadata) into the database. Accessing This is the 4th phase of data mart implementation, this phase makes the data usable for data analysis, querying the data, creating the chart, graphs, and reports to publish them into the external market for business objectives. Furthermore, an end-user submits the set of queries and display the final result of the queries. Managing This is the fifth last phase of data mart implementation, that covers all of the tasks which relate to
  • 19. 3. Metadata Repository Metadata is another form of simple data. A simple data which represent different data format call as metadata. Metadata segregate the especially relevant information of the data. It also provides the detailed information of the data. So, let’s assume an index of the notebook consumes as a metadata for the internal content of the notebook. It also defines the warehouse objects and roadmap in the data warehouse. Now, understand the metadata repository, which is an internal part of data warehouse system. As a result, it contains some metadata categories as follows. Business Metadata: This type of metadata covers business information like business goals, owner information, changing policies information and business definition etc. Operational Metadata: It contains lineage of data and data currency. Data currency defines the active mode of data, purged and archived. Data lineage refers to the data transformation and migration rules. Mapping of data from operating system to Data warehouse: This type of metadata contains sources database, the content of the data sources, extraction of data, data partition, cleaning of data, the transformation of data, purging of data and refreshing of data rules etc. Summarization Algorithms: It contains data aggregation, dimension algorithms, data
  • 20. 4. Monitoring & Administration This element is basically used for data refreshment and disaster recovery. It set the limitation in size of data warehouse and provide data in synchronization. Further, it controls the ranges of query and numbers format. It helps to increase performance better of the database. Middle-Tier Layer In the middle-tier, there is an OLAP server that works as a single processing combination of ROLAP and MOLAP database systems. This OLAP server provides the multidimensional way to get the data model. For this reason, analysts and managers can provide consistent and fastest way to get the information. ROLAP stands for Relational On-Line Analytical Process that manipulate the stored data in the relational database. ROLAP model can easily maintain the large data. MOLAP stands for Multidimensional OLAP system that maintains dynamic multidimensional operations and the data. Data can be stored in the multidimensional cube. It provides fastest data retrieval due to MOLAP cubes. The whole structure of middle layer is to represent the abstracted view of the data warehouse. As a result, this layer contains common behavior between database and end-user.
  • 21. Components of OLAP Server There are three components in OLAP system as follows. Relational OLAP Multidimensional OLAP Hybrid OLAP Relational OLAP Basically, ROLAP works on the stored data in the relational database. ROLAP server defines the relationship between front-end(client) tools and relational back-end tools. ROLAP server provides the way to manage and store the data by using relational and extended DBMS. Therefore, this server can easily handle a large number of data. But, it contains a drawback that it has some limited SQL functionalities to maintain the database and aggregate tables. Multidimensional OLAP Multidimensional OLAP system handles the multidimensional data through array-based storage systems. It also provides an optimal solution for dice and slice operations. MOLAP perform the fastest access of the data in comparison with relational OLAP. It can also handle complex calculations. But, it contains some drawbacks like data aggregation cannot perform while changing the dimension of cubes. MOLAP maintains a small number of data.
  • 22. Hybrid OLAP Hybrid is a complete mixture of MOLAP & ROLAP servers. A hybrid server provides the fastest accessibility than MOLAP & ROLAP servers. It contains better scalability and computation. It also maintains a big number of data (More than a terabyte). Note: OLAP contains additional type as “Specialized SQL servers”, that allows to execute advanced functions of SQL query and provide better processing support than Snowflake and Star schemas. So, let’s look at the difference between OLAP and OLTP. This is the most important part in data warehouse, which could be asked in the interviews for data warehousing profile.
  • 23. Top-Tier Layer Top-Tier is a front-end client-side layer in a three-tier model of the data warehouse. This contains reporting & query tools, data mining & data analysis tools. Here, reporting tools contain report writers & productive reporting tools. Analysis tools are also used to make charts based on data mining result. Data mining tools define the especially relevant information from the hidden patterns. Now, we have completed the three-architecture model of the data warehouse. Conclusion I hope you guys enjoyed this tour of the data warehouse. A simple data warehouse is a hub where the especially relevant information resides in the database. There are many databases for storing the value but I explained the relational and extended database management system. Because they both contain maximum data storage than other databases. This is a complete tour since definition to its conclusion. In addition, this article will help to learn what exactly data warehouse is? And why is this so important? I also explained the whole concept of the data warehouse from the scratch which will help to learn for freshers in their interviews and the exams. In the data warehouse architecture, metadata plays a vital role as it species the values, the point of connection, source, features and usage of the data warehouse. In addition, last but not the least the main topic of this article is the difference between OLAP and OLTP. So, most of the companies who hire a candidate or employee for data warehousing profile they must ask these differences.
  • 24. Thanks For Watching Connect With Source Url :- https://bit.ly/2OvTimI Call Us:- +1-434-608-0184