We have a team of data analytics services. Loginworks Softwares is provid the Big Data Solutions and Data Analytics Services using the complete data analysis methods. Loginworks is the best Data Warehouse Services provides multiple data warehouse offerings including on premises on cloud and as an integrated appliance. We are All three platforms offer a common SQL engine to streamline queries and flexible licensing for your analytical workloads. This high performance and highly flexible data analytics foundation enable timely insights to support data-driven decision making.
https://www.loginworks.com/data-analytics/
2. A Quick Guide to Datawarehouse
What is Data Warehouse?
A Data Warehouse is a collective bunch
of data of corporate world and these
retrieve data from the external source
and other operational systems. It
handles the enterprise’s daily
transaction operations which helps in
making the business decision due to
reporting and analysis tools at different
levels. Furthermore, data is inserted into
the warehouse database through ETL
process (data extraction, data
transformation, and data loading ) by
using OLTP applications like external
providers and mainframe applications
3. Data Warehouse
The whole concept of DW was
established in the 1980s by IBM
employees “Paul Murphy and Barry
Devlin”. They formally enhanced
“Business Data Warehouse”. Therefore,
this concept is specially formed to
provide the data flow from external
operational programs to make the better
decision for organizations perspective.
Basically, Data warehouse provides
consolidated data, transactional data,
and historical data from different
sources. Data warehouse segregates
the workload analysis and enhances the
organization data from other external
4. Why the need of Data Warehouse?
A data warehouse is a domain system of OLTP (Online transaction processing). So, data
can be stored in the data warehouse through data warehouse appliance for months and
years to make the historical analysis. A warehouse appliance is a mixture of software and
hardware tools. Therefore, these tools store the data in the database of a data warehouse.
Basically, the storage capacity of data warehouse comes around in petabyte and terabyte
ranges.
So, let’s understand the practical meaning of data warehousing with an example. So, let’s
assume a big organization which stores data (information) related to its employees, their
date of joining, date of leaving, their salaries, product estimation, client and customer
information, invoices and sales etc. Furthermore, the head of an organization may want any
information related to the measurement of cost-reduction. As a result, the answer will rely on
historical analysis of the data. This is the main advantage of data warehousing.
Business Analysis Framework
This framework tells the actual working of a data warehouse. A (Business Analyst) BA
retrieves the valuable information from the DW (Data Warehouse). Further, it identifies the
performance and makes crucial decision to win with other market stakeholders. For this
reason, a data warehouse increases the organization’s productivity. It also provides the
5. Properties: Data warehouse
Integrated: It consolidates the data from
external sources like relational DBMS,
extended DBMS, and Flat files etc. It also
provides the effective data analysis in the
data warehouse.
Subject Oriented: This is the main feature of
the data warehouse. Data warehousing is
the subject-oriented system because that
provides the data of a subject instead of
company’s ongoing operations. These
subject can be the product, branch
information, sales, and revenue etc. A
simple data warehousing does not require
any ongoing operations. It also provides
data analysis and modeling for decision
6. Non-Volatile: In this situation, old data is not
deleted while the new record is added to the
database. Old data is merged with new data by
using SQL JOINS query. A data warehouse is
apart from the external operational sources that do
not reflect the in the database of the data
warehouse.
Time-Variant: Extracted or collected data is
analyzed for a particular period of time. Here, data
provides in historical form.
7. ETL Process in Data Warehouse
ETL is a combination structure of
extraction of data, the transformation of
data and loading of data. These ETL
process help in preparation of data in
the database management. It also
extracts data from external sources,
then transforms the data by using
concatenation, data aggregation, and
calculations rules. Later, it loads the
information in the data warehouse. As a
result, it simply analyzes the business
data to make the crucial business
decision. So, let us understand the ETL
process flow in brief as follows.
8. Extraction
Extract process reads the data from unknown external database and extracts in the desired
form to make the data easily accessible for business objectives. On another hand, the major
point to extract data is to fetch the relevant information from the system with minimum
resources. As a result, this step does not affect the data.
Transformation
This function work like filtration and data cleaning. This step is applied to some important
lookup rules and formulas on extracted data as per the business requirements. Then, it
converts the data into the desired table or a database. Therefore, this phase of ETL
contains verification, validation, and rejection of data. First of all, it analyzes the extracted
data through verification that defines the extracted data is in the correct format or not.
Secondly, It validates the extracted data means make sure that data is extracted correctly
what we required for. Here, you can use SQL during the validation process it will help you to
find the correct data. You can even find the duplicate data, sorting of data, data conversion,
and consistency of the data in the database.
After validation process, the query result will give you a final data. Now, apply set of rules to
get the duplicate and unwanted data. As a result, it automatically will reject the data. This is
9. Loading
This is the final stage of ETL process. This phase
defines the resultant of data extraction and
transformation. After two steps of EPL data then
loads into the database of the data warehouse.
You can also take help of SELECT command while
data loading into the database. As a result, once
you load the data then refresh the database.
10. Data Warehouse Architecture Model
(Three-Tier Architecture)
A data warehouse is an RDBMS
(Relational Database
management System), that is well
developed for query analysis. A
data warehouse maintains huge
data in the database to make the
changes in historical analysis. In
this session, I will discuss the
three-tier architecture model of
the data warehouse. In
conclusion, this three-tier
architecture contains three layers
11. Bottom-Tier Layer
This is the bottom tier layer contain all of the database servers in the data warehouse which
relates to the relational database system. Data is extracted from external and operational
databases due to some backends tool. Then, feed data into the database of the warehouse
by performing ETL process (Extract, Transform, and Loading). Basically, we use ETL
process to load data into the bottom layer of the data warehouse architecture model. In
addition, the bottom layer contains several entities like as
Data warehouse
Data Marts
Metadata Repository
Monitoring & Administration
1. Data Warehouse
A data warehouse is a sophisticated form of operational database that contains only
required information and allow to get fast access to the information.
12. 2. Data Marts
Data Mart is an optimized version of the data warehouse which relies on a particular
subjected area. A data mart maintains access layer to collect the relevant information from
the organizational data store. Therefore, data marts only focus on specific business
architecture to build the warehouse in a large size that takes less time and generate
organization revenue. Basically, it divides the data into small portion like sales information,
customers information, and items information etc. In addition, some key points need to
remember of data mart as follows.
Data Marts are Windows, Unix and Linux based servers that create data marts. It also
contains low cost to create data marts.
Implementation of data marts cycles cannot be stored for a long time means it only valid for
weeks not for months and years.
Data marts are much smaller in size but more flexible. Data marts can customize the data
by their departments.
Because of the user understanding, data mart requires less implementation as compared to
the data warehouse. As a result, data mart can increase business process.
13. Types: Data Mart
• Data marts contain its three type as follows
• Dependent data mart directly sourced from
the database of a data warehouse.
• Independent data mart sourced with more
than one data warehouse.
• Hybrid data mart converts the data from data
warehouse to operational database.
So, let’s understand the data marts types in brief.
Dependent Data Mart
A dependent data mart access data source
directly from the single data warehouse
database. This type of data mart depends on
only enterprise data warehouse or centralized
data warehouse. You can access only one
database at a time. But, if you want more than
one then you have to individually configure them
is called dependent data mart.
14. Independent Data Mart
This type of data mart works without
the use of any centralized system or
data enterprise warehouse.
Independent data mart divides the
data into small portion within an
organization. Therefore, independent
data mart does not contain any
relation value with data enterprise
warehouse and other operational
data marts. Hence, it works
independently from the operational
system. For this reason, you need to
develop new centralized store where
an end-user can analyze the data
15. Hybrid Data Mart
A hybrid data mart is a mixture of
multiple inputs from the external
sources in the data warehouse.
Basically, it works on the ad-hoc
environment. It is designed to work
on more than a database and also
provide the fastest implementation
for any big organization. It requires
less cleaning process and supports
big database. Hybrid data mart
contains more flexibility than
dependent and independent data
marts for small data applications.
16. Implementation process of Data Mart
Data mart contains five implementation process as follows.
Designing
Constructing
Populating
Accessing
Managing
17. Designing
Designing is the 1st phase of implementation in the data mart, that covers all actions between the
initial request of data mart to collective data through the basic data mart requirements. It also
discovers both physical & logical model of the data mart. In addition, this phase contains some tasks
as follows.
Collect the required technical and business requirement.
It verifies the source of data and selects the proper subset of data.
It helps in making the physical and logical architecture of the data mart.
Constructing
Constructing is the 2nd phase of datamart, it creates the logical and physical architecture of the data
mart that provides fast access to data. In addition, this phase contains some tasks as follows.
An RDBMS stores the constructed data which contains the addition of the data & deletion of the data.
By using SQL query, you can perform several functions like indexes, views, and tables.
Constructed data remains secure by using SQL security libraries.
18. Populating
This is the third phase of data mart process, that covers all the steps which are used in above two
phase of data mart like getting the data from the external source, cleaning of the data and modifying of
the data. In addition, this phase contains some tasks as follows.
Select the target sources for getting the data.
Extraction of data from the sources.
Data cleaning and data transformation.
Loading and storing the data (metadata) into the database.
Accessing
This is the 4th phase of data mart implementation, this phase makes the data usable for data analysis,
querying the data, creating the chart, graphs, and reports to publish them into the external market for
business objectives. Furthermore, an end-user submits the set of queries and display the final result of
the queries.
Managing
This is the fifth last phase of data mart implementation, that covers all of the tasks which relate to
19. 3. Metadata Repository
Metadata is another form of simple data. A simple data which represent different data format
call as metadata. Metadata segregate the especially relevant information of the data. It also
provides the detailed information of the data. So, let’s assume an index of the notebook
consumes as a metadata for the internal content of the notebook. It also defines the
warehouse objects and roadmap in the data warehouse. Now, understand the metadata
repository, which is an internal part of data warehouse system. As a result, it contains some
metadata categories as follows.
Business Metadata: This type of metadata covers business information like business goals,
owner information, changing policies information and business definition etc.
Operational Metadata: It contains lineage of data and data currency. Data currency defines
the active mode of data, purged and archived. Data lineage refers to the data transformation
and migration rules.
Mapping of data from operating system to Data warehouse: This type of metadata contains
sources database, the content of the data sources, extraction of data, data partition,
cleaning of data, the transformation of data, purging of data and refreshing of data rules etc.
Summarization Algorithms: It contains data aggregation, dimension algorithms, data
20. 4. Monitoring & Administration
This element is basically used for data refreshment and disaster recovery. It set the
limitation in size of data warehouse and provide data in synchronization. Further, it controls
the ranges of query and numbers format. It helps to increase performance better of the
database.
Middle-Tier Layer
In the middle-tier, there is an OLAP server that works as a single processing combination of
ROLAP and MOLAP database systems. This OLAP server provides the multidimensional
way to get the data model. For this reason, analysts and managers can provide consistent
and fastest way to get the information.
ROLAP stands for Relational On-Line Analytical Process that manipulate the stored data in
the relational database. ROLAP model can easily maintain the large data. MOLAP stands
for Multidimensional OLAP system that maintains dynamic multidimensional operations and
the data. Data can be stored in the multidimensional cube. It provides fastest data retrieval
due to MOLAP cubes. The whole structure of middle layer is to represent the abstracted
view of the data warehouse. As a result, this layer contains common behavior between
database and end-user.
21. Components of OLAP Server
There are three components in OLAP system as follows.
Relational OLAP
Multidimensional OLAP
Hybrid OLAP
Relational OLAP
Basically, ROLAP works on the stored data in the relational database. ROLAP server defines the
relationship between front-end(client) tools and relational back-end tools. ROLAP server provides the
way to manage and store the data by using relational and extended DBMS. Therefore, this server can
easily handle a large number of data. But, it contains a drawback that it has some limited SQL
functionalities to maintain the database and aggregate tables.
Multidimensional OLAP
Multidimensional OLAP system handles the multidimensional data through array-based storage
systems. It also provides an optimal solution for dice and slice operations. MOLAP perform the fastest
access of the data in comparison with relational OLAP. It can also handle complex calculations. But, it
contains some drawbacks like data aggregation cannot perform while changing the dimension of
cubes. MOLAP maintains a small number of data.
22. Hybrid OLAP
Hybrid is a complete mixture of MOLAP & ROLAP servers. A hybrid
server provides the fastest accessibility than MOLAP & ROLAP
servers. It contains better scalability and computation. It also
maintains a big number of data (More than a terabyte).
Note: OLAP contains additional type as “Specialized SQL servers”,
that allows to execute advanced functions of SQL query and provide
better processing support than Snowflake and Star schemas.
So, let’s look at the difference between OLAP and OLTP. This is the
most important part in data warehouse, which could be asked in the
interviews for data warehousing profile.
23. Top-Tier Layer
Top-Tier is a front-end client-side layer in a three-tier model of the data warehouse. This
contains reporting & query tools, data mining & data analysis tools. Here, reporting tools
contain report writers & productive reporting tools. Analysis tools are also used to make
charts based on data mining result. Data mining tools define the especially relevant
information from the hidden patterns. Now, we have completed the three-architecture model
of the data warehouse.
Conclusion
I hope you guys enjoyed this tour of the data warehouse. A simple data warehouse is a hub where the
especially relevant information resides in the database. There are many databases for storing the
value but I explained the relational and extended database management system. Because they both
contain maximum data storage than other databases. This is a complete tour since definition to its
conclusion. In addition, this article will help to learn what exactly data warehouse is? And why is this
so important?
I also explained the whole concept of the data warehouse from the scratch which will help to learn for
freshers in their interviews and the exams. In the data warehouse architecture, metadata plays a vital
role as it species the values, the point of connection, source, features and usage of the data
warehouse. In addition, last but not the least the main topic of this article is the difference between
OLAP and OLTP. So, most of the companies who hire a candidate or employee for data warehousing
profile they must ask these differences.