1. Data Architecture
Process in a Business
Intelligence Environment
WHAT DO DATA ARCHITECTS IN A BUSINESS INTELLIGENCE(BI) ENVIRONMENT DO?
AUTHOR: SASHA CITINO, SENIOR CONSULTANT (DATA ARCHITECTURE)
PUBLISHED ON: SEPTEMBER 29TH, 2016
2. About the Author: Sasha Citino
9/29/2016Written and Published by Sasha Citino (Data Architect)
2
Sasha Citino has 15+ years of experience in Information Technology
industry. Sasha got her start in IT as a VB 6 Developer but quickly
moved into the world of “Data”.
Sasha has 12+ years of experience in designing, developing and
implementing Data Warehouses in SQL Server and Oracle
environments.
Sasha has experience in Business Intelligence (Architecture and
Development) in multiple industries such as Real Estate(commercial
and industrial), Telecommunications, Retail, Fast Food, Casino
Gaming, Supply Chain distribution and logistics, Healthcare,
Supplementary Insurance.
Sasha has been the lead Data Architect for multiple multi-million
dollar BI projects for the last 8 years. She thoroughly enjoys BI and
all of its components.
Contact Sasha: sccitino@yahoo.com
3. Agenda
What is Business Intelligence?
Data Warehouse (DW) vs Business Intelligence (BI)
What is Data Architecture?
Visual representation of Data Warehouse Architecture
Components of a Data Warehouse
What do Data Architects need to know in a BI environment
Data Architect Relationships in a BI environment
Key Architecture Process Roles of a Data Architect in a BI environment
Note on Data Architecture Standards
Step by Step Data Architecture Process in a BI (traditional) environment
9/29/2016Written and Published by Sasha Citino (Data Architect)
3
4. What is BI?
BI Encompasses:
Tools, applications, methodologies for data
collection and transformation from a
variety of internal and external data
sources
Providing data analytical tools to end users
to allow them to analyze data (adhoc),
report on/present important business KPI’s
(key performance indicators) via
dashboards, reports as well as other data
visualization tools
Providing avenues for external consumers
of data to extract data from a single,
stable, robust and dimensional data
repository
Business Intelligence or BI is a
technology based process or
mechanism for analyzing and
presenting data in a format
that allows business users,
including executives,
managers and other users to
make informed business
decisions.
9/29/2016Written and Published by Sasha Citino (Data Architect)
4
5. Data
Warehouse(DW)
vs Business
Intelligence(BI)
So what really is a Data Warehouse?:
A Data Warehouse is a large storage of
data that is collected from multiple data
sources including but not limited to,
operational systems, financial systems, the
internet, and flat files.
A Data Warehouse is frequently known as
the central repository for a company’s
data.
The data in a Data Warehouse is extracted
from multiple data sources in raw form,
aligned to mature business processes and
then goes through transformation phase,
utilizing best practice DW methodologies
to turn raw data into a format that allows
for simple, high performance consumption
of the data via data visualization tools,
adhoc analytical tools as well as external
consumers.
After 15 years working in Business
Intelligence starting at custom
application development,
moving to report development,
ETL development, Database
management in a DW
environment, supporting
multiple Data Warehouse
environments in a variety of
industries and eventually
architecting dimensional Data
Warehouses, in my professional
opinion, a Data Warehouse is an
integral, necessary
“Component” of Business
Intelligence.
9/29/2016Written and Published by Sasha Citino (Data Architect)
5
6. What is Data Architecture?
Defines rules, structures and policies to support business objectives
Mechanism for how data is governed, defined, stored and managed in a
Data Warehouse
Integrates data from multiple source systems within an organization
Allows for consumption of data by reporting tools, data visualization tools,
adhoc analysis as well as external consumers.
9/29/2016Written and Published by Sasha Citino (Data Architect)
6
7. Data Warehouse Architecture
Note: Image used
from Oracle Data
Warehousing
Concepts
whitepaper
9/29/2016Written and Published by Sasha Citino (Data Architect)
7
8. Data Architecture Components
Data Sources
Data is extracted from multiple
data sources. Data Sources can
be:
• Operational Systems
• ERP Systems
• CRM Systems
• Financial Systems
• Flat Files
• Internet
Data Warehouse
Data Warehouse has multiple
components:
• Data Staging Database
• Persistent Staging Database
(stores raw data historically)
• Metadata
• Summary/Aggregated data in
dimensional form
(dimensions/facts)
• Data Marts
• Data Architecture Modeling Tools
(e.g. Erwin, Embarcadero, R)
Consumers
Data Warehouse data is
consumed by a variety of Users:
• Data Analysts
• Report/Data Visualization
Developers/Users
• Data mining
• External consumers such as
other business applications
Data is Extracted, Transformed and Loaded to Target Objects using ETL tools/processes -->9/29/2016Written and Published by Sasha Citino (Data Architect)
8
9. Data Architects in a BI Environment
Data Architects in a BI Environment should:
Understand the End to end vision of the BI Project
Get Business Buy-in (without support, the success of the project is at risk)
Understand legacy systems and how systems relate
Understand business processes and how they translate to one or more dimensional models
Address data migration, cleansing and storage requirements/issues
Work closely with and develop strong relationships with project SME’s (subject matter
experts) and project teams throughout the BI project
Architect for the Business Process, at the lowest grain allowing for aggregation and acutely
aware of how Time affects metrics, attributes, kpi’s
Architect for flexibility, robustness, re-usability
Verify concepts ALWAYS prior to transferring development to other teams (ETL, Reporting)
9/29/2016Written and Published by Sasha Citino (Data Architect)
9
10. Data Architect Relationships in a BI environment
Performance
/ DBA Team
Report
Developers
External
Customers
ETL
Developers
Business
Analysts
Project
SME’s
Data
Analysts
Quality
Assurance
Testers
Data
Architect
9/29/2016Written and Published by Sasha Citino (Data Architect)
10
11. Data Architect
Role
Data Profiling
“Data Investigation”
Integration Design
Aligning data from multiple systems and
sources
Dimensional Modeling
Structures data in conformed format for
faster reporting on large data volumes
Organize data for effective and efficient
analysis according to business processes
Define Data Architecture Standards
See next slide for note on Standards
Key Architecture Process
Roles of a Data Architect in a
BI environment
9/29/2016Written and Published by Sasha Citino (Data Architect)
11
12. Note on Data
Architecture
Standards
Data Architecture standards may vary by
company or architect but they should
always include:
Consistent naming conventions for
tables(staging, dimension, facts, helper, cross
reference)
Consistent naming convention for fields
Consistent strategy and naming convention
for Indexes/Partitions
Clear definition on how Nulls in dimensions
and facts are handled
The data modeling tool in use/to be used
Clear definition of the data types that can be
used
Metadata requirements for tables (e.g.
insert_date, update_date, current_flg,source
system, effective_from_dt and
effective_to_dt) that should be present on
each data warehouse dimension/fact table.
It is critical to any data
warehouse environment to
have well defined and
consistent standards
surrounding naming
conventions, handling of nulls,
dimension/fact design
strategies, types of data
architect artifacts required
and the data modeling tool(s)
used.
9/29/2016Written and Published by Sasha Citino (Data Architect)
12
13. Data Architecture Process
The Data Architecture Process, once matured, is repeatable, dependable,
effective and efficient and aligns to business processes.
Components of Data Architecture in a BI Environment:
Step 1 – Receive/Understand Requirements
Step 2 – Data Profiling
Step 3 – Conceptual Model Design
Step 4 – Logical Model Design
Step 5 – Physical Model Design (also known as ERD (entity relationship diagram)
Step 6 – ETL Mapping
Step 7 – Data Model Reviews
Step 8 – Metadata/Data Validation post development
9/29/2016Written and Published by Sasha Citino (Data Architect)
13
14. Data Architecture Process - Requirements
Business Requirements for the business process to be architected can be
delivered to a data architect in multiple formats:
Through Business and/or User Requirement specifications for a new business process
and/or enhancement to an existing business process/data model
Through Source System SME’s, typically when upgrades to source system(s) affect the
Data Warehouse (new fields, changed fields, changed logic)
Through self examination (data architect reviews existing data models and identifies
new metrics/attributes that can be added to enhance the robustness of a data
model and provide added business value).
Through listening! It is extremely important for a Data Architect to be an excellent
listener. You may notice repeated statements from, for example, the reporting team
on aggregations/calculations/groupings that a seasoned data architect can identify
as an opportunity for improvement of the existing data model. While this may not
provide added business value, it may help in performance of the environment
and/or simplification of the DW environment for reporting.
9/29/2016Written and Published by Sasha Citino (Data Architect)
14
15. Data Architecture Process – Data Profiling
What are you profiling?
• Select Business Process
• Decide on grain of data
• Identify
dimensions/dimensional
attributes
• Identify facts/metrics
Understand Metadata
• Analyze tables pertaining to
business process subject area
• Data Sources
• Table sizes
• Row counts
• Fields/columns
• Relationships
• Natural/Primary Keys
Generate Profiling Outputs
Upon completion of the data
profiling process. The following
outputs can be generated:
• Summary analysis of Metadata
• Source queries that relate
tables and select attributes
and metrics according to
filter/aggregation business
process criteria
• These source queries can also
be used to validate landed
data post ETL development
9/29/2016Written and Published by Sasha Citino (Data Architect)
15
16. Data Architecture Process – Conceptual Model Design
During the Conceptual Model Design phase, the
Data Architect:
Creates a conceptual schema which is a high level
visual description of the business process
informational needs.
Identifies dimensions that relate to the business
process
Identifies at a high level the metrics/facts that relate
to the business process
Output: The conceptual model (example seen in
pic)
The conceptual model can be used to
communicate with the business without too much
technical information
The conceptual model can also be used to update
the Bus Matrix (pivot of business processes and
what dimensions are used by each)
9/29/2016Written and Published by Sasha Citino (Data Architect)
16
17. Data Architecture Process – Logical Model Design
During the Logical Model Design
Phase, a Data Architect:
Identifies Data Metrics (typically in raw
form) that support the subject area.
Documents relationship between
metrics and dimensions.
Identifies all fields needed for subject
area and their metadata attributes
Output: The Logical Data Model
An example of a fact table logical
design, can be seen in picture shown
9/29/2016Written and Published by Sasha Citino (Data Architect)
17
18. Data Architecture Process – Physical Model Design
Select Modeling Tool
The physical data modeling for a
business process is typically
completed using a Data Modeling
tool.
Examples of Data modeling Tools:
• Erwin Data Modeler
• Embarcadero
• R
• Visio
**There are many tools, all depends
on your company’s preference.
Create Dimensional Model
• Create the Entity Relationship
Diagram for Dimensional Model;
• Create Dimension Tables/Fact
table(s)
• Define Physical properties for each
Dimension Attribute and Fact
metric. Physical properties are:
• Data Type,
• Data Length /Scale/Precision
• Relationships,
• Indexes,
• Storage Schemas
Output of Physical Model
Once the Physical model has
been created using a modeling
tool. The following artifacts are
produced:
• ERD (entity relationship
diagram)
• DDL (Data Definition
Language) for each
dimension/fact table
• DDL’s are used to create the
physical tables on the
database
9/29/2016Written and Published by Sasha Citino (Data Architect)
18
19. Data Architecture Process – ETL Mapping
What is ETL Mapping?
• ETL means Extract, Transform, Load.
This is the mechanism by which
data is extracted from source
systems, transformed according to
business requirements and then
loaded to target dimension and
fact tables in the Data Warehouse.
• The Data Architect during the ETL
mapping phase, identifies the
rules/business logic for the ETL
Developers to accurately Extract,
Transform and Load data to
defined dimensions and facts.
• The ETL Mapping document is
absolutely critical to the success of
the ETL Team’s ability to develop
the processes to populate data.
ETL Mapping Content
The Data Architect creates an ETL mapping
template to:
• Identify Source Systems, source tables,
source fields
• Identify Target Tables /fields
• Define the Type of DW Table (fact/
dimension)
• Define/Identify Grouping Logic; Filters;
Column Order/Type; Data
Type/Length/Precision/Scale;
• Define Transformation Logic (rules)
• Define Default values for Null attributes,
keys, metrics
• Source Queries for ETL Developers to get
insight into the data they are working with.
ETL Mapping Outputs
• ETL Mapping Document
• Metadata for the Data
Warehouse environment
• Data Dictionary (Note: this is
not always done by the data
architect but rather a member
of the Data Governance
team)
9/29/2016Written and Published by Sasha Citino (Data Architect)
19
20. Data
Architecture
Process – Data
Model Review
In a mature BI environment, the Data
Architect conducts Data Model Reviews
with ETL Developers, Report Developers
and possibly Business Analyst to:
Ensure data model meets business
requirements
Provide ETL Developers with the overview
of the business process/subject area.
Review the ETL logic with Developers to
ensure they understand what needs to be
done
Provide Report Developers with an
overview of the data model giving them
insight into the data they will soon report
on
Prior to Hand-Off to the
Development Teams, the
Data Architect will perform
Data Model Review(s) to
ensure everyone is on the
same page and understands
the tasks that need to be
completed and/or what data
will become available for
consumption by the data
visualizers (reporting team).
9/29/2016Written and Published by Sasha Citino (Data Architect)
20
21. Data
Architecture
Process –
Metadata
Validation
Metadata Validation by the Data Architect
involves:
Checking data for consistency and
completeness
Checking for duplicates;
Verifying row grain uniqueness/natural keys;
Verifying data formatting;
Verifying row counts and data match
expected row counts and data from source
queries (data profiling step)
Verifying the non-existence of orphaned or
null surrogate keys; landed data matches
expected source query results.
Note: If validation fails, the DA will work with
ETL team to resolve. If validation passes, the
DA will notify/ Hand-off to Reporting Team
Upon completion of
development work by the ETL
Team, the Data Architect
reviews the data landed to
the target dimension/fact
table(s) to ensure that it
complies with the rules
defined in the ETL mapping
document
9/29/2016Written and Published by Sasha Citino (Data Architect)
21
22. Data Architecture Process – Wrap
up
Upon completion of all data architecture process steps, ETL development
and successful Metadata Validation, the Data Architecture process is
complete for this business process/enhancement. The Data Architect will
continue to be a resource to the Reporting, Quality Assurance and
Performance teams as needed.
9/29/2016Written and Published by Sasha Citino (Data Architect)
22