SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Data Architecture
Process in a Business
Intelligence Environment
WHAT DO DATA ARCHITECTS IN A BUSINESS INTELLIGENCE(BI) ENVIRONMENT DO?
AUTHOR: SASHA CITINO, SENIOR CONSULTANT (DATA ARCHITECTURE)
PUBLISHED ON: SEPTEMBER 29TH, 2016
About the Author: Sasha Citino
9/29/2016Written and Published by Sasha Citino (Data Architect)
2
 Sasha Citino has 15+ years of experience in Information Technology
industry. Sasha got her start in IT as a VB 6 Developer but quickly
moved into the world of “Data”.
 Sasha has 12+ years of experience in designing, developing and
implementing Data Warehouses in SQL Server and Oracle
environments.
 Sasha has experience in Business Intelligence (Architecture and
Development) in multiple industries such as Real Estate(commercial
and industrial), Telecommunications, Retail, Fast Food, Casino
Gaming, Supply Chain distribution and logistics, Healthcare,
Supplementary Insurance.
 Sasha has been the lead Data Architect for multiple multi-million
dollar BI projects for the last 8 years. She thoroughly enjoys BI and
all of its components.
 Contact Sasha: sccitino@yahoo.com
Agenda
 What is Business Intelligence?
 Data Warehouse (DW) vs Business Intelligence (BI)
 What is Data Architecture?
 Visual representation of Data Warehouse Architecture
 Components of a Data Warehouse
 What do Data Architects need to know in a BI environment
 Data Architect Relationships in a BI environment
 Key Architecture Process Roles of a Data Architect in a BI environment
 Note on Data Architecture Standards
 Step by Step Data Architecture Process in a BI (traditional) environment
9/29/2016Written and Published by Sasha Citino (Data Architect)
3
What is BI?
 BI Encompasses:
 Tools, applications, methodologies for data
collection and transformation from a
variety of internal and external data
sources
 Providing data analytical tools to end users
to allow them to analyze data (adhoc),
report on/present important business KPI’s
(key performance indicators) via
dashboards, reports as well as other data
visualization tools
 Providing avenues for external consumers
of data to extract data from a single,
stable, robust and dimensional data
repository
Business Intelligence or BI is a
technology based process or
mechanism for analyzing and
presenting data in a format
that allows business users,
including executives,
managers and other users to
make informed business
decisions.
9/29/2016Written and Published by Sasha Citino (Data Architect)
4
Data
Warehouse(DW)
vs Business
Intelligence(BI)
 So what really is a Data Warehouse?:
 A Data Warehouse is a large storage of
data that is collected from multiple data
sources including but not limited to,
operational systems, financial systems, the
internet, and flat files.
 A Data Warehouse is frequently known as
the central repository for a company’s
data.
 The data in a Data Warehouse is extracted
from multiple data sources in raw form,
aligned to mature business processes and
then goes through transformation phase,
utilizing best practice DW methodologies
to turn raw data into a format that allows
for simple, high performance consumption
of the data via data visualization tools,
adhoc analytical tools as well as external
consumers.
After 15 years working in Business
Intelligence starting at custom
application development,
moving to report development,
ETL development, Database
management in a DW
environment, supporting
multiple Data Warehouse
environments in a variety of
industries and eventually
architecting dimensional Data
Warehouses, in my professional
opinion, a Data Warehouse is an
integral, necessary
“Component” of Business
Intelligence.
9/29/2016Written and Published by Sasha Citino (Data Architect)
5
What is Data Architecture?
 Defines rules, structures and policies to support business objectives
 Mechanism for how data is governed, defined, stored and managed in a
Data Warehouse
 Integrates data from multiple source systems within an organization
 Allows for consumption of data by reporting tools, data visualization tools,
adhoc analysis as well as external consumers.
9/29/2016Written and Published by Sasha Citino (Data Architect)
6
Data Warehouse Architecture
Note: Image used
from Oracle Data
Warehousing
Concepts
whitepaper
9/29/2016Written and Published by Sasha Citino (Data Architect)
7
Data Architecture Components
Data Sources
Data is extracted from multiple
data sources. Data Sources can
be:
• Operational Systems
• ERP Systems
• CRM Systems
• Financial Systems
• Flat Files
• Internet
Data Warehouse
Data Warehouse has multiple
components:
• Data Staging Database
• Persistent Staging Database
(stores raw data historically)
• Metadata
• Summary/Aggregated data in
dimensional form
(dimensions/facts)
• Data Marts
• Data Architecture Modeling Tools
(e.g. Erwin, Embarcadero, R)
Consumers
Data Warehouse data is
consumed by a variety of Users:
• Data Analysts
• Report/Data Visualization
Developers/Users
• Data mining
• External consumers such as
other business applications
Data is Extracted, Transformed and Loaded to Target Objects using ETL tools/processes -->9/29/2016Written and Published by Sasha Citino (Data Architect)
8
Data Architects in a BI Environment
 Data Architects in a BI Environment should:
 Understand the End to end vision of the BI Project
 Get Business Buy-in (without support, the success of the project is at risk)
 Understand legacy systems and how systems relate
 Understand business processes and how they translate to one or more dimensional models
 Address data migration, cleansing and storage requirements/issues
 Work closely with and develop strong relationships with project SME’s (subject matter
experts) and project teams throughout the BI project
 Architect for the Business Process, at the lowest grain allowing for aggregation and acutely
aware of how Time affects metrics, attributes, kpi’s
 Architect for flexibility, robustness, re-usability
 Verify concepts ALWAYS prior to transferring development to other teams (ETL, Reporting)
9/29/2016Written and Published by Sasha Citino (Data Architect)
9
Data Architect Relationships in a BI environment
Performance
/ DBA Team
Report
Developers
External
Customers
ETL
Developers
Business
Analysts
Project
SME’s
Data
Analysts
Quality
Assurance
Testers
Data
Architect
9/29/2016Written and Published by Sasha Citino (Data Architect)
10
Data Architect
Role
 Data Profiling
 “Data Investigation”
 Integration Design
 Aligning data from multiple systems and
sources
 Dimensional Modeling
 Structures data in conformed format for
faster reporting on large data volumes
 Organize data for effective and efficient
analysis according to business processes
 Define Data Architecture Standards
 See next slide for note on Standards
Key Architecture Process
Roles of a Data Architect in a
BI environment
9/29/2016Written and Published by Sasha Citino (Data Architect)
11
Note on Data
Architecture
Standards
 Data Architecture standards may vary by
company or architect but they should
always include:
 Consistent naming conventions for
tables(staging, dimension, facts, helper, cross
reference)
 Consistent naming convention for fields
 Consistent strategy and naming convention
for Indexes/Partitions
 Clear definition on how Nulls in dimensions
and facts are handled
 The data modeling tool in use/to be used
 Clear definition of the data types that can be
used
 Metadata requirements for tables (e.g.
insert_date, update_date, current_flg,source
system, effective_from_dt and
effective_to_dt) that should be present on
each data warehouse dimension/fact table.
It is critical to any data
warehouse environment to
have well defined and
consistent standards
surrounding naming
conventions, handling of nulls,
dimension/fact design
strategies, types of data
architect artifacts required
and the data modeling tool(s)
used.
9/29/2016Written and Published by Sasha Citino (Data Architect)
12
Data Architecture Process
 The Data Architecture Process, once matured, is repeatable, dependable,
effective and efficient and aligns to business processes.
 Components of Data Architecture in a BI Environment:
 Step 1 – Receive/Understand Requirements
 Step 2 – Data Profiling
 Step 3 – Conceptual Model Design
 Step 4 – Logical Model Design
 Step 5 – Physical Model Design (also known as ERD (entity relationship diagram)
 Step 6 – ETL Mapping
 Step 7 – Data Model Reviews
 Step 8 – Metadata/Data Validation post development
9/29/2016Written and Published by Sasha Citino (Data Architect)
13
Data Architecture Process - Requirements
 Business Requirements for the business process to be architected can be
delivered to a data architect in multiple formats:
 Through Business and/or User Requirement specifications for a new business process
and/or enhancement to an existing business process/data model
 Through Source System SME’s, typically when upgrades to source system(s) affect the
Data Warehouse (new fields, changed fields, changed logic)
 Through self examination (data architect reviews existing data models and identifies
new metrics/attributes that can be added to enhance the robustness of a data
model and provide added business value).
 Through listening! It is extremely important for a Data Architect to be an excellent
listener. You may notice repeated statements from, for example, the reporting team
on aggregations/calculations/groupings that a seasoned data architect can identify
as an opportunity for improvement of the existing data model. While this may not
provide added business value, it may help in performance of the environment
and/or simplification of the DW environment for reporting.
9/29/2016Written and Published by Sasha Citino (Data Architect)
14
Data Architecture Process – Data Profiling
What are you profiling?
• Select Business Process
• Decide on grain of data
• Identify
dimensions/dimensional
attributes
• Identify facts/metrics
Understand Metadata
• Analyze tables pertaining to
business process subject area
• Data Sources
• Table sizes
• Row counts
• Fields/columns
• Relationships
• Natural/Primary Keys
Generate Profiling Outputs
Upon completion of the data
profiling process. The following
outputs can be generated:
• Summary analysis of Metadata
• Source queries that relate
tables and select attributes
and metrics according to
filter/aggregation business
process criteria
• These source queries can also
be used to validate landed
data post ETL development
9/29/2016Written and Published by Sasha Citino (Data Architect)
15
Data Architecture Process – Conceptual Model Design
 During the Conceptual Model Design phase, the
Data Architect:
 Creates a conceptual schema which is a high level
visual description of the business process
informational needs.
 Identifies dimensions that relate to the business
process
 Identifies at a high level the metrics/facts that relate
to the business process
 Output: The conceptual model (example seen in
pic)
 The conceptual model can be used to
communicate with the business without too much
technical information
 The conceptual model can also be used to update
the Bus Matrix (pivot of business processes and
what dimensions are used by each)
9/29/2016Written and Published by Sasha Citino (Data Architect)
16
Data Architecture Process – Logical Model Design
 During the Logical Model Design
Phase, a Data Architect:
 Identifies Data Metrics (typically in raw
form) that support the subject area.
 Documents relationship between
metrics and dimensions.
 Identifies all fields needed for subject
area and their metadata attributes
 Output: The Logical Data Model
 An example of a fact table logical
design, can be seen in picture shown
9/29/2016Written and Published by Sasha Citino (Data Architect)
17
Data Architecture Process – Physical Model Design
Select Modeling Tool
The physical data modeling for a
business process is typically
completed using a Data Modeling
tool.
Examples of Data modeling Tools:
• Erwin Data Modeler
• Embarcadero
• R
• Visio
**There are many tools, all depends
on your company’s preference.
Create Dimensional Model
• Create the Entity Relationship
Diagram for Dimensional Model;
• Create Dimension Tables/Fact
table(s)
• Define Physical properties for each
Dimension Attribute and Fact
metric. Physical properties are:
• Data Type,
• Data Length /Scale/Precision
• Relationships,
• Indexes,
• Storage Schemas
Output of Physical Model
Once the Physical model has
been created using a modeling
tool. The following artifacts are
produced:
• ERD (entity relationship
diagram)
• DDL (Data Definition
Language) for each
dimension/fact table
• DDL’s are used to create the
physical tables on the
database
9/29/2016Written and Published by Sasha Citino (Data Architect)
18
Data Architecture Process – ETL Mapping
What is ETL Mapping?
• ETL means Extract, Transform, Load.
This is the mechanism by which
data is extracted from source
systems, transformed according to
business requirements and then
loaded to target dimension and
fact tables in the Data Warehouse.
• The Data Architect during the ETL
mapping phase, identifies the
rules/business logic for the ETL
Developers to accurately Extract,
Transform and Load data to
defined dimensions and facts.
• The ETL Mapping document is
absolutely critical to the success of
the ETL Team’s ability to develop
the processes to populate data.
ETL Mapping Content
The Data Architect creates an ETL mapping
template to:
• Identify Source Systems, source tables,
source fields
• Identify Target Tables /fields
• Define the Type of DW Table (fact/
dimension)
• Define/Identify Grouping Logic; Filters;
Column Order/Type; Data
Type/Length/Precision/Scale;
• Define Transformation Logic (rules)
• Define Default values for Null attributes,
keys, metrics
• Source Queries for ETL Developers to get
insight into the data they are working with.
ETL Mapping Outputs
• ETL Mapping Document
• Metadata for the Data
Warehouse environment
• Data Dictionary (Note: this is
not always done by the data
architect but rather a member
of the Data Governance
team)
9/29/2016Written and Published by Sasha Citino (Data Architect)
19
Data
Architecture
Process – Data
Model Review
 In a mature BI environment, the Data
Architect conducts Data Model Reviews
with ETL Developers, Report Developers
and possibly Business Analyst to:
 Ensure data model meets business
requirements
 Provide ETL Developers with the overview
of the business process/subject area.
 Review the ETL logic with Developers to
ensure they understand what needs to be
done
 Provide Report Developers with an
overview of the data model giving them
insight into the data they will soon report
on
Prior to Hand-Off to the
Development Teams, the
Data Architect will perform
Data Model Review(s) to
ensure everyone is on the
same page and understands
the tasks that need to be
completed and/or what data
will become available for
consumption by the data
visualizers (reporting team).
9/29/2016Written and Published by Sasha Citino (Data Architect)
20
Data
Architecture
Process –
Metadata
Validation
 Metadata Validation by the Data Architect
involves:
 Checking data for consistency and
completeness
 Checking for duplicates;
 Verifying row grain uniqueness/natural keys;
 Verifying data formatting;
 Verifying row counts and data match
expected row counts and data from source
queries (data profiling step)
 Verifying the non-existence of orphaned or
null surrogate keys; landed data matches
expected source query results.
 Note: If validation fails, the DA will work with
ETL team to resolve. If validation passes, the
DA will notify/ Hand-off to Reporting Team
Upon completion of
development work by the ETL
Team, the Data Architect
reviews the data landed to
the target dimension/fact
table(s) to ensure that it
complies with the rules
defined in the ETL mapping
document
9/29/2016Written and Published by Sasha Citino (Data Architect)
21
Data Architecture Process – Wrap
up
Upon completion of all data architecture process steps, ETL development
and successful Metadata Validation, the Data Architecture process is
complete for this business process/enhancement. The Data Architect will
continue to be a resource to the Reporting, Quality Assurance and
Performance teams as needed.
9/29/2016Written and Published by Sasha Citino (Data Architect)
22

Weitere ähnliche Inhalte

Was ist angesagt?

The Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: CollaborationThe Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: Collaboration
Embarcadero Technologies
 
IRMAC April 2015 - DMBOK2 DWBI New Content
IRMAC April 2015 - DMBOK2 DWBI New ContentIRMAC April 2015 - DMBOK2 DWBI New Content
IRMAC April 2015 - DMBOK2 DWBI New Content
Martin Sykora
 

Was ist angesagt? (20)

Slides: Enterprise Architecture vs. Data Architecture
Slides: Enterprise Architecture vs. Data ArchitectureSlides: Enterprise Architecture vs. Data Architecture
Slides: Enterprise Architecture vs. Data Architecture
 
Building a Collaborative Data Architecture
Building a Collaborative Data ArchitectureBuilding a Collaborative Data Architecture
Building a Collaborative Data Architecture
 
Data-Ed Online: Trends in Data Modeling
Data-Ed Online: Trends in Data ModelingData-Ed Online: Trends in Data Modeling
Data-Ed Online: Trends in Data Modeling
 
Data Systems Integration & Business Value Pt. 1: Metadata
Data Systems Integration & Business Value Pt. 1: MetadataData Systems Integration & Business Value Pt. 1: Metadata
Data Systems Integration & Business Value Pt. 1: Metadata
 
Enterprise Data Architect Job Description
Enterprise Data Architect Job DescriptionEnterprise Data Architect Job Description
Enterprise Data Architect Job Description
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Emerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big ThingEmerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big Thing
 
Slides: How AI Makes Analytics More Human
Slides: How AI Makes Analytics More HumanSlides: How AI Makes Analytics More Human
Slides: How AI Makes Analytics More Human
 
HP Vertica Architecture Gives Massive Performance Boost to Toughest BI Querie...
HP Vertica Architecture Gives Massive Performance Boost to Toughest BI Querie...HP Vertica Architecture Gives Massive Performance Boost to Toughest BI Querie...
HP Vertica Architecture Gives Massive Performance Boost to Toughest BI Querie...
 
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
 
Lean Modeling for Any Methodology
Lean Modeling for Any MethodologyLean Modeling for Any Methodology
Lean Modeling for Any Methodology
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture Deliverables
 
IDERA Slides: Managing Complex Data Environments
IDERA Slides: Managing Complex Data EnvironmentsIDERA Slides: Managing Complex Data Environments
IDERA Slides: Managing Complex Data Environments
 
Data Integration, Interoperability and Virtualization
Data Integration, Interoperability and VirtualizationData Integration, Interoperability and Virtualization
Data Integration, Interoperability and Virtualization
 
The Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: CollaborationThe Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: Collaboration
 
IRMAC April 2015 - DMBOK2 DWBI New Content
IRMAC April 2015 - DMBOK2 DWBI New ContentIRMAC April 2015 - DMBOK2 DWBI New Content
IRMAC April 2015 - DMBOK2 DWBI New Content
 
ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and Comparison
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
 
Applying reference models with archi mate
Applying reference models with archi mateApplying reference models with archi mate
Applying reference models with archi mate
 

Andere mochten auch

Respa Broker Training
Respa Broker TrainingRespa Broker Training
Respa Broker Training
jaccip
 
[Goldman Sachs] A mortgage product primer
[Goldman Sachs] A mortgage product primer[Goldman Sachs] A mortgage product primer
[Goldman Sachs] A mortgage product primer
3mag1
 
Loan Origination and Portfolio Risk
Loan Origination and Portfolio RiskLoan Origination and Portfolio Risk
Loan Origination and Portfolio Risk
Dave DeBonis
 
Bi Architecture And Conceptual Framework
Bi Architecture And Conceptual FrameworkBi Architecture And Conceptual Framework
Bi Architecture And Conceptual Framework
Slava Kokaev
 

Andere mochten auch (20)

BIS06 Physical Database Models
BIS06 Physical Database ModelsBIS06 Physical Database Models
BIS06 Physical Database Models
 
BIS03 Data Modelling - I
BIS03 Data Modelling - IBIS03 Data Modelling - I
BIS03 Data Modelling - I
 
Bi risk services 2013
Bi risk services 2013Bi risk services 2013
Bi risk services 2013
 
Stearns Lending, Inc. | People. Power. Possibilities.
Stearns Lending, Inc. | People. Power. Possibilities.Stearns Lending, Inc. | People. Power. Possibilities.
Stearns Lending, Inc. | People. Power. Possibilities.
 
Respa Broker Training
Respa Broker TrainingRespa Broker Training
Respa Broker Training
 
Data Federation
Data FederationData Federation
Data Federation
 
Mortgage LOS Implementation: A Roadmap for Sustainability
Mortgage LOS Implementation: A Roadmap for SustainabilityMortgage LOS Implementation: A Roadmap for Sustainability
Mortgage LOS Implementation: A Roadmap for Sustainability
 
[Goldman Sachs] A mortgage product primer
[Goldman Sachs] A mortgage product primer[Goldman Sachs] A mortgage product primer
[Goldman Sachs] A mortgage product primer
 
Real World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data WarehousingReal World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data Warehousing
 
Hadoop meets Cloud with Multi-Tenancy
Hadoop meets Cloud with Multi-TenancyHadoop meets Cloud with Multi-Tenancy
Hadoop meets Cloud with Multi-Tenancy
 
Data Warehousing - in the real world
Data Warehousing - in the real worldData Warehousing - in the real world
Data Warehousing - in the real world
 
Better architecture with semantic integration
Better architecture with semantic integrationBetter architecture with semantic integration
Better architecture with semantic integration
 
Loan Origination and Portfolio Risk
Loan Origination and Portfolio RiskLoan Origination and Portfolio Risk
Loan Origination and Portfolio Risk
 
Session9 E X A M I N A T I O N P R E P NMLS - Mortgage
Session9 E X A M I N A T I O N P R E P NMLS - MortgageSession9 E X A M I N A T I O N P R E P NMLS - Mortgage
Session9 E X A M I N A T I O N P R E P NMLS - Mortgage
 
Bi Architecture And Conceptual Framework
Bi Architecture And Conceptual FrameworkBi Architecture And Conceptual Framework
Bi Architecture And Conceptual Framework
 
Driving your BA Career: From Business Analyst to Business Architect
Driving your BA Career: From Business Analyst to Business ArchitectDriving your BA Career: From Business Analyst to Business Architect
Driving your BA Career: From Business Analyst to Business Architect
 
The architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWSThe architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWS
 
The Mortgage Process
The Mortgage ProcessThe Mortgage Process
The Mortgage Process
 
05. Physical Data Specification Template
05. Physical Data Specification Template05. Physical Data Specification Template
05. Physical Data Specification Template
 
Loan Management System in Oracle
Loan Management System in OracleLoan Management System in Oracle
Loan Management System in Oracle
 

Ähnlich wie Data Architecture Process in a BI environment

Microsoft SQL Server - BI Consolidation Presentation
Microsoft SQL Server - BI Consolidation PresentationMicrosoft SQL Server - BI Consolidation Presentation
Microsoft SQL Server - BI Consolidation Presentation
Microsoft Private Cloud
 
Real time insights for better products, customer experience and resilient pla...
Real time insights for better products, customer experience and resilient pla...Real time insights for better products, customer experience and resilient pla...
Real time insights for better products, customer experience and resilient pla...
Balvinder Hira
 

Ähnlich wie Data Architecture Process in a BI environment (20)

10 Best Big Data Management Tools
10 Best Big Data Management Tools10 Best Big Data Management Tools
10 Best Big Data Management Tools
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Visionet Business Intelligence Solutions - Is your Business Intelligence real...
Visionet Business Intelligence Solutions - Is your Business Intelligence real...Visionet Business Intelligence Solutions - Is your Business Intelligence real...
Visionet Business Intelligence Solutions - Is your Business Intelligence real...
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
Database 2 External Schema
Database 2   External SchemaDatabase 2   External Schema
Database 2 External Schema
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Microsoft SQL Server - BI Consolidation Presentation
Microsoft SQL Server - BI Consolidation PresentationMicrosoft SQL Server - BI Consolidation Presentation
Microsoft SQL Server - BI Consolidation Presentation
 
Overview of business intelligence
Overview of business intelligenceOverview of business intelligence
Overview of business intelligence
 
Business intelligence overview
Business intelligence overviewBusiness intelligence overview
Business intelligence overview
 
Intro of Key Features of SoftCAAT BI Software
Intro of Key Features of SoftCAAT BI SoftwareIntro of Key Features of SoftCAAT BI Software
Intro of Key Features of SoftCAAT BI Software
 
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - ExcercisesAgile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
CS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_ArchitectureCS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_Architecture
 
What is the Best Data Visualization Tool: Power BI or Tableau?
What is the Best Data Visualization Tool: Power BI or Tableau?What is the Best Data Visualization Tool: Power BI or Tableau?
What is the Best Data Visualization Tool: Power BI or Tableau?
 
Real time insights for better products, customer experience and resilient pla...
Real time insights for better products, customer experience and resilient pla...Real time insights for better products, customer experience and resilient pla...
Real time insights for better products, customer experience and resilient pla...
 
Intro to big data and applications - day 2
Intro to big data and applications - day 2Intro to big data and applications - day 2
Intro to big data and applications - day 2
 

Data Architecture Process in a BI environment

  • 1. Data Architecture Process in a Business Intelligence Environment WHAT DO DATA ARCHITECTS IN A BUSINESS INTELLIGENCE(BI) ENVIRONMENT DO? AUTHOR: SASHA CITINO, SENIOR CONSULTANT (DATA ARCHITECTURE) PUBLISHED ON: SEPTEMBER 29TH, 2016
  • 2. About the Author: Sasha Citino 9/29/2016Written and Published by Sasha Citino (Data Architect) 2  Sasha Citino has 15+ years of experience in Information Technology industry. Sasha got her start in IT as a VB 6 Developer but quickly moved into the world of “Data”.  Sasha has 12+ years of experience in designing, developing and implementing Data Warehouses in SQL Server and Oracle environments.  Sasha has experience in Business Intelligence (Architecture and Development) in multiple industries such as Real Estate(commercial and industrial), Telecommunications, Retail, Fast Food, Casino Gaming, Supply Chain distribution and logistics, Healthcare, Supplementary Insurance.  Sasha has been the lead Data Architect for multiple multi-million dollar BI projects for the last 8 years. She thoroughly enjoys BI and all of its components.  Contact Sasha: sccitino@yahoo.com
  • 3. Agenda  What is Business Intelligence?  Data Warehouse (DW) vs Business Intelligence (BI)  What is Data Architecture?  Visual representation of Data Warehouse Architecture  Components of a Data Warehouse  What do Data Architects need to know in a BI environment  Data Architect Relationships in a BI environment  Key Architecture Process Roles of a Data Architect in a BI environment  Note on Data Architecture Standards  Step by Step Data Architecture Process in a BI (traditional) environment 9/29/2016Written and Published by Sasha Citino (Data Architect) 3
  • 4. What is BI?  BI Encompasses:  Tools, applications, methodologies for data collection and transformation from a variety of internal and external data sources  Providing data analytical tools to end users to allow them to analyze data (adhoc), report on/present important business KPI’s (key performance indicators) via dashboards, reports as well as other data visualization tools  Providing avenues for external consumers of data to extract data from a single, stable, robust and dimensional data repository Business Intelligence or BI is a technology based process or mechanism for analyzing and presenting data in a format that allows business users, including executives, managers and other users to make informed business decisions. 9/29/2016Written and Published by Sasha Citino (Data Architect) 4
  • 5. Data Warehouse(DW) vs Business Intelligence(BI)  So what really is a Data Warehouse?:  A Data Warehouse is a large storage of data that is collected from multiple data sources including but not limited to, operational systems, financial systems, the internet, and flat files.  A Data Warehouse is frequently known as the central repository for a company’s data.  The data in a Data Warehouse is extracted from multiple data sources in raw form, aligned to mature business processes and then goes through transformation phase, utilizing best practice DW methodologies to turn raw data into a format that allows for simple, high performance consumption of the data via data visualization tools, adhoc analytical tools as well as external consumers. After 15 years working in Business Intelligence starting at custom application development, moving to report development, ETL development, Database management in a DW environment, supporting multiple Data Warehouse environments in a variety of industries and eventually architecting dimensional Data Warehouses, in my professional opinion, a Data Warehouse is an integral, necessary “Component” of Business Intelligence. 9/29/2016Written and Published by Sasha Citino (Data Architect) 5
  • 6. What is Data Architecture?  Defines rules, structures and policies to support business objectives  Mechanism for how data is governed, defined, stored and managed in a Data Warehouse  Integrates data from multiple source systems within an organization  Allows for consumption of data by reporting tools, data visualization tools, adhoc analysis as well as external consumers. 9/29/2016Written and Published by Sasha Citino (Data Architect) 6
  • 7. Data Warehouse Architecture Note: Image used from Oracle Data Warehousing Concepts whitepaper 9/29/2016Written and Published by Sasha Citino (Data Architect) 7
  • 8. Data Architecture Components Data Sources Data is extracted from multiple data sources. Data Sources can be: • Operational Systems • ERP Systems • CRM Systems • Financial Systems • Flat Files • Internet Data Warehouse Data Warehouse has multiple components: • Data Staging Database • Persistent Staging Database (stores raw data historically) • Metadata • Summary/Aggregated data in dimensional form (dimensions/facts) • Data Marts • Data Architecture Modeling Tools (e.g. Erwin, Embarcadero, R) Consumers Data Warehouse data is consumed by a variety of Users: • Data Analysts • Report/Data Visualization Developers/Users • Data mining • External consumers such as other business applications Data is Extracted, Transformed and Loaded to Target Objects using ETL tools/processes -->9/29/2016Written and Published by Sasha Citino (Data Architect) 8
  • 9. Data Architects in a BI Environment  Data Architects in a BI Environment should:  Understand the End to end vision of the BI Project  Get Business Buy-in (without support, the success of the project is at risk)  Understand legacy systems and how systems relate  Understand business processes and how they translate to one or more dimensional models  Address data migration, cleansing and storage requirements/issues  Work closely with and develop strong relationships with project SME’s (subject matter experts) and project teams throughout the BI project  Architect for the Business Process, at the lowest grain allowing for aggregation and acutely aware of how Time affects metrics, attributes, kpi’s  Architect for flexibility, robustness, re-usability  Verify concepts ALWAYS prior to transferring development to other teams (ETL, Reporting) 9/29/2016Written and Published by Sasha Citino (Data Architect) 9
  • 10. Data Architect Relationships in a BI environment Performance / DBA Team Report Developers External Customers ETL Developers Business Analysts Project SME’s Data Analysts Quality Assurance Testers Data Architect 9/29/2016Written and Published by Sasha Citino (Data Architect) 10
  • 11. Data Architect Role  Data Profiling  “Data Investigation”  Integration Design  Aligning data from multiple systems and sources  Dimensional Modeling  Structures data in conformed format for faster reporting on large data volumes  Organize data for effective and efficient analysis according to business processes  Define Data Architecture Standards  See next slide for note on Standards Key Architecture Process Roles of a Data Architect in a BI environment 9/29/2016Written and Published by Sasha Citino (Data Architect) 11
  • 12. Note on Data Architecture Standards  Data Architecture standards may vary by company or architect but they should always include:  Consistent naming conventions for tables(staging, dimension, facts, helper, cross reference)  Consistent naming convention for fields  Consistent strategy and naming convention for Indexes/Partitions  Clear definition on how Nulls in dimensions and facts are handled  The data modeling tool in use/to be used  Clear definition of the data types that can be used  Metadata requirements for tables (e.g. insert_date, update_date, current_flg,source system, effective_from_dt and effective_to_dt) that should be present on each data warehouse dimension/fact table. It is critical to any data warehouse environment to have well defined and consistent standards surrounding naming conventions, handling of nulls, dimension/fact design strategies, types of data architect artifacts required and the data modeling tool(s) used. 9/29/2016Written and Published by Sasha Citino (Data Architect) 12
  • 13. Data Architecture Process  The Data Architecture Process, once matured, is repeatable, dependable, effective and efficient and aligns to business processes.  Components of Data Architecture in a BI Environment:  Step 1 – Receive/Understand Requirements  Step 2 – Data Profiling  Step 3 – Conceptual Model Design  Step 4 – Logical Model Design  Step 5 – Physical Model Design (also known as ERD (entity relationship diagram)  Step 6 – ETL Mapping  Step 7 – Data Model Reviews  Step 8 – Metadata/Data Validation post development 9/29/2016Written and Published by Sasha Citino (Data Architect) 13
  • 14. Data Architecture Process - Requirements  Business Requirements for the business process to be architected can be delivered to a data architect in multiple formats:  Through Business and/or User Requirement specifications for a new business process and/or enhancement to an existing business process/data model  Through Source System SME’s, typically when upgrades to source system(s) affect the Data Warehouse (new fields, changed fields, changed logic)  Through self examination (data architect reviews existing data models and identifies new metrics/attributes that can be added to enhance the robustness of a data model and provide added business value).  Through listening! It is extremely important for a Data Architect to be an excellent listener. You may notice repeated statements from, for example, the reporting team on aggregations/calculations/groupings that a seasoned data architect can identify as an opportunity for improvement of the existing data model. While this may not provide added business value, it may help in performance of the environment and/or simplification of the DW environment for reporting. 9/29/2016Written and Published by Sasha Citino (Data Architect) 14
  • 15. Data Architecture Process – Data Profiling What are you profiling? • Select Business Process • Decide on grain of data • Identify dimensions/dimensional attributes • Identify facts/metrics Understand Metadata • Analyze tables pertaining to business process subject area • Data Sources • Table sizes • Row counts • Fields/columns • Relationships • Natural/Primary Keys Generate Profiling Outputs Upon completion of the data profiling process. The following outputs can be generated: • Summary analysis of Metadata • Source queries that relate tables and select attributes and metrics according to filter/aggregation business process criteria • These source queries can also be used to validate landed data post ETL development 9/29/2016Written and Published by Sasha Citino (Data Architect) 15
  • 16. Data Architecture Process – Conceptual Model Design  During the Conceptual Model Design phase, the Data Architect:  Creates a conceptual schema which is a high level visual description of the business process informational needs.  Identifies dimensions that relate to the business process  Identifies at a high level the metrics/facts that relate to the business process  Output: The conceptual model (example seen in pic)  The conceptual model can be used to communicate with the business without too much technical information  The conceptual model can also be used to update the Bus Matrix (pivot of business processes and what dimensions are used by each) 9/29/2016Written and Published by Sasha Citino (Data Architect) 16
  • 17. Data Architecture Process – Logical Model Design  During the Logical Model Design Phase, a Data Architect:  Identifies Data Metrics (typically in raw form) that support the subject area.  Documents relationship between metrics and dimensions.  Identifies all fields needed for subject area and their metadata attributes  Output: The Logical Data Model  An example of a fact table logical design, can be seen in picture shown 9/29/2016Written and Published by Sasha Citino (Data Architect) 17
  • 18. Data Architecture Process – Physical Model Design Select Modeling Tool The physical data modeling for a business process is typically completed using a Data Modeling tool. Examples of Data modeling Tools: • Erwin Data Modeler • Embarcadero • R • Visio **There are many tools, all depends on your company’s preference. Create Dimensional Model • Create the Entity Relationship Diagram for Dimensional Model; • Create Dimension Tables/Fact table(s) • Define Physical properties for each Dimension Attribute and Fact metric. Physical properties are: • Data Type, • Data Length /Scale/Precision • Relationships, • Indexes, • Storage Schemas Output of Physical Model Once the Physical model has been created using a modeling tool. The following artifacts are produced: • ERD (entity relationship diagram) • DDL (Data Definition Language) for each dimension/fact table • DDL’s are used to create the physical tables on the database 9/29/2016Written and Published by Sasha Citino (Data Architect) 18
  • 19. Data Architecture Process – ETL Mapping What is ETL Mapping? • ETL means Extract, Transform, Load. This is the mechanism by which data is extracted from source systems, transformed according to business requirements and then loaded to target dimension and fact tables in the Data Warehouse. • The Data Architect during the ETL mapping phase, identifies the rules/business logic for the ETL Developers to accurately Extract, Transform and Load data to defined dimensions and facts. • The ETL Mapping document is absolutely critical to the success of the ETL Team’s ability to develop the processes to populate data. ETL Mapping Content The Data Architect creates an ETL mapping template to: • Identify Source Systems, source tables, source fields • Identify Target Tables /fields • Define the Type of DW Table (fact/ dimension) • Define/Identify Grouping Logic; Filters; Column Order/Type; Data Type/Length/Precision/Scale; • Define Transformation Logic (rules) • Define Default values for Null attributes, keys, metrics • Source Queries for ETL Developers to get insight into the data they are working with. ETL Mapping Outputs • ETL Mapping Document • Metadata for the Data Warehouse environment • Data Dictionary (Note: this is not always done by the data architect but rather a member of the Data Governance team) 9/29/2016Written and Published by Sasha Citino (Data Architect) 19
  • 20. Data Architecture Process – Data Model Review  In a mature BI environment, the Data Architect conducts Data Model Reviews with ETL Developers, Report Developers and possibly Business Analyst to:  Ensure data model meets business requirements  Provide ETL Developers with the overview of the business process/subject area.  Review the ETL logic with Developers to ensure they understand what needs to be done  Provide Report Developers with an overview of the data model giving them insight into the data they will soon report on Prior to Hand-Off to the Development Teams, the Data Architect will perform Data Model Review(s) to ensure everyone is on the same page and understands the tasks that need to be completed and/or what data will become available for consumption by the data visualizers (reporting team). 9/29/2016Written and Published by Sasha Citino (Data Architect) 20
  • 21. Data Architecture Process – Metadata Validation  Metadata Validation by the Data Architect involves:  Checking data for consistency and completeness  Checking for duplicates;  Verifying row grain uniqueness/natural keys;  Verifying data formatting;  Verifying row counts and data match expected row counts and data from source queries (data profiling step)  Verifying the non-existence of orphaned or null surrogate keys; landed data matches expected source query results.  Note: If validation fails, the DA will work with ETL team to resolve. If validation passes, the DA will notify/ Hand-off to Reporting Team Upon completion of development work by the ETL Team, the Data Architect reviews the data landed to the target dimension/fact table(s) to ensure that it complies with the rules defined in the ETL mapping document 9/29/2016Written and Published by Sasha Citino (Data Architect) 21
  • 22. Data Architecture Process – Wrap up Upon completion of all data architecture process steps, ETL development and successful Metadata Validation, the Data Architecture process is complete for this business process/enhancement. The Data Architect will continue to be a resource to the Reporting, Quality Assurance and Performance teams as needed. 9/29/2016Written and Published by Sasha Citino (Data Architect) 22