The Data Architect role is one of the most misunderstood roles in Information Technology. The role is usually done in parts by several members in IT. DBA's, Application Architects and Developers perform this role in some fashion or the other. But having a single resource or team own this role brings tremendous advantages in standardization, compliance, documentation and performance.
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
The Data Architect Manifesto
1. The Data Architect Manifesto
Session ID#: 10144
REMINDER
Check in on the
COLLABORATE mobile app
Prepared by:
Mahesh Vallampati
Practice Principal
Keste
@mvallamp
2. About the Presenter
■ Mahesh Vallampati
▪ Career
— Practice Leader for Business Intelligence and Oracle Financials at
Keste
— Sales and Consulting at Oracle for 9 years
▪ Education
— Courses in Business/Accounting at Houston Community College— Courses in Business/Accounting at Houston Community College
— Master’s in EE from Texas A&M University
■ Career Focus
▪ Used to be a DBA
▪ Now Techno-Functional (Fechnical)
3. is an AWARD-WINNING software solutions and
development company headquartered in Plano, Texas.
We focus on the EXECUTION, DELIVERY and SUPPORT of enterprise software
& systems for the high technology, communications,
life sciences and industrial manufacturing amongst other industries.
Keste – kest n. [old world language derivative]; A culture that is agile and adaptive
3
8. IT Architecture
■ The IEEE Definition
▪ Describes the fundamental organization of a system
▪ Embodies it components
▪ Describes the relationships between the components and the
environment
▪ Describes the principles governing the design and evolution▪ Describes the principles governing the design and evolution
9. Data Architecture-Zachmann
Layer View Data (What) RACI
EA DA Bus DBA
1 Scope/Contextual List of things and
architectural standards important to the
business
A C R I
2 Business Model/Conceptual Semantic model C RA I I2 Business Model/Conceptual Semantic model
or Conceptual/Enterprise Data Model
C RA I I
3 System Model/Logical Enterprise/Logical Data
Model
C RA I I
4 Technology Model/Physical Data Model C C I RA
5 Detailed Representations in Actual databases I C I RA
10. Data Architecture Drivers
Driver Description
Enterprise
Requirements
The requirements of a business system that processes
data
Technology Drivers Existing standards, software and resource knowledge
Economics Business Drivers, Competitive advantage, Business
cycle
Business Policies Compliance, Policies and regulatory environment
Data Processing
Needs
Type of Data Processing – Transaction, Data
Warehousing, Mixed Load
11. Conceptual, Logical and Physical
Feature Conceptual Logical Physical
Entity Names X X
Entity
Relationships X X
Attributes XAttributes X
Primary Keys X X
Foreign Keys X X
Table Names X
Column Names X
Column Data
Types X
14. Manifesto
■ A public declaration of policy and aims
■ The two famous manifestos of all time
▪ The Declaration of Independence
▪ The Communist Manifesto - by Karl Marx
16. In the beginning…
■ In the Beginning there was Codd…
▪ We acknowledge the father of modern relational data theory
▪ He was a British citizen who fought in World War II
▪ He got his Ph.D. from Michigan
▪ Just like all innovations, his work was ignored by his employer -
IBMIBM
▪ Larry Ellison recalled reading the paper and being inspired
enough to make several billions
17. And then there was Date..
■ Date was an English computer scientist
■ He popularized and taught relational data theory
■ His book on relational data theory is a classic that is used
even today
■ The book is,” An Introduction to Database Systems”
■ He later wrote a book called Databases, Types and the■ He later wrote a book called Databases, Types and the
Relational Model which is more popularly referred to as the
third manifesto.
18. Use The keys
■ We promise to use the key, the whole key and nothing but the
key, so help me Codd.
▪ A mnemonic that helps in verifying the third normal form
▪ A tongue in cheek obeisance to the father of relational theory
■ Keys
▪ The key – 1st Normal Form▪ The key – 1st Normal Form
▪ The whole Key – 2nd Normal Form
▪ Nothing but the key – 3rd Normal Form
19. Have a functional perspective
■ While most data architects think in terms of data models, it is
beneficial to think in terms of business functions
■ Having a functional or logical data model that has a business
perspective puts things into focus
■ A functional perspectives gives context and business purpose
to a data model
20. Have a functional perspective
Customers
Buying
Users
Clients
Shopping Lists
Order Guide
External
Products
Inventory
Products/
Item Master
Buying
Products
Vendors
Ordering RulesCustomer
Product
Tags
Customers X
Products
Orders
21. Feel free to comment
■ "Don't let it end like this. Tell them I said something" ~ last
words of Pancho Villa
■ Oracle offers a mechanism to store comments
▪ Tables
▪ Columns
▪ Materialized views▪ Materialized views
▪ IndexType
▪ User Defined Operators
22. Comment on Tables
■ create table foo(bar number);
■ comment on table foo is 'This is a comment for foo';
■ select * from user_tab_comments where table_name=‘FOO’
TABLE_NAME TABLE_TYPE COMMENTS
FOO TABLE This is a comment for foo
23. Comment on Columns
■ comment on column foo.bar is 'This is a comment for bar';
■ select * from user_col_comments where comments is not
null;
TABLE_NAME COLUMN_NAME COMMENTS
FOO BAR This is a comment for barFOO BAR This is a comment for bar
24. He named names
■ Naming columns should be consistent across tables
■ A column that is used widely in several tables should have
the same name
■ You will not believe how often it is not the case
■ Keep abbreviations and short names consistent across table
name and columnsname and columns
25. Always use Aliases
■ When referring to tables in queries, always use aliases
■ Also when referring to columns in queries, always prefix them
with their table alias
■ This helps the reviewer or user or developers to understand
what is being referred to from where
■ It is especially important when doing outer joins on the■ It is especially important when doing outer joins on the
columns that are being joined.
■ My favorite table alias is for FND_USER
26. It is OK to be ANSI and not (+)
■ ANSI SQL is the way to go from a data architecture
perspective
■ ANSI SQL is highly portable and can make applications
potentially database neutral
■ Yes, ANSI is verbose
■ Yes, it can be confusing■ Yes, it can be confusing
■ Yes, it is painful
■ But it is worth it
27. Know the Who
■ All table should have the Who Columns
▪ CREATED_BY – The user who created the record
▪ UPDATED_BY – The user who updated the record
▪ CREATION_DATE – The date and time the record was created
▪ LAST_UPDATE_DATE – The date and time the record was
updatedupdated
28. Master of his domain
■ Domains allow you to define and reuse a data type with
optional constraints or allowable values. You can use
domains in the Logical and Relational models.
■ The concept of domains should be adopted more by data
architects
■ Oracle SQL Data Modeler now provides domain features in
its modeling capabilityits modeling capability
29. Know Attribute Domains
■ STATUS_INDICATOR – NUMBER
▪ 1
▪ 2
▪ 3
▪ 4
■ So what do these values mean?■ So what do these values mean?
■ A survey of architects had different interpretations for their
meaning
■ Instead have a table structure that captures these attribute
domains
30. FND_IT
■ Oracle’s Approach in EBS for domain values
▪ FND_LOOKUP_VALUES
■ Use a similar approach
▪ TAB_COL_DOMAIN_LOOKUPS
▪ For each distinct value in the column domain store the value
and its meaningand its meaning
▪ Eliminate any ambiguities about what the few distinct values in
the column mean
■ This has the benefit of deriving meanings for columns from
queries instead of using other sub-optimal approaches
31. Documenting Attribute Domains
Table Name Column Name Column Values Value Meaning
PRODUCT_MASTER STATUS_INDICATOR 1 Org Product
PRODUCT_MASTER STATUS_INDICATOR 2 Third Party
PRODUCT_MASTER STATUS_INDICATOR 3 Government
Product
PRODUCT_MASTER STATUS_INDICATOR 4 Discontinued
32. CHECK_IT
■ When using small domain ranges say distinct values in
column < 10, use a check constraint
■ This eliminates the possibility that non-domain values will get
filled
33. Design for the Analytic
■ A focus on data mapping to functionality should not blind us
from the analytic
■ Make sure the data model is analytic friendly
■ See if it can be modeled as a snowflake or a star
■ Or use click-stream tables
■ Always ask the question- Can I mine this data?■ Always ask the question- Can I mine this data?
34. Know the business
■ The future demands people who know both technology and
business
■ Meet, talk and work with the users of the system
■ Live their life for a day and use the system like they do
■ Find the question behind the question
■ Design for the analytic ( business insight ) and the data■ Design for the analytic ( business insight ) and the data
35. Know more…
■ As a Data Architect, know more
▪ Than the developer
▪ Than the user
▪ Than the business
▪ Than the business Analyst
▪ Than the tester▪ Than the tester
▪ Than the PM
36. Data is now big
■ From a relational standpoint, Big Data is the converse
■ It is and can be counter-intuitive
■ There is actually a NO-SQL
■ It is a big deal
■ It is un-structured
■ It is however learnable■ It is however learnable
37. Do the Math (Financial)
■ There are always business requirements that involve using
large data sets
■ While that sounds awesome and cool, it comes with a lot of
costs
■ Large Data Sets impose significant overhead on IT services
whether it be Infrastructure, DBA, licenses and development
costscosts
■ We did a cost benefit analysis for a customer who wanted to
use Advanced Pricing and convinced them to use Simple
Pricing
38. Do the Math
Probability 50%
Discount Rate 5%
Year1 Year2 Year3 Year4 Year5
RevenueRevenue
Upside $4,000,000 $4,000,000 $4,000,000 $4,000,000 $4,000,000
NPV $17,317,907 NPV for 5 Years
Probable
Revenue $8,658,953 NPV times the Probability
Investment
Required $15,000,000 Capital Investment Required. Depreciation not included.
Profit ($6,341,047) Revenue-Cost Incurred
39. Know the Stat
■ Every relational database uses some kind of statistical model
about the data
■ This data is used to determine query plans
■ Most of them assume a uniform distribution of the data
■ Any skewed distribution of the data has to be “taught” to the
system as a hint or a special process to gather itsystem as a hint or a special process to gather it
■ Any Data Architect should be able to articulate the statistical
distribution of a column values
40. Know the Stat
■ Data Science or Big Data Analytics is all about statistics
■ A huge stream of data is mined to generate customer
preferences
■ These preferences are used to drive product placement and
other revenue and profit enhancing initiatives
41. Know the Stat
■ At a minimum, know the following
▪ Mean, Median and Mode
▪ Standard Deviation
▪ Quintile, Decile, Quartile and Percentile
▪ An awareness of Regression Analysis
42. Write it down
■ For every table in the system, have a Wikipedia page
■ Or a note-let
■ Have a one pager or one paragraph about the table and the
business function it supports
■ For every column, have a short description as to what it
meansmeans
43. Write it Down (Example)
Column Name Data Type Comments
ORG_ID NUMBER Customer Organization
CUST_NBR NUMBER Customer Number
Customers have departments and this table tracks it and it is an outer join
from the customer table. Table Name: HZ_CUST_DEPT
CUST_NBR NUMBER Customer Number
DEPT_NBR NUMBER(38,0) Customer Department
DEPT_NAME VARCHAR2(25 BYTE)
Customer Department
Name
DEPT_ACTV_IND VARCHAR2(1 BYTE)
Indicates if the
Department for the
customer is active or not
(Y/N)?
44. Visualize It
■ Be comfortable in data visualization techniques
■ Be able to represent data in different formats in a way that
generates insight
■ Most BI Tools provide this and be able to provide innovative
perspectives on data, results and reports
■ Information Dashboard Design by Stephen Few is particularly■ Information Dashboard Design by Stephen Few is particularly
insightful
45. Be savvy about Algorithms
■ Algorithms provide a framework to think about complex
business requirements
■ Ask the question, whether the algorithm required will be
complex
■ If the answer is yes, costs will be high
■ You should be able to articulate in terms of O(n), O(nlog(n)),■ You should be able to articulate in terms of O(n), O(nlog(n)),
O(n*n) and so on
46. Mask the Data
■ As data security becomes an increasingly important topic,
masking the data from PROD to DEV becomes an important
task
■ Masking the data in PROD from users of the system also
becomes important
■ For e.g., salaries in Oracle HR tables are now masked and
were not a few versions agowere not a few versions ago
■ A savvy Oracle developer could pretty much know the
salaries of every employee in the company
47. Secure the Data
■ As a Data Architect, we need to be able to define secure
methods to protect the data from internal and external threats
■ Features like Oracle Database vault and secure backups are
key features that make it possible
■ While there are security teams, as a data architect, we need
to be able to identify data vulnerabilities
■ Become familiar with encryption technologies like RSA
48. Drive towards Master Data
■ Master Data for key enterprise domains (customer, products)
are becoming common place
■ We need to adopt this wave and lead from the front
■ Master Data Management is here to stay
49. Where do your users spend time?
What Data
Users Do?
How they do it? Industry Standard
Data Gathering Users spend a lot of gathering data 35
Data They then spend a lot of time formatting it 20Data
Formatting
They then spend a lot of time formatting it 20
Data
Reconciliation
They then reconcile the data 30
Data Analysis They then analyze the data 15
50. Get Certified
■ CDMP
▪ Certified Data Management Professional
■ Data Management Association International (DAMA)
■ Institute for Certification of Computing Professionals (ICCP)
■ Three ICCP exams:
▪ IS Core exam▪ IS Core exam
▪ Data Management Core exam
▪ One elective
51. You will speak many tongues
■ Not just SQL or PL/SQL
▪ XML and XSLT
▪ NO SQL
▪ UML (Unified Modeling Language)
▪ Java is the cobol of the 21st century
■ Not Just ER Data Models■ Not Just ER Data Models
▪ Logical Data Models
▪ Process flows that necessitate the entities of these logical
entities
52. Be Responsible
■ Be Responsible for
▪ Organizing Data
▪ Treat Data as an Asset
▪ Leverage Data to achieve the strategic goals of the enterprise
▪ Data Quality
▪ Data Governance▪ Data Governance
▪ Data Security
54. The pledge
■ We, the data architects, hereby solemnly swear, that we will
safeguard the data assets of the enterprise, by securing it
from external threats, masking it from internal threats,
document it to avoid secrecy, ensure data quality and data
governance and commit to ongoing learning and new
approaches, and provide value to our stakeholders, so help
me Codd.me Codd.
55. at Collaborate
Questions to @mvallamp
Text
972-804-5511
Mahesh Vallampati
Practice Leader, BI and EBS
Mahesh.Vallampati@keste.com
972-804-5511
57. Please complete the session
evaluation
We appreciate your feedback and insight
You may complete the session evaluation either
on paper or online via the mobile app