Many are confused when it comes to data. Architecture, models, data - it can seem a bit overwhelming. This webinar offers a clear explanation of Data Modeling as the primary means of achieving better understanding of Data Architecture. Using a storytelling format, this webinar presents an organization approaching the daunting process of attempting to better leverage its data. The organization is currently not knowledgeable of these concepts and begins the process of understating its current state as well as a desired future state. We join as the organization takes steps to better understand what is has and what it needs to accomplish to employ Data Modeling and Architecture to achieve its mission.
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Data Architecture vs Data Modeling
1. Peter Aiken, Ph.D.
Data
Architecture
Versus
Data
Modeling
Copyright 2018 by Data Blueprint Slide # !5
Data mapping from two perspectives
• DAMA International President 2009-2013 / 2018
• DAMA International Achievement Award 2001
(with Dr. E. F. "Ted" Codd
• DAMA International Community Award 2005
Peter Aiken, Ph.D.
!6Copyright 2018 by Data Blueprint Slide #
• I've been doing this a long time
• My work is recognized as useful
• Associate Professor of IS (vcu.edu)
• Founder, Data Blueprint (datablueprint.com)
• DAMA International (dama.org)
• 10 books and dozens of articles
• Experienced w/ 500+ data
management practices worldwide
• Multi-year immersions
– US DoD (DISA/Army/Marines/DLA)
– Nokia
– Deutsche Bank
– Wells Fargo
– Walmart
– …
PETER AIKEN WITH JUANITA BILLINGS
FOREWORD BY JOHN BOTTEGA
MONETIZING
DATA MANAGEMENT
Unlocking the Value in Your Organization’s
Most Important Asset.
2. 4 Minute Architecture Lesson from Steve Jobs, Introducing iCloud
!7Copyright 2018 by Data Blueprint Slide #
Typically Managed Architectures
• Business Architecture
– Goals, strategies, roles, organizational structure, location(s)
• Process Architecture
– Arrangement of inputs -> transformations = value -> outputs
– Typical elements: Functions, activities, workflow, events, cycles, products, procedures
• Systems Architecture
– Applications, software components, interfaces, projects
• Security Architecture
– Arrangement of security controls relation to IT Architecture
• Technical Architecture/Tarchitecture
– Relation of software capabilities/technology stack
– Structure of the technology infrastructure of an enterprise, solution or system
– Typical elements: Networks, hardware, software platforms, standards/protocols
• Data/Information Architecture
– Arrangement of data assets supporting organizational strategy
– Typical elements: specifications expressed as entities, relationships, attributes,
definitions, values, vocabularies
!8Copyright 2018 by Data Blueprint Slide #
3. Architecture is about ...
• Things
– (components)
• The functions of the things
– (individually)
• How the things interact
– (as a system,
– towards a goal)
!9Copyright 2018 by Data Blueprint Slide #
• Business
• Process
• Systems
• Security
• Technical
• Data/Information
!10Copyright 2018 by Data Blueprint Slide #
Data Architecture Versus Data Modeling
!X
• Data Maps->Models
– Why do we need them?
– How are they be used?
– Challenges (social, political,
economic)
• Architecture/Engineering
– Two sides of the same coin
– Must operate on standard,
shared data of known quality
• View from the Top
– Means: Forward engineering
– Goal: Composition/Building
• View from the Bottom
– Means: Reverse engineering
– Goal: Understanding
• Working Together
– Functions required for
effective data management
– Need for simplicity
• Take Aways/Q&A
4. Data
Data
Data
Information
Fact Meaning
Request
Business Glossary Components
[Built on definitions from Dan Appleton 1983]
Intelligence
Strategic Use
1. Each FACT combines with one or more MEANINGS.
2. Each specific FACT and MEANING combination is referred to as a DATUM.
3. An INFORMATION is one or more DATA that are returned in response to a specific REQUEST
4. INFORMATION REUSE is enabled when one FACT is combined with more than one MEANING.
5. INTELLIGENCE is INFORMATION associated with its STRATEGIC USES.
6. DATA/INFORMATION must formally arranged into an ARCHITECTURE.
Wisdom & knowledge are
often used synonymously
Data
Data
Data Data
!11Copyright 2018 by Data Blueprint Slide #
Data ...
• As a subject is
– Complex and detailed
– Taught inconsistently, and
– Poorly understood
• Maps are necessary but
insufficient prerequisites to data architectures
– Fully leveraging data assets
• Maps are incomplete without purpose statements
– More powerful than definitions
– Remedy
• Add purpose statements
• Validate resulting model
• Maps are required to share information about data
• Data architectures are comprised of data models
– Data modeling is an engineering activity required to product data maps
that are necessary but insufficient prerequisites to leveraging data assets
!12Copyright 2018 by Data Blueprint Slide #
5. How are components expressed as architectures?
• Details are
organized into
larger
components
• Larger
components
are organized
into models
• Models are
organized into
architectures
(comprised of
architectural
components)
!13Copyright 2018 by Data Blueprint Slide #
A B
C D
A B
C D
A
D
C
B
Intricate
Dependencies
Purposefulness
How are data structures expressed as architectures?
• Attributes are organized into entities/objects
– Attributes are characteristics of "things"
– Entitles/objects are "things" whose
information is managed in support of strategy
– Examples
• Entities/objects are organized into models
– Combinations of attributes and entities are
structured to represent information requirements
– Poorly structured data, constrains organizational
information delivery capabilities
– Examples
• Models are organized into architectures
– When building new systems, architectures are used to plan development
– More often, data managers do not know what existing architectures are
and - therefore - cannot make use of them in support of strategy
implementation
– Why no examples?
!14Copyright 2018 by Data Blueprint Slide #
6. Data structures organized into an Architecture
• How do data structures support strategy?
• Consider the opposite question?
– Were your systems explicitly designed to be
integrated or otherwise work together?
– If not then what is the likelihood that they will
work well together?
– In all likelihood your organization is spending
between 20-40% of its IT budget compensating
for poor data structure integration
– They cannot be helpful as long as their
structure is unknown
• Two answers/two separate strategies
– Achieving efficiency and
effectiveness goals
– Providing organizational dexterity
for rapid implementation
!15Copyright 2018 by Data Blueprint Slide #
Data architectures are comprised of data models
!16Copyright 2018 by Data Blueprint Slide #
7. Data Architectures Determine Interoperability
• Required in order to enable
self-correction/generation capabilities
• Permits governance of data as an asset
• Prerequisite to meaningful data exchanges
• Lowers the costs of organization-wide and extra-
organizational data sharing
• Permits managed evolution - rapidly responding to
changing needs, new partners, time criticality's
• Required for full implementation of role-based security
• Decreases the cost of maintaining various data inventories
!17Copyright 2018 by Data Blueprint Slide #
Data Architectures:
– Capture the business meaning of the data required to run the organization
– Living document – constantly evolving to meet upcoming and discovered business requirements
– A potential entry point for architecture engagements
– Validated data architectural components can be used to populate a business glossary
– Major collection of metadata
Levels of Abstraction, Completeness and Utility
• Models more downward facing - detail
• Architecture is higher level of abstraction - integration
• In the past architecture attempted to gain complete
(perfect) understanding
– Not timely
– Not feasible
• Focus instead on
architectural components
– Governed by a framework
– More immediate utility
• http://www.architecturalcomponentsinc.com
!18Copyright 2018 by Data Blueprint Slide #
8. Data model focus is typically domain specific
!19Copyright 2018 by Data Blueprint Slide #
Program A
Program C
Program B
Focus of a software engineering effort
Underutilized
data modeling
effort
Database Architecture Focus Can Vary
!20Copyright 2018 by Data Blueprint Slide #
Application
domain 1
Program A
Program C
Program B
Focus of a software engineering effort
Underutilized
data modeling
effort
Better utilized
data modeling
effort
ERPs and COTS are marketed
as being similarly integrated!
Program F
Program E
Program G
Program H
Program I
Application
domain 2
Application
domain 3
Program D
9. Application
domain 1
Program A
Program C
Program B
DataData
DataData
Data
Data
Data
Program F
Program E
Program D
Program G
Program H
Program I
Application
domain 2Application
domain 3
Data
Data
Data
Data Architecture Focus has Greater Potential Value
• Broader focus than
either software
architecture or
database
architecture
• Analysis scope is
on the system
wide use of data
• Problems caused
by data exchange
or interface
problems
• Architectural goals
more strategic
than operational
!21Copyright 2018 by Data Blueprint Slide #
!22Copyright 2018 by Data Blueprint Slide #
10. What do we teach knowledge workers about data?
!23Copyright 2018 by Data Blueprint Slide #
What percentage of the deal with it daily?
!24Copyright 2018 by Data Blueprint Slide #
Political
12. !27Copyright 2018 by Data Blueprint Slide #
If the only tool you
know is a hammer
you tend to see
every problem as a
nail (slightly reworded
from Abraham Maslow)
Bad Data Decisions Spiral
!28Copyright 2018 by Data Blueprint Slide #
Bad data decisions
Technical deci-
sion makers are not
data knowledgable
Business decision
makers are not
data knowledgable
Poor organizational outcomes
Poor treatment of
organizational data
assets
Poor
quality
data
13. Tacoma Narrows Bridge/Gallopin' Gertie
• Slender, elegant and graceful
• World's 3rd longest suspension span
• Opened on July 1st, collapsed in a windstorm on
November 7, 1940
• "The most dramatic failure in
bridge engineering history"
• Changed forever how engineers
design suspension bridges leading
to safer spans today.
!29Copyright 2018 by Data Blueprint Slide #
!30Copyright 2018 by Data Blueprint Slide #
Similarly data failures cost organizations
minimally 20-40% of their IT budget
14. What is a data structure?
• "An organization of information, usually in memory, for better
algorithm efficiency, such as queue, stack, linked list, heap, dictionary,
and tree, or conceptual unity, such as the name and address of a
person. It may include redundant information, such as length of the
list or number of nodes in a subtree."
• Some data structure characteristics
– Grammar for data objects
• Grammar is the principles
or rules of an art, science,
or technique "a grammar
of the theater"
– Constraints for data
objects
– Sequential order
– Uniqueness
– Order
• Hierarchical, relational,
network, other
– Balance
– Optimality
!31Copyright 2018 by Data Blueprint Slide #
http://www.nist.gov/dads/HTML/datastructur.html
Data is a hidden IT Expense
• Organizations spend between 20 -
40% of their IT budget evolving
their data - including:
– Data migration
• Changing the location from one place to
another
– Data conversion
• Changing data into another form, state, or
product
– Data improving
• Inspecting and manipulating, or re-keying
data to prepare it for subsequent use
– Source: John Zachman
!32Copyright 2018 by Data Blueprint Slide #
PETER AIKEN WITH JUANITA BILLINGS
FOREWORD BY JOHN BOTTEGA
MONETIZING
DATA MANAGEMENT
Unlocking the Value in Your Organization’s
Most Important Asset.
15. Doing a poor job with data
• Takes longer
• Costs more
• Delivers less
• Presents greater risk (with thanks to Tom DeMarco)
!33Copyright 2018 by Data Blueprint Slide #
!34Copyright 2018 by Data Blueprint Slide #
Data Architecture Versus Data Modeling
!X
• Data Maps->Models
– Why do we need them?
– How are they be used?
– Challenges (social, political,
economic)
• Architecture/Engineering
– Two sides of the same coin
– Must operate on standard,
shared data of known quality
• View from the Top
– Means: Forward engineering
– Goal: Composition/Building
• View from the Bottom
– Means: Reverse engineering
– Goal: Understanding
• Working Together
– Functions required for
effective data management
– Need for simplicity
• Take Aways/Q&A
16. Architectures: here, whether you like it or not
35Copyright 2018 by Data Blueprint Slide #
deviantart.com
• All organizations
have architectures
– Some are better
understood and
documented (and
therefore more
useful to the
organization) than
others
Engineering
Architecture
!36Copyright 2018 by Data Blueprint Slide #
Engineering/Architecting
Relationship
• Architecting is used to
create and build systems too
complex to be treated by
engineering analysis alone
– Require technical details as the
exception
• Engineers develop the
technical designs for
implementation
– Engineering/Crafts-persons
deliver work product
components supervised by:
• Manufacturer
• Building Contractor
17. You cannot architect after implementation!
!37Copyright 2018 by Data Blueprint Slide #
USS Midway &
Pancakes
What is this?
• It is tall
• It has a clutch
• It was built in 1942
• It is still in regular use!
!38Copyright 2018 by Data Blueprint Slide #
18. !39Copyright 2018 by Data Blueprint Slide #
Definition of Bed
Q: What is the proper relationship for these entities?
!40Copyright 2018 by Data Blueprint Slide #
ROOMBED
19. Bed Room
Data Maps - Entity Level
!41Copyright 2018 by Data Blueprint Slide #
Bed Room
BEDS are related to ROOMS
More precision:
many BEDS are related to many ROOMS
Bed Room
Better information:
many BEDS may be contained in each ROOM and each room may contain many beds
What if beds can
be moved?
Possible Entity Relationships
!42Copyright 2018 by Data Blueprint Slide #
Eventually One or Many (optional)
Eventually One (optional)
Exactly One (mandatory)
Zero, or Many (optional)
One or Many (mandatory)
20. Families of Modeling Notation Variants
!43Copyright 2018 by Data Blueprint Slide #
Eventually One, More
Eventually One
Exactly One
Zero, or More
One or More
Information Engineering
Pick one!
What is a Relationship?
• Natural associations between two or more entities
!44Copyright 2018 by Data Blueprint Slide #
21. You cannot architect after implementation!
!37Copyright 2018 by Data Blueprint Slide #
USS Midway &
Pancakes
What is this?
• It is tall
• It has a clutch
• It was built in 1942
• It is still in regular use!
!38Copyright 2018 by Data Blueprint Slide #
22. Standard definition reporting does not provide conceptual context
!47Copyright 2018 by Data Blueprint Slide #
BED
Something you sleep in
Purpose statement incorporates motivations
Entity: BED
Data Asset Type: Principal Data Entity
Purpose: This is a substructure within the Room
substructure of the Facility Location. It contains
information about beds within rooms.
Source: Maintenance Manual for File and Table
Data (Software Version 3.0, Release 3.1)
Attributes: Bed.Description
Bed.Status
Bed.Sex.To.Be.Assigned
Bed.Reserve.Reason
Associations: >0-+ Room
Status: Validated
!48Copyright 2018 by Data Blueprint Slide #
Draft
A purpose statement describing
– Why the organization is maintaining information about this business concept;
– Sources of information about it;
– A partial list of the attributes or characteristics of the entity; and
– Associations with other data items(read as "One room contains zero or many beds.")
23. ANSI-SPARC 3-Layer Schema
1. Conceptual - Allows independent
customized user views:
– Each should be able to access the same
data, but have a different customized view
of the data.
2. Logical - This hides the physical
storage details from users:
– Users should not have to deal with
physical database storage details. They
should be allowed to work with the data
itself, without concern for how it is
physically stored.
3. Physical - The database administrator
should be able to change the database
storage structures without affecting the
users’ views:
– Changes to the structure of an
organization's data will be required. The
internal structure of the database should
be unaffected by changes to the physical
aspects of the storage.
!49Copyright 2018 by Data Blueprint Slide #
For example, a changeover to a new
DBMS technology. The database
administrator should be able to change
the conceptual or global structure of the
database without affecting the users.
!50Copyright 2018 by Data Blueprint Slide #
Data Architecture Versus Data Modeling
!X
• Data Maps->Models
– Why do we need them?
– How are they be used?
– Challenges (social, political,
economic)
• Architecture/Engineering
– Two sides of the same coin
– Must operate on standard,
shared data of known quality
• View from the Top
– Means: Forward engineering
– Goal: Composition/Building
• View from the Bottom
– Means: Reverse engineering
– Goal: Understanding
• Working Together
– Functions required for
effective data management
– Need for simplicity
• Take Aways/Q&A
24. As Is Requirements
Assets WHAT?
As Is Design Assets
HOW?
As Is Implementation
Assets AS BUILT
Forward Engineering
!51Copyright 2018 by Data Blueprint Slide #
New
Building new stuff - in this case, new databases
!52Copyright 2018 by Data Blueprint Slide #
Data Architecture Versus Data Modeling
!X
• Data Maps->Models
– Why do we need them?
– How are they be used?
– Challenges (social, political,
economic)
• Architecture/Engineering
– Two sides of the same coin
– Must operate on standard,
shared data of known quality
• View from the Top
– Means: Forward engineering
– Goal: Composition/Building
• View from the Bottom
– Means: Reverse engineering
– Goal: Understanding
• Working Together
– Functions required for
effective data management
– Need for simplicity
• Take Aways/Q&A
25. As Is Requirements
Assets WHAT?
As Is Design Assets
HOW?
As Is Implementation
Assets AS BUILT
Existing
Reverse Engineering
!53Copyright 2018 by Data Blueprint Slide #
A structured technique aimed at recovering rigorous knowledge
of the existing system to leverage enhancement efforts
[Chikofsky & Cross 1990]
!54Copyright 2018 by Data Blueprint Slide #
Data Architecture Versus Data Modeling
!X
• Data Maps->Models
– Why do we need them?
– How are they be used?
– Challenges (social, political,
economic)
• Architecture/Engineering
– Two sides of the same coin
– Must operate on standard,
shared data of known quality
• View from the Top
– Means: Forward engineering
– Goal: Composition/Building
• View from the Bottom
– Means: Reverse engineering
– Goal: Understanding
• Working Together
– Functions required for
effective data management
– Need for simplicity
• Take Aways/Q&A
26. As Is Requirements
Assets WHAT?
As Is Design Assets
HOW?
As Is Implementation
Assets AS BUILT
ExistingNew
Reengineering
Reverse Engineering
Forward engineering
Reimplement
To Be
Implementation
Assets
To Be
Design
Assets
To Be Requirements
Assets
!55Copyright 2018 by Data Blueprint Slide #
• First, reverse engineering the existing system to understand its strengths/weaknesses
• Next, use this information to inform the design of the new system
Data Modeling Process
1. Identify entities
2. Identify key for each
entity
3. Draw rough draft of
entity relationship
data model
4. Identify data
attributes
5. Map data attributes
to entities
!56Copyright 2018 by Data Blueprint Slide #
27. Models Evolution is good, at first ...
1. Identify entities
2. Identify key for each
entity
3. Draw rough draft of
entity relationship
data model
4. Identify data
attributes
5. Map data attributes
to entities
!57Copyright 2018 by Data Blueprint Slide #
Preliminary
activities Modeling
cycles
Wrapup
activities
Evidence
collection &
analysis
Project
coordination
requirements
Target
system
analysis
Modeling
cycle
focus
Activity
Refinement
Collection
Analysis
Validation
Declining coordination requirements
Increasing amounts of targetsystem analysis
Preliminary
activities Modeling
cycles
Wrapup
activities
Evidence
collection &
analysis
Project
coordination
requirements
Target
system
analysis
Modeling
cycle
focus
Activity
Refinement
Collection
Analysis
Validation
Declining coordination requirements
Increasing amounts of targetsystem analysis
Preliminary
activities Modeling
cycles
Wrapup
activities
Evidence
collection &
analysis
Project
coordination
requirements
Target
system
analysis
Modeling
cycle
focus
Activity
Refinement
Collection
Analysis
Validation
Declining coordination requirements
Increasing amounts of targetsystem analysis
Preliminary
activities Modeling
cycles
Wrapup
activities
Evidence
collection &
analysis
Project
coordination
requirements
Target
system
analysis
Modeling
cycle
focus
Activity
Refinement
Collection
Analysis
Validation
Declining coordination requirements
Increasing amounts of targetsystem analysis
Relative use of time allocated to tasks during Modeling
Preliminary
activities Modeling
cycles
Wrapup
activities
Evidence
collection &
analysis
Project
coordination
requirements
Target
system
analysis
Modeling
cycle
focus
Activity
Refinement
Collection
Analysis
Validation
Declining coordination requirements
Increasing amounts of targetsystem analysis
!58Copyright 2018 by Data Blueprint Slide #
28. It’s your turn!
Use the chat
feature or Twitter
(#dataed) to submit
your questions now!
Questions?
+ =
!59Copyright 2018 by Data Blueprint Slide #
Upcoming Events
December Webinar:
The Seven Deadly Data Sins
December 11, 2018 @ 2:00 PM ET
January Webinar:
Data Strategy-Best Practices
January 8, 2019 @ 2:00 PM ET
Enterprise Data World
How I Learned to Stop Worrying & Love My Data Warehouse
Sunday, 3/17/2019 @ 1:30 PM ET
Data Management Brain Drain
Thursday, 3/21/2019 @ 8:30 AM ET
Sign up for webinars at: www.datablueprint.com/webinar-schedule
or at www.dataversity.net
!60Copyright 2018 by Data Blueprint Slide #
Brought to you by:
29. 10124 W. Broad Street, Suite C
Glen Allen, Virginia 23060
804.521.4056
Copyright 2018 by Data Blueprint Slide # !61