We are bombarded with stories of the latest products to hit the market – products that will change everything we do. This causes us to focus on the latest technology, building IT for the sake of building IT. Meanwhile, the world still seems to run on Excel.
The “big innovators” who have and use unimaginably large amounts of data are not the norm. Aspiring to use the same complex technologies and patterns they do leads to poor investments and tradeoffs. This is an age-old problem rooted in the over-emphasis of technology as the agent of change. Technology isn’t the answer – it’s the platform on which people build answers.
To emphasize technology is to ignore the way tools change people and practices. The design focus in our market was on storing and making data accessible. If we want to make progress then we need to step back from the details and look at data from the perspective of the organization. Our design focus shifts to people learning and applying new insights, asking questions about how an organization can be more resilient, more efficient, or faster to sense and respond to changing conditions.
In this talk you will learn how to put your data architecture into a human frame of reference. Drawing inspiration from the history of technology and urban planning, we will see that the services provided by the things we build are what drive success, not the latest shiny distraction.
6. Copyright Third Nature, Inc.
"Those who cannot remember
the past are condemned to
repeat it.”
George Santayana
If there’s one lesson we can take from history, It’s that nobody
learns any lessons from history.
7. Copyright Third Nature, Inc.
Online
Realtime
For decision making
Today we’ll call it streaming
8. Copyright Third Nature, Inc.
Technology patterns
New “Data Bases” ™
Storage virtualization
Separation of storage and
compute
9. Copyright Third Nature, Inc.
Technology patterns
New “Data Bases” ™
Storage virtualization
Separation of storage and
compute
What year was it?
10. Copyright Third Nature, Inc.
Technology patterns
New “Data Bases” ™
Storage virtualization
Separation of storage and
compute
Welcome to 1975
11. Copyright Third Nature, Inc.
BETTER
is
New
Our core beliefs in software are based
on this. Progress is not a promise.
12. Copyright Third Nature, Inc.
BETTER ?
is
New
This is fundamentally a belief in leading
with technology to solve problems…
26. Copyright Third Nature, Inc.
"Always design a thing by considering it in its next larger
context - a chair in a room, a room in a house, a house in
an environment, an environment in a city plan." – Eliel
Saarinen
27. Copyright Third Nature, Inc.
Order Entry
Order
Database
Customer
Service
Integration
Program
Inventory
Database
Distribution
Integration
Program
Receivables
Database
Accounts
Receivable
Data
Warehouse
Analysts &
users
This is the simplistic view people have of IT, if they see
even this level of detail
29. Copyright Third Nature, Inc.
Copyright Third Nature, Inc.
Monthly
Production plans
Weekly pre-
orders for
bulk cheese
Availability
confirmation
and location
In store system
Store
Stock
Management
Store EPOS
data
Category
Supervis
or
Stock
adjustments/
order
interventions
Order
adjustment
Stock/order
interventions
*
*
Orders
(based on 6
day
forecast)
Dallas
Distrib Centre
WMS
Picking/load
teams
Pos/Pick
lists/Load
sheets
Confirmed
Deliveries/
Confirmed
picks +
loads
Farmers
Milk intake/
silos Cheese plant
Plant
Processor
In-house Cheese
store
Contract Cheese
store
Processor
Packing plant
Processor
National
Distribution
Centre
Retailer
RDC
Retailer Stores
(550)
Retailer HQ
Consolidated
Demand
Ordering
Processor NDC
Customer
Services
Daily order -
SKU/Depot/
Vol
Sent @ 12.30-13:00
Delivery
orders
Processor HQ
Sales
Team/
Account
Manager
Processor HQ
Forecasting
Team
Processor HQ
Bulk Planning
Team
Cheese plant
Planner/Stock
office
Processor HQ
Milk
Purchasing
Team
Cheese plant
Transport
Manager
Actual
daily
delivery
figures
Daily
collection
planning
Weekly order for delivery to
Packing plant
Daily &
weekly Call-
off
Daily Call-off
15/day
22 pallet loads 15/day
A80
Shortages/
Allocation
instructions
Annual
Buying plan
Milk Availability
Forecast
Annual
prediction
of milk
production
Shortages/
Allocation
instructions
Daily milk
intake
Weekly milk
shortages
shortages
Spot mkt or
Processor
ingredients
Packing plant
Planning
Team
Processor HQ
JBA Invoicing
and
Sales Monitor
FGI and Last 5
weeks sales
Expedite
Changes
to existing
forecast -
exceptions
Retailer HQ
Retailer Buyer
Meeting
every 6
weeks
Packing plant
Cheese
ordering
10 day stock
plan
On line
stock info
7 day order
plan for bulk
cheese
Arrange
daily
delivery
schedule
Emergency
call-off
Daily
optimisation
of loads
Service
Monitor
Despatch and
delivery
confirmations
Processor NDC
Transport
Planning
Transport
Plan
Processor NDC
Inventory
Monitoring
Stock and
delivery
monitoring
Processor NDC
Warehouse
management
syatem
Operation
Instructions
Key
Shaded Boxes = Product flow system
Un-shaded boxes = Information flow system
Retailer
Cheese Processor
Farms
Schedule
weekly &
Daily
10 Day
plan(wed) and
daily plan
15/day
Changes
to existing
forecast -
exceptions
Stock
availability
Monthly
review
Annual
f/cast
Source: IGD Food Chain Centre, February 2008
All companies operate in the context of an industry. The external data
interchanges and market signals are today as important as the internal
data, for both strategic and operational decision making.
Gray = companies in value chain
Red = information flows and systems
30. Copyright Third Nature, Inc.
Copyright Third Nature, Inc.
Monthly
Production plans
Weekly pre-
orders for
bulk cheese
Availability
confirmation
and location
In store system
Store
Stock
Management
Store EPOS
data
Category
Supervis
or
Stock
adjustments/
order
interventions
Order
adjustment
Stock/order
interventions
*
*
Orders
(based on 6
day
forecast)
Dallas
Distrib Centre
WMS
Picking/load
teams
Pos/Pick
lists/Load
sheets
Confirmed
Deliveries/
Confirmed
picks +
loads
Farmers
Milk intake/
silos Cheese plant
Plant
Processor
In-house Cheese
store
Contract Cheese
store
Processor
Packing plant
Processor
National
Distribution
Centre
Retailer
RDC
Retailer Stores
(550)
Retailer HQ
Consolidated
Demand
Ordering
Processor NDC
Customer
Services
Daily order -
SKU/Depot/
Vol
Sent @ 12.30-13:00
Delivery
orders
Processor HQ
Sales
Team/
Account
Manager
Processor HQ
Forecasting
Team
Processor HQ
Bulk Planning
Team
Cheese plant
Planner/Stock
office
Processor HQ
Milk
Purchasing
Team
Cheese plant
Transport
Manager
Actual
daily
delivery
figures
Daily
collection
planning
Weekly order for delivery to
Packing plant
Daily &
weekly Call-
off
Daily Call-off
15/day
22 pallet loads 15/day
A80
Shortages/
Allocation
instructions
Annual
Buying plan
Milk Availability
Forecast
Annual
prediction
of milk
production
Shortages/
Allocation
instructions
Daily milk
intake
Weekly milk
shortages
shortages
Spot mkt or
Processor
ingredients
Packing plant
Planning
Team
Processor HQ
JBA Invoicing
and
Sales Monitor
FGI and Last 5
weeks sales
Expedite
Changes
to existing
forecast -
exceptions
Retailer HQ
Retailer Buyer
Meeting
every 6
weeks
Packing plant
Cheese
ordering
10 day stock
plan
On line
stock info
7 day order
plan for bulk
cheese
Arrange
daily
delivery
schedule
Emergency
call-off
Daily
optimisation
of loads
Service
Monitor
Despatch and
delivery
confirmations
Processor NDC
Transport
Planning
Transport
Plan
Processor NDC
Inventory
Monitoring
Stock and
delivery
monitoring
Processor NDC
Warehouse
management
syatem
Operation
Instructions
Key
Shaded Boxes = Product flow system
Un-shaded boxes = Information flow system
Retailer
Cheese Processor
Farms
Schedule
weekly &
Daily
10 Day
plan(wed) and
daily plan
15/day
Changes
to existing
forecast -
exceptions
Stock
availability
Monthly
review
Annual
f/cast
The real data context of the
organization that is assembled
by the data platforms is
subsets of all of these systems.
The complexity of a DW is a
function of the complexity of
the organization and all the
integration points.
There’s more to it than just the
systems and technologies…
Data
Warehouse
31. Data is transformed, cleaned, integrated, and new data is
derived. This adds a level of temporal and semantic
complexity to data management, and it’s always hidden.
Machine learning won’t “solve” data integration. It will
help in some areas, mostly with augmenting simpler tasks.
Data flows – the dark matter of your architecture diagrams
33. This is a map of one
organization’s analytic data,
showing the dataset
complexity inherent in a mid-
sized organization.
Different views of
data complexity
34. Data complexity is not just
based on the number of
datasets, or the number of
tables.
It is based on the number of
connections. This is an order
of magnitude higher than
number of objects.
Organizational complexity
drives communication
complexity drives data
complexity.
Different views of
data complexity
35. This view is only
showing connections
between objects in
data sets based on
data relationships. All
these connections are
joins you must take
care with in a well
managed platform.
36. Different views of
data complexity
A reverse gravity view, showing
the mass of reused / replicated
information at the center and
the nodes where large
interchanges occur.
These different views show
how complex an organization’s
data really is, rather than the
abstract list of sources and
terabytes stored.
This is why managing data is a
difficult job
37. Different views –
data and use
The value of data is tied to its
use. This shows relationships
between people and data used.
This and the prior diagram
show an important point: 70%
of the data is used and reused
constantly. 30% of the data is
used by one or a few people,
often new data with
undetermined value.
This information can be used to
determine where and how you
should spend your limited
resources and money.
39. Copyright Third Nature, Inc.
The reality of data availability is that it can only be a subset
The rest of the data
is still here…
There will always be more data
available than ability to analyze
it. Some judgement must be
applied to sort the more from
the less important
40. Copyright Third Nature, Inc.
Copyright Third Nature, Inc.
Loosely
managed
data
User
managed
data
In an expanded ecosystem of data, curation processes are
needed to address quality, definition and structure
Closely
managed
data
High quality,
well-known
Directional
quality
Unknown / low
quality
Curation is directly attached to your data architecture...
42. Copyright Third Nature, Inc.
The real purpose of this work is not to help IT be more
productive. IT exists to help users be more productive
Starting with technology is like getting excited by a new chisel.
54. Copyright Third Nature, Inc.
Value is not in the product, it’s in the practice
The poor carpenter blames his tools
55. You are a designer. You need to think like one.
“Everyone designs who devises
courses of action aimed at changing
existing situations into preferred
ones.” ~ Herbert Simon
58. Copyright Third Nature, Inc.
Copyright Third Nature, Inc.
Design tip: any time you deny a behavior or a request, ask yourself “how
will they do this on their own? What do they do instead?”
Bad policy causes more problems than bad technology
59. Copyright Third Nature, Inc.
Shape the architecture for the people, don’t shape try to shape people
60. Copyright Third Nature, Inc.
Copyright Third Nature, Inc.
Data should be governed by policy, e.g. zoning
http://welcometocup.org/file_columns/0000/0530/cup-whatiszoning-guidebook.pdf
61. We need to do today what we were doing 30 years ago
We spent time then to understand the users, what they wanted,
the needs, and found ways to justify the work to meet those needs.
We don’t
do enough
of this
We over-
emphasize
this
63. The primary focus should be on goals, specifically of the users
“The engineer, and more generally
the designer, is concerned with how
things ought to be - how they ought
to be in order to attain goals, and to
function.” ~ Herbert Simon
68. Starting with technology is starting in the solution space, not the problem space
https://indiyoung.com/about-problem-space/
69. Analysis and data science workflows are generally poorly understood
An analyst trying to answer a question has highs and lows along their workflow.
The environment is defined by independent, often mismatched tools, some fit for
purpose and others not, with no single product capable of meeting their needs.
Each usage model has several of these maps tied to different roles
Where is
data? Can I
access new
data?
Why does IT
have to be
involved?
Green = solved
Yellow = gap, poss opportunity
Red = obstacle, opportunity
Why can’t I
store data
I’m working
on?
How do I link
new data to
existing
data?
How do I
share
information
with others?
70. Copyright Third Nature, Inc.
User goals: more than accessing the data
Explore and
Understand
Inform
and
Explain
Convince
and
Decice
Deliver
Process
71. Copyright Third Nature, Inc.
The real design criteria:
context and point of use
Information use is diverse and varies
based on context:
▪ Get a quick answer
▪ Solve a one-off problem
▪ Analyze causes
▪ Do experiments
▪ Make repetitive decisions
▪ Use data in routine processes
▪ Make complex decisions
▪ Choose a course of action
▪ Convince others to take action
One size doesn’t fit all.
72. Copyright Third Nature, Inc.
Data architecture requires understanding data use so we
can build the right infrastructure
Monitor
Analyze
Exceptions
Analyze
Causes
Decide Act
No problem No idea Do nothing
Understanding the details of uses, workflows, tasks, and
activities allows us to look at the higher organizational level again
Copyright Third Nature, Inc.
73. Copyright Third Nature, Inc.
This is part of a larger system. Feedback loops exist and
operate at different frequencies.
Collect
new data
Monitor
Analyze
Exceptions
Analyze
Causes
Decide Act
Act on the process
Act within the process
74. Copyright Third Nature, Inc.
Data platforms are the most complex in the organization, far
more complex than any web application or ERP system.
75. Copyright Third Nature, Inc.
Manage your data
(or it will manage you)
Data management is where
developers are weakest.
Modern engineering practices
are where data management is
weakest.
Users care about their tasks.
You need to bridge these
groups and practices in the
organization if you want to do
meaningful work with data.
77. Copyright Third Nature, Inc.
Further Reading
Thinking in Systems, Donella Meadows
An Introduction to Systems Thinking, Gerald Weinberg
Contextual Design, Beyer & Holtzblatt
Badass: making Users Awesome, Kathy Sierra
Information Design, Jacobsen
Data: A Guide to Humans, Phil Harvey
In Search of Certainty, Burgess
https://indiyoung.com/about-problem-space/
http://welcometocup.org/file_columns/0000/0530/cup-whatiszoning-guidebook.pdf
78. Copyright Third Nature, Inc.
CC Image Attributions
Thanks to the people who supplied the creative commons licensed images used in this presentation:
well town hall.jpg - http://flickr.com/photos/tuinkabouter/1135560976/
seattle library 1 - http://www.flickr.com/photos/thomashawk/2671536366/
chicken_head2.jpg - http://www.flickr.com/photos/coycholla/4901760905
egg_face1.jpg - http://www.flickr.com/photos/sally_monster/3228248457
indonesian angry mask phone - Erik De Castro Reuters.jpg