Big Data Expo 2015 - Barnsten Why Data Modelling is Essential

Why Data Modelling is
Essential in a Big Data
Environment
Mark Barringer

Introduction
• My name is Mark Barringer
(mark.Barringer@embarcadero.com)
• Product Manager (Data Architecture Tools) Embarcadero
• I live in the beautiful city of Winchester in southern England
2

Agenda and Objectives…
3
Data Challenges
What are the modern day challenges for the Data
Architect
Big Data
Landscape
Adding Value to
the Business
Data Modelling
Techniques
How to understand and view the Big Data
Landscape
Making the Big Data Landscape transparent to the
business and how to add real value
A look at some of the latest Data Modelling Tips and
Tricks applied to the Big Data Environment

Challenges facing Data Architecture
4
Federation
Data Democratisation
Platform Fragmentation
Data Lineage
Latency
Delivery
Obfuscation

5
Federation: The application of a single view over multiple repositories.
Data Democratisation
Data Lineage
Latency
Delivery
Obfuscation

6
Data Democratisation: The expectation of the business to have more control over the data
assets.
Data Lineage
Latency
Delivery
Obfuscation

7
assets.
Platform Fragmentation: The proliferation of non-RDBMS solutions to store data.
Data Lineage
Latency
Delivery
Obfuscation

8
assets.
Data Lineage: Expectation to understand actions performed on the data.
Latency
Delivery
Obfuscation

9
assets.
Latency: The trend towards lower end-to end latency of data (from creation to reporting)
Delivery
Obfuscation

10
assets.
Delivery: Model development in step with development teams.
Obfuscation

11
assets.
Delivery: Model development in step with development teams.
Obfuscation: Expectation to view and understand data models by the business.

Why Data Modelling is Essential…
12
Modelling
the Business
Understand
the
Landscape
Self
Documenting
Business
Heterogeneous
& Big Data
Physical models
Data Modelling
Techniques
Agile
Development

13
Modelling
the Business
Understand
the
Landscape
Self
documenting
Business
Heterogeneous
& Big Data
Physical models
Data Modelling
Techniques
Agile
Development
Business Modelling
• Meaningful abstracted view
of the business
• Data-centric perspective
• 'Anchor point' for other
models
• Key to successful
communication
• Develop credibility and
relevance with the business
• Establish Business Glossaries
with consistent definitions
• Build a solid foundation for
Compliance, Data Governance
and Master Data
Management
• Improve visibility and
collaboration with ER/Studio

Connecting Business Information and IT
Data
14

Who are the Data customers and collaborators?
15
DA
Technical Collaborators
(DBA, ETL, SA)
(MetaData Consumers)
Data Analysts
(Finance, Credit, Mktg.)
(Data Consumers)
Business Users
(Information Producers)

Benefit of Relating Metadata to Models
• Expand the depth of information
by accessing the underlying
framework
16
• Models and terms seamlessly integrate to
one another

17
Modelling
the business
Understand
the
Landscape
Self
documenting
Business
Heterogeneous
& Big Data
Physical models
Data Modelling
Techniques
Agile
Development
Understand the Landscape
• Create Landscape Inventories
• Reverse Engineer
• Overcome Information
Obscurity
• Eliminate Data Silos
• Contain much of the detailed
meta data
• Useful for impact and gap
analysis
• Platform agnostic
• Map to concepts in the
Conceptual Data Model and
physical objects

Enterprise Complexity – Information Obscurity

Big Data & NoSQL: The Challenge
• Capture new data sources and increase
information management footprint
• Understanding semi- and un-structured data
• “Raw Data is the single source of the truth”
• 7 Vs
• Velocity, Volume, Variety,
• Veracity (conformity of facts – Data in doubt)
• Variable, Virtual, Value
• Reverse & Forward Engineering (JSON, BSON)
• Forward & Reverse Engineer DDL
• “We will hopefully find what we didn’t know about
that we didn’t know that we didn’t know about”

Eliminate Data Silos: Inventory existing databases
• What type of data do you own and where can it be found.
• Map your data landscape using data models as the foundation.
– Each model represents a different database system
– Link like data elements together for traceability
20

21
Modelling
the Business
Understand
the
Landscape
Self
documenting
Business
Heterogeneous
& Big Data
Physical models
Data Modelling
Techniques
Agile
Development
Physical Modelling for Big Data
• Accurately model all types of
data at rest within the
organization.
• Not just RDBMS resident data
• Document physical meta-data
(table space, partitions, etc)
• Introduce non-RDBMS data
stores e.g. NOSQL, JSON, HIVE
• Build many physicals based
on business decomposition
• Reverse Engineering

Traditional (RDBMS) Prescriptive Data Modelling
MODEL
(and
DESIGN)
LOAD
EXPLORE/
QUERY
DATA EXPLORE
‘Schema on write’
Good for
Known
Unknowns
(Repetition)

Big Data (NoSQL/Hadoop) Descriptive Data Modelling
LOAD QUERY MODEL
NoSQL STORE
EXPLORE
But Fast and Agile!
‘Schema on read’
Good for
Unknown
Unknowns
(Exploration)

Data modeling in Big Data
Customer
-
-
-
NoSQL DATABASE
Documents
-
-
-
Product
-
-
-
Key values
-
-
-
Conceptual/
business data model
Understanding
Logical/physical
data model
Architecture/Design
RELATIONAL
DATABASE
(i.e., Data
warehouse/data mart)
May transfer into structured database
(using models)

25
Modelling
the Business
Understand
the
Landscape
Self
documenting
Business
Heterogeneous
& Big Data
Physical models
Data Modelling
Techniques
Agile
Development
Timely Design
• Ensuring changes to the
physical data models are in
step with and relevant to the
development methodology
used in the organization.
• Where modelling meets
development.
• Create credibility and
relevance with the
development teams
• User Stories, Tasks and
Change Management

Agile Change Management
• Enable “Agile Data Modeler”
– Incremental rather than waterfall
• Need more granularity than named
versions of a model or submodel
• Change numbers at Repository check in
• Can be associated to user stories, tasks
26

27
Modelling
the Business
Understand
the
Landscape
Self
documenting
Business
Heterogeneous
Physical models
Data Modelling
Techniques
Agile
Development
Data Modelling Techniques
• Sub-Modelling
• Business Decomposition
• Visual Data Lineage
• Impact Analysis / Where Used
• Naming Standards
• Data Source Mapping
• Universal Mapping
• Augmented metadata
• Glossary Integration

Big Data Notation Enhancement
• Physical Model
– Objects instead of
tables
• Nested Objects
– “is contained in”
relationship type
28

Containment Relationship: Array of Nested Objects
29
db.patron.insert(
{
"_id" :
ObjectId("5367ddc4228cd006ab2bc60c"),
name: "Joe Bookreader",
address: [
{
street: "123 Fake Street",
city: "Faketon",
state: "MA"
},
{
street: "1 Someother Street",
zip: "12345"
} ]
})
db.book.insert(
{
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher_id: ObjectId("5367dd99228cd006ab2bc60b"),
available: 3,
checkout: [ { by: ObjectId("5367ddc4228cd006ab2bc60c"), date: ISODate("2012-10-15") } ]
},
{
title: "50 Tips and Tricks for MongoDB Developer",
author: [ "Kristina Chodorow" ],
published_date: ISODate("2011-05-06"),
pages: 68,
language: "English",
publisher_id: ObjectId("5367dd99228cd006ab2bc60b")
})

Hive & ER/Studio
30
Understanding the Big Data Schema

Technique: Attachments (Metadata extensions)

Technique: Data Source Mapping
32

Technique: Automated Naming Standards
Real-time update while typing
33

Technique: Glossary Integration
• Associate Data Architect objects to Business glossary terms
– Model, submodel
– Entity, Table
– Attribute, Column
– Domain
– View
• Push terms to glossary
34

Why Data Modelling is essential…
35
Modelling
the Business
Understand
the
Landscape
Self
documenting
Business
Heterogeneous
& Big Data
Physical models
Data Modelling
Techniques
Agile
Development
Business metadata
• Provide the business with the
ability to centrally manage its
own meta data in terms of
definitions, rules and
relationships in a structured
and curated manner.
• Facilitate the binding of the
business elements to
technical elements within the
models and other
documentation.
• Data Dictionary
• Self-Service Discovery and
Reporting

Providing Business Context
A taxonomy of searchable terms mapped to unique concepts
R&D
Entity
Business Term
Patient Recruitment
Data
Attribute
Business Term
Batch Supply Data
DatasourcePhysical Model
Column
Logical diagram
Table
Clinical
Discussion
threads
Conceptual &
Process Diagrams
Glossaries
Supply Chain

Why Data Modelling is essential…
37
Modelling
the Business
Understand
the
Landscape
Self
documenting
Business
Heterogeneous
& Big data
Physical models
Data Modeling
Techniques
Agile
Development
Holistic, integrated
modelling that can
present the same meta
data to different
audiences in the most
appropriate format.
The single most important
challenge to overcome is
that of communication and
collaboration and to Build
Trust in Data.
Without the ability to
communicate effectively to a
wide variety of audiences even
the most diligently documented
organisation will be unable to
benefit from it.

Embarcadero Enterprise Database Tools

Win a
FitBit Charge HR
Leave a
Business Card
at the
Barnsten /
Embarcadero
Stand
Raymond Horsten
(r.horsten@barnsten.com)
Mark Barringer
(mark.barringer@embarcadero.com)

Questions and Answers
Raymond Horsten (r.horsten@barnsten.com)
Mark Barringer (mark.barringer@embarcadero.com)
40

Building Trust in Data : Collaboration
Syndication
Governance and Collaboration
Technical
Metadata
Business
Metadata
Metadata
Repository
Data Modeling Team Server Web
Architecture Business
SDLC &
Information
Management
Integrated
Tooling
Enterprise
Data

Big Data Expo 2015 - Barnsten Why Data Modelling is Essential

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (19)

Ähnlich wie Big Data Expo 2015 - Barnsten Why Data Modelling is Essential

Ähnlich wie Big Data Expo 2015 - Barnsten Why Data Modelling is Essential (20)

Mehr von BigDataExpo

Mehr von BigDataExpo (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Big Data Expo 2015 - Barnsten Why Data Modelling is Essential

Hinweis der Redaktion