SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Downloaden Sie, um offline zu lesen
Data
Preparation
Fundamentals
© Copyright 2022 by Peter Aiken Slide # 1
peter.aiken@anythingawesome.com +1.804.382.5957 Peter Aiken, PhD
0%
50%
100%
Data Analysis Data Preparation
0%
0% ?
Peter Aiken, Ph.D.
• I've been doing this a long time
• My work is recognized as useful
• Associate Professor of IS (vcu.edu)
• Institute for Defense Analyses (ida.org)
• DAMA International (dama.org)
• MIT CDO Society (iscdo.org)
• Anything Awesome (anythingawesome.com)
• Experienced w/ 500+ data
management practices worldwide
• Multi-year immersions
– US DoD (DISA/Army/Marines/DLA)
– Nokia
– Deutsche Bank
– Wells Fargo
– Walmart
– HUD …
• 12 books and
dozens of articles
© Copyright 2022 by Peter Aiken Slide # 2
https://anythingawesome.com
+
• DAMA International President 2009-2013/2018/2020
• DAMA International Achievement Award 2001
(with Dr. E. F. "Ted" Codd
• DAMA International Community Award 2005
Your
Sponsor
Today
Quest
Software
Solutions Consultant
Information Management Specialist
Quest Software
Gary Jerep
Where Next Meets Now.
Foglight
SharePlex
Kace
Toad
QoreStor
NetVault
erwin
Binary Tree
Change Auditor
Migration Manager
Metalogix
Quadrotech
SCALE & STREAMLINE
IT OPERATIONS
Migrate faster, strengthen
cyber security resilience
and stay in control to keep your
business running
Identity Manager
Safeguard
Active Roles
Quest: Helping Customers Achieve True IT Resilience NOW
IDENTITY
DATA EMPOWERMENT
& GOVERNANCE
Empower your business with the
visibility and context to better
manage and develop data pipelines
that deliver faster insights, while
safeguarding the data
and infrastructure
Information
& System
Management
One
Identity
Microsoft
Platform
Management
HARDENED CYBER
SECURITY
Using an identity-focused,
cloud-first, customer-centric approach
from the Cloud to the edge
making Zero Trust a reality now
Monitoring
Sensitive Data
Identification
SQL Tuning &
Optimization
Development
Administration
Backup
DevOps
Upgrades
Migration
Load Balancing
Diagnostics
Data Modeling
Systems Monitoring &
Diagnostics
Inventory &
Asset Management
Policy &
Access Management
Secondary Storage &
Cost Optimization
Backup & Recovery
Software Compliance
Helpdesk
Cloud Cost Optimization
erwin Data
Modeler
erwin Data
Catalog
erwin
Evolve
erwin Data
Literacy
erwin Data
Intelligence
Active Metadata
Management
Business Process
Modeling
Data
Stewardship
Regulatory
Compliance
Data Catalog
Data Lineage
Enterprise
Architecture
Data Architecture
Data Profiling
Impact Analysis
RapidRecovery®
Data
Protection
Data
Operations
Data
Governance Application
Modernization
Cloud Migration
Empower NoSQL
Sensitive Data Protection
Model-Driven
DevOps
SLA Performance
Cyber
Resilience
The Quest Data Empowerment Platform
Where Next Meets Now.
®
A Few Resources!
• www.Quest.com/Solutions/Data-
empowerment
• eBook: The Four Roadblocks to Data
Preparation
• Toad Data Point Case Study:
Philadelphia Youth Network
• eBook: Enabling Agile Database
Development with Toad
• Tech Brief: Keep Using Toad for Oracle
with Databases in the Cloud
• Tech Brief: Accelerate and Secure your
SQL Server DevOps CI/CD Pipelines
Current approaches are not and have not been working
© Copyright 2022 by Peter Aiken Slide # 3
https://anythingawesome.com
Driving Innovation with Data
Competing on data and analytics
Managing data as a business asset
Created a data-driven organization
Forged a data culture
25% 50% 75% 100%
24%
24%
39%
41%
56%
Yes No
Source: Big Data and AI Executive Survey 2022 by Randy Bean & Thomas Davenport @ www.newvantage.com
2020
0%
25%
50%
75%
100%
technology people/process
90%
10%
80% of data challenges are people/process based!
© Copyright 2022 by Peter Aiken Slide # 4
https://anythingawesome.com
0%
25%
50%
75%
100%
Data Analysis Data Preparation
20%
80%
Everyone wants to do better data analysis …
• Some data preparation is inevitable
• What would a 'good' ratio be?
• "Everyone knows"
© Copyright 2022 by Peter Aiken Slide # 5
https://anythingawesome.com
1. 80% of your data is redundant, trivial, or obsolete
2. 80% of your data is of unknown quality
3. 80% of your data is 'standards free'
4. Your highly paid data analytics capabilities spend
80% of their time working under these conditions
Pareto Data Realities
• IT thinks data is a business problem
– "If they can connect to the server, then my job is done!"
• The business thinks IT is managing data adequately
– "Who else would be taking care of it?"
Confusion as to data responsibility
© Copyright 2022 by Peter Aiken Slide # 6
https://anythingawesome.com
You must address data debt proactively
© Copyright 2022 by Peter Aiken Slide # 7
https://anythingawesome.com
https://www.merkleinc.com/blog/are-you-buried-alive-data-debt
https://johnladley.com/a-bit-more-on-data-debt/
https://uk.nttdataservices.com/en/blog/2020/february/how-to-get-rid-of-your-data-debt
Data debt:
• Slows progress
• Decreases quality
• Increases costs
• Presents greater risks
• Data debt
– The time and effort it will take to return your
shared data to a governed state from its
(likely) current state of ungoverned
• Getting back to zero
– Involves undoing existing stuff
– Likely new skills are required
Bad Data Decisions Spiral
© Copyright 2022 by Peter Aiken Slide # 8
https://anythingawesome.com
Bad data decisions
Poor organizational outcomes
Technical decision
makers are not data
knowledgable
Business decision
makers are not data
knowledgable
Poor treatment
of organizational
data assets
Poor
quality
data
© Copyright 2022 by Peter Aiken Slide #
https://anythingawesome.com
Program
Program
9
• Motivation
• Data Preparation Considerations
– No standard data curricula
– No standard audience
– Technology is a one-legged stool
• Data Problems are Different
– Dependence on high speed automation
– Hidden data factories sap resources
– Require a unified approach
• Reverse Engineering (Introducing Yourself to a Dataset)
– No measures (other than size)
– Hype Cycle
– Column Profiling
– Dependency Profiling
– Redundancy Profiling
• Take Aways/References/Q&A
0%
50%
100%
Data Analysis Data Preparation
80%
20%
Data is not broadly or widely understood
© Copyright 2022 by Peter Aiken Slide # 10
https://anythingawesome.com
adapted from: http://www.dailymirror.lk/print/opinion/editorial-we-need-to-become-channels-of-peace/172-27164
It is like a fan!
It is like a snake!
It is like a wall!
It is like a rope!
It is like a tree! Blind Persons and the Elephant
It is like a story!
It is like a dashboard!
It is like pipes!
It is like a warehouse!
It is like statistics!
© Copyright 2022 by Peter Aiken Slide # 11
https://anythingawesome.com
Unrefined
data management
definition
Sources
Uses
Data Management
© Copyright 2022 by Peter Aiken Slide # 12
https://anythingawesome.com
More refined
data management
definition
Sources
Uses
Reuse
Data Management
➜
➜
Governance & Ethical Use Framework
Specialized
Data
Skills
Collection
Evaluation
Engineering
Evolution
Access
Storage
Preparation
Data
Science
Delivery
Presentation
Story
Telling
Exploitation
Better still data management definition
© Copyright 2022 by Peter Aiken Slide # 13
https://anythingawesome.com
Sources
➜
Reuse
➜
Data Preparation 80% Data Exploitation/Analysis 20%
Formal Data Reuse Management
Data
Technologies by themselves, are a 1–Legged Stool
© Copyright 2022 by Peter Aiken Slide # 14
https://anythingawesome.com
Success Requires a 3–Legged Stool
© Copyright 2022 by Peter Aiken Slide # 15
https://anythingawesome.com
P
e
o
p
l
e
Process
T
e
c
h
n
o
l
o
g
y
But not just a stool–these are interdependent
© Copyright 2022 by Peter Aiken Slide # 16
https://anythingawesome.com
People
Process
Technology
© Copyright 2022 by Peter Aiken Slide # 17
https://anythingawesome.com
Success is identifying a winning combination
People
Process
Technology
Defining Data Technology Architecture
• Data technology is part of the overall technology architecture
• It is also often considered part of the enterprise’s data architecture
• Data technology architecture addresses 3 questions:
– What technologies are
standard/required/preferred/acceptable?
– Which technologies apply to which
purposes and circumstances?
– In a distributed environment, which
technologies exist where, and
how does data move from one node to another?
© Copyright 2022 by Peter Aiken Slide # 18
https://anythingawesome.com
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
• Managing data technology should follow
the same principles and standards for
managing any technology
• Leading reference model for technology
management is the Information Technology
Infrastructure Library (ITIL):
– http://www.itil-officialsite.com/home/home.asp
© Copyright 2022 by Peter Aiken Slide # 19
https://anythingawesome.com
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
Data Management Technologies
Understanding Data Technology Requirements
• Need to understand:
– How the technology works
– How it provides value in the
context of your organization
– Requirements of a data technology before
determining what technical solution to choose for a particular situation
• Suggested questions:
– What problem does this data technology mean to solve?
– What sets this data technology apart from others?
– Are there specific hardware/software/operating systems/storage/network/
connectivity requirements?
– Does this technology include data security functionality?
© Copyright 2022 by Peter Aiken Slide # 20
https://anythingawesome.com
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
© Copyright 2022 by Peter Aiken Slide #
https://anythingawesome.com
Program
Program
21
• Motivation
• Data Preparation Considerations
– No standard data curricula
– No standard audience
– Technology is a one-legged stool
• Data Problems are Different
– Dependence on high speed automation
– Hidden data factories sap resources
– Require a unified approach
• Reverse Engineering (Introducing Yourself to a Dataset)
– No measures (other than size)
– Hype Cycle
– Column Profiling
– Dependency Profiling
– Redundancy Profiling
• Take Aways/References/Q&A
0%
50%
100%
Data Analysis Data Preparation
80%
20%
© Copyright 2022 by Peter Aiken Slide # 22
https://anythingawesome.com
Data
Sandwich
Data supply
Data literacy
Standard data
Standard data
Leverage point - high performance automation
© Copyright 2022 by Peter Aiken Slide #
Data literacy
23
https://anythingawesome.com
Data supply
Leverage point - high performance automation
© Copyright 2022 by Peter Aiken Slide #
Standard data
Data supply
Data literacy
24
https://anythingawesome.com
Leverage point - high performance automation
© Copyright 2022 by Peter Aiken Slide #
This cannot happen without investments in
engineering and architecture!
25
https://anythingawesome.com
Data supply
Data literacy
Standard data
Quality engineering/
architecture work products
do not happen accidentally!
Quality data engineering/
architecture work products
do not happen accidentally!
Leverage point - high performance automation
© Copyright 2022 by Peter Aiken Slide #
This cannot happen without investments in
data engineering and architecture!
26
https://anythingawesome.com
Data supply
Data literacy
Standard data
Tacoma Narrows Bridge/Gallopin' Gertie
• Slender, elegant and graceful
• World's 3rd longest suspension span
• Opened on July 1st 1940
• Collapsed in a windstorm on November 7,1940
• "The most dramatic failure in
bridge engineering history"
• Changed forever how engineers
design suspension bridges leading
to safer spans today
© Copyright 2022 by Peter Aiken Slide # 27
https://anythingawesome.com
Hidden Data Factories
© Copyright 2022 by Peter Aiken Slide # 28
https://anythingawesome.com
https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year
Work products are
delivered to
Customers
Customers
Knowledge Workers
80% looking for stuff
20% doing useful work
Department B
1. Check A's work
2. Make any corrections
3. Complete B's work
4. Deliver to Department C
Department A
https://en.wikipedia.org/wiki/Theory_of_constraints
Department C
1. Check B's work
2. Make any corrections
3. Complete C's work
4. Deliver to Customer
5. Deal with consequences
© Copyright 2022 by Peter Aiken Slide # 29
https://anythingawesome.com
Hidden Data Factories
Poor data manifests as multifaceted organizational challenges
© Copyright 2022 by Peter Aiken Slide # 30
https://anythingawesome.com
Poor data manifests as multifaceted organizational challenges
© Copyright 2022 by Peter Aiken Slide # 31
https://anythingawesome.com
IT
System
Business
Challenge
Business
Process
Business
Challenge
IT
Process
Business
Challenge
Business
System
Business
Challenge
IT
Process
Business
Challenge
IT
System
Business
Challenge
Business
Process
Business
Challenge
Poor results
Root cause
analysis is
part of data
governance
Consistency Encourages Quality Analysis
© Copyright 2022 by Peter Aiken Slide # 32
https://anythingawesome.com
IT
System
Business
Challenge
Business
Process
Business
Challenge
IT
Process
Business
Challenge
Business
System
Business
Challenge
IT
Process
Business
Challenge
IT
System
Business
Challenge
Business
Process
Business
Challenge
Eliminating data debt
requires a team with
specialized skills
deployed to create a
repeatable process
and develop sustained
organizational
skillsets
© Copyright 2022 by Peter Aiken Slide #
https://anythingawesome.com
Program
Program
33
• Motivation
• Data Preparation Considerations
– No standard data curricula
– No standard audience
– Technology is a one-legged stool
• Data Problems are Different
– Dependence on high speed automation
– Hidden data factories sap resources
– Require a unified approach
• Reverse Engineering (Introducing Yourself to a Dataset)
– No measures (other than size)
– Hype Cycle
– Column Profiling
– Dependency Profiling
– Redundancy Profiling
• Take Aways/References/Q&A
0%
50%
100%
Data Analysis Data Preparation
80%
20%
As Is Requirements
Assets WHAT?
As Is Design Assets
HOW?
As Is Implementation
Assets AS BUILT
Forward Engineering
© Copyright 2022 by Peter Aiken Slide # 34
https://anythingawesome.com
As Is Requirements
Assets WHAT?
As Is Design Assets
HOW?
As Is Implementation
Assets AS BUILT
Existing
Reverse Engineering
© Copyright 2022 by Peter Aiken Slide # 35
https://anythingawesome.com
A structured technique aimed at recovering rigorous
knowledge of the existing system to leverage
enhancement efforts [Chikofsky & Cross 1990]
As Is Requirements
Assets WHAT?
As Is Design Assets
HOW?
As Is Implementation
Assets AS BUILT
Existing
New
Reengineering
Reverse Engineering
Forward engineering
Reimplement
To Be
Implementation
Assets
To Be
Design
Assets
To Be Requirements
Assets
• First, reverse engineering the existing system
to understand its strengths/weaknesses
• Next, use this information to inform the
design of the new system
© Copyright 2022 by Peter Aiken Slide # 36
https://anythingawesome.com
Data Preparation Tools & Vendor Hype
• CIOs/CDOs feel pressure
• Vendor/project promise auditing
• No understanding of hype cycle
© Copyright 2022 by Peter Aiken Slide # 37
https://anythingawesome.com
Who wrote this … ?
© Copyright 2022 by Peter Aiken Slide # 38
https://anythingawesome.com
• In considering any new subject,
• there is frequently a tendency
first to overrate what we find to
be already interesting or
remarkable, and
• secondly - by a sort of natural
reaction - to undervalue the true
state of the case.
– Lady Augusta Ada King,
(1815 – 1852)
Countess of Lovelace
– (aka) Ada Lovelace,
daughter of Lord Byron
– Publisher of the first
computing program
© Copyright 2022 by Peter Aiken Slide #
https://anythingawesome.com
Technology Trigger: A potential technology breakthrough kicks things off. Early proof-of-concept stories and media interest
trigger significant publicity. Often no usable products exist and commercial viability is unproven.
Trough of Disillusionment: Interest wanes as experiments and implementations fail to deliver. Producers of the
technology shake out or fail. Investments continue only if the surviving providers improve their products to the
satisfaction of early adopters.
Peak of Inflated Expectations: Early publicity produces a number of
success stories—often accompanied by scores of failures. Some
companies take action; many do not.
Slope of Enlightenment: More instances of how the technology can benefit the
enterprise start to crystallize and become more widely understood. Second- and third-
generation products appear from technology providers. More enterprises fund pilots;
conservative companies remain cautious.
Plateau of Productivity: Mainstream adoption starts to
take off. Criteria for assessing provider viability are more
clearly defined. The technology’s broad market
applicability and relevance are clearly paying off.
Gartner Five-phase Hype Cycle
http://www.gartner.com/technology/research/methodologies/hype-cycle.jsp
Gartner 2021 Hype Cycle for Data Management
© Copyright 2022 by Peter Aiken Slide # 40
https://anythingawesome.com
© Copyright 2022 by Peter Aiken Slide #
Metadata
Management
41
https://anythingawesome.com
Data
Management
Body
of
Knowledge
(DM
BoK
V2)
Practice
Areas
from The DAMA Guide to the Data Management Body of Knowledge 2E © 2017 by DAMA International
Data Management Tools
© Copyright 2022 by Peter Aiken Slide # 42
https://anythingawesome.com
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
1. Data Governance
– Intranet Website
– E-Mail
– Metadata Tools
– Issue Mgt. Tools
2. Data Architecture
– Management
– Intranet Website
– E-Mail
– Meta-data Tools
– Issue Mgt. Tools
– Data Modeling Tools
– Model Mgt. Tool
– Metadata Repository
– Office Productivity
Tools
3. Data Development
– Data Modeling Tools
– Cloud
– DBMS
– Software Dev. Tools
– Testing Tools
– CASE/Model Tools
– Config. Mgt. Tools
– Office Tools
4. Data Ops Mgt.
– DBMS
– Data Dev. Tools
– DBA Tools
– ETL Tools
– Office Productivity
5. Data Security Mgt.
– DBMS
– BI Tools
– Application
Frameworks
– Identity Mgt. Tech.
– Change Control Sys.
6. Reference and
Master Data
– Management
– Reference DM Apps.
– Master DM Apps
– Data Modeling Tools
– Process Modeling
– Metadata Repositories
– Data Profiling Tools
– Data Cleansing Tools
– Data Integration Tools
– Rule Engines
– Change Mgt. Tools
7. Analytics
– Business Execs/Mgrs.
– DM Execs/IT Mgt.
– BI Program Manager
– SMEs/Info
Consumers
– Data Stewards
– Project Managers
– Data Architects/
Analysts
– Data Integration
Specs.
– BI Specialists
– DBAs
– Data Security
– Data Quality Analysts
8. Content
Management
– All Employees
– Data Stewards
– DM Professionals
– Records Mgt. Staff
– Other IT
Professionals
– Data Mgt. Executive
– CIO/CKO
9. Metadata Mgt.
– Metadata
Repositories
– Data Modeling Tools
– DBMS
– Data Integration Tools
– BI Tools
– System Mgt. Tools
– Object Modeling Tools
– Process Modeling
Tools
– Report Generators
– Data Quality Tools
– Data Dev. Tools
– Reference/Master
Data Tools
10. Data Quality Tools
– Data Profiling Tools
– Statistical Analysis
Tools
– Data Cleansing Tools
– Data Integration Tools
– Issue and Event
Management Tools
1. Data Governance
– Intranet Website
– E-Mail
– Metadata Tools
– Issue Mgt. Tools
2. Data Architecture
– Management
– Intranet Website
– E-Mail
– Meta-data Tools
– Issue Mgt. Tools
– Data Modeling Tools
– Model Mgt. Tool
– Metadata Repository
– Office Productivity
Tools
3. Data Development
– Data Modeling Tools
– Cloud
– DBMS
– Software Dev. Tools
– Testing Tools
– CASE/Model Tools
– Config. Mgt. Tools
– Office Tools
4. Data Ops Mgt.
– DBMS
– Data Dev. Tools
– DBA Tools
– ETL Tools
– Office Productivity
5. Data Security Mgt.
– DBMS
– BI Tools
– Application
Frameworks
– Identity Mgt. Tech.
– Change Control Sys.
6. Reference and
Master Data
– Management
– Reference DM Apps.
– Master DM Apps
– Data Modeling Tools
– Process Modeling
– Metadata Repositories
– Data Profiling Tools
– Data Cleansing Tools
– Data Integration Tools
– Rule Engines
– Change Mgt. Tools
7. Analytics
– Business Execs/Mgrs.
– DM Execs/IT Mgt.
– BI Program Manager
– SMEs/Info
Consumers
– Data Stewards
– Project Managers
– Data Architects/
Analysts
– Data Integration
Specs.
– BI Specialists
– DBAs
– Data Security
– Data Quality Analysts
8. Content
Management
– All Employees
– Data Stewards
– DM Professionals
– Records Mgt. Staff
– Other IT
Professionals
– Data Mgt. Executive
– CIO/CKO
9. Metadata Mgt.
– Metadata
Repositories
– Data Modeling Tools
– DBMS
– Data Integration Tools
– BI Tools
– System Mgt. Tools
– Object Modeling Tools
– Process Modeling
Tools
– Report Generators
– Data Quality Tools
– Data Dev. Tools
– Reference/Master
Data Tools
10. Data Quality Tools
– Data Profiling Tools
– Statistical Analysis
Tools
– Data Cleansing Tools
– Data Integration Tools
– Issue and Event
Management Tools
Data Management Tools
© Copyright 2022 by Peter Aiken Slide # 43
https://anythingawesome.com
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
• Cloud
• CASE/Model Tools
• ETL Tools
• Data Quality Tools
• Data Profiling Tools
Gartner Strategic Planning Assumptions
• By 2023
– 75% of all databases will be cloud, reducing the DBMS vendor landscape and
increasing complexity for data governance and integration.
© Copyright 2022 by Peter Aiken Slide # 44
https://anythingawesome.com
https://www.gartner.com/document/3894971?ref=solrAll&refval=219836558&qid=de595a5685b6f86db0ec6
Transform
Problems with forklifting
1. no basis for decisions made
2. no inclusion of architecture/
engineering concepts
3. no idea that these concepts
are missing from the process
4. 80% of organizational
data is ROT
© Copyright 2022 by Peter Aiken Slide # 45
https://anythingawesome.com
Cleaner
More shareable
Less …
… data
Successful Cloud Adoption
Gartner Cloud Vendor Offerings
Data combined with easy access, becomes a key differentiator -
obvious combinations include:
• Google:
– Google Search data
– YouTube data
– Google Ads data
– Retailers.
• Azure
– LinkedIn
– Office 365 data
– Sales and customer-relationship-focused analytics
• Amazon
– Anything retail
© Copyright 2022 by Peter Aiken Slide # 46
https://anythingawesome.com
https://www.gartner.com/document/3894971?ref=solrAll&refval=219836558&qid=de595a5685b6f86db0ec6
• Computer-aided software engineering (CASE) is the
scientific application of a set of tools and methods to a
software system which is meant to result in high-quality,
defect-free, and maintainable software products. It also
refers to methods for the development of information
systems together with automated tools that can be used in
the software development process.
• Scientific application of a set of tools and methods to a
software system which is meant to result in high-quality,
defect free, and maintainable software products
• Refers to methods for the development of information
systems together with automated tools that can be used in
the software development process
• CASE functions include analysis, design, and
programming
© Copyright 2022 by Peter Aiken Slide # 47
https://anythingawesome.com
Source: http://en.wikipedia.org/wiki/
Computer Aided Software/Systems Engineering (CASE) Tools
CASE-based Support
© Copyright 2022 by Peter Aiken Slide # 48
https://anythingawesome.com
http://www.visible.com
CASE-based Support
© Copyright 2022 by Peter Aiken Slide # 49
https://anythingawesome.com
http://www.visible.com
CASE-based Support
© Copyright 2022 by Peter Aiken Slide # 50
https://anythingawesome.com
http://www.visible.com
© Copyright 2022 by Peter Aiken Slide # 51
https://anythingawesome.com
This includes:
• Senders
– flows from the CASE effort that
can inform the re-architecting
effort.
• Receivers
– flows from the project that can
inform the CASE effort.
• Senders and receivers
– some elements, such as
restructuring and reengineering,
are both senders and receivers.
CASE Tool: "Taxonomy"
A variety of
CASE-based
methods and
technologies can
access and
update the
metadata
metadata
Integration
Additional metadata uses
accessible via: web; portal;
XML; RDBMS
Everything must "fit" into one
CASE technology
Changing Model of CASE Tool Usage
© Copyright 2022 by Peter Aiken Slide # 52
https://anythingawesome.com
Limited access
from outside
the CASE
technology
environment
CASE
tool-specific
methods
and
technologies
Limited additional
metadata use
IBM's AD/Cycle Information Model
© Copyright 2022 by Peter Aiken Slide # 53
https://anythingawesome.com
Implementing Metadata
Repository Functionality
• "The repository" does not have to be an integrated solution
– it must be an easily integrateable solution
• Repository functionality (does not equal a) repository
– metadata must easily evolve to repository solution
• Multiple repositories are not necessarily bad
– as interim solutions, Excel has been working quite well
• Minimal functionality includes
– ability to create, read, update, delete, and evolve metadata items
• Remember the 1st law of data management
– In order to manage metadata, you need metadata repository functionality
© Copyright 2022 by Peter Aiken Slide # 54
https://anythingawesome.com
Defining The "E" Spaces
• ETL Extract Transform, Load
– delivers aggregated data to a
new database
• EAI Enterprise Application Integration
– connects applications to other applications in a
predictable manner using
pre-established connections
• EII Enterprise Information Integration
– between ETL and EAI - delivers tailored views of
information to users at the time that it is required
© Copyright 2022 by Peter Aiken Slide # 55
https://anythingawesome.com
Data Quality Engineering Tools
• 4 categories of activities:
– Analysis
– Cleansing
– Enhancement
– Monitoring
• Principal tools:
– Parsing and Standardization
– Data Transformation
– Identity Resolution and Matching
– Enhancement
– Reporting
– Data Profiling
© Copyright 2022 by Peter Aiken Slide # 56
https://anythingawesome.com
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
Data Preparation Activities
© Copyright 2022 by Peter Aiken Slide # 57
https://anythingawesome.com
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
Data Preparation Parsing & Standardization Tools
• Data parsing tools enable the
definition of patterns that feed into a
rules engine used to distinguish
between valid and invalid data
values
• Actions are triggered upon matching
a specific pattern
• When an invalid pattern is
recognized, the application may
attempt to transform the invalid value
into one that meets expectations
© Copyright 2022 by Peter Aiken Slide # 58
https://anythingawesome.com
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
https://www.youtube.com/watch?v=r9UhJxFT5rk
Data Preparation Identity Resolution & MatchingTools
Basic approaches to matching:
• Deterministic
– Relies on defined patterns and
rules for assigning weights and
scores to determine similarity
• Predictable
– Only as good as anticipations of
the rules developers
• Probabilistic
– Uses statistical techniques to
assess probabilities that pairs of
records represent the same
entity
• Not reliant on rules
– Refined based on experience ->
matchers can improve precision
as more data is analyzed
© Copyright 2022 by Peter Aiken Slide # 59
https://anythingawesome.com
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
Data Preparation Enhancement Tools
• A method for adding value to information by accumulating
additional information about a base set of entities and then
merging all the sets of information to provide a focused view
• Examples:
– Time/date stamps
– Auditing information
– Contextual information
– Geographic information
– Demographic information
– Psychographic information
© Copyright 2022 by Peter Aiken Slide # 60
https://anythingawesome.com
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
Data Preparation Reporting Tools
• Good reporting supports:
– Inspection and monitoring of conformance to data quality expectations
– Monitoring performance of data stewards conforming to data quality SLAs
– Workflow processing for data quality incidents
– Manual oversight of data cleansing and correction
• Associate report results w/:
– Data quality measurement
– Metrics
– Activity
© Copyright 2022 by Peter Aiken Slide # 61
https://anythingawesome.com
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
Data Preparation Portals as a Tools
© Copyright 2022 by Peter Aiken Slide # 62
https://anythingawesome.com
Data Preparation Profiling Tools
• Need to be able to distinguish between good and bad data before
making any improvements
• Data profiling is a set of algorithms for 2 purposes:
– Statistical analysis and assessment of the data quality values within a data set
– Exploring relationships that exist between value collections within and across
data sets
© Copyright 2022 by Peter Aiken Slide # 63
https://anythingawesome.com
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
Sample Existing Environment
© Copyright 2022 by Peter Aiken Slide #
?
64
https://anythingawesome.com
Sample Existing Environment
© Copyright 2022 by Peter Aiken Slide #
VSAM, DB2, Oracle, Client
Server, IDMS, Flat files
65
https://anythingawesome.com
Sample Existing Environment
© Copyright 2022 by Peter Aiken Slide #
RDBMS 1
Finance
HR
RDBMS 2
Marketing
R
&
D
#
1
R
&
D
#
2
R&D
#3 Network
Database
BackOffice
Applications
Manufacturing
Systems Flat Files
Logistics
Systems Flat Files
66
https://anythingawesome.com
Sample Existing Environment
© Copyright 2022 by Peter Aiken Slide #
Technology
Logical/
Virtual
Databases
Subject
Areas Tables Attributes
Unique
Attributes Records
Programs/
Copybooks
Lines of
Code
Data Management Technology Type 1 44 20 13,067 6,600
Global schema 1,049
T01 5
C01 4
Data Management Technology Type 2
R2 613
R3 108
R5 127
R7 447
R72 5,996
R73 11,224
Data Management Technology Type 3 1,227 9,000 2,514
Application 10,970 15,700,000
Copybooks 5,518 498,966
Totals: 44 20 19,742 22,067 9,114 1,058 16,488 16,198,966
67
https://anythingawesome.com
© Copyright 2022 by Peter Aiken Slide #
Profiling
Discovery
Analysis
Data Discovery Technologies
• Data analysis software technologies deliver up to 10X
productivity over manual approaches
• Based on a powerful computing technology that allows data
engineers to quickly form candidate hypotheses with respect
to the existing data structures
• Hypotheses are then presented to the SMEs (both business
and technical) who confirm, refine, or deny them
• Allows existing data structures to be inferred at rate that is an
order of magnitude more effective than previous manual
approaches
• Semi-automated
68
https://anythingawesome.com
How has this been done in the past?
• Old
– Manually
– Brute force
– Repository dependent
– Quality indifferent
– Not repeatable
• New
– Semi-automated
– Engineered
– Repository independent
– Integrated quality
– Repeatable
– Currency
– Accuracy
© Copyright 2022 by Peter Aiken Slide # 69
https://anythingawesome.com
© Copyright 2022 by Peter Aiken Slide # 70
https://anythingawesome.com
Semi-
automating
Reverse
Engineering:
Column
Profiling with
Attribute
Summary
Report
© Copyright 2022 by Peter Aiken Slide # 71
https://anythingawesome.com
Semi-automating Reverse Engineering:
Column Profiling, Compare Documented vs. Actual
© Copyright 2022 by Peter Aiken Slide #
Screen shots of Migration Architect used by permission of Evoke Software http://www.evokesoft.com
72
https://anythingawesome.com
Semi-automating Reverse
Engineering:
Column Profiling, Drilling
Down on Column Values
© Copyright 2022 by Peter Aiken Slide # 73
https://anythingawesome.com
Select an Attribute to get a list of values
Double-click a value to
see rows with that value
© Copyright 2022 by Peter Aiken Slide #
Column
Profiling
Demonstration
74
https://anythingawesome.com
© Copyright 2022 by Peter Aiken Slide # 75
https://anythingawesome.com
Semi-automating Reverse Engineering:
Dependency Profiling, Candidate Dependencies
© Copyright 2022 by Peter Aiken Slide # 76
https://anythingawesome.com
Semi-automating Reverse Engineering:
Dependency Profiling, Promoting Dependencies
© Copyright 2022 by Peter Aiken Slide #
Dependency
Profiling
Demonstration
77
https://anythingawesome.com
© Copyright 2022 by Peter Aiken Slide # 78
https://anythingawesome.com
Semi-automating
Reverse Engineering:
Redundancy
Profiling, Domain
Comparison Detail
© Copyright 2022 by Peter Aiken Slide # 79
https://anythingawesome.com
Redundancy
Profiling
Demonstration
Comparing Weekly Progress (80/20)
© Copyright 2022 by Peter Aiken Slide #
MONDAY
Morning:
Model
preparation
Afternoon:
Model refinement/
validation session
TUESDAY
Afternoon:
Model refinement/
validation session
WEDNESDAY
Afternoon:
Model refinement/
validation session
THURSDAY
Afternoon:
Model refinement/
validation session
FRIDAY
Afternoon:
Model refinement/
validation session
MONDAY
Morning:
Model
preparation
Afternoon:
Model
preparation
TUESDAY
Morning:
Model
preparation
Afternoon:
Model refinement/
validation session
WEDNESDAY
Morning:
Model
preparation
Afternoon:
Model
preparation
THURSDAY
Morning:
Model
preparation
Afternoon:
Model refinement/
validation session
FRIDAY
Morning:
Model
preparation
Afternoon:
Model
preparation
80
https://anythingawesome.com
Morning:
Model refinement/
validation session
Morning:
Model
preparation
Morning:
Model refinement/
validation session
Morning:
Model
preparation
Baseline
Relative
Condition
&
Amount
of
Evidence
[ ]
Confounding
characteristics
Data Handling,
Operating
Environment
&
Language
Factor
(Factor => 1)
[ ]
[
Beneficial
characteristics
Key End User
Participation &
Net Automation
Impact
(Impact =<1)
]
Historical
organizational
reverse
engineering
performance data
[ ]
= Project
characteristics
* The purpose of the Preliminary System Survey is to determine how long and how
many resources will be required to reverse engineer the selected system components.
[ ]
Project
characteristics
=
Project
Estimate
Preliminary System Survey (PSS*)
© Copyright 2022 by Peter Aiken Slide # 81
https://anythingawesome.com
Ancillary Results
• 4.7 billion empty bytes in just three
data warehouse tables
– Reducing the need to upgrade
company infrastructure capacity
– Made a strong case for normalization
– Ovation for the team from the Data
Warehouse Board of Directors
• Preserved multi-million dollar US
Postal Service (and other) sort
discounts
• Accurate measurable views of how
effectively certain processes work
• Hundreds of files have processed
by the domain profiling process.
• Many of the files through initial
stages of normalization
© Copyright 2022 by Peter Aiken Slide # 82
https://anythingawesome.com
© Copyright 2022 by Peter Aiken Slide #
https://anythingawesome.com
Program
Program
83
• Motivation
• Data Preparation Considerations
– No standard data curricula
– No standard audience
– Technology is a one-legged stool
• Data Problems are Different
– Dependence on high speed automation
– Hidden data factories sap resources
– Require a unified approach
• Reverse Engineering (Introducing Yourself to a Dataset)
– No measures (other than size)
– Hype Cycle
– Column Profiling
– Dependency Profiling
– Redundancy Profiling
• Take Aways/References/Q&A
0%
50%
100%
Data Analysis Data Preparation
80%
20%
Supply/demand for data talent
https://www.logianalytics.com/bi-trends/3-keys-understanding-data/
Growth of Data vs. Growth of Data Analysts
• Stored data accumulating at
28% annual growth rate
• Data analysts in workforce
growing at 5.7% growth rate
© Copyright 2022 by Peter Aiken Slide # 84
https://anythingawesome.com
Unmatched
Items
Ignorable
Items
Items
Matched
Week # (% Total) (% Total) (% Total)
1 31.47% 1.34% N/A
2 21.22% 6.97% N/A
3 20.66% 7.49% N/A
4 32.48% 11.99% 55.53%
… … … …
14 9.02% 22.62% 68.36%
15 9.06% 22.62% 68.33%
16 9.53% 22.62% 67.85%
17 9.5% 22.62% 67.88%
18 7.46% 22.62% 69.92%
Determining Diminishing Returns
© Copyright 2022 by Peter Aiken Slide #
Before
After
85
https://anythingawesome.com
Quantifying Benefits: Original Plan
© Copyright 2022 by Peter Aiken Slide # 86
https://anythingawesome.com
Time needed to review all NSNs once over the life of the project:
NSNs 2,000,000
Average time to review & cleanse (in minutes) 5
Total Time (in minutes) 10,000,000
Time available per resource over a one year period of time:
Work weeks in a year 48
Work days in a week 5
Work hours in a day 7.5
Work minutes in a day 450
Total work minutes/year 108,000
Person years required to cleanse each NSN once prior to migration:
Minutes needed 10,000,000
Minutes available person/year 108,000
Total Person-Years 92.6
Resource Cost to cleanse NSN's prior to migration:
Avg salary for SME year (not including overhead) $60,000.00
Projected years required to cleanse/total DLA person years saved 93
Total cost to cleanse/Total DLA savings to cleanse NSN's: $5.5 million
Quantifying Benefits: Revised Plan
© Copyright 2022 by Peter Aiken Slide # 87
https://anythingawesome.com
Time needed to review all NSNs once over the life of the project:
NSNs 2,000,000
Average time to review & cleanse (in minutes) 5
Total Time (in minutes) 10,000,000
Time available per resource over a one year period of time:
Work weeks in a year 48
Work days in a week 5
Work hours in a day 7.5
Work minutes in a day 450
Total work minutes/year 108,000
Person years required to cleanse each NSN once prior to migration:
Minutes needed 10,000,000
Minutes available person/year 108,000
Total Person-Years 92.6
Resource Cost to cleanse NSN's prior to migration:
Avg salary for SME year (not including overhead) $60,000.00
Projected years required to cleanse/total DLA person years saved 93
Total cost to cleanse/Total DLA savings to cleanse NSN's: $5.5 million
Time needed to review all NSNs once over the life of the project:
NSNs 150,000
Average time to review & cleanse (in minutes) 5
Total Time (in minutes) 750,000
Time available per resource over a one year period of time:
Work weeks in a year 48
Work days in a week 5
Work hours in a day 7.5
Work minutes in a day 450
Total work minutes/year 108,000
Person years required to cleanse each NSN once prior to migration:
Minutes needed 750,000
Minutes available person/year 108,000
Total Person-Years 7
Resource Cost to cleanse NSN's prior to migration:
Avg salary for SME year (not including overhead) $60,000.00
Projected years required to cleanse/total DLA person years saved 7
Total cost to cleanse/Total DLA savings to cleanse NSN's: $420,000
Quantifying Benefits: Social Engineering
© Copyright 2022 by Peter Aiken Slide # 88
https://anythingawesome.com
Time needed to review all NSNs once over the life of the project:
NSNs 2,000,000
Average time to review & cleanse (in minutes) 5
Total Time (in minutes) 10,000,000
Time available per resource over a one year period of time:
Work weeks in a year 48
Work days in a week 5
Work hours in a day 7.5
Work minutes in a day 450
Total work minutes/year 108,000
Person years required to cleanse each NSN once prior to migration:
Minutes needed 10,000,000
Minutes available person/year 108,000
Total Person-Years 92.6
Resource Cost to cleanse NSN's prior to migration:
Avg salary for SME year (not including overhead) $60,000.00
Projected years required to cleanse/total DLA person years saved 93
Total cost to cleanse/Total DLA savings to cleanse NSN's: $5.5 million
Take Aways
• Too much investment is spent focused on
tools and technology at the expense of
problem understanding
• It is useful to understand data management
technologies and their use as part of a
people process & technology (3-legged)
stools
• Value that can be gained by profiling data
• Data volume is still increasing faster than we
are able to process it
• Data interchange overhead and other costs
of poor data practices are measurably
sapping organization and individual
resources–and therefore productivity
• Reliance on existing technology-based
approaches and education methods has not
materially addressed this gap between
creation and processing or reduced bottom
line costs
© Copyright 2022 by Peter Aiken Slide # 89
https://anythingawesome.com
R. Buckminster Fuller
[ Clicking any webinar title will link directly to the registration page ]
Upcoming Events
Conceptual vs. Logical vs. Physical Data
12 July 2022
The Importance of Metadata
9 August 2022
Data Preparation Fundamentals
13 September 2022
© Copyright 2022 by Peter Aiken Slide # 90
https://anythingawesome.com
Brought to you by:
Time: 19:00 UTC (2:00 PM NYC) | Presented by: Peter Aiken, PhD
Peter.Aiken@AnythingAwesome.com +1.804.382.5957
Thank You!
© Copyright 2022 by Peter Aiken Slide # 91
Book a call with Peter to discuss anything - https://anythingawesome.com/OfficeHours.html
Critical Design Review?
Hiring Assistance?
Reverse Engineering Expertise?
Executive Data
Literacy Training?
Mentoring?
Tool/automation evaluation?
Use your data more strategically?

Weitere ähnliche Inhalte

Was ist angesagt?

The Evolving Role of the Data Architect – What Does It Mean for Your Career?
The Evolving Role of the Data Architect – What Does It Mean for Your Career?The Evolving Role of the Data Architect – What Does It Mean for Your Career?
The Evolving Role of the Data Architect – What Does It Mean for Your Career?DATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
Data Governance
Data GovernanceData Governance
Data GovernanceRob Lux
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata ManagementDATAVERSITY
 
Introduction to Data Governance
Introduction to Data GovernanceIntroduction to Data Governance
Introduction to Data GovernanceJohn Bao Vuu
 
Data Governance Roles as the Backbone of Your Program
Data Governance Roles as the Backbone of Your ProgramData Governance Roles as the Backbone of Your Program
Data Governance Roles as the Backbone of Your ProgramDATAVERSITY
 
How to Realize Benefits from Data Management Maturity Models
How to Realize Benefits from Data Management Maturity ModelsHow to Realize Benefits from Data Management Maturity Models
How to Realize Benefits from Data Management Maturity ModelsKingland
 
The Importance of MDM - Eternal Management of the Data Mind
The Importance of MDM - Eternal Management of the Data MindThe Importance of MDM - Eternal Management of the Data Mind
The Importance of MDM - Eternal Management of the Data MindDATAVERSITY
 
LDM Webinar: Data Modeling & Metadata Management
LDM Webinar: Data Modeling & Metadata ManagementLDM Webinar: Data Modeling & Metadata Management
LDM Webinar: Data Modeling & Metadata ManagementDATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
RWDG Slides: Three Approaches to Data Stewardship
RWDG Slides: Three Approaches to Data StewardshipRWDG Slides: Three Approaches to Data Stewardship
RWDG Slides: Three Approaches to Data StewardshipDATAVERSITY
 
Data-Ed Webinar: Data Quality Engineering
Data-Ed Webinar: Data Quality EngineeringData-Ed Webinar: Data Quality Engineering
Data-Ed Webinar: Data Quality EngineeringDATAVERSITY
 
Lessons in Data Modeling: Why a Data Model is an Important Part of Your Data ...
Lessons in Data Modeling: Why a Data Model is an Important Part of Your Data ...Lessons in Data Modeling: Why a Data Model is an Important Part of Your Data ...
Lessons in Data Modeling: Why a Data Model is an Important Part of Your Data ...DATAVERSITY
 
RWDG Slides: Data Governance Roles and Responsibilities
RWDG Slides: Data Governance Roles and ResponsibilitiesRWDG Slides: Data Governance Roles and Responsibilities
RWDG Slides: Data Governance Roles and ResponsibilitiesDATAVERSITY
 
Selecting Data Management Tools - A practical approach
Selecting Data Management Tools - A practical approachSelecting Data Management Tools - A practical approach
Selecting Data Management Tools - A practical approachChristopher Bradley
 
Master Data Management
Master Data ManagementMaster Data Management
Master Data ManagementZahra Mansoori
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management DATAVERSITY
 
Introduction to Data Management Maturity Models
Introduction to Data Management Maturity ModelsIntroduction to Data Management Maturity Models
Introduction to Data Management Maturity ModelsKingland
 

Was ist angesagt? (20)

Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data Management
 
The Evolving Role of the Data Architect – What Does It Mean for Your Career?
The Evolving Role of the Data Architect – What Does It Mean for Your Career?The Evolving Role of the Data Architect – What Does It Mean for Your Career?
The Evolving Role of the Data Architect – What Does It Mean for Your Career?
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
Data Governance
Data GovernanceData Governance
Data Governance
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata Management
 
Introduction to Data Governance
Introduction to Data GovernanceIntroduction to Data Governance
Introduction to Data Governance
 
Data Governance Roles as the Backbone of Your Program
Data Governance Roles as the Backbone of Your ProgramData Governance Roles as the Backbone of Your Program
Data Governance Roles as the Backbone of Your Program
 
How to Realize Benefits from Data Management Maturity Models
How to Realize Benefits from Data Management Maturity ModelsHow to Realize Benefits from Data Management Maturity Models
How to Realize Benefits from Data Management Maturity Models
 
The Importance of MDM - Eternal Management of the Data Mind
The Importance of MDM - Eternal Management of the Data MindThe Importance of MDM - Eternal Management of the Data Mind
The Importance of MDM - Eternal Management of the Data Mind
 
LDM Webinar: Data Modeling & Metadata Management
LDM Webinar: Data Modeling & Metadata ManagementLDM Webinar: Data Modeling & Metadata Management
LDM Webinar: Data Modeling & Metadata Management
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
RWDG Slides: Three Approaches to Data Stewardship
RWDG Slides: Three Approaches to Data StewardshipRWDG Slides: Three Approaches to Data Stewardship
RWDG Slides: Three Approaches to Data Stewardship
 
Data-Ed Webinar: Data Quality Engineering
Data-Ed Webinar: Data Quality EngineeringData-Ed Webinar: Data Quality Engineering
Data-Ed Webinar: Data Quality Engineering
 
Lessons in Data Modeling: Why a Data Model is an Important Part of Your Data ...
Lessons in Data Modeling: Why a Data Model is an Important Part of Your Data ...Lessons in Data Modeling: Why a Data Model is an Important Part of Your Data ...
Lessons in Data Modeling: Why a Data Model is an Important Part of Your Data ...
 
RWDG Slides: Data Governance Roles and Responsibilities
RWDG Slides: Data Governance Roles and ResponsibilitiesRWDG Slides: Data Governance Roles and Responsibilities
RWDG Slides: Data Governance Roles and Responsibilities
 
Selecting Data Management Tools - A practical approach
Selecting Data Management Tools - A practical approachSelecting Data Management Tools - A practical approach
Selecting Data Management Tools - A practical approach
 
Master Data Management
Master Data ManagementMaster Data Management
Master Data Management
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management
 
DAMA International DMBOK V2 - Comparison with V1
DAMA International DMBOK V2 - Comparison with V1DAMA International DMBOK V2 - Comparison with V1
DAMA International DMBOK V2 - Comparison with V1
 
Introduction to Data Management Maturity Models
Introduction to Data Management Maturity ModelsIntroduction to Data Management Maturity Models
Introduction to Data Management Maturity Models
 

Ähnlich wie Data Preparation Fundamentals

Where Data Architecture and Data Governance Collide
Where Data Architecture and Data Governance CollideWhere Data Architecture and Data Governance Collide
Where Data Architecture and Data Governance CollideDATAVERSITY
 
Approaching Data Quality
Approaching Data QualityApproaching Data Quality
Approaching Data QualityDATAVERSITY
 
Getting Data Quality Right
Getting Data Quality RightGetting Data Quality Right
Getting Data Quality RightDATAVERSITY
 
Data-Ed Online: Approaching Data Quality
Data-Ed Online: Approaching Data QualityData-Ed Online: Approaching Data Quality
Data-Ed Online: Approaching Data QualityDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Data Management vs. Data Governance Program
Data Management vs. Data Governance ProgramData Management vs. Data Governance Program
Data Management vs. Data Governance ProgramDATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Essential Reference and Master Data Management
Essential Reference and Master Data ManagementEssential Reference and Master Data Management
Essential Reference and Master Data ManagementDATAVERSITY
 
DataEd Slides: Leveraging Data Management Technologies
DataEd Slides: Leveraging Data Management TechnologiesDataEd Slides: Leveraging Data Management Technologies
DataEd Slides: Leveraging Data Management TechnologiesDATAVERSITY
 
Data-Ed Webinar: Data Architecture Requirements
Data-Ed Webinar: Data Architecture RequirementsData-Ed Webinar: Data Architecture Requirements
Data-Ed Webinar: Data Architecture RequirementsDATAVERSITY
 
Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements  Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements Data Blueprint
 
Why Data Modeling Is Fundamental
Why Data Modeling Is FundamentalWhy Data Modeling Is Fundamental
Why Data Modeling Is FundamentalDATAVERSITY
 
Data-Ed: Essential Metadata Strategies
Data-Ed: Essential Metadata StrategiesData-Ed: Essential Metadata Strategies
Data-Ed: Essential Metadata StrategiesDATAVERSITY
 
What’s in Your Data Warehouse?
What’s in Your Data Warehouse?What’s in Your Data Warehouse?
What’s in Your Data Warehouse?DATAVERSITY
 
Key Elements of a Successful Data Governance Program
Key Elements of a Successful Data Governance ProgramKey Elements of a Successful Data Governance Program
Key Elements of a Successful Data Governance ProgramDATAVERSITY
 
DataEd Slides: Growing Practical Data Governance Programs
DataEd Slides: Growing Practical Data Governance ProgramsDataEd Slides: Growing Practical Data Governance Programs
DataEd Slides: Growing Practical Data Governance ProgramsDATAVERSITY
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of MetadataDATAVERSITY
 
Necessary Prerequisites to Data Success
Necessary Prerequisites to Data SuccessNecessary Prerequisites to Data Success
Necessary Prerequisites to Data SuccessDATAVERSITY
 
DataEd Slides: Data Management vs. Data Strategy
DataEd Slides: Data Management vs. Data StrategyDataEd Slides: Data Management vs. Data Strategy
DataEd Slides: Data Management vs. Data StrategyDATAVERSITY
 
DataEd Slides: Data Management + Data Strategy = Interoperability
DataEd Slides: Data Management + Data Strategy = InteroperabilityDataEd Slides: Data Management + Data Strategy = Interoperability
DataEd Slides: Data Management + Data Strategy = InteroperabilityDATAVERSITY
 

Ähnlich wie Data Preparation Fundamentals (20)

Where Data Architecture and Data Governance Collide
Where Data Architecture and Data Governance CollideWhere Data Architecture and Data Governance Collide
Where Data Architecture and Data Governance Collide
 
Approaching Data Quality
Approaching Data QualityApproaching Data Quality
Approaching Data Quality
 
Getting Data Quality Right
Getting Data Quality RightGetting Data Quality Right
Getting Data Quality Right
 
Data-Ed Online: Approaching Data Quality
Data-Ed Online: Approaching Data QualityData-Ed Online: Approaching Data Quality
Data-Ed Online: Approaching Data Quality
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Data Management vs. Data Governance Program
Data Management vs. Data Governance ProgramData Management vs. Data Governance Program
Data Management vs. Data Governance Program
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Essential Reference and Master Data Management
Essential Reference and Master Data ManagementEssential Reference and Master Data Management
Essential Reference and Master Data Management
 
DataEd Slides: Leveraging Data Management Technologies
DataEd Slides: Leveraging Data Management TechnologiesDataEd Slides: Leveraging Data Management Technologies
DataEd Slides: Leveraging Data Management Technologies
 
Data-Ed Webinar: Data Architecture Requirements
Data-Ed Webinar: Data Architecture RequirementsData-Ed Webinar: Data Architecture Requirements
Data-Ed Webinar: Data Architecture Requirements
 
Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements  Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements
 
Why Data Modeling Is Fundamental
Why Data Modeling Is FundamentalWhy Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
 
Data-Ed: Essential Metadata Strategies
Data-Ed: Essential Metadata StrategiesData-Ed: Essential Metadata Strategies
Data-Ed: Essential Metadata Strategies
 
What’s in Your Data Warehouse?
What’s in Your Data Warehouse?What’s in Your Data Warehouse?
What’s in Your Data Warehouse?
 
Key Elements of a Successful Data Governance Program
Key Elements of a Successful Data Governance ProgramKey Elements of a Successful Data Governance Program
Key Elements of a Successful Data Governance Program
 
DataEd Slides: Growing Practical Data Governance Programs
DataEd Slides: Growing Practical Data Governance ProgramsDataEd Slides: Growing Practical Data Governance Programs
DataEd Slides: Growing Practical Data Governance Programs
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of Metadata
 
Necessary Prerequisites to Data Success
Necessary Prerequisites to Data SuccessNecessary Prerequisites to Data Success
Necessary Prerequisites to Data Success
 
DataEd Slides: Data Management vs. Data Strategy
DataEd Slides: Data Management vs. Data StrategyDataEd Slides: Data Management vs. Data Strategy
DataEd Slides: Data Management vs. Data Strategy
 
DataEd Slides: Data Management + Data Strategy = Interoperability
DataEd Slides: Data Management + Data Strategy = InteroperabilityDataEd Slides: Data Management + Data Strategy = Interoperability
DataEd Slides: Data Management + Data Strategy = Interoperability
 

Mehr von DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data LiteracyDATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...DATAVERSITY
 
Empowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceEmpowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceDATAVERSITY
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 

Mehr von DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
 
Empowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceEmpowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business Intelligence
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 

Kürzlich hochgeladen

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 

Kürzlich hochgeladen (20)

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 

Data Preparation Fundamentals

  • 1. Data Preparation Fundamentals © Copyright 2022 by Peter Aiken Slide # 1 peter.aiken@anythingawesome.com +1.804.382.5957 Peter Aiken, PhD 0% 50% 100% Data Analysis Data Preparation 0% 0% ? Peter Aiken, Ph.D. • I've been doing this a long time • My work is recognized as useful • Associate Professor of IS (vcu.edu) • Institute for Defense Analyses (ida.org) • DAMA International (dama.org) • MIT CDO Society (iscdo.org) • Anything Awesome (anythingawesome.com) • Experienced w/ 500+ data management practices worldwide • Multi-year immersions – US DoD (DISA/Army/Marines/DLA) – Nokia – Deutsche Bank – Wells Fargo – Walmart – HUD … • 12 books and dozens of articles © Copyright 2022 by Peter Aiken Slide # 2 https://anythingawesome.com + • DAMA International President 2009-2013/2018/2020 • DAMA International Achievement Award 2001 (with Dr. E. F. "Ted" Codd • DAMA International Community Award 2005
  • 2. Your Sponsor Today Quest Software Solutions Consultant Information Management Specialist Quest Software Gary Jerep Where Next Meets Now.
  • 3. Foglight SharePlex Kace Toad QoreStor NetVault erwin Binary Tree Change Auditor Migration Manager Metalogix Quadrotech SCALE & STREAMLINE IT OPERATIONS Migrate faster, strengthen cyber security resilience and stay in control to keep your business running Identity Manager Safeguard Active Roles Quest: Helping Customers Achieve True IT Resilience NOW IDENTITY DATA EMPOWERMENT & GOVERNANCE Empower your business with the visibility and context to better manage and develop data pipelines that deliver faster insights, while safeguarding the data and infrastructure Information & System Management One Identity Microsoft Platform Management HARDENED CYBER SECURITY Using an identity-focused, cloud-first, customer-centric approach from the Cloud to the edge making Zero Trust a reality now
  • 4. Monitoring Sensitive Data Identification SQL Tuning & Optimization Development Administration Backup DevOps Upgrades Migration Load Balancing Diagnostics Data Modeling Systems Monitoring & Diagnostics Inventory & Asset Management Policy & Access Management Secondary Storage & Cost Optimization Backup & Recovery Software Compliance Helpdesk Cloud Cost Optimization erwin Data Modeler erwin Data Catalog erwin Evolve erwin Data Literacy erwin Data Intelligence Active Metadata Management Business Process Modeling Data Stewardship Regulatory Compliance Data Catalog Data Lineage Enterprise Architecture Data Architecture Data Profiling Impact Analysis RapidRecovery® Data Protection Data Operations Data Governance Application Modernization Cloud Migration Empower NoSQL Sensitive Data Protection Model-Driven DevOps SLA Performance Cyber Resilience The Quest Data Empowerment Platform
  • 5. Where Next Meets Now. ® A Few Resources! • www.Quest.com/Solutions/Data- empowerment • eBook: The Four Roadblocks to Data Preparation • Toad Data Point Case Study: Philadelphia Youth Network • eBook: Enabling Agile Database Development with Toad • Tech Brief: Keep Using Toad for Oracle with Databases in the Cloud • Tech Brief: Accelerate and Secure your SQL Server DevOps CI/CD Pipelines
  • 6. Current approaches are not and have not been working © Copyright 2022 by Peter Aiken Slide # 3 https://anythingawesome.com Driving Innovation with Data Competing on data and analytics Managing data as a business asset Created a data-driven organization Forged a data culture 25% 50% 75% 100% 24% 24% 39% 41% 56% Yes No Source: Big Data and AI Executive Survey 2022 by Randy Bean & Thomas Davenport @ www.newvantage.com 2020 0% 25% 50% 75% 100% technology people/process 90% 10% 80% of data challenges are people/process based! © Copyright 2022 by Peter Aiken Slide # 4 https://anythingawesome.com 0% 25% 50% 75% 100% Data Analysis Data Preparation 20% 80% Everyone wants to do better data analysis … • Some data preparation is inevitable • What would a 'good' ratio be? • "Everyone knows"
  • 7. © Copyright 2022 by Peter Aiken Slide # 5 https://anythingawesome.com 1. 80% of your data is redundant, trivial, or obsolete 2. 80% of your data is of unknown quality 3. 80% of your data is 'standards free' 4. Your highly paid data analytics capabilities spend 80% of their time working under these conditions Pareto Data Realities • IT thinks data is a business problem – "If they can connect to the server, then my job is done!" • The business thinks IT is managing data adequately – "Who else would be taking care of it?" Confusion as to data responsibility © Copyright 2022 by Peter Aiken Slide # 6 https://anythingawesome.com
  • 8. You must address data debt proactively © Copyright 2022 by Peter Aiken Slide # 7 https://anythingawesome.com https://www.merkleinc.com/blog/are-you-buried-alive-data-debt https://johnladley.com/a-bit-more-on-data-debt/ https://uk.nttdataservices.com/en/blog/2020/february/how-to-get-rid-of-your-data-debt Data debt: • Slows progress • Decreases quality • Increases costs • Presents greater risks • Data debt – The time and effort it will take to return your shared data to a governed state from its (likely) current state of ungoverned • Getting back to zero – Involves undoing existing stuff – Likely new skills are required Bad Data Decisions Spiral © Copyright 2022 by Peter Aiken Slide # 8 https://anythingawesome.com Bad data decisions Poor organizational outcomes Technical decision makers are not data knowledgable Business decision makers are not data knowledgable Poor treatment of organizational data assets Poor quality data
  • 9. © Copyright 2022 by Peter Aiken Slide # https://anythingawesome.com Program Program 9 • Motivation • Data Preparation Considerations – No standard data curricula – No standard audience – Technology is a one-legged stool • Data Problems are Different – Dependence on high speed automation – Hidden data factories sap resources – Require a unified approach • Reverse Engineering (Introducing Yourself to a Dataset) – No measures (other than size) – Hype Cycle – Column Profiling – Dependency Profiling – Redundancy Profiling • Take Aways/References/Q&A 0% 50% 100% Data Analysis Data Preparation 80% 20% Data is not broadly or widely understood © Copyright 2022 by Peter Aiken Slide # 10 https://anythingawesome.com adapted from: http://www.dailymirror.lk/print/opinion/editorial-we-need-to-become-channels-of-peace/172-27164 It is like a fan! It is like a snake! It is like a wall! It is like a rope! It is like a tree! Blind Persons and the Elephant It is like a story! It is like a dashboard! It is like pipes! It is like a warehouse! It is like statistics!
  • 10. © Copyright 2022 by Peter Aiken Slide # 11 https://anythingawesome.com Unrefined data management definition Sources Uses Data Management © Copyright 2022 by Peter Aiken Slide # 12 https://anythingawesome.com More refined data management definition Sources Uses Reuse Data Management ➜ ➜
  • 11. Governance & Ethical Use Framework Specialized Data Skills Collection Evaluation Engineering Evolution Access Storage Preparation Data Science Delivery Presentation Story Telling Exploitation Better still data management definition © Copyright 2022 by Peter Aiken Slide # 13 https://anythingawesome.com Sources ➜ Reuse ➜ Data Preparation 80% Data Exploitation/Analysis 20% Formal Data Reuse Management Data Technologies by themselves, are a 1–Legged Stool © Copyright 2022 by Peter Aiken Slide # 14 https://anythingawesome.com
  • 12. Success Requires a 3–Legged Stool © Copyright 2022 by Peter Aiken Slide # 15 https://anythingawesome.com P e o p l e Process T e c h n o l o g y But not just a stool–these are interdependent © Copyright 2022 by Peter Aiken Slide # 16 https://anythingawesome.com People Process Technology
  • 13. © Copyright 2022 by Peter Aiken Slide # 17 https://anythingawesome.com Success is identifying a winning combination People Process Technology Defining Data Technology Architecture • Data technology is part of the overall technology architecture • It is also often considered part of the enterprise’s data architecture • Data technology architecture addresses 3 questions: – What technologies are standard/required/preferred/acceptable? – Which technologies apply to which purposes and circumstances? – In a distributed environment, which technologies exist where, and how does data move from one node to another? © Copyright 2022 by Peter Aiken Slide # 18 https://anythingawesome.com from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
  • 14. • Managing data technology should follow the same principles and standards for managing any technology • Leading reference model for technology management is the Information Technology Infrastructure Library (ITIL): – http://www.itil-officialsite.com/home/home.asp © Copyright 2022 by Peter Aiken Slide # 19 https://anythingawesome.com from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International Data Management Technologies Understanding Data Technology Requirements • Need to understand: – How the technology works – How it provides value in the context of your organization – Requirements of a data technology before determining what technical solution to choose for a particular situation • Suggested questions: – What problem does this data technology mean to solve? – What sets this data technology apart from others? – Are there specific hardware/software/operating systems/storage/network/ connectivity requirements? – Does this technology include data security functionality? © Copyright 2022 by Peter Aiken Slide # 20 https://anythingawesome.com from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
  • 15. © Copyright 2022 by Peter Aiken Slide # https://anythingawesome.com Program Program 21 • Motivation • Data Preparation Considerations – No standard data curricula – No standard audience – Technology is a one-legged stool • Data Problems are Different – Dependence on high speed automation – Hidden data factories sap resources – Require a unified approach • Reverse Engineering (Introducing Yourself to a Dataset) – No measures (other than size) – Hype Cycle – Column Profiling – Dependency Profiling – Redundancy Profiling • Take Aways/References/Q&A 0% 50% 100% Data Analysis Data Preparation 80% 20% © Copyright 2022 by Peter Aiken Slide # 22 https://anythingawesome.com Data Sandwich
  • 16. Data supply Data literacy Standard data Standard data Leverage point - high performance automation © Copyright 2022 by Peter Aiken Slide # Data literacy 23 https://anythingawesome.com Data supply Leverage point - high performance automation © Copyright 2022 by Peter Aiken Slide # Standard data Data supply Data literacy 24 https://anythingawesome.com
  • 17. Leverage point - high performance automation © Copyright 2022 by Peter Aiken Slide # This cannot happen without investments in engineering and architecture! 25 https://anythingawesome.com Data supply Data literacy Standard data Quality engineering/ architecture work products do not happen accidentally! Quality data engineering/ architecture work products do not happen accidentally! Leverage point - high performance automation © Copyright 2022 by Peter Aiken Slide # This cannot happen without investments in data engineering and architecture! 26 https://anythingawesome.com Data supply Data literacy Standard data
  • 18. Tacoma Narrows Bridge/Gallopin' Gertie • Slender, elegant and graceful • World's 3rd longest suspension span • Opened on July 1st 1940 • Collapsed in a windstorm on November 7,1940 • "The most dramatic failure in bridge engineering history" • Changed forever how engineers design suspension bridges leading to safer spans today © Copyright 2022 by Peter Aiken Slide # 27 https://anythingawesome.com Hidden Data Factories © Copyright 2022 by Peter Aiken Slide # 28 https://anythingawesome.com https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year Work products are delivered to Customers Customers Knowledge Workers 80% looking for stuff 20% doing useful work Department B 1. Check A's work 2. Make any corrections 3. Complete B's work 4. Deliver to Department C Department A https://en.wikipedia.org/wiki/Theory_of_constraints Department C 1. Check B's work 2. Make any corrections 3. Complete C's work 4. Deliver to Customer 5. Deal with consequences
  • 19. © Copyright 2022 by Peter Aiken Slide # 29 https://anythingawesome.com Hidden Data Factories Poor data manifests as multifaceted organizational challenges © Copyright 2022 by Peter Aiken Slide # 30 https://anythingawesome.com
  • 20. Poor data manifests as multifaceted organizational challenges © Copyright 2022 by Peter Aiken Slide # 31 https://anythingawesome.com IT System Business Challenge Business Process Business Challenge IT Process Business Challenge Business System Business Challenge IT Process Business Challenge IT System Business Challenge Business Process Business Challenge Poor results Root cause analysis is part of data governance Consistency Encourages Quality Analysis © Copyright 2022 by Peter Aiken Slide # 32 https://anythingawesome.com IT System Business Challenge Business Process Business Challenge IT Process Business Challenge Business System Business Challenge IT Process Business Challenge IT System Business Challenge Business Process Business Challenge Eliminating data debt requires a team with specialized skills deployed to create a repeatable process and develop sustained organizational skillsets
  • 21. © Copyright 2022 by Peter Aiken Slide # https://anythingawesome.com Program Program 33 • Motivation • Data Preparation Considerations – No standard data curricula – No standard audience – Technology is a one-legged stool • Data Problems are Different – Dependence on high speed automation – Hidden data factories sap resources – Require a unified approach • Reverse Engineering (Introducing Yourself to a Dataset) – No measures (other than size) – Hype Cycle – Column Profiling – Dependency Profiling – Redundancy Profiling • Take Aways/References/Q&A 0% 50% 100% Data Analysis Data Preparation 80% 20% As Is Requirements Assets WHAT? As Is Design Assets HOW? As Is Implementation Assets AS BUILT Forward Engineering © Copyright 2022 by Peter Aiken Slide # 34 https://anythingawesome.com
  • 22. As Is Requirements Assets WHAT? As Is Design Assets HOW? As Is Implementation Assets AS BUILT Existing Reverse Engineering © Copyright 2022 by Peter Aiken Slide # 35 https://anythingawesome.com A structured technique aimed at recovering rigorous knowledge of the existing system to leverage enhancement efforts [Chikofsky & Cross 1990] As Is Requirements Assets WHAT? As Is Design Assets HOW? As Is Implementation Assets AS BUILT Existing New Reengineering Reverse Engineering Forward engineering Reimplement To Be Implementation Assets To Be Design Assets To Be Requirements Assets • First, reverse engineering the existing system to understand its strengths/weaknesses • Next, use this information to inform the design of the new system © Copyright 2022 by Peter Aiken Slide # 36 https://anythingawesome.com
  • 23. Data Preparation Tools & Vendor Hype • CIOs/CDOs feel pressure • Vendor/project promise auditing • No understanding of hype cycle © Copyright 2022 by Peter Aiken Slide # 37 https://anythingawesome.com Who wrote this … ? © Copyright 2022 by Peter Aiken Slide # 38 https://anythingawesome.com • In considering any new subject, • there is frequently a tendency first to overrate what we find to be already interesting or remarkable, and • secondly - by a sort of natural reaction - to undervalue the true state of the case. – Lady Augusta Ada King, (1815 – 1852) Countess of Lovelace – (aka) Ada Lovelace, daughter of Lord Byron – Publisher of the first computing program
  • 24. © Copyright 2022 by Peter Aiken Slide # https://anythingawesome.com Technology Trigger: A potential technology breakthrough kicks things off. Early proof-of-concept stories and media interest trigger significant publicity. Often no usable products exist and commercial viability is unproven. Trough of Disillusionment: Interest wanes as experiments and implementations fail to deliver. Producers of the technology shake out or fail. Investments continue only if the surviving providers improve their products to the satisfaction of early adopters. Peak of Inflated Expectations: Early publicity produces a number of success stories—often accompanied by scores of failures. Some companies take action; many do not. Slope of Enlightenment: More instances of how the technology can benefit the enterprise start to crystallize and become more widely understood. Second- and third- generation products appear from technology providers. More enterprises fund pilots; conservative companies remain cautious. Plateau of Productivity: Mainstream adoption starts to take off. Criteria for assessing provider viability are more clearly defined. The technology’s broad market applicability and relevance are clearly paying off. Gartner Five-phase Hype Cycle http://www.gartner.com/technology/research/methodologies/hype-cycle.jsp Gartner 2021 Hype Cycle for Data Management © Copyright 2022 by Peter Aiken Slide # 40 https://anythingawesome.com
  • 25. © Copyright 2022 by Peter Aiken Slide # Metadata Management 41 https://anythingawesome.com Data Management Body of Knowledge (DM BoK V2) Practice Areas from The DAMA Guide to the Data Management Body of Knowledge 2E © 2017 by DAMA International Data Management Tools © Copyright 2022 by Peter Aiken Slide # 42 https://anythingawesome.com from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International 1. Data Governance – Intranet Website – E-Mail – Metadata Tools – Issue Mgt. Tools 2. Data Architecture – Management – Intranet Website – E-Mail – Meta-data Tools – Issue Mgt. Tools – Data Modeling Tools – Model Mgt. Tool – Metadata Repository – Office Productivity Tools 3. Data Development – Data Modeling Tools – Cloud – DBMS – Software Dev. Tools – Testing Tools – CASE/Model Tools – Config. Mgt. Tools – Office Tools 4. Data Ops Mgt. – DBMS – Data Dev. Tools – DBA Tools – ETL Tools – Office Productivity 5. Data Security Mgt. – DBMS – BI Tools – Application Frameworks – Identity Mgt. Tech. – Change Control Sys. 6. Reference and Master Data – Management – Reference DM Apps. – Master DM Apps – Data Modeling Tools – Process Modeling – Metadata Repositories – Data Profiling Tools – Data Cleansing Tools – Data Integration Tools – Rule Engines – Change Mgt. Tools 7. Analytics – Business Execs/Mgrs. – DM Execs/IT Mgt. – BI Program Manager – SMEs/Info Consumers – Data Stewards – Project Managers – Data Architects/ Analysts – Data Integration Specs. – BI Specialists – DBAs – Data Security – Data Quality Analysts 8. Content Management – All Employees – Data Stewards – DM Professionals – Records Mgt. Staff – Other IT Professionals – Data Mgt. Executive – CIO/CKO 9. Metadata Mgt. – Metadata Repositories – Data Modeling Tools – DBMS – Data Integration Tools – BI Tools – System Mgt. Tools – Object Modeling Tools – Process Modeling Tools – Report Generators – Data Quality Tools – Data Dev. Tools – Reference/Master Data Tools 10. Data Quality Tools – Data Profiling Tools – Statistical Analysis Tools – Data Cleansing Tools – Data Integration Tools – Issue and Event Management Tools 1. Data Governance – Intranet Website – E-Mail – Metadata Tools – Issue Mgt. Tools 2. Data Architecture – Management – Intranet Website – E-Mail – Meta-data Tools – Issue Mgt. Tools – Data Modeling Tools – Model Mgt. Tool – Metadata Repository – Office Productivity Tools 3. Data Development – Data Modeling Tools – Cloud – DBMS – Software Dev. Tools – Testing Tools – CASE/Model Tools – Config. Mgt. Tools – Office Tools 4. Data Ops Mgt. – DBMS – Data Dev. Tools – DBA Tools – ETL Tools – Office Productivity 5. Data Security Mgt. – DBMS – BI Tools – Application Frameworks – Identity Mgt. Tech. – Change Control Sys. 6. Reference and Master Data – Management – Reference DM Apps. – Master DM Apps – Data Modeling Tools – Process Modeling – Metadata Repositories – Data Profiling Tools – Data Cleansing Tools – Data Integration Tools – Rule Engines – Change Mgt. Tools 7. Analytics – Business Execs/Mgrs. – DM Execs/IT Mgt. – BI Program Manager – SMEs/Info Consumers – Data Stewards – Project Managers – Data Architects/ Analysts – Data Integration Specs. – BI Specialists – DBAs – Data Security – Data Quality Analysts 8. Content Management – All Employees – Data Stewards – DM Professionals – Records Mgt. Staff – Other IT Professionals – Data Mgt. Executive – CIO/CKO 9. Metadata Mgt. – Metadata Repositories – Data Modeling Tools – DBMS – Data Integration Tools – BI Tools – System Mgt. Tools – Object Modeling Tools – Process Modeling Tools – Report Generators – Data Quality Tools – Data Dev. Tools – Reference/Master Data Tools 10. Data Quality Tools – Data Profiling Tools – Statistical Analysis Tools – Data Cleansing Tools – Data Integration Tools – Issue and Event Management Tools
  • 26. Data Management Tools © Copyright 2022 by Peter Aiken Slide # 43 https://anythingawesome.com from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International • Cloud • CASE/Model Tools • ETL Tools • Data Quality Tools • Data Profiling Tools Gartner Strategic Planning Assumptions • By 2023 – 75% of all databases will be cloud, reducing the DBMS vendor landscape and increasing complexity for data governance and integration. © Copyright 2022 by Peter Aiken Slide # 44 https://anythingawesome.com https://www.gartner.com/document/3894971?ref=solrAll&refval=219836558&qid=de595a5685b6f86db0ec6
  • 27. Transform Problems with forklifting 1. no basis for decisions made 2. no inclusion of architecture/ engineering concepts 3. no idea that these concepts are missing from the process 4. 80% of organizational data is ROT © Copyright 2022 by Peter Aiken Slide # 45 https://anythingawesome.com Cleaner More shareable Less … … data Successful Cloud Adoption Gartner Cloud Vendor Offerings Data combined with easy access, becomes a key differentiator - obvious combinations include: • Google: – Google Search data – YouTube data – Google Ads data – Retailers. • Azure – LinkedIn – Office 365 data – Sales and customer-relationship-focused analytics • Amazon – Anything retail © Copyright 2022 by Peter Aiken Slide # 46 https://anythingawesome.com https://www.gartner.com/document/3894971?ref=solrAll&refval=219836558&qid=de595a5685b6f86db0ec6
  • 28. • Computer-aided software engineering (CASE) is the scientific application of a set of tools and methods to a software system which is meant to result in high-quality, defect-free, and maintainable software products. It also refers to methods for the development of information systems together with automated tools that can be used in the software development process. • Scientific application of a set of tools and methods to a software system which is meant to result in high-quality, defect free, and maintainable software products • Refers to methods for the development of information systems together with automated tools that can be used in the software development process • CASE functions include analysis, design, and programming © Copyright 2022 by Peter Aiken Slide # 47 https://anythingawesome.com Source: http://en.wikipedia.org/wiki/ Computer Aided Software/Systems Engineering (CASE) Tools CASE-based Support © Copyright 2022 by Peter Aiken Slide # 48 https://anythingawesome.com http://www.visible.com
  • 29. CASE-based Support © Copyright 2022 by Peter Aiken Slide # 49 https://anythingawesome.com http://www.visible.com CASE-based Support © Copyright 2022 by Peter Aiken Slide # 50 https://anythingawesome.com http://www.visible.com
  • 30. © Copyright 2022 by Peter Aiken Slide # 51 https://anythingawesome.com This includes: • Senders – flows from the CASE effort that can inform the re-architecting effort. • Receivers – flows from the project that can inform the CASE effort. • Senders and receivers – some elements, such as restructuring and reengineering, are both senders and receivers. CASE Tool: "Taxonomy" A variety of CASE-based methods and technologies can access and update the metadata metadata Integration Additional metadata uses accessible via: web; portal; XML; RDBMS Everything must "fit" into one CASE technology Changing Model of CASE Tool Usage © Copyright 2022 by Peter Aiken Slide # 52 https://anythingawesome.com Limited access from outside the CASE technology environment CASE tool-specific methods and technologies Limited additional metadata use
  • 31. IBM's AD/Cycle Information Model © Copyright 2022 by Peter Aiken Slide # 53 https://anythingawesome.com Implementing Metadata Repository Functionality • "The repository" does not have to be an integrated solution – it must be an easily integrateable solution • Repository functionality (does not equal a) repository – metadata must easily evolve to repository solution • Multiple repositories are not necessarily bad – as interim solutions, Excel has been working quite well • Minimal functionality includes – ability to create, read, update, delete, and evolve metadata items • Remember the 1st law of data management – In order to manage metadata, you need metadata repository functionality © Copyright 2022 by Peter Aiken Slide # 54 https://anythingawesome.com
  • 32. Defining The "E" Spaces • ETL Extract Transform, Load – delivers aggregated data to a new database • EAI Enterprise Application Integration – connects applications to other applications in a predictable manner using pre-established connections • EII Enterprise Information Integration – between ETL and EAI - delivers tailored views of information to users at the time that it is required © Copyright 2022 by Peter Aiken Slide # 55 https://anythingawesome.com Data Quality Engineering Tools • 4 categories of activities: – Analysis – Cleansing – Enhancement – Monitoring • Principal tools: – Parsing and Standardization – Data Transformation – Identity Resolution and Matching – Enhancement – Reporting – Data Profiling © Copyright 2022 by Peter Aiken Slide # 56 https://anythingawesome.com from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
  • 33. Data Preparation Activities © Copyright 2022 by Peter Aiken Slide # 57 https://anythingawesome.com from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International Data Preparation Parsing & Standardization Tools • Data parsing tools enable the definition of patterns that feed into a rules engine used to distinguish between valid and invalid data values • Actions are triggered upon matching a specific pattern • When an invalid pattern is recognized, the application may attempt to transform the invalid value into one that meets expectations © Copyright 2022 by Peter Aiken Slide # 58 https://anythingawesome.com from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International https://www.youtube.com/watch?v=r9UhJxFT5rk
  • 34. Data Preparation Identity Resolution & MatchingTools Basic approaches to matching: • Deterministic – Relies on defined patterns and rules for assigning weights and scores to determine similarity • Predictable – Only as good as anticipations of the rules developers • Probabilistic – Uses statistical techniques to assess probabilities that pairs of records represent the same entity • Not reliant on rules – Refined based on experience -> matchers can improve precision as more data is analyzed © Copyright 2022 by Peter Aiken Slide # 59 https://anythingawesome.com from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International Data Preparation Enhancement Tools • A method for adding value to information by accumulating additional information about a base set of entities and then merging all the sets of information to provide a focused view • Examples: – Time/date stamps – Auditing information – Contextual information – Geographic information – Demographic information – Psychographic information © Copyright 2022 by Peter Aiken Slide # 60 https://anythingawesome.com from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
  • 35. Data Preparation Reporting Tools • Good reporting supports: – Inspection and monitoring of conformance to data quality expectations – Monitoring performance of data stewards conforming to data quality SLAs – Workflow processing for data quality incidents – Manual oversight of data cleansing and correction • Associate report results w/: – Data quality measurement – Metrics – Activity © Copyright 2022 by Peter Aiken Slide # 61 https://anythingawesome.com from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International Data Preparation Portals as a Tools © Copyright 2022 by Peter Aiken Slide # 62 https://anythingawesome.com
  • 36. Data Preparation Profiling Tools • Need to be able to distinguish between good and bad data before making any improvements • Data profiling is a set of algorithms for 2 purposes: – Statistical analysis and assessment of the data quality values within a data set – Exploring relationships that exist between value collections within and across data sets © Copyright 2022 by Peter Aiken Slide # 63 https://anythingawesome.com from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International Sample Existing Environment © Copyright 2022 by Peter Aiken Slide # ? 64 https://anythingawesome.com
  • 37. Sample Existing Environment © Copyright 2022 by Peter Aiken Slide # VSAM, DB2, Oracle, Client Server, IDMS, Flat files 65 https://anythingawesome.com Sample Existing Environment © Copyright 2022 by Peter Aiken Slide # RDBMS 1 Finance HR RDBMS 2 Marketing R & D # 1 R & D # 2 R&D #3 Network Database BackOffice Applications Manufacturing Systems Flat Files Logistics Systems Flat Files 66 https://anythingawesome.com
  • 38. Sample Existing Environment © Copyright 2022 by Peter Aiken Slide # Technology Logical/ Virtual Databases Subject Areas Tables Attributes Unique Attributes Records Programs/ Copybooks Lines of Code Data Management Technology Type 1 44 20 13,067 6,600 Global schema 1,049 T01 5 C01 4 Data Management Technology Type 2 R2 613 R3 108 R5 127 R7 447 R72 5,996 R73 11,224 Data Management Technology Type 3 1,227 9,000 2,514 Application 10,970 15,700,000 Copybooks 5,518 498,966 Totals: 44 20 19,742 22,067 9,114 1,058 16,488 16,198,966 67 https://anythingawesome.com © Copyright 2022 by Peter Aiken Slide # Profiling Discovery Analysis Data Discovery Technologies • Data analysis software technologies deliver up to 10X productivity over manual approaches • Based on a powerful computing technology that allows data engineers to quickly form candidate hypotheses with respect to the existing data structures • Hypotheses are then presented to the SMEs (both business and technical) who confirm, refine, or deny them • Allows existing data structures to be inferred at rate that is an order of magnitude more effective than previous manual approaches • Semi-automated 68 https://anythingawesome.com
  • 39. How has this been done in the past? • Old – Manually – Brute force – Repository dependent – Quality indifferent – Not repeatable • New – Semi-automated – Engineered – Repository independent – Integrated quality – Repeatable – Currency – Accuracy © Copyright 2022 by Peter Aiken Slide # 69 https://anythingawesome.com © Copyright 2022 by Peter Aiken Slide # 70 https://anythingawesome.com Semi- automating Reverse Engineering: Column Profiling with Attribute Summary Report
  • 40. © Copyright 2022 by Peter Aiken Slide # 71 https://anythingawesome.com Semi-automating Reverse Engineering: Column Profiling, Compare Documented vs. Actual © Copyright 2022 by Peter Aiken Slide # Screen shots of Migration Architect used by permission of Evoke Software http://www.evokesoft.com 72 https://anythingawesome.com Semi-automating Reverse Engineering: Column Profiling, Drilling Down on Column Values
  • 41. © Copyright 2022 by Peter Aiken Slide # 73 https://anythingawesome.com Select an Attribute to get a list of values Double-click a value to see rows with that value © Copyright 2022 by Peter Aiken Slide # Column Profiling Demonstration 74 https://anythingawesome.com
  • 42. © Copyright 2022 by Peter Aiken Slide # 75 https://anythingawesome.com Semi-automating Reverse Engineering: Dependency Profiling, Candidate Dependencies © Copyright 2022 by Peter Aiken Slide # 76 https://anythingawesome.com Semi-automating Reverse Engineering: Dependency Profiling, Promoting Dependencies
  • 43. © Copyright 2022 by Peter Aiken Slide # Dependency Profiling Demonstration 77 https://anythingawesome.com © Copyright 2022 by Peter Aiken Slide # 78 https://anythingawesome.com Semi-automating Reverse Engineering: Redundancy Profiling, Domain Comparison Detail
  • 44. © Copyright 2022 by Peter Aiken Slide # 79 https://anythingawesome.com Redundancy Profiling Demonstration Comparing Weekly Progress (80/20) © Copyright 2022 by Peter Aiken Slide # MONDAY Morning: Model preparation Afternoon: Model refinement/ validation session TUESDAY Afternoon: Model refinement/ validation session WEDNESDAY Afternoon: Model refinement/ validation session THURSDAY Afternoon: Model refinement/ validation session FRIDAY Afternoon: Model refinement/ validation session MONDAY Morning: Model preparation Afternoon: Model preparation TUESDAY Morning: Model preparation Afternoon: Model refinement/ validation session WEDNESDAY Morning: Model preparation Afternoon: Model preparation THURSDAY Morning: Model preparation Afternoon: Model refinement/ validation session FRIDAY Morning: Model preparation Afternoon: Model preparation 80 https://anythingawesome.com Morning: Model refinement/ validation session Morning: Model preparation Morning: Model refinement/ validation session Morning: Model preparation
  • 45. Baseline Relative Condition & Amount of Evidence [ ] Confounding characteristics Data Handling, Operating Environment & Language Factor (Factor => 1) [ ] [ Beneficial characteristics Key End User Participation & Net Automation Impact (Impact =<1) ] Historical organizational reverse engineering performance data [ ] = Project characteristics * The purpose of the Preliminary System Survey is to determine how long and how many resources will be required to reverse engineer the selected system components. [ ] Project characteristics = Project Estimate Preliminary System Survey (PSS*) © Copyright 2022 by Peter Aiken Slide # 81 https://anythingawesome.com Ancillary Results • 4.7 billion empty bytes in just three data warehouse tables – Reducing the need to upgrade company infrastructure capacity – Made a strong case for normalization – Ovation for the team from the Data Warehouse Board of Directors • Preserved multi-million dollar US Postal Service (and other) sort discounts • Accurate measurable views of how effectively certain processes work • Hundreds of files have processed by the domain profiling process. • Many of the files through initial stages of normalization © Copyright 2022 by Peter Aiken Slide # 82 https://anythingawesome.com
  • 46. © Copyright 2022 by Peter Aiken Slide # https://anythingawesome.com Program Program 83 • Motivation • Data Preparation Considerations – No standard data curricula – No standard audience – Technology is a one-legged stool • Data Problems are Different – Dependence on high speed automation – Hidden data factories sap resources – Require a unified approach • Reverse Engineering (Introducing Yourself to a Dataset) – No measures (other than size) – Hype Cycle – Column Profiling – Dependency Profiling – Redundancy Profiling • Take Aways/References/Q&A 0% 50% 100% Data Analysis Data Preparation 80% 20% Supply/demand for data talent https://www.logianalytics.com/bi-trends/3-keys-understanding-data/ Growth of Data vs. Growth of Data Analysts • Stored data accumulating at 28% annual growth rate • Data analysts in workforce growing at 5.7% growth rate © Copyright 2022 by Peter Aiken Slide # 84 https://anythingawesome.com
  • 47. Unmatched Items Ignorable Items Items Matched Week # (% Total) (% Total) (% Total) 1 31.47% 1.34% N/A 2 21.22% 6.97% N/A 3 20.66% 7.49% N/A 4 32.48% 11.99% 55.53% … … … … 14 9.02% 22.62% 68.36% 15 9.06% 22.62% 68.33% 16 9.53% 22.62% 67.85% 17 9.5% 22.62% 67.88% 18 7.46% 22.62% 69.92% Determining Diminishing Returns © Copyright 2022 by Peter Aiken Slide # Before After 85 https://anythingawesome.com Quantifying Benefits: Original Plan © Copyright 2022 by Peter Aiken Slide # 86 https://anythingawesome.com Time needed to review all NSNs once over the life of the project: NSNs 2,000,000 Average time to review & cleanse (in minutes) 5 Total Time (in minutes) 10,000,000 Time available per resource over a one year period of time: Work weeks in a year 48 Work days in a week 5 Work hours in a day 7.5 Work minutes in a day 450 Total work minutes/year 108,000 Person years required to cleanse each NSN once prior to migration: Minutes needed 10,000,000 Minutes available person/year 108,000 Total Person-Years 92.6 Resource Cost to cleanse NSN's prior to migration: Avg salary for SME year (not including overhead) $60,000.00 Projected years required to cleanse/total DLA person years saved 93 Total cost to cleanse/Total DLA savings to cleanse NSN's: $5.5 million
  • 48. Quantifying Benefits: Revised Plan © Copyright 2022 by Peter Aiken Slide # 87 https://anythingawesome.com Time needed to review all NSNs once over the life of the project: NSNs 2,000,000 Average time to review & cleanse (in minutes) 5 Total Time (in minutes) 10,000,000 Time available per resource over a one year period of time: Work weeks in a year 48 Work days in a week 5 Work hours in a day 7.5 Work minutes in a day 450 Total work minutes/year 108,000 Person years required to cleanse each NSN once prior to migration: Minutes needed 10,000,000 Minutes available person/year 108,000 Total Person-Years 92.6 Resource Cost to cleanse NSN's prior to migration: Avg salary for SME year (not including overhead) $60,000.00 Projected years required to cleanse/total DLA person years saved 93 Total cost to cleanse/Total DLA savings to cleanse NSN's: $5.5 million Time needed to review all NSNs once over the life of the project: NSNs 150,000 Average time to review & cleanse (in minutes) 5 Total Time (in minutes) 750,000 Time available per resource over a one year period of time: Work weeks in a year 48 Work days in a week 5 Work hours in a day 7.5 Work minutes in a day 450 Total work minutes/year 108,000 Person years required to cleanse each NSN once prior to migration: Minutes needed 750,000 Minutes available person/year 108,000 Total Person-Years 7 Resource Cost to cleanse NSN's prior to migration: Avg salary for SME year (not including overhead) $60,000.00 Projected years required to cleanse/total DLA person years saved 7 Total cost to cleanse/Total DLA savings to cleanse NSN's: $420,000 Quantifying Benefits: Social Engineering © Copyright 2022 by Peter Aiken Slide # 88 https://anythingawesome.com Time needed to review all NSNs once over the life of the project: NSNs 2,000,000 Average time to review & cleanse (in minutes) 5 Total Time (in minutes) 10,000,000 Time available per resource over a one year period of time: Work weeks in a year 48 Work days in a week 5 Work hours in a day 7.5 Work minutes in a day 450 Total work minutes/year 108,000 Person years required to cleanse each NSN once prior to migration: Minutes needed 10,000,000 Minutes available person/year 108,000 Total Person-Years 92.6 Resource Cost to cleanse NSN's prior to migration: Avg salary for SME year (not including overhead) $60,000.00 Projected years required to cleanse/total DLA person years saved 93 Total cost to cleanse/Total DLA savings to cleanse NSN's: $5.5 million
  • 49. Take Aways • Too much investment is spent focused on tools and technology at the expense of problem understanding • It is useful to understand data management technologies and their use as part of a people process & technology (3-legged) stools • Value that can be gained by profiling data • Data volume is still increasing faster than we are able to process it • Data interchange overhead and other costs of poor data practices are measurably sapping organization and individual resources–and therefore productivity • Reliance on existing technology-based approaches and education methods has not materially addressed this gap between creation and processing or reduced bottom line costs © Copyright 2022 by Peter Aiken Slide # 89 https://anythingawesome.com R. Buckminster Fuller [ Clicking any webinar title will link directly to the registration page ] Upcoming Events Conceptual vs. Logical vs. Physical Data 12 July 2022 The Importance of Metadata 9 August 2022 Data Preparation Fundamentals 13 September 2022 © Copyright 2022 by Peter Aiken Slide # 90 https://anythingawesome.com Brought to you by: Time: 19:00 UTC (2:00 PM NYC) | Presented by: Peter Aiken, PhD
  • 50. Peter.Aiken@AnythingAwesome.com +1.804.382.5957 Thank You! © Copyright 2022 by Peter Aiken Slide # 91 Book a call with Peter to discuss anything - https://anythingawesome.com/OfficeHours.html Critical Design Review? Hiring Assistance? Reverse Engineering Expertise? Executive Data Literacy Training? Mentoring? Tool/automation evaluation? Use your data more strategically?