SlideShare ist ein Scribd-Unternehmen logo
1 von 28
© 2016 IBM Corporation
IBM Industry Models and the IBM Data Lake
January 2017
Pat O’Sullivan – IBM Analytics
Email : posulliv@ie.ibm.com
Twitter : @PatOSullivanIBM
© 2017 IBM Corporation
© 2015 IBM Corporation2 © 2017 IBM Corporation
Disclaimer
 IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without
notice at IBM’s sole discretion.
 Information regarding potential future products is intended to outline our general product direction and it
should not be relied on in making a purchasing decision.
 The information mentioned regarding potential future products is not a commitment, promise, or legal
obligation to deliver any material, code or functionality. Information about potential future products may
not be incorporated into any contract.
 The development, release, and timing of any future features or functionality described for our products
remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled
environment. The actual throughput or performance that any user will experience will vary depending upon
many factors, including considerations such as the amount of multiprogramming in the user’s job stream,
the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can
be given that an individual user will achieve results similar to those stated here.
2
© 2015 IBM Corporation3 © 2017 IBM Corporation
SOA
The broadening scope of analytics
Master Data
Management
Hub
Applications Data
Warehouse
Pattern
Discovery for
Analytics
Operational
Data Store
Adding in a business desire for real-time analytics,
self service data and increasing regulations relating
to individual privacy, it becomes necessary to have a
well- defined, managed and governed approach to
information architecture. We call this IBM’s data
Lake.
SAND
BOXES
Analyze
Values
Search
For Data
Reporting
Data
Lake
Hadoop
© 2015 IBM Corporation4 © 2017 IBM Corporation
Big Data Lakes or Swamps?
 As we collect data
‱ Can we preserve clarity?
‱ Do we know what we are collecting?
‱ Can we find the data we need?
 Are we creating a data swamp?
 How do we build trust in big data?
‱ Do we know what data is being used
for?
© 2015 IBM Corporation5 © 2017 IBM Corporation
The Data Lake
Data Lake = Efficient Management, Governance, Protection and Access.
Data Lake
Information Management and Governance Fabric
Data Lake Services
Data Lake Repositories
© 2015 IBM Corporation6 © 2017 IBM Corporation
Users supported by the Data Lake
Data Lake (System of Insight)
Information Management and Governance Fabric
Data Lake Services
Analytics
Teams
Governance, Risk and
Compliance Team
Information
Curator
Line of Business
Teams
Data Lake
Operations
Data Lake Repositories
Enterprise IT
Other Data
Lakes
Systems of
Engagement
Systems of
Automation
Systems of
Record
New Sources
© 2015 IBM Corporation7 © 2017 IBM Corporation
The Data Lake subsystems
Data Lake (System of Insight)
Information Management and Governance Fabric
Catalogue
Self-
Service
Access
Enterprise
IT Data
Exchange
Self-Service
Access
Analytics
Teams
Governance, Risk and
Compliance Team
Information
Curator
Line of Business
Teams
Data Lake
Operations
Enterprise IT
Other Data
Lakes
Systems of
Engagement
Data Lake Repositories
Systems of
Automation
Systems of
Record
New Sources
© 2015 IBM Corporation8 © 2017 IBM Corporation
Data lake repositories
Specialist Processing
Structured and Optimized
System-level Data
(Landing Area)
Accumulation of Context for
Master and Reference Data
Self-managed DataMetadata
Refined data formatted for
particular consumers
© 2015 IBM Corporation9 © 2017 IBM Corporation
IBM Industry Data Models
IBM Industry Data Models provide pre-defined data structures which help accelerate data warehouse,
data lake and business intelligence projects.
Industry specific issues
being addressed
Integrated set of Models
from business requirements
to low level design
Predefined and pretested
deployment to RDBMS and
HDFS environments
IBM Industry Data Models
KPIsBusiness Vocabulary
Atomic DW Models Dimensional Models
Banking Insurance Fin Markets Retail Healthcare Telecom E&U
Customer Insight Profitability Risk Regulatory Compliance
ProjectAcceleration
Technical
Business
Analysis ModelsData Classifications
Business
Models
Analysis
Models
Design
Models
Supportive Terms
Data
Warehouse
Operational
Data Store
Big DataData
Marts
Information Integration & Governance
© 2015 IBM Corporation10 © 2017 IBM Corporation
IBM Industry Models and main data lake deployment paths
Business Vocabulary is deployed to Data
Lake Catalog via tools such as InfoSphere
Information Governance Catalog (IGC)
Atomic (Inmon) and Dimensional (Kimball)
Data Models deployed to data lake via
tools such as InfoSphere Data Architect
(IDA) and ERwin
Supporting collateral
Models-specific white papers and best practice docs outlining the main
deployment patterns and implementation considerations
© 2015 IBM Corporation11 © 2017 IBM Corporation
Overall set of Models
Business
Terms/
FSDMSupportive
Content
Analytical
Requirements
Atomic
Warehouse
Model
Dimensional
Warehouse
Models
Business
Vocabulary
(IGC)
Analysis level
Models (IDA)
Design level
Models (IDA)
Data
Models
Business Data
Model
© 2015 IBM Corporation12 © 2017 IBM Corporation
Data Lake
View-
based
Interaction
Big Data Landscape – main components touched by the IBM Data Models
Line of Business
Applications
Simple,
Ad Hoc
Discovery
and
Analysis
Reporting
Information
Service Calls
Search
Requests
Report
Requests
Understand
Information
Sources
Understand
Information
Sources
Deploy
Decision
Models
Understand
Compliance
Report
Compliance
Information
Service Calls
Data
Access
Catalog
Interfaces
Advertise
Information
Source
Deploy
Real-time
Decision
Models
Enterprise IT
Interaction
Data Reservoir
Operations
Curation
Interaction
Management
Data
Access
Data
Deposit
Data
Deposit
Raw Data
Interaction
Information Integration
& Governance
Repositories
Decision Model
Management
Governance, Risk and
Compliance Team
Information
Curator
Enterprise IT
Events to
Evaluate
Information
Service Calls
Data Out
Data In
Other Systems
Of Insight
Notifications
System of
Record
Applications
Enterprise
ServiceBus
New Sources
Third Party Feeds
Third Party APIs
Systems of
Engagement
Internal Sources
Other Systems
Of Insight
Deploy
Real-time
Decision
Models
Published
Data
Harvested
Data
INFORMATION
WAREHOUSE
DEEP DATA
Historical
Data
Descriptive
Data
CATALOG
OPERATIONAL
HISTORY
REPORTING
DATA
MARTS
SAND
BOXES
Full info on the IBM Data Lake Reference Architecture see IBM Redbook : Designing and Operating a Data Reservoir
http://www.redbooks.ibm.com/Redbooks.nsf/RedpieceAbstracts/sg248274.html?Open
© 2015 IBM Corporation13 © 2017 IBM Corporation
Options regarding common models/glossaries to encourage
standardization and reuse
Data
Access
Enterprise IT
System of
Record
Application
s
Enterprise
ServiceBus
New Sources
Third Party Feeds
Third Party APIs
Systems of
Engagement
Internal Sources
Enterprise IT
Interaction
Information
Service Calls
Data Out
Publishing
Feeds
Service
Interfaces
Data In
Information
Integration &
Governance
Data
Ingestion
Deploy
Decision
Models
Information
Service Calls
Data
Access
Deploy
Real-
time
Decision
Models
Data
Deposit
Deploy
Real-time
Decision
Models
View-based
Interaction
Published
OBJECT
CACHE
Repositories
Shared
Operational
Data
ASSET
HUB
EXECUTION
ENGINES
WORKFLOWMONITOR
Information
Service Calls
Search
Requests
Curation
Interaction
Management
Data
Deposit
Report
Requests
Harvested
Data
Historical
Data
DEEP DATA
OPERATIONAL
HISTORY
INFORMATIONWAREHOUSE
REPORTING
DATA
MARTS
Line of Business
Applications
Consumers of
Insight
Simple, ad hoc
Discovery
and Analysis
Reporting
Analytical Insight
Applications
Descriptive
Data
CATALOG
SAND
BOXES
Data Analysts/Data Scientists
Analytics Tools
Data Management Operations
Shared set of
term and
physical asset
definitions in
the Catalog that
underpin all
queries by all
users
Data Scientists can make use of predefined catalogs and likely to
create new catalog entries during their daily activities
Business Users
use specific
subsets of the
same shared
Catalog as users
to ensure
consistency of
language and
meaning
Any published structures required by the Business are based on
the same standard definitions and structures as those used
elsewhere
Standardized
set of Business
Term and Data
Model
definitions used
to enforce both
the meaning
and where
appropriate
structure of
stored data
Data Management Operations use the same
shared set of models and catalog entries to
build the necessary production ETL assets
© 2015 IBM Corporation14 © 2017 IBM Corporation
Catalog Deployment - Models in the Descriptive Data Zone
Business
Terms/
FSDMSupportive
Content
Analytical
Requirements
Atomic
Warehouse
Model
Dimensional
Warehouse
Models
Business
Vocabulary
(IGC)
Analysis level
Models (IDA)
Design level
Models (IDA),
Purpose
Provide a standard business language and information
model that can be used when discussing business concepts
and related technical components.
Steps
1. Business Vocabulary Models are deployed to the
Catalog (IGC) where they used and maintained by
business analysts and data stewards
2. The Logical data Models (eg. Business and Atomic &
Dimensional Warehouse Models) are be imported
into the catalog. However they are mastered in a
modelling tool like InfoSphere Data Architect
Considerations
 Evolving patterns/best practices for the overall
management of enterprise and LOB glossaries
Repositories
Harvested
Data
Historical
Data
Enterprise IT
Interaction
Shared
Operational
DataInformation
Service Calls
Data Out
Publishing
Feeds
Service
Interfaces
Data In
Data
Ingestion
Enterprise IT
System of
Record
Applications
Enterprise
ServiceBus
New Sources
Third Party Feeds
Third Party APIs
Systems of
Engagement
Internal Sources
ASSET
HUB
DEEP DATA
OPERATIONAL
HISTORY
INFORMATION WAREHOUSE
REPORTING
DATA
MARTS
Information
Integration &
Governance
2
1
SAND
BOXES
Business Users
Data Scientists
Business Data
Model
Descriptive
Data
CATALOG
Descriptive Data Zone
© 2015 IBM Corporation15 © 2017 IBM Corporation
Repositories
Harvested
Data
Historical
Data
Enterprise IT
Interaction
Shared
Operational
DataInformation
Service Calls
Data Out
Publishing
Feeds
Service
Interfaces
Data In
Data
Ingestion
Enterprise IT
System of
Record
Applications
Enterprise
ServiceBus
New Sources
Third Party Feeds
Third Party APIs
Systems of
Engagement
Internal Sources
ASSET
HUB
OPERATIONAL
HISTORY
Information
Integration &
Governance
Descriptive
Data
CATALOG
Business
Terms
Supportive
Content
Analytical
Requirements
Warehouse and Marts – Models in Integrated Warehouse Zone
Atomic
Warehouse
Model
Dimensional
Warehouse
Models
Business
Vocabulary
(IGC)
Purpose
Provide data modellers with consistent data structures
for deployment across the different aspects of an
integrated Information Warehouse and Marts zone.
Steps
1. The Atomic Warehouse Model is used as the basis
for the Inmon-style central relational Information
Warehouse
2. The Dimensional Warehouse Model is used as the
basis for the Kimball-style Dimensional
Information Warehouse.
3. The Dimensional Warehouse Model provides the
business-issue-specific structures to enable the
deployment of Reporting Data Marts.
I
Integrated Warehouse & Marts Zone
DEEP DATA
INFORMATION WAREHOUSE
3
1
2
REPORTING
DATA
MARTS
Business Users
Analysis level
Models (IDA)
Design level
Models (IDA),
© 2015 IBM Corporation16 © 2017 IBM Corporation
Repositories
Harvested
Data
Historical
Data
Enterprise IT
Interaction
Shared
Operational
DataInformation
Service Calls
Data Out
Publishing
Feeds
Service
Interfaces
Data In
Data
Ingestion
Enterprise IT
System of
Record
Applications
Enterprise
ServiceBus
New Sources
Third Party Feeds
Third Party APIs
Systems of
Engagement
Internal Sources
ASSET
HUB
INFORMATION WAREHOUSE
Information
Integration &
Governance
Dimensional
Warehouse
Models
Business
Terms
Supportive
Content
Analytical
Requirements
Big Data Deployment – Models in the Landing Area Zone
Atomic
Warehouse
Model
Business
Vocabulary
(IGC)
Purpose
Provide the basis for a consistent and appropriate use
of schemas in the different repositories in the Landing
Area Zone.
Steps
1. Atomic Warehouse Model used as the basis for
the deployment for both schema-at-write and
schema-at-read Hadoop Deep Data structures
2. Atomic Warehouse Model may provide the
basis for deployment for schema-at-read for
Operational History raw data structures
Considerations
 Further investigation needed into the potential
role for DWM deployments to Hadoop-based
technology
Landing Area
Zone
2
1
DEEP DATA
OPERATIONAL
HISTORY
REPORTING
DATA
MARTS
SAND
BOXES
Business Users
Data Scientists
Analysis level
Models (IDA)
Design level
Models (IDA),
Descriptive
Data
CATALOG
© 2015 IBM Corporation17 © 2017 IBM Corporation
Information
Integration &
Governance
Descriptive
Data
CATALOG
Repositories
Shared
Operational
Data
ASSET
HUB
Harvested
Data
Historical
Data
Enterprise IT
Interaction
Information
Service Calls
Data Out
Publishing
Feeds
Service
Interfaces
Data In
Data
Ingestion
Enterprise IT
System of
Record
Applications
Enterprise
ServiceBus
New Sources
Third Party Feeds
Third Party APIs
Systems of
Engagement
Internal Sources
DEEP DATA
OPERATIONAL
HISTORY
INFORMATION WAREHOUSE
REPORTING
DATA
MARTS
SAND
BOXES
Business Users
Data Scientists
Summary Picture
Physical Model
Hadoop
Physical
Model RDBMS
Physical Model
Dimensional
Logical Model
Atomic
Logical Model
Dimensional
Business Vocabulary
Mappings to inform common Business
Meaning using the Business Vocabulary in IGC
Generation of Technical Structure using
the ER Data Models in ER tool (e.g. IDA)
Legend
Use of Business Vocabulary to understand
Business Meaning by Users
‱ The Business Vocabulary Terms in IGC can be used to enforce common
business meaning through out the Data lake landscape
‱ The output of the various Logical Models can be used to define the
technical structure of assets in the lake that need to be created. Where
a predefined schema is required (e.g. Schema at Write)
4
1 2 3
5
6
7
8
9
10
© 2015 IBM Corporation18 © 2017 IBM Corporation
Three different lifecycles relating to the evolution of the models with the
Data Lake
Analysis
Refine
Deploy
Review
Requirement
Maintenance of the
Business Language
AR
BT
SG
Analysis
Design
Generate
Review
Requirement
Development of the
ER/UML Models
AWM DWM
The use of the Industry Models
Business Vocabularies to enable a
common Business meaning of
language by all Data Lake users
The use of the
Industry Models
Business
Vocabularies and
derived physical
assets in the
creation and
ongoing
management of
the Data Lake
The use of the ER and UML models
to enforce a common structure of
artifacts where required in the Data
Lake
BDM
BT - Business Terms
AR - Analytical Requirements
SG - Supportive Glossaries
BDM - Business Data Model
AWM - Atomic Warehouse Model
DWM - Dimensional Warehouse Model
Legend
AWM
(Physical)
DWM
(Physical)
Management of the runtime
production environment
BT
Data Lake
Repositories
Data Lake
Catalog
Data
Data Lake
Users
© 2015 IBM Corporation19 © 2017 IBM Corporation
The Repositories used by the Data Lake Lifecycles
IGC Dev
Repository
Modelling
Environment
Collaboration/Versioning
Repository (e.g. RTC)
Business
Language
Environment
Runtime Data Lake
Environment
IGC Production
Repository
Data
Repositories
RDBMS
IGC Browser
IDA
IGC for Eclipse
Data
Repositories
HDFS
Data Lake Repositories
Data Lake Catalog
IGC Anywhere/REST
IGC Browser
IMAM IDA Import
IMAM
Physical Data
Model
IGC
Workflow
© 2015 IBM Corporation20 © 2017 IBM Corporation
Lifecycle 1 - Maintaining the Business Language of the Data Lake
 Objective : The creation and ongoing maintenance of the
common Business Language to be used by all users to describe
the various components of the Data Lake oi underpin the Data
Lake
 Roles Involved : Business user reps, Business SMEs, Business
Language Stakeholders
Analysis
Refine
Deploy
Review
Requirement
Maintenance of the
Business Language
AR
BT
SG
 Considerations:
‱ Determining the needs of the different users of
the Data Lake (different uses, need for different
dialects, amount of technical metadata in the
Language)
‱ Determining the approach to building the
business language, the overall flow for
creation, promotion and maintenance of terms
‱ Defining the specific glossary suitable for pure
business users , versus Business Analysts, Data
Scientists, Data Modellers and IT staff
‱ Determining the role of using IBM Industry
Models to build out the Business Language
© 2015 IBM Corporation21 © 2017 IBM Corporation
Lifecycle 2 - Developing the technical Models
 Objective : The use of the ER and UML models to enforce a common
structure of artifacts where required in the Data Lake
 Roles Involved : Modellers, Business SMEs,
 Considerations:
‱ Ensuring the appropriate communications
between the Data Modellers and the
Business Users
‱ Determining when to use and not to use
Data models for the data lake repositories
‱ Determining the ongoing use of a Canonical
Platform Independent Logical Model as a
basis for the deployment of the different
types of Platform specific, physical Models
required across the Data Lake Repositories
‱ Determining the specific data modelling
approaches and scenarios for deploying to
the different Data lake repositories.
Analysis
Design
Generate
Review
Requirement
Development of the
ER/UML Models
AWM DWM
BDM
© 2015 IBM Corporation22 © 2017 IBM Corporation
Lifecycle 3 - Deploying the Models into the runtime Data Lake environment
 Objective : The use of the Industry Models Business Vocabularies
and derived physical assets in the creation and ongoing
management of the Data Lake
 Roles Involved : Business user reps, Modellers, Data Lake Ops staff
 Considerations:
‱ Determining how to deploy the Business
Language for optimal use by the different Data
Lake users (management access to the
different terms, handling of ongoing updates)
‱ Determine the strategy for the ongoing
association of the Business Terms with Data
Assets (which users tag new data elements
with the Business Language and when)
‱ What is the approach for the Data Lake ops
staff to deploy the physical Data Models – how
is feedback to the Data Modellers handled.
‱ How to incorporate the Data Model artifacts
into the ongoing Data Lake governance aspects
AWM
(Physical)
DWM
(Physical)
Management of the runtime
production environment
BT
Data Lake
Repositories
Data Lake
Catalog
Data
Data Lake
Users
© 2015 IBM Corporation23 © 2017 IBM Corporation
Claim
File
Patient
Information
File
Sample Source
Data
/data/udmh/patient/<date>/<version>/.. Data files..
Data
Transformation
Process
(Hive,Spark, Pig,
ETL, ..)
Data
Transformation
Process
(Hive,Spark, Pig,
ETL, ..)
Hive
Metastore
Patient party ext Table
HIVE
Vendor SQL
for Hadoop
interface
/data/udmh/claim/<date>/<version>/.. Data files..
Claim ext Table
Logical
Data
Model
Physical
Data
Model
Patient ClaimPatient
/ Claim
Patient Claim
Downstream
Data
Transformation
processes
1
23
Industry Models Hadoop deployment example – low level
HDFS
Three possible
deployment
paths
© 2015 IBM Corporation24 © 2017 IBM Corporation
Mapping of incoming new structures in the Data Lake
IGC Dev
Repository
Runtime Data Lake
Environment
IGC Production
Repository
Data
Repositories
RDBMS
IDA
IGC for Eclipse
Data
Repositories
HDFS
Data Lake Repositories
Data Lake Catalog
IGC Anywhere/REST
IGC Browser
IMAM IDA Import
IMAM
Physical Data
Model
IGC
Workflow
New HDFS
Structure
1
2a
2b
2c
Question about what are the best practices for the “Bottom-up” mapping of a new structure in the data lake which has
not been originally derived from a Data Model.
1. Direct mapping from the Physical Asset to the appropriate Term in the Catalog
2. Indirect mapping via a specifically created data model (actual mapping done either via BGE or in BG Browser)
a. Reverse engineer a new model from the HDFS Structure
b. Import the Data model into the Catalog
c. Import the mappings into the Catalog from IDA (is mapping done in IDA via BGE)
© 2015 IBM Corporation25 © 2017 IBM Corporation
Model artifacts in the Data Lake Runtime environment – main usage
patterns
There are three main categories ways in which the data model artifacts are used in or impact the Data Lake runtime
environment
‱ Industry Model artifacts are deployed
into the Data Lake runtime
environment
‱ Most likely as an output from the two
lifecycles “Maintaining the Business
Language” and “Deploying the
Technical Models”
‱ Industry Model artifacts deployed in
the Data lake are used by and effected
by Data Lake users
‱ For example , Data lake users provide
feedback on
changes/corrections/additions to the
model artifacts
‱ Industry Model artifacts deployed in
the Data lake are impacted by new or
changed data coming into the Data
Lake Repositories
‱ The most obvious example is the need
for new mappings to a new or
changed Repository brought into the
Data Lake.
© 2015 IBM Corporation26 © 2017 IBM Corporation
REFERENCE MATERIAL
New Information Architectures and Capabilities
© 2015 IBM Corporation27 © 2017 IBM Corporation
Designing and Operating a Data Reservoir
 Description of the behaviour and
processes that make up a data
reservoir (IBM’s Data Lake)
 Blog
‱ 5 things to know about a data
reservoir
https://www.ibm.com/developerwo
rks/community/blogs/5things/entry
/5_things_to_know_about_data_res
ervoir?lang=en
 Redbook
‱ http://www.redbooks.ibm.com/Red
books.nsf/RedpieceAbstracts/sg248
274.html?Open
© 2015 IBM Corporation28 © 2017 IBM Corporation
IBM Industry Models and Data lake publications so far :
http://www-01.ibm.com/common/ssi/cgi-
bin/ssialias?htmlfid=IMW14877USEN
http://www-01.ibm.com/common/ssi/cgi-
bin/ssialias?htmlfid=IMW14872USEN
http://www-
01.ibm.com/common/ssi/cgi-
bin/ssialias?htmlfid=IMW14877US
EN
http://www-
01.ibm.com/common/ssi/cgi-
bin/ssialias?htmlfid=IMW14872US
EN
https://www-
01.ibm.com/common/ssi/cgi-
bin/ssialias?htmlfid=IMW14911IEEN
&

Weitere Àhnliche Inhalte

Was ist angesagt?

Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture DesignKujambu Murugesan
 
Data Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityData Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityDATAVERSITY
 
Effective Healthcare Data Governance Strategy Propels Data Transformation
Effective Healthcare Data Governance Strategy Propels Data TransformationEffective Healthcare Data Governance Strategy Propels Data Transformation
Effective Healthcare Data Governance Strategy Propels Data TransformationHealth Catalyst
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Data Governance for the Executive
Data Governance for the ExecutiveData Governance for the Executive
Data Governance for the ExecutiveDATAVERSITY
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...HostedbyConfluent
 
The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360Capgemini
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalakeLaurent Leturgez
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Business Intelligence (BI) and Data Management Basics
Business Intelligence (BI) and Data Management  Basics Business Intelligence (BI) and Data Management  Basics
Business Intelligence (BI) and Data Management Basics amorshed
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
 
Data Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical ApproachesData Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical ApproachesDATAVERSITY
 
Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?James Serra
 
The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...Pieter De Leenheer
 
How to identify the correct Master Data subject areas & tooling for your MDM...
How to identify the correct Master Data subject areas & tooling for your MDM...How to identify the correct Master Data subject areas & tooling for your MDM...
How to identify the correct Master Data subject areas & tooling for your MDM...Christopher Bradley
 
Customer Event Hub - the modern Customer 360° view
Customer Event Hub - the modern Customer 360° viewCustomer Event Hub - the modern Customer 360° view
Customer Event Hub - the modern Customer 360° viewGuido Schmutz
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of MetadataDATAVERSITY
 

Was ist angesagt? (20)

How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Data Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityData Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data Quality
 
Effective Healthcare Data Governance Strategy Propels Data Transformation
Effective Healthcare Data Governance Strategy Propels Data TransformationEffective Healthcare Data Governance Strategy Propels Data Transformation
Effective Healthcare Data Governance Strategy Propels Data Transformation
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Data Governance for the Executive
Data Governance for the ExecutiveData Governance for the Executive
Data Governance for the Executive
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
 
The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Business Intelligence (BI) and Data Management Basics
Business Intelligence (BI) and Data Management  Basics Business Intelligence (BI) and Data Management  Basics
Business Intelligence (BI) and Data Management Basics
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Data Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical ApproachesData Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical Approaches
 
Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?
 
The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...
 
How to identify the correct Master Data subject areas & tooling for your MDM...
How to identify the correct Master Data subject areas & tooling for your MDM...How to identify the correct Master Data subject areas & tooling for your MDM...
How to identify the correct Master Data subject areas & tooling for your MDM...
 
Customer Event Hub - the modern Customer 360° view
Customer Event Hub - the modern Customer 360° viewCustomer Event Hub - the modern Customer 360° view
Customer Event Hub - the modern Customer 360° view
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of Metadata
 

Andere mochten auch

Data Lake: A simple introduction
Data Lake: A simple introductionData Lake: A simple introduction
Data Lake: A simple introductionIBM Analytics
 
Information Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data LakesInformation Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data LakesDataWorks Summit
 
IBM Watson + Apple + IBM Bluemix
IBM Watson + Apple + IBM BluemixIBM Watson + Apple + IBM Bluemix
IBM Watson + Apple + IBM BluemixNiels JĂžrgen Hansen
 
Airlines 2020 substitution and commoditization
Airlines 2020   substitution and commoditizationAirlines 2020   substitution and commoditization
Airlines 2020 substitution and commoditizationMarinet Ltd
 
Overview of Recorded Future Intel Cards
Overview of Recorded Future Intel CardsOverview of Recorded Future Intel Cards
Overview of Recorded Future Intel CardsRecorded Future
 
The IBM Netezza Data Warehouse Appliance
The IBM Netezza Data Warehouse ApplianceThe IBM Netezza Data Warehouse Appliance
The IBM Netezza Data Warehouse ApplianceIBM Sverige
 
Big Data in Financial Services
Big Data in Financial ServicesBig Data in Financial Services
Big Data in Financial ServicesEikos Partners
 
IBM Banking videocast - 3/20/2013
IBM Banking videocast - 3/20/2013 IBM Banking videocast - 3/20/2013
IBM Banking videocast - 3/20/2013 Casey Lucas
 
FircoSoft Company Overview
FircoSoft Company OverviewFircoSoft Company Overview
FircoSoft Company OverviewFircoSoft
 
International strategic alliance between lenovo and ibm
International strategic alliance between lenovo and ibmInternational strategic alliance between lenovo and ibm
International strategic alliance between lenovo and ibmVarsha Kumari
 
Tj bot 0317ćŻŠäœœćŠ ç”„èŁçŻ‡
Tj bot 0317ćŻŠäœœćŠ ç”„èŁçŻ‡Tj bot 0317ćŻŠäœœćŠ ç”„èŁçŻ‡
Tj bot 0317ćŻŠäœœćŠ ç”„èŁçŻ‡æčŻç±łćł Tommy Wu
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 
Banking application architecture mishra
Banking application architecture mishraBanking application architecture mishra
Banking application architecture mishraAjay Mishra
 
Ibm cognitive business_strategy_presentation
Ibm cognitive business_strategy_presentationIbm cognitive business_strategy_presentation
Ibm cognitive business_strategy_presentationdiannepatricia
 
Profiting from customer profitability + big data fitzgerald analytics
Profiting from customer profitability + big data fitzgerald analyticsProfiting from customer profitability + big data fitzgerald analytics
Profiting from customer profitability + big data fitzgerald analyticsFitzgerald Analytics, Inc.
 
The cognitive bank ibm launch deck 2016
The cognitive bank ibm launch deck 2016The cognitive bank ibm launch deck 2016
The cognitive bank ibm launch deck 2016Charlie Chan
 
Combating Constantly Evolving Advanced Threats – Solution Architecture
Combating Constantly Evolving Advanced Threats – Solution ArchitectureCombating Constantly Evolving Advanced Threats – Solution Architecture
Combating Constantly Evolving Advanced Threats – Solution ArchitectureIBM Sverige
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 

Andere mochten auch (20)

Data Lake: A simple introduction
Data Lake: A simple introductionData Lake: A simple introduction
Data Lake: A simple introduction
 
Information Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data LakesInformation Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data Lakes
 
IBM Watson + Apple + IBM Bluemix
IBM Watson + Apple + IBM BluemixIBM Watson + Apple + IBM Bluemix
IBM Watson + Apple + IBM Bluemix
 
Airlines 2020 substitution and commoditization
Airlines 2020   substitution and commoditizationAirlines 2020   substitution and commoditization
Airlines 2020 substitution and commoditization
 
Overview of Recorded Future Intel Cards
Overview of Recorded Future Intel CardsOverview of Recorded Future Intel Cards
Overview of Recorded Future Intel Cards
 
GE’s Industrial Data Lake Platform
GE’s Industrial Data Lake PlatformGE’s Industrial Data Lake Platform
GE’s Industrial Data Lake Platform
 
The IBM Netezza Data Warehouse Appliance
The IBM Netezza Data Warehouse ApplianceThe IBM Netezza Data Warehouse Appliance
The IBM Netezza Data Warehouse Appliance
 
IBM - Cognitive Computing in Insurance
IBM - Cognitive Computing in InsuranceIBM - Cognitive Computing in Insurance
IBM - Cognitive Computing in Insurance
 
Big Data in Financial Services
Big Data in Financial ServicesBig Data in Financial Services
Big Data in Financial Services
 
IBM Banking videocast - 3/20/2013
IBM Banking videocast - 3/20/2013 IBM Banking videocast - 3/20/2013
IBM Banking videocast - 3/20/2013
 
FircoSoft Company Overview
FircoSoft Company OverviewFircoSoft Company Overview
FircoSoft Company Overview
 
International strategic alliance between lenovo and ibm
International strategic alliance between lenovo and ibmInternational strategic alliance between lenovo and ibm
International strategic alliance between lenovo and ibm
 
Tj bot 0317ćŻŠäœœćŠ ç”„èŁçŻ‡
Tj bot 0317ćŻŠäœœćŠ ç”„èŁçŻ‡Tj bot 0317ćŻŠäœœćŠ ç”„èŁçŻ‡
Tj bot 0317ćŻŠäœœćŠ ç”„èŁçŻ‡
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
Banking application architecture mishra
Banking application architecture mishraBanking application architecture mishra
Banking application architecture mishra
 
Ibm cognitive business_strategy_presentation
Ibm cognitive business_strategy_presentationIbm cognitive business_strategy_presentation
Ibm cognitive business_strategy_presentation
 
Profiting from customer profitability + big data fitzgerald analytics
Profiting from customer profitability + big data fitzgerald analyticsProfiting from customer profitability + big data fitzgerald analytics
Profiting from customer profitability + big data fitzgerald analytics
 
The cognitive bank ibm launch deck 2016
The cognitive bank ibm launch deck 2016The cognitive bank ibm launch deck 2016
The cognitive bank ibm launch deck 2016
 
Combating Constantly Evolving Advanced Threats – Solution Architecture
Combating Constantly Evolving Advanced Threats – Solution ArchitectureCombating Constantly Evolving Advanced Threats – Solution Architecture
Combating Constantly Evolving Advanced Threats – Solution Architecture
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 

Ähnlich wie IBM Industry Models and Data Lake

OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMBig Data Joeℱ Rossi
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMBig Data Joeℱ Rossi
 
Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Jeffrey T. Pollock
 
IMS08 the momentum driving the ims future
IMS08   the momentum driving the ims futureIMS08   the momentum driving the ims future
IMS08 the momentum driving the ims futureRobert Hain
 
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineAmazon Web Services
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise DataWorks Summit
 
How Businesses use Big Data to Impact the Bottom Line
How Businesses use Big Data to Impact the Bottom LineHow Businesses use Big Data to Impact the Bottom Line
How Businesses use Big Data to Impact the Bottom LineEnterprise Management Associates
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Cynthia Saracco
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesDataWorks Summit
 
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesFbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesCindy Irby
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunitiesBigdata Meetup Kochi
 
Big Data – wie aus Daten strategische Resourcen und Ihr Wettbewerbsvorteil we...
Big Data – wie aus Daten strategische Resourcen und Ihr Wettbewerbsvorteil we...Big Data – wie aus Daten strategische Resourcen und Ihr Wettbewerbsvorteil we...
Big Data – wie aus Daten strategische Resourcen und Ihr Wettbewerbsvorteil we...IBM Switzerland
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data PlatformVikas Manoria
 
Make from your it department a competitive differentiator for your business
Make from your it department a competitive differentiator for your businessMake from your it department a competitive differentiator for your business
Make from your it department a competitive differentiator for your businessMarcos Quezada
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceIBM Cloud Data Services
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML Amazon Web Services
 
IMS integration 2017
IMS integration 2017IMS integration 2017
IMS integration 2017Helene Lyon
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLPreparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLAmazon Web Services
 

Ähnlich wie IBM Industry Models and Data Lake (20)

OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
 
Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!
 
IMS08 the momentum driving the ims future
IMS08   the momentum driving the ims futureIMS08   the momentum driving the ims future
IMS08 the momentum driving the ims future
 
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise
 
How Businesses use Big Data to Impact the Bottom Line
How Businesses use Big Data to Impact the Bottom LineHow Businesses use Big Data to Impact the Bottom Line
How Businesses use Big Data to Impact the Bottom Line
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesFbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_services
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunities
 
Big Data – wie aus Daten strategische Resourcen und Ihr Wettbewerbsvorteil we...
Big Data – wie aus Daten strategische Resourcen und Ihr Wettbewerbsvorteil we...Big Data – wie aus Daten strategische Resourcen und Ihr Wettbewerbsvorteil we...
Big Data – wie aus Daten strategische Resourcen und Ihr Wettbewerbsvorteil we...
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
 
Make from your it department a competitive differentiator for your business
Make from your it department a competitive differentiator for your businessMake from your it department a competitive differentiator for your business
Make from your it department a competitive differentiator for your business
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML
 
IMS integration 2017
IMS integration 2017IMS integration 2017
IMS integration 2017
 
Kaizentric Presentation
Kaizentric PresentationKaizentric Presentation
Kaizentric Presentation
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLPreparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML
 

KĂŒrzlich hochgeladen

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 

KĂŒrzlich hochgeladen (20)

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 

IBM Industry Models and Data Lake

  • 1. © 2016 IBM Corporation IBM Industry Models and the IBM Data Lake January 2017 Pat O’Sullivan – IBM Analytics Email : posulliv@ie.ibm.com Twitter : @PatOSullivanIBM © 2017 IBM Corporation
  • 2. © 2015 IBM Corporation2 © 2017 IBM Corporation Disclaimer  IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.  Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.  The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract.  The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. 2
  • 3. © 2015 IBM Corporation3 © 2017 IBM Corporation SOA The broadening scope of analytics Master Data Management Hub Applications Data Warehouse Pattern Discovery for Analytics Operational Data Store Adding in a business desire for real-time analytics, self service data and increasing regulations relating to individual privacy, it becomes necessary to have a well- defined, managed and governed approach to information architecture. We call this IBM’s data Lake. SAND BOXES Analyze Values Search For Data Reporting Data Lake Hadoop
  • 4. © 2015 IBM Corporation4 © 2017 IBM Corporation Big Data Lakes or Swamps?  As we collect data ‱ Can we preserve clarity? ‱ Do we know what we are collecting? ‱ Can we find the data we need?  Are we creating a data swamp?  How do we build trust in big data? ‱ Do we know what data is being used for?
  • 5. © 2015 IBM Corporation5 © 2017 IBM Corporation The Data Lake Data Lake = Efficient Management, Governance, Protection and Access. Data Lake Information Management and Governance Fabric Data Lake Services Data Lake Repositories
  • 6. © 2015 IBM Corporation6 © 2017 IBM Corporation Users supported by the Data Lake Data Lake (System of Insight) Information Management and Governance Fabric Data Lake Services Analytics Teams Governance, Risk and Compliance Team Information Curator Line of Business Teams Data Lake Operations Data Lake Repositories Enterprise IT Other Data Lakes Systems of Engagement Systems of Automation Systems of Record New Sources
  • 7. © 2015 IBM Corporation7 © 2017 IBM Corporation The Data Lake subsystems Data Lake (System of Insight) Information Management and Governance Fabric Catalogue Self- Service Access Enterprise IT Data Exchange Self-Service Access Analytics Teams Governance, Risk and Compliance Team Information Curator Line of Business Teams Data Lake Operations Enterprise IT Other Data Lakes Systems of Engagement Data Lake Repositories Systems of Automation Systems of Record New Sources
  • 8. © 2015 IBM Corporation8 © 2017 IBM Corporation Data lake repositories Specialist Processing Structured and Optimized System-level Data (Landing Area) Accumulation of Context for Master and Reference Data Self-managed DataMetadata Refined data formatted for particular consumers
  • 9. © 2015 IBM Corporation9 © 2017 IBM Corporation IBM Industry Data Models IBM Industry Data Models provide pre-defined data structures which help accelerate data warehouse, data lake and business intelligence projects. Industry specific issues being addressed Integrated set of Models from business requirements to low level design Predefined and pretested deployment to RDBMS and HDFS environments IBM Industry Data Models KPIsBusiness Vocabulary Atomic DW Models Dimensional Models Banking Insurance Fin Markets Retail Healthcare Telecom E&U Customer Insight Profitability Risk Regulatory Compliance ProjectAcceleration Technical Business Analysis ModelsData Classifications Business Models Analysis Models Design Models Supportive Terms Data Warehouse Operational Data Store Big DataData Marts Information Integration & Governance
  • 10. © 2015 IBM Corporation10 © 2017 IBM Corporation IBM Industry Models and main data lake deployment paths Business Vocabulary is deployed to Data Lake Catalog via tools such as InfoSphere Information Governance Catalog (IGC) Atomic (Inmon) and Dimensional (Kimball) Data Models deployed to data lake via tools such as InfoSphere Data Architect (IDA) and ERwin Supporting collateral Models-specific white papers and best practice docs outlining the main deployment patterns and implementation considerations
  • 11. © 2015 IBM Corporation11 © 2017 IBM Corporation Overall set of Models Business Terms/ FSDMSupportive Content Analytical Requirements Atomic Warehouse Model Dimensional Warehouse Models Business Vocabulary (IGC) Analysis level Models (IDA) Design level Models (IDA) Data Models Business Data Model
  • 12. © 2015 IBM Corporation12 © 2017 IBM Corporation Data Lake View- based Interaction Big Data Landscape – main components touched by the IBM Data Models Line of Business Applications Simple, Ad Hoc Discovery and Analysis Reporting Information Service Calls Search Requests Report Requests Understand Information Sources Understand Information Sources Deploy Decision Models Understand Compliance Report Compliance Information Service Calls Data Access Catalog Interfaces Advertise Information Source Deploy Real-time Decision Models Enterprise IT Interaction Data Reservoir Operations Curation Interaction Management Data Access Data Deposit Data Deposit Raw Data Interaction Information Integration & Governance Repositories Decision Model Management Governance, Risk and Compliance Team Information Curator Enterprise IT Events to Evaluate Information Service Calls Data Out Data In Other Systems Of Insight Notifications System of Record Applications Enterprise ServiceBus New Sources Third Party Feeds Third Party APIs Systems of Engagement Internal Sources Other Systems Of Insight Deploy Real-time Decision Models Published Data Harvested Data INFORMATION WAREHOUSE DEEP DATA Historical Data Descriptive Data CATALOG OPERATIONAL HISTORY REPORTING DATA MARTS SAND BOXES Full info on the IBM Data Lake Reference Architecture see IBM Redbook : Designing and Operating a Data Reservoir http://www.redbooks.ibm.com/Redbooks.nsf/RedpieceAbstracts/sg248274.html?Open
  • 13. © 2015 IBM Corporation13 © 2017 IBM Corporation Options regarding common models/glossaries to encourage standardization and reuse Data Access Enterprise IT System of Record Application s Enterprise ServiceBus New Sources Third Party Feeds Third Party APIs Systems of Engagement Internal Sources Enterprise IT Interaction Information Service Calls Data Out Publishing Feeds Service Interfaces Data In Information Integration & Governance Data Ingestion Deploy Decision Models Information Service Calls Data Access Deploy Real- time Decision Models Data Deposit Deploy Real-time Decision Models View-based Interaction Published OBJECT CACHE Repositories Shared Operational Data ASSET HUB EXECUTION ENGINES WORKFLOWMONITOR Information Service Calls Search Requests Curation Interaction Management Data Deposit Report Requests Harvested Data Historical Data DEEP DATA OPERATIONAL HISTORY INFORMATIONWAREHOUSE REPORTING DATA MARTS Line of Business Applications Consumers of Insight Simple, ad hoc Discovery and Analysis Reporting Analytical Insight Applications Descriptive Data CATALOG SAND BOXES Data Analysts/Data Scientists Analytics Tools Data Management Operations Shared set of term and physical asset definitions in the Catalog that underpin all queries by all users Data Scientists can make use of predefined catalogs and likely to create new catalog entries during their daily activities Business Users use specific subsets of the same shared Catalog as users to ensure consistency of language and meaning Any published structures required by the Business are based on the same standard definitions and structures as those used elsewhere Standardized set of Business Term and Data Model definitions used to enforce both the meaning and where appropriate structure of stored data Data Management Operations use the same shared set of models and catalog entries to build the necessary production ETL assets
  • 14. © 2015 IBM Corporation14 © 2017 IBM Corporation Catalog Deployment - Models in the Descriptive Data Zone Business Terms/ FSDMSupportive Content Analytical Requirements Atomic Warehouse Model Dimensional Warehouse Models Business Vocabulary (IGC) Analysis level Models (IDA) Design level Models (IDA), Purpose Provide a standard business language and information model that can be used when discussing business concepts and related technical components. Steps 1. Business Vocabulary Models are deployed to the Catalog (IGC) where they used and maintained by business analysts and data stewards 2. The Logical data Models (eg. Business and Atomic & Dimensional Warehouse Models) are be imported into the catalog. However they are mastered in a modelling tool like InfoSphere Data Architect Considerations  Evolving patterns/best practices for the overall management of enterprise and LOB glossaries Repositories Harvested Data Historical Data Enterprise IT Interaction Shared Operational DataInformation Service Calls Data Out Publishing Feeds Service Interfaces Data In Data Ingestion Enterprise IT System of Record Applications Enterprise ServiceBus New Sources Third Party Feeds Third Party APIs Systems of Engagement Internal Sources ASSET HUB DEEP DATA OPERATIONAL HISTORY INFORMATION WAREHOUSE REPORTING DATA MARTS Information Integration & Governance 2 1 SAND BOXES Business Users Data Scientists Business Data Model Descriptive Data CATALOG Descriptive Data Zone
  • 15. © 2015 IBM Corporation15 © 2017 IBM Corporation Repositories Harvested Data Historical Data Enterprise IT Interaction Shared Operational DataInformation Service Calls Data Out Publishing Feeds Service Interfaces Data In Data Ingestion Enterprise IT System of Record Applications Enterprise ServiceBus New Sources Third Party Feeds Third Party APIs Systems of Engagement Internal Sources ASSET HUB OPERATIONAL HISTORY Information Integration & Governance Descriptive Data CATALOG Business Terms Supportive Content Analytical Requirements Warehouse and Marts – Models in Integrated Warehouse Zone Atomic Warehouse Model Dimensional Warehouse Models Business Vocabulary (IGC) Purpose Provide data modellers with consistent data structures for deployment across the different aspects of an integrated Information Warehouse and Marts zone. Steps 1. The Atomic Warehouse Model is used as the basis for the Inmon-style central relational Information Warehouse 2. The Dimensional Warehouse Model is used as the basis for the Kimball-style Dimensional Information Warehouse. 3. The Dimensional Warehouse Model provides the business-issue-specific structures to enable the deployment of Reporting Data Marts. I Integrated Warehouse & Marts Zone DEEP DATA INFORMATION WAREHOUSE 3 1 2 REPORTING DATA MARTS Business Users Analysis level Models (IDA) Design level Models (IDA),
  • 16. © 2015 IBM Corporation16 © 2017 IBM Corporation Repositories Harvested Data Historical Data Enterprise IT Interaction Shared Operational DataInformation Service Calls Data Out Publishing Feeds Service Interfaces Data In Data Ingestion Enterprise IT System of Record Applications Enterprise ServiceBus New Sources Third Party Feeds Third Party APIs Systems of Engagement Internal Sources ASSET HUB INFORMATION WAREHOUSE Information Integration & Governance Dimensional Warehouse Models Business Terms Supportive Content Analytical Requirements Big Data Deployment – Models in the Landing Area Zone Atomic Warehouse Model Business Vocabulary (IGC) Purpose Provide the basis for a consistent and appropriate use of schemas in the different repositories in the Landing Area Zone. Steps 1. Atomic Warehouse Model used as the basis for the deployment for both schema-at-write and schema-at-read Hadoop Deep Data structures 2. Atomic Warehouse Model may provide the basis for deployment for schema-at-read for Operational History raw data structures Considerations  Further investigation needed into the potential role for DWM deployments to Hadoop-based technology Landing Area Zone 2 1 DEEP DATA OPERATIONAL HISTORY REPORTING DATA MARTS SAND BOXES Business Users Data Scientists Analysis level Models (IDA) Design level Models (IDA), Descriptive Data CATALOG
  • 17. © 2015 IBM Corporation17 © 2017 IBM Corporation Information Integration & Governance Descriptive Data CATALOG Repositories Shared Operational Data ASSET HUB Harvested Data Historical Data Enterprise IT Interaction Information Service Calls Data Out Publishing Feeds Service Interfaces Data In Data Ingestion Enterprise IT System of Record Applications Enterprise ServiceBus New Sources Third Party Feeds Third Party APIs Systems of Engagement Internal Sources DEEP DATA OPERATIONAL HISTORY INFORMATION WAREHOUSE REPORTING DATA MARTS SAND BOXES Business Users Data Scientists Summary Picture Physical Model Hadoop Physical Model RDBMS Physical Model Dimensional Logical Model Atomic Logical Model Dimensional Business Vocabulary Mappings to inform common Business Meaning using the Business Vocabulary in IGC Generation of Technical Structure using the ER Data Models in ER tool (e.g. IDA) Legend Use of Business Vocabulary to understand Business Meaning by Users ‱ The Business Vocabulary Terms in IGC can be used to enforce common business meaning through out the Data lake landscape ‱ The output of the various Logical Models can be used to define the technical structure of assets in the lake that need to be created. Where a predefined schema is required (e.g. Schema at Write) 4 1 2 3 5 6 7 8 9 10
  • 18. © 2015 IBM Corporation18 © 2017 IBM Corporation Three different lifecycles relating to the evolution of the models with the Data Lake Analysis Refine Deploy Review Requirement Maintenance of the Business Language AR BT SG Analysis Design Generate Review Requirement Development of the ER/UML Models AWM DWM The use of the Industry Models Business Vocabularies to enable a common Business meaning of language by all Data Lake users The use of the Industry Models Business Vocabularies and derived physical assets in the creation and ongoing management of the Data Lake The use of the ER and UML models to enforce a common structure of artifacts where required in the Data Lake BDM BT - Business Terms AR - Analytical Requirements SG - Supportive Glossaries BDM - Business Data Model AWM - Atomic Warehouse Model DWM - Dimensional Warehouse Model Legend AWM (Physical) DWM (Physical) Management of the runtime production environment BT Data Lake Repositories Data Lake Catalog Data Data Lake Users
  • 19. © 2015 IBM Corporation19 © 2017 IBM Corporation The Repositories used by the Data Lake Lifecycles IGC Dev Repository Modelling Environment Collaboration/Versioning Repository (e.g. RTC) Business Language Environment Runtime Data Lake Environment IGC Production Repository Data Repositories RDBMS IGC Browser IDA IGC for Eclipse Data Repositories HDFS Data Lake Repositories Data Lake Catalog IGC Anywhere/REST IGC Browser IMAM IDA Import IMAM Physical Data Model IGC Workflow
  • 20. © 2015 IBM Corporation20 © 2017 IBM Corporation Lifecycle 1 - Maintaining the Business Language of the Data Lake  Objective : The creation and ongoing maintenance of the common Business Language to be used by all users to describe the various components of the Data Lake oi underpin the Data Lake  Roles Involved : Business user reps, Business SMEs, Business Language Stakeholders Analysis Refine Deploy Review Requirement Maintenance of the Business Language AR BT SG  Considerations: ‱ Determining the needs of the different users of the Data Lake (different uses, need for different dialects, amount of technical metadata in the Language) ‱ Determining the approach to building the business language, the overall flow for creation, promotion and maintenance of terms ‱ Defining the specific glossary suitable for pure business users , versus Business Analysts, Data Scientists, Data Modellers and IT staff ‱ Determining the role of using IBM Industry Models to build out the Business Language
  • 21. © 2015 IBM Corporation21 © 2017 IBM Corporation Lifecycle 2 - Developing the technical Models  Objective : The use of the ER and UML models to enforce a common structure of artifacts where required in the Data Lake  Roles Involved : Modellers, Business SMEs,  Considerations: ‱ Ensuring the appropriate communications between the Data Modellers and the Business Users ‱ Determining when to use and not to use Data models for the data lake repositories ‱ Determining the ongoing use of a Canonical Platform Independent Logical Model as a basis for the deployment of the different types of Platform specific, physical Models required across the Data Lake Repositories ‱ Determining the specific data modelling approaches and scenarios for deploying to the different Data lake repositories. Analysis Design Generate Review Requirement Development of the ER/UML Models AWM DWM BDM
  • 22. © 2015 IBM Corporation22 © 2017 IBM Corporation Lifecycle 3 - Deploying the Models into the runtime Data Lake environment  Objective : The use of the Industry Models Business Vocabularies and derived physical assets in the creation and ongoing management of the Data Lake  Roles Involved : Business user reps, Modellers, Data Lake Ops staff  Considerations: ‱ Determining how to deploy the Business Language for optimal use by the different Data Lake users (management access to the different terms, handling of ongoing updates) ‱ Determine the strategy for the ongoing association of the Business Terms with Data Assets (which users tag new data elements with the Business Language and when) ‱ What is the approach for the Data Lake ops staff to deploy the physical Data Models – how is feedback to the Data Modellers handled. ‱ How to incorporate the Data Model artifacts into the ongoing Data Lake governance aspects AWM (Physical) DWM (Physical) Management of the runtime production environment BT Data Lake Repositories Data Lake Catalog Data Data Lake Users
  • 23. © 2015 IBM Corporation23 © 2017 IBM Corporation Claim File Patient Information File Sample Source Data /data/udmh/patient/<date>/<version>/.. Data files.. Data Transformation Process (Hive,Spark, Pig, ETL, ..) Data Transformation Process (Hive,Spark, Pig, ETL, ..) Hive Metastore Patient party ext Table HIVE Vendor SQL for Hadoop interface /data/udmh/claim/<date>/<version>/.. Data files.. Claim ext Table Logical Data Model Physical Data Model Patient ClaimPatient / Claim Patient Claim Downstream Data Transformation processes 1 23 Industry Models Hadoop deployment example – low level HDFS Three possible deployment paths
  • 24. © 2015 IBM Corporation24 © 2017 IBM Corporation Mapping of incoming new structures in the Data Lake IGC Dev Repository Runtime Data Lake Environment IGC Production Repository Data Repositories RDBMS IDA IGC for Eclipse Data Repositories HDFS Data Lake Repositories Data Lake Catalog IGC Anywhere/REST IGC Browser IMAM IDA Import IMAM Physical Data Model IGC Workflow New HDFS Structure 1 2a 2b 2c Question about what are the best practices for the “Bottom-up” mapping of a new structure in the data lake which has not been originally derived from a Data Model. 1. Direct mapping from the Physical Asset to the appropriate Term in the Catalog 2. Indirect mapping via a specifically created data model (actual mapping done either via BGE or in BG Browser) a. Reverse engineer a new model from the HDFS Structure b. Import the Data model into the Catalog c. Import the mappings into the Catalog from IDA (is mapping done in IDA via BGE)
  • 25. © 2015 IBM Corporation25 © 2017 IBM Corporation Model artifacts in the Data Lake Runtime environment – main usage patterns There are three main categories ways in which the data model artifacts are used in or impact the Data Lake runtime environment ‱ Industry Model artifacts are deployed into the Data Lake runtime environment ‱ Most likely as an output from the two lifecycles “Maintaining the Business Language” and “Deploying the Technical Models” ‱ Industry Model artifacts deployed in the Data lake are used by and effected by Data Lake users ‱ For example , Data lake users provide feedback on changes/corrections/additions to the model artifacts ‱ Industry Model artifacts deployed in the Data lake are impacted by new or changed data coming into the Data Lake Repositories ‱ The most obvious example is the need for new mappings to a new or changed Repository brought into the Data Lake.
  • 26. © 2015 IBM Corporation26 © 2017 IBM Corporation REFERENCE MATERIAL New Information Architectures and Capabilities
  • 27. © 2015 IBM Corporation27 © 2017 IBM Corporation Designing and Operating a Data Reservoir  Description of the behaviour and processes that make up a data reservoir (IBM’s Data Lake)  Blog ‱ 5 things to know about a data reservoir https://www.ibm.com/developerwo rks/community/blogs/5things/entry /5_things_to_know_about_data_res ervoir?lang=en  Redbook ‱ http://www.redbooks.ibm.com/Red books.nsf/RedpieceAbstracts/sg248 274.html?Open
  • 28. © 2015 IBM Corporation28 © 2017 IBM Corporation IBM Industry Models and Data lake publications so far : http://www-01.ibm.com/common/ssi/cgi- bin/ssialias?htmlfid=IMW14877USEN http://www-01.ibm.com/common/ssi/cgi- bin/ssialias?htmlfid=IMW14872USEN http://www- 01.ibm.com/common/ssi/cgi- bin/ssialias?htmlfid=IMW14877US EN http://www- 01.ibm.com/common/ssi/cgi- bin/ssialias?htmlfid=IMW14872US EN https://www- 01.ibm.com/common/ssi/cgi- bin/ssialias?htmlfid=IMW14911IEEN &