SlideShare ist ein Scribd-Unternehmen logo
1 von 45
Downloaden Sie, um offline zu lesen
Specification-Driven Data Mesh
Sion Smith CTO - oso.sh
Neil Avery CTO - liquidlabs.com
Introducing
The Enterprise Guide to
Building a Data Mesh
About us
2
Introduction
Sion Smith
CTO, OSO
15 years consulting experience solving
complex problems with various cloud and
programming technologies
Neil Avery
CTO, Liquidlabs
Distributed systems, previously
Confluent, Luxoft, startups and
others
Emerging Technology
Current State of Play The Spec Mesh Way
Agenda
3
Developer Tooling
● The Gartner Hype Cycle
● Foundations of Data Mesh
● Evolution of central
nervous system
● Domain mapping out of the
box
● Example specification
● Data Mesh lifecycle
management
● Features of Spec Mesh
● Screenshots
● Developer roadmap
Our Data Mesh Journey
Is the Hype Real?
5
State of Play
Gartner, Hype Cycle for Data Management, 2022, Donald Feinberg, Philip Russom, Nina Showell, 30 June 2022
https://www.denodo.com/en/document/analyst-report/2022-gartner-hype-cycle-data-management
Data Hub
Strategy
Data Integration
Tools
Data Lakes
Data
Engineering
Data Ops
Data
Mesh
Expectations
Time
Innovation trigger
Peak of
inflated
expectations
Trough of
disillusionment
Slope of
enlightenment
Plateau of
productivity
Gartner Hype Cycle for Data Management - 2022
Four pillars of Data Mesh
6
State of Play
Data as a Product
Self-serve data
infrastructure as a platform
Federated computational
Governance
Domain-oriented
decentralised
data ownership &
architecture
https://martinfowler.com/articles/data-mesh-principles.html
Is Data Mesh really new?
7
State of Play
+ Data mesh incremental evolution of style of architecture we have
been building for several years for event streaming
+ A mature data streaming system adopts a central nervous
system
+ Can we build a data mesh around event streaming principles?
+ A central nervous system models topics with a domain structure
and federated computational governance
Introducing:
+ An agreement / contract for data mesh using a specification
Central
data team
Domain teams
Stakeholders
So where does Data Mesh fit?
8
State of Play
Investment & Time
Value
4
5
Early Interest
Central Nervous
System
Mission critical,
but disparate
LOBs
Identify a project
Mission-critical,
connected LOBs
Projects Platform
Single
solution
Scalable pipeline
Pub/Sub
Clusters of reuse,
real-time
analytics
Platform effect:
reuse of data,
efficiencies of
scale
Enterprise Data-as-a
Product. Event-driven
architecture
3
2
1
This is used
throughout large
enterprise in
production
Confluent maturity model for event driven architecture
https://www.confluent.io/resources/5-stages-streaming-platform-adoption/
Data Mesh should not boil the ocean
9
Introducing SpecMesh
Specification-Driven Data Mesh
CNS Patterns applied to Data Mesh
1. Events and storage comprise data platform fundamentals required to build
almost anything
2. Events and storage already exist and that won’t change
3. Organise data resources hierarchically
4. Enforcing a domain model is need to control complexity and cope with scale
5. Orgs have a data quality function
6. Orgs have a data catalogue function
7. Registry models are required to model data (and support data evolution etc)
8. Most organisations suffer from inconsistent mechanisms for data-as-an-api
11
12
Spec Mesh
Supporting the pillars of Data Mesh
Features
Domain
ownership
Self-serve
Data as a
Product
Federated
Computational
Governance
Spec driven (async-api spec) ✔ ✔
SDLC plugins (unit-test, integration-test) ✔
SDLC plugins (provision - terraform) ✔ ✔ ✔
3rd party data catalogue ✔ ✔ ✔
Big Data London is our data product
Spec Mesh
13
14
Data Model
Spec Mesh
/ london / borough / venue / event
/ retail /
/ transport /
/ accommodation /
/ services /
/ london / hammersmith / olympia / bigdatalondon / public / attendee
/ london / hammersmith / transport / public / tube
/ london / heathrow / transport / public / airport
/ london / hammersmith / olympia / bigdatalondon / vendor / terra / public / visitor
/ london / hammersmith / olympia / bigdatalondon / retailer / subway / public / purchase
asyncapi: '2.4.0'
id: 'urn:london:hammersmith:olympia:bigdatalondon'
Info:
title: BigDataLondon API
version: '1.0.0'
description: Simple model of BigDataLondon as a Data Product
Servers:
test:
url: test.mykafkacluster.org:8092
protocol: kafka-secure
channels:
public/attendee:
Publish:
summary: Humans arriving
Message:
name: Human
"tags": [
"name": "human",
"name": "big data london"]
Payload:
type: object
Properties:
Id:
type: integer
minimum: 0
description: Id of the human
Age:
type: integer
15
Spec Mesh
16
Spec Mesh
Data Mesh as Code: Spec Mesh
Capture Provision Serve
>> >>
17
Spec Mesh
Capture
● Create repo from git template
● Async-api spec
● Apply the domain model/structure
● Specify data event streams & storage
● Model the data entities (schemas)
● Apply tags to data entities
● Permission data as
public/protected/private
● Write tests and build your product
Provision
Spec
Data
Catalogue
Spec Mesh
Capture Provision Serve
What is Spec Mesh?
Provision
18
What is Spec Mesh?
Spec Mesh
Provision
Spec
Data
Catalogue
Spec Mesh
Capture Provision Serve
● Pipeline driven provisioning of data product
based on Specification
● Includes domain structure, governance and
permissions
● Data catalogue automatically updated
(including tags)
● Data schema’s published to registry
19
What is Spec Mesh?
Spec Mesh
Provision
Spec
Data
Catalogue
Spec Mesh
Capture Provision Serve
Public
Private
Restricted
Serve
● Governance integrated via underlying Data
Platform. I.e. Kafka ACLs
● Data catalogue supports discovery (via tags),
and access requests
● Data resources are structured using public,
private and protected scope
20
Spec Mesh conceptual
Spec Mesh
Data
Catalogue
Data
Product A
Spec
Data
Product B
Spec
P
r
i
v
a
t
e
t
o
p
i
c
s
Public
topics
Serve
Nodes
Public
Private
Restricted
Developer Tooling
22
Why use Spec Mesh?
Developer Tooling
Developer focused tooling:. This is not some platform
Abstract away complexities: Leverage layers of abstraction to create
repeatable patterns
Unit testing specifications: Java based tooling supports development
lifecycle
Tooling support flexibility: Using configurable guardrails
Increased extensibility: Interoperate with existing Data Management
tooling and modules
Icons: www.streamlinehq.com
23
What do you get out of the box?
Developer Tooling
Spec
Specification as Code
[ ]
Data
Provider
Provisioning Workflow Data Mesh
Data
Product A
Data
Product B
Data
Product Y
Data
Product Z
Build Apply
JSON
E.g. Kafka
E.g. dev,
test, prod
Data Product
Owner
24
Demo
Developer Tooling
Provisioning summary
Specification
output (JSON)
Kafka Infrastructure
Environment
Resources
Add title
Beyond
Community driven
Help us shape the roadmap by
suggesting and contributing to
features!
Q1 - 2023
Extensions
● More Kafka extensions
supported (incl provisioning
support): Quotas, Storage &
ACLs for Public, Private,
Protected resources (basic
governance)
Data Catalogue
● Initial integration to existing
products to support discovery
and basic governance
Storage
● Storage resource mapping
Governance
● Delegation to a Governance API
where protected topics are
specified and requested
Proof of Value - Q4 22
Developer Experience
● Local testing support (Java &
Python) - Test containers via
gradle plugin
● SpecMesh gradle plugin to use
HCL Terraform & state mgmt
Domain model
● Domain modelling built into
Spec.Id field
Build Pipeline
● Executions pushes the APISpec
to apply orthogonal resource
allocation
Governance
● Manual governance of ‘public’
resources through Credential
creation (API Keys + ACLs)
Extensions
● Kafka support for Async API
Q2 - 2023
Data As A Product
● Data Product Open Registry;
simple Data Catalogue with
Discovery and tagging (Spec
resources contain tags - show
example)
Observability
● Topology view (derived from
specs showing
product/consume data flows
and ownership visualisation)
Bolt-ons
● Multi-region
● Cross resource
● SMT
● DQ integration + other simple
additions
25
Developer Tooling
Roadmap
Thank you
github.com/specmesh
@osodevops @avery_neil
Get involved!
Scan & thumbs ups
on the Github issue
27
What do you get out of the box?
Developer Tooling
Spec
Specification as Code
Data
Provider
Data
Product
Owner
Provisioning Workflow Data Mesh
Data
Product A
Data
Product B
Data
Product Y
Data
Product Z
Plan Build Apply
JSON
Roadmap - THIS HAS BEEN MAPPED TO A SLIDE
Proof of Value (0.1) (current status of Spec Mesh OS):
- Developer support for local testing (Java and Python) - i.e. TestContainers resources are provisioned using gradle plugin. The
Spec Mesh Gradle plugin uses Terraform (making it very open to using existing operators, scripts, i.e. CDK)
- Domain modelling is built into the Spec.Id field - and upon provisioning via the Gradle/Terraform plugin applies correct domain
knowledge to resources (see example)
- Build pipeline execution pushes the APISpec to apply orthogonal resource allocation (state mgmt via terraform) to provision
pipe-line environment resources (build pipe example showing DEV, UAT and PROD)
- Manual governance of ‘public’ resources through Credential creation (API Keys + ACLs)
- Kafka extensions for Async API spec is being built
Roadmap after PoV (local testing and build pipeline)
1. More Kafka extensions supported (incl provisioning support): Quotas, Storage & ACLs for Public, Private, Protected resources
(basic governance)
2. Storage resource mapping
3. Initial Catalogue integration to existing products to support discovery and basic governance
4. Delegation to a Governance API where protected topics are specified and requested
5. Data Product Open Registry: simple Data Catalogue with Discovery and tagging (Spec resources contain tags - show example)
6. Topology view (derived from specs showing product/consume data flows and ownership visualisation)
7. Bolt ons: multi-region, cross resource, SMT, DQ integration + other simple additions
28
29
Data Model
Data Model
/ london / borough / venue / event
/ /
/
/ london / hammersmith / olympia / bigdataldn
Map domain model into spec
- Now that specification in my branch / git repo
- I need to build a test around this / init test my data product (include the topic/channel name)
- The unit test in this example is Big Data LDN (People coming to the event)
- Test container example
30
31
State of Play
Established patterns of Data Mesh and central nervous system
Discoverability
Spec Mesh
provides
Addressability
Trustworthy
Self-describing
semantics
Interoperable /
standards
Security
Data as a Product
Build pipeline
- Leverage @IntegrationTest - gets run first and calls on the SpecMeshProsivioner (Java-> terraform) to parse the spec, build
topic names and execute against target cluster location
32
DEMO MAYBE
- ALL IN JAVA
- Drive TF from Java
- Parse the spec from github location
- Provision kafka topics from what is in the specification
-
- LETS SEE THIS BAD BOY IN ACTION
33
stuff
The Mission
To make Data Mesh simple, Open Source and available to all - without lockin, without complex tooling by using an approach centered
around ‘specifications’, existing tools and baking in a ‘domain’ model.
<insert triangular image here>
35
Set it up
Line up with client
Get it in the diary
Agree the theme
and key questions
Arrange lunch
Run the session
How it works
Why we should build a mesh using spec's
36
Organisation
+ Real life insights into
technology problems
+ Uncover new
opportunities
+ Builds trust and good
working relationships
+ Brand awareness
Domain
+ Insights into emerging
technologies
+ Access to technology
experts
+ Technical focus
+ Professional
development for staff
+ Builds team morale
+ Free lunch!
Benefits
37
Build using ASyncAPI specifications
38
Outline
Talk Structure (30 mins) 15 slides
● we believe data mesh should not be a large complex undertaken lasting multi years costing millions
● Target audience developers
● we believe data mesh should be useable and free for everyone. ITs not about complex custom built tooling
● foundations are opinionated on a specification way
● Problem statement (Why? Most Kafka environments we work in are shit, its a graveyard of topics - no uniform way
of provisioning we cannot scale)
● Why do we think data mesh has anything todo with this. These are the principles: how we are using these principles
to build Kafka at scale.
● Data ownership by domain
● your domain model is defined by people in your organisation, they should be able to look at it and know where they
sit and where to discovery events they need.
● data governance / data as a product
● why we think the async api matter, data as a product is represented by the api / specification for data > that allows
us to model the domain and thefore the
● Modelling London, here are the specs for parts of this
● data model Fulham
● you came in from as an event today
● Here is the spec for big data ldn : this is what it looks like when we use it > we are using this to test our code
● local test > we need to get away from the graveyard >
● Domain model for London (
● private events buying a kebab
● public events: me moving from Fulham to central
39
Data mesh, self service ubiquitous access to data
Goal: To make Data Mesh simple, Open Source and available to all - without lockin, without complex tooling by using an approach
centered around ‘specifications’, existing tools and baking in a ‘domain’ model. (see image)
Scope - running OTS infrastructure like connectors and all things, webservices, microservices or processors…. They are the problem of
the client, not the Mesh itself
Problem statement (what):
- Everyone is touting data mesh and it’s getting confusing, many bespoke, many commercial solutions… it should be
accessible & simple, without lockin
- We see Data Mesh as a natural, formalisation of building the central nervous system (everything via streams of topic
data)
- Been building CNS for many years across some of the largest companies in the world
- CNS focuses on Event-Streams, but during implementation includes most of the guiding principles of Datamesh
(ownership by domain, as a product, selfservice, governance)
- Avoid vendor lockin (doesnt need to be all encompassing and conflict with existing data infra)
- Most data-mesh solutions dont provide SDLC/development tools for test and infrastructure
Why:
- A specification based approach provides uniformity (part of the solution)
- Fixes Kafka resource mess/grave-yard
- Testing support (developer SDLC unlike most tooling)
- Repeatability -> provides guard-rails, reuse, reliable, simple and consistent approach for all
- Not just kafka, but eventually Kinesis, G PubSub, EventHubs, S3, from OnPrem to Cloud, multiple envs, clusters etc
40
How (Can we do this with minimal effort by leveraging existing tools and principles):
- Simple: Data Ubiquity should have 2 forms, Streams of Events (Kafka stream or others) + Storage (think S3) (because not
everything fits in a stream). This simplifies everything - don’t boil the ocean
- Simple: Specification based using AsyncAPI specs
- Simple and existing: Data as an API in the CNS maps into AsyncAPI Spec + Schemas
- Existing: by using the AsyncAPI Spec ‘id’ hierarchy we can express domain ownership (example)
Data Mesh Principle mapping
- Domain ownership is captured using the ID of the spec (example)
- Data as a product is reflected by the Spec itself, the spec is used to provision resources (example) (more later)
- Data available everywhere (discoverable) - built using existing tools such as AWS Glue Data Catalogue, Collibra and others
- Data governed everywhere - using streams and storage we build a integration to the data catalogue, and integrate access
requests through the use of private, protected, public topics & storage while automating restrictive controls (i.e. ACLs in Kafka,
principle mapping etc).
Worked example of the city of london → BDL
PoV (current status of Spec Mesh OS):
1. Developer support for local testing (Java and Python) - i.e. TestContainers resources are provisioned using gradle plugin. The
Spec Mesh Gradle plugin uses Terraform (making it very open to using existing operators, scripts, i.e. CDK)
2. Domain modelling is built into the Spec.Id field - and upon provisioning via the Gradle/Terraform plugin applies correct domain
knowledge to resources (see example)
3. Build pipeline execution pushes the APISpec to apply orthogonal resource allocation (state mgmt via terraform) to provision
pipe-line environment resources (build pipe example showing DEV, UAT and PROD)
4. Manual governance of ‘public’ resources through Credential creation (API Keys + ACLs)
Challenges! No one size fits all
41
There is not one tool to solve all the problems, a framework
or suite of tools is needed.
+ Every company is different, there is no prescribed solution
+ Scaling your data products as more is captured
+ How to separate signals from noise?
+ Complexity of data is growing exponentially
+ Maintaining data quality whilst keeping consistent
State of Play
42
Roadmap after PoV (local testing and build pipeline)
1. More Kafka extensions supported (incl provisioning support): Quotas, Storage & ACLs for Public, Private, Protected resources
(basic governance)
2. Initial Catalogue integration to existing products to support discovery and basic governance
3. Delegation to a Governance API where protected topics are specified and requested
4. Data Product Open Registry: simple Data Catalogue with Discovery and tagging (Spec resources contain tags - show example)
5. Topology view (derived from specs showing product/consume data flows and ownership visualisation)
6. Bolt ons: multi-region, cross resource, SMT, DQ integration + other simple additions
Sion Notes
● People can work more collaboratively using a decentralised standardised approach
○ different levels of expertise of using data in each domain
● how do you mentor and empower each domain on contributing to the mesh
○ set some enterprise level standards so teams can educate themselves on things like the team structure etc.
● Decentralised doesn't mean free for all
● Federated governance is at the domain. Example the finance team needs to know what privacy
● NOT building a data silos, its more about people and processes not the technology
○ interoperability and easy to navigate
● Not every domain knows how to build and manage a scalable API.
Pillars of Modern Data Architecture
● Scalable on demand
● Purpose-build data services
● Seamless data movement
● unified governance
● Performant and cost-effective
43
Sion notes
Why Spec Mesh
● Encourage data-driven agility
● Support domain-local governance through lightweight specifications
● Isolate data resources with clear contracts
● Consider data-as-a-specification which can exist in any system
44
Why title Design and Build
title
Introduction
45
Scale title
+ aaa + aaa + aaa

Weitere ähnliche Inhalte

Was ist angesagt?

Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks FundamentalsDalibor Wijas
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
Data Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureData Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureLorenzo Nicora
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data MeshLibbySchulze
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...
Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...
Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...Denodo
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...HostedbyConfluent
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Databricks
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Flink Forward
 
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture DesignKujambu Murugesan
 

Was ist angesagt? (20)

Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Data Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureData Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and Future
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...
Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...
Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
 
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI Initiatives
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 

Ähnlich wie Enterprise guide to building a Data Mesh

The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshIanFurlong4
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...Cisco DevNet
 
DICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made EasyDICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made EasyCloudify Community
 
DataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data ManagementDataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data ManagementAndreas Schreiber
 
Organizing the Data Chaos of Scientists
Organizing the Data Chaos of ScientistsOrganizing the Data Chaos of Scientists
Organizing the Data Chaos of ScientistsAndreas Schreiber
 
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration) SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration) Surendar S
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowWes McKinney
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityWes McKinney
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?confluent
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataRobert Grossman
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2Bill Liu
 
Session 8 - Creating Data Processing Services | Train the Trainers Program
Session 8 - Creating Data Processing Services | Train the Trainers ProgramSession 8 - Creating Data Processing Services | Train the Trainers Program
Session 8 - Creating Data Processing Services | Train the Trainers ProgramFIWARE
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationDenodo
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingTimothy Spann
 
Technology Overview
Technology OverviewTechnology Overview
Technology OverviewLiran Zelkha
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of MetadataJim Dowling
 

Ähnlich wie Enterprise guide to building a Data Mesh (20)

The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
 
DICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made EasyDICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made Easy
 
DataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data ManagementDataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data Management
 
Organizing the Data Chaos of Scientists
Organizing the Data Chaos of ScientistsOrganizing the Data Chaos of Scientists
Organizing the Data Chaos of Scientists
 
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration) SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
 
NextGenML
NextGenML NextGenML
NextGenML
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2
 
Session 8 - Creating Data Processing Services | Train the Trainers Program
Session 8 - Creating Data Processing Services | Train the Trainers ProgramSession 8 - Creating Data Processing Services | Train the Trainers Program
Session 8 - Creating Data Processing Services | Train the Trainers Program
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
 
Scale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | GimelScale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | Gimel
 
Technology Overview
Technology OverviewTechnology Overview
Technology Overview
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of Metadata
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 

Kürzlich hochgeladen

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Kürzlich hochgeladen (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Enterprise guide to building a Data Mesh

  • 1. Specification-Driven Data Mesh Sion Smith CTO - oso.sh Neil Avery CTO - liquidlabs.com Introducing The Enterprise Guide to Building a Data Mesh
  • 2. About us 2 Introduction Sion Smith CTO, OSO 15 years consulting experience solving complex problems with various cloud and programming technologies Neil Avery CTO, Liquidlabs Distributed systems, previously Confluent, Luxoft, startups and others Emerging Technology
  • 3. Current State of Play The Spec Mesh Way Agenda 3 Developer Tooling ● The Gartner Hype Cycle ● Foundations of Data Mesh ● Evolution of central nervous system ● Domain mapping out of the box ● Example specification ● Data Mesh lifecycle management ● Features of Spec Mesh ● Screenshots ● Developer roadmap
  • 4. Our Data Mesh Journey
  • 5. Is the Hype Real? 5 State of Play Gartner, Hype Cycle for Data Management, 2022, Donald Feinberg, Philip Russom, Nina Showell, 30 June 2022 https://www.denodo.com/en/document/analyst-report/2022-gartner-hype-cycle-data-management Data Hub Strategy Data Integration Tools Data Lakes Data Engineering Data Ops Data Mesh Expectations Time Innovation trigger Peak of inflated expectations Trough of disillusionment Slope of enlightenment Plateau of productivity Gartner Hype Cycle for Data Management - 2022
  • 6. Four pillars of Data Mesh 6 State of Play Data as a Product Self-serve data infrastructure as a platform Federated computational Governance Domain-oriented decentralised data ownership & architecture https://martinfowler.com/articles/data-mesh-principles.html
  • 7. Is Data Mesh really new? 7 State of Play + Data mesh incremental evolution of style of architecture we have been building for several years for event streaming + A mature data streaming system adopts a central nervous system + Can we build a data mesh around event streaming principles? + A central nervous system models topics with a domain structure and federated computational governance Introducing: + An agreement / contract for data mesh using a specification Central data team Domain teams Stakeholders
  • 8. So where does Data Mesh fit? 8 State of Play Investment & Time Value 4 5 Early Interest Central Nervous System Mission critical, but disparate LOBs Identify a project Mission-critical, connected LOBs Projects Platform Single solution Scalable pipeline Pub/Sub Clusters of reuse, real-time analytics Platform effect: reuse of data, efficiencies of scale Enterprise Data-as-a Product. Event-driven architecture 3 2 1 This is used throughout large enterprise in production Confluent maturity model for event driven architecture https://www.confluent.io/resources/5-stages-streaming-platform-adoption/
  • 9. Data Mesh should not boil the ocean 9
  • 11. CNS Patterns applied to Data Mesh 1. Events and storage comprise data platform fundamentals required to build almost anything 2. Events and storage already exist and that won’t change 3. Organise data resources hierarchically 4. Enforcing a domain model is need to control complexity and cope with scale 5. Orgs have a data quality function 6. Orgs have a data catalogue function 7. Registry models are required to model data (and support data evolution etc) 8. Most organisations suffer from inconsistent mechanisms for data-as-an-api 11
  • 12. 12 Spec Mesh Supporting the pillars of Data Mesh Features Domain ownership Self-serve Data as a Product Federated Computational Governance Spec driven (async-api spec) ✔ ✔ SDLC plugins (unit-test, integration-test) ✔ SDLC plugins (provision - terraform) ✔ ✔ ✔ 3rd party data catalogue ✔ ✔ ✔
  • 13. Big Data London is our data product Spec Mesh 13
  • 14. 14 Data Model Spec Mesh / london / borough / venue / event / retail / / transport / / accommodation / / services / / london / hammersmith / olympia / bigdatalondon / public / attendee / london / hammersmith / transport / public / tube / london / heathrow / transport / public / airport / london / hammersmith / olympia / bigdatalondon / vendor / terra / public / visitor / london / hammersmith / olympia / bigdatalondon / retailer / subway / public / purchase
  • 15. asyncapi: '2.4.0' id: 'urn:london:hammersmith:olympia:bigdatalondon' Info: title: BigDataLondon API version: '1.0.0' description: Simple model of BigDataLondon as a Data Product Servers: test: url: test.mykafkacluster.org:8092 protocol: kafka-secure channels: public/attendee: Publish: summary: Humans arriving Message: name: Human "tags": [ "name": "human", "name": "big data london"] Payload: type: object Properties: Id: type: integer minimum: 0 description: Id of the human Age: type: integer 15 Spec Mesh
  • 16. 16 Spec Mesh Data Mesh as Code: Spec Mesh Capture Provision Serve >> >>
  • 17. 17 Spec Mesh Capture ● Create repo from git template ● Async-api spec ● Apply the domain model/structure ● Specify data event streams & storage ● Model the data entities (schemas) ● Apply tags to data entities ● Permission data as public/protected/private ● Write tests and build your product Provision Spec Data Catalogue Spec Mesh Capture Provision Serve What is Spec Mesh?
  • 18. Provision 18 What is Spec Mesh? Spec Mesh Provision Spec Data Catalogue Spec Mesh Capture Provision Serve ● Pipeline driven provisioning of data product based on Specification ● Includes domain structure, governance and permissions ● Data catalogue automatically updated (including tags) ● Data schema’s published to registry
  • 19. 19 What is Spec Mesh? Spec Mesh Provision Spec Data Catalogue Spec Mesh Capture Provision Serve Public Private Restricted Serve ● Governance integrated via underlying Data Platform. I.e. Kafka ACLs ● Data catalogue supports discovery (via tags), and access requests ● Data resources are structured using public, private and protected scope
  • 20. 20 Spec Mesh conceptual Spec Mesh Data Catalogue Data Product A Spec Data Product B Spec P r i v a t e t o p i c s Public topics Serve Nodes Public Private Restricted
  • 22. 22 Why use Spec Mesh? Developer Tooling Developer focused tooling:. This is not some platform Abstract away complexities: Leverage layers of abstraction to create repeatable patterns Unit testing specifications: Java based tooling supports development lifecycle Tooling support flexibility: Using configurable guardrails Increased extensibility: Interoperate with existing Data Management tooling and modules Icons: www.streamlinehq.com
  • 23. 23 What do you get out of the box? Developer Tooling Spec Specification as Code [ ] Data Provider Provisioning Workflow Data Mesh Data Product A Data Product B Data Product Y Data Product Z Build Apply JSON E.g. Kafka E.g. dev, test, prod Data Product Owner
  • 24. 24 Demo Developer Tooling Provisioning summary Specification output (JSON) Kafka Infrastructure Environment Resources
  • 25. Add title Beyond Community driven Help us shape the roadmap by suggesting and contributing to features! Q1 - 2023 Extensions ● More Kafka extensions supported (incl provisioning support): Quotas, Storage & ACLs for Public, Private, Protected resources (basic governance) Data Catalogue ● Initial integration to existing products to support discovery and basic governance Storage ● Storage resource mapping Governance ● Delegation to a Governance API where protected topics are specified and requested Proof of Value - Q4 22 Developer Experience ● Local testing support (Java & Python) - Test containers via gradle plugin ● SpecMesh gradle plugin to use HCL Terraform & state mgmt Domain model ● Domain modelling built into Spec.Id field Build Pipeline ● Executions pushes the APISpec to apply orthogonal resource allocation Governance ● Manual governance of ‘public’ resources through Credential creation (API Keys + ACLs) Extensions ● Kafka support for Async API Q2 - 2023 Data As A Product ● Data Product Open Registry; simple Data Catalogue with Discovery and tagging (Spec resources contain tags - show example) Observability ● Topology view (derived from specs showing product/consume data flows and ownership visualisation) Bolt-ons ● Multi-region ● Cross resource ● SMT ● DQ integration + other simple additions 25 Developer Tooling Roadmap
  • 26. Thank you github.com/specmesh @osodevops @avery_neil Get involved! Scan & thumbs ups on the Github issue
  • 27. 27 What do you get out of the box? Developer Tooling Spec Specification as Code Data Provider Data Product Owner Provisioning Workflow Data Mesh Data Product A Data Product B Data Product Y Data Product Z Plan Build Apply JSON
  • 28. Roadmap - THIS HAS BEEN MAPPED TO A SLIDE Proof of Value (0.1) (current status of Spec Mesh OS): - Developer support for local testing (Java and Python) - i.e. TestContainers resources are provisioned using gradle plugin. The Spec Mesh Gradle plugin uses Terraform (making it very open to using existing operators, scripts, i.e. CDK) - Domain modelling is built into the Spec.Id field - and upon provisioning via the Gradle/Terraform plugin applies correct domain knowledge to resources (see example) - Build pipeline execution pushes the APISpec to apply orthogonal resource allocation (state mgmt via terraform) to provision pipe-line environment resources (build pipe example showing DEV, UAT and PROD) - Manual governance of ‘public’ resources through Credential creation (API Keys + ACLs) - Kafka extensions for Async API spec is being built Roadmap after PoV (local testing and build pipeline) 1. More Kafka extensions supported (incl provisioning support): Quotas, Storage & ACLs for Public, Private, Protected resources (basic governance) 2. Storage resource mapping 3. Initial Catalogue integration to existing products to support discovery and basic governance 4. Delegation to a Governance API where protected topics are specified and requested 5. Data Product Open Registry: simple Data Catalogue with Discovery and tagging (Spec resources contain tags - show example) 6. Topology view (derived from specs showing product/consume data flows and ownership visualisation) 7. Bolt ons: multi-region, cross resource, SMT, DQ integration + other simple additions 28
  • 29. 29 Data Model Data Model / london / borough / venue / event / / / / london / hammersmith / olympia / bigdataldn
  • 30. Map domain model into spec - Now that specification in my branch / git repo - I need to build a test around this / init test my data product (include the topic/channel name) - The unit test in this example is Big Data LDN (People coming to the event) - Test container example 30
  • 31. 31 State of Play Established patterns of Data Mesh and central nervous system Discoverability Spec Mesh provides Addressability Trustworthy Self-describing semantics Interoperable / standards Security Data as a Product
  • 32. Build pipeline - Leverage @IntegrationTest - gets run first and calls on the SpecMeshProsivioner (Java-> terraform) to parse the spec, build topic names and execute against target cluster location 32
  • 33. DEMO MAYBE - ALL IN JAVA - Drive TF from Java - Parse the spec from github location - Provision kafka topics from what is in the specification - - LETS SEE THIS BAD BOY IN ACTION 33
  • 34. stuff
  • 35. The Mission To make Data Mesh simple, Open Source and available to all - without lockin, without complex tooling by using an approach centered around ‘specifications’, existing tools and baking in a ‘domain’ model. <insert triangular image here> 35
  • 36. Set it up Line up with client Get it in the diary Agree the theme and key questions Arrange lunch Run the session How it works Why we should build a mesh using spec's 36 Organisation + Real life insights into technology problems + Uncover new opportunities + Builds trust and good working relationships + Brand awareness Domain + Insights into emerging technologies + Access to technology experts + Technical focus + Professional development for staff + Builds team morale + Free lunch! Benefits
  • 37. 37 Build using ASyncAPI specifications
  • 38. 38 Outline Talk Structure (30 mins) 15 slides ● we believe data mesh should not be a large complex undertaken lasting multi years costing millions ● Target audience developers ● we believe data mesh should be useable and free for everyone. ITs not about complex custom built tooling ● foundations are opinionated on a specification way ● Problem statement (Why? Most Kafka environments we work in are shit, its a graveyard of topics - no uniform way of provisioning we cannot scale) ● Why do we think data mesh has anything todo with this. These are the principles: how we are using these principles to build Kafka at scale. ● Data ownership by domain ● your domain model is defined by people in your organisation, they should be able to look at it and know where they sit and where to discovery events they need. ● data governance / data as a product ● why we think the async api matter, data as a product is represented by the api / specification for data > that allows us to model the domain and thefore the ● Modelling London, here are the specs for parts of this ● data model Fulham ● you came in from as an event today ● Here is the spec for big data ldn : this is what it looks like when we use it > we are using this to test our code ● local test > we need to get away from the graveyard > ● Domain model for London ( ● private events buying a kebab ● public events: me moving from Fulham to central
  • 39. 39 Data mesh, self service ubiquitous access to data Goal: To make Data Mesh simple, Open Source and available to all - without lockin, without complex tooling by using an approach centered around ‘specifications’, existing tools and baking in a ‘domain’ model. (see image) Scope - running OTS infrastructure like connectors and all things, webservices, microservices or processors…. They are the problem of the client, not the Mesh itself Problem statement (what): - Everyone is touting data mesh and it’s getting confusing, many bespoke, many commercial solutions… it should be accessible & simple, without lockin - We see Data Mesh as a natural, formalisation of building the central nervous system (everything via streams of topic data) - Been building CNS for many years across some of the largest companies in the world - CNS focuses on Event-Streams, but during implementation includes most of the guiding principles of Datamesh (ownership by domain, as a product, selfservice, governance) - Avoid vendor lockin (doesnt need to be all encompassing and conflict with existing data infra) - Most data-mesh solutions dont provide SDLC/development tools for test and infrastructure Why: - A specification based approach provides uniformity (part of the solution) - Fixes Kafka resource mess/grave-yard - Testing support (developer SDLC unlike most tooling) - Repeatability -> provides guard-rails, reuse, reliable, simple and consistent approach for all - Not just kafka, but eventually Kinesis, G PubSub, EventHubs, S3, from OnPrem to Cloud, multiple envs, clusters etc
  • 40. 40 How (Can we do this with minimal effort by leveraging existing tools and principles): - Simple: Data Ubiquity should have 2 forms, Streams of Events (Kafka stream or others) + Storage (think S3) (because not everything fits in a stream). This simplifies everything - don’t boil the ocean - Simple: Specification based using AsyncAPI specs - Simple and existing: Data as an API in the CNS maps into AsyncAPI Spec + Schemas - Existing: by using the AsyncAPI Spec ‘id’ hierarchy we can express domain ownership (example) Data Mesh Principle mapping - Domain ownership is captured using the ID of the spec (example) - Data as a product is reflected by the Spec itself, the spec is used to provision resources (example) (more later) - Data available everywhere (discoverable) - built using existing tools such as AWS Glue Data Catalogue, Collibra and others - Data governed everywhere - using streams and storage we build a integration to the data catalogue, and integrate access requests through the use of private, protected, public topics & storage while automating restrictive controls (i.e. ACLs in Kafka, principle mapping etc). Worked example of the city of london → BDL PoV (current status of Spec Mesh OS): 1. Developer support for local testing (Java and Python) - i.e. TestContainers resources are provisioned using gradle plugin. The Spec Mesh Gradle plugin uses Terraform (making it very open to using existing operators, scripts, i.e. CDK) 2. Domain modelling is built into the Spec.Id field - and upon provisioning via the Gradle/Terraform plugin applies correct domain knowledge to resources (see example) 3. Build pipeline execution pushes the APISpec to apply orthogonal resource allocation (state mgmt via terraform) to provision pipe-line environment resources (build pipe example showing DEV, UAT and PROD) 4. Manual governance of ‘public’ resources through Credential creation (API Keys + ACLs)
  • 41. Challenges! No one size fits all 41 There is not one tool to solve all the problems, a framework or suite of tools is needed. + Every company is different, there is no prescribed solution + Scaling your data products as more is captured + How to separate signals from noise? + Complexity of data is growing exponentially + Maintaining data quality whilst keeping consistent State of Play
  • 42. 42 Roadmap after PoV (local testing and build pipeline) 1. More Kafka extensions supported (incl provisioning support): Quotas, Storage & ACLs for Public, Private, Protected resources (basic governance) 2. Initial Catalogue integration to existing products to support discovery and basic governance 3. Delegation to a Governance API where protected topics are specified and requested 4. Data Product Open Registry: simple Data Catalogue with Discovery and tagging (Spec resources contain tags - show example) 5. Topology view (derived from specs showing product/consume data flows and ownership visualisation) 6. Bolt ons: multi-region, cross resource, SMT, DQ integration + other simple additions
  • 43. Sion Notes ● People can work more collaboratively using a decentralised standardised approach ○ different levels of expertise of using data in each domain ● how do you mentor and empower each domain on contributing to the mesh ○ set some enterprise level standards so teams can educate themselves on things like the team structure etc. ● Decentralised doesn't mean free for all ● Federated governance is at the domain. Example the finance team needs to know what privacy ● NOT building a data silos, its more about people and processes not the technology ○ interoperability and easy to navigate ● Not every domain knows how to build and manage a scalable API. Pillars of Modern Data Architecture ● Scalable on demand ● Purpose-build data services ● Seamless data movement ● unified governance ● Performant and cost-effective 43
  • 44. Sion notes Why Spec Mesh ● Encourage data-driven agility ● Support domain-local governance through lightweight specifications ● Isolate data resources with clear contracts ● Consider data-as-a-specification which can exist in any system 44
  • 45. Why title Design and Build title Introduction 45 Scale title + aaa + aaa + aaa