SlideShare ist ein Scribd-Unternehmen logo
1 von 96
Downloaden Sie, um offline zu lesen
Data Platform
Marquez:
A Metadata Service for Data Abstraction, Data Lineage,
and Event-based Triggers
DataEngConf NYC ‘18
Data Platform
Hey!
I’m Willy Lulciuc
Data Engineer
Marquez Team, Data Platform
@wslulciuc
Data Platform
Space01
Community02
Services03
Data Platform
268,000
members globally
287
physical locations
72
cities
23
countries
Data Platform
AGENDA
Room bookings pipeline (naïve)
Intro to Marquez
Room bookings pipeline (take 2)
02
03
04
@wslulciuc
Future work05
Why metadata?01
Why metadata?01
Data lineage
● Add context to
data
Democratize
● Self-service data
culture
Data quality
● Build trust in
data
Why manage and utilize metadata?
Data Platform
… creating a healthy data
ecosystem
Freedom
● Experiment
● Flexible
● Self-sufficient
Accountability
● Cost
● Trust
Self-service
● Discover
● Explore
● Global context
A healthy data ecosystem
Data Platform
Data Platform
Let’s get
booking!
Location + floor01
Data Platform
Data Platform
Location + floor01
Open time slot02
Data Platform
Location + floor01
Open time slot02
Duration03
Data Platform
Location + floor01
Open time slot02
Duration03
Confirm04
Which location has
the most bookings?
Data Platform
Set[RoomBooking] LocationID
Room bookings pipeline
(naïve)
02
Data Platform
@wslulciuc
Requirements
Example: Room bookings pipeline (naïve)
● Read room bookings
● Sum room bookings by location
● Write top location
● Run once an hour
Read SumStart Write
Data Platform
@wslulciucExample: Room bookings pipeline (naïve)
S3
Postgres
.csv
.csv
Data Platform
@wslulciucExample: Room bookings pipeline (naïve)
S3
Postgres
.csv
.csv
b940314,1541624285,2
TSLOCATION ROOM
b648485,1541501885,9
b648485,1541710685,4
Data Platform
@wslulciucExample: Room bookings pipeline (naïve)
S3
Postgres
.csv
.csv
b940314,1541624285,2
1 b648485 1541721600 2
TSLOCATION ROOM
LOCATIONID TS BOOKINGS
b648485,1541501885,9
b648485,1541710685,4
Data Platform
Example: Room bookings pipeline (naïve) @wslulciuc
Job
Scheduler
Upstream Downstream
S3 Postgres
Room Bookings
Job
Archival Top Locations
Workflow
We’re live!
Data Platform
@wslulciuc
Problems
● What’s our job’s input dataset?
● Does the dataset have an owner?
● How often is the dataset updated?
Example: Room bookings pipeline (naïve)
Data Platform
Example: Room bookings pipeline (naïve) @wslulciuc
Job
Scheduler
Upstream Downstream
S3 Postgres
Room Bookings
Job
Archival Top Locations
Workflow
Curses, our job’s failing …
Data Platform
Example: Room bookings pipeline (naïve) @wslulciuc
Job
Scheduler
Upstream Downstream
S3 Postgres
Room Bookings
Job
Archival Top Locations
Workflow
Oh, might be our input data!
Data Platform
@wslulciucExample: Room bookings pipeline (naïve)
S3
.csv
.csv
Room field is of type string
b648485,1541501885,9A
b940314,1541624285,2G
b648485,1541710685,4F
TSLOCATION ROOM
int
Data Platform
@wslulciuc
Problems
● What’s our job’s input dataset?
● Does the dataset have an owner?
● How often is the dataset updated?
● Coordinate changes
Example: Room bookings pipeline (naïve)
Data Platform
Example: Room bookings pipeline (naïve) @wslulciuc
Job
Scheduler
Upstream Downstream
S3 Postgres
Room Bookings
Job
Archival Top Locations
Workflow
Ugh, gaps in output data
Data Platform
Example: Room bookings pipeline (naïve) @wslulciuc
00h 01h 02h 03h 04h 05h 06h 07h 08h 09h
Backfills!
time partitions
latest
Data Platform
@wslulciuc
Problems
● What’s our job’s input dataset?
● Does the dataset have an owner?
● How often is the dataset updated?
● Coordinate changes
● Figure out backfills
Example: Room bookings pipeline (naïve)
Data Platform
@wslulciucExample: Room bookings pipeline (naïve)
Job
Scheduler
S3 Postgres
Room Bookings
Workflow
What we have so far …
Data Platform
@wslulciucExample: Room bookings pipeline (naïve)
Job
Scheduler
S3 Postgres
What we have so far … Problems
● What’s our job’s input
dataset?
● Does the dataset have
an owner?
● How often is the
dataset updated?
● Coordinate changes
● Figure out backfillsRoom Bookings
Workflow
… writing a job shouldn’t be
this hard!
Intro to Marquez04
Data Platform
Metadata Service
● Centralized metadata
management
○ Jobs
○ Datasets
● Modular
○ Data discovery
○ Data health
○ Data triggers
Marquez: Design @wslulciuc
Clients
(JVM)
Clients
(Python)
Marquez
Search
Health
Triggers
REST API
Data Platform
Module: Search
● Unified search
● Documentation
○ Owner
○ Schema
○ Datasource
@wslulciuc
Marquez
Search
Health
Triggers
Marquez: Data discovery
@wslulciucMarquez: Data discovery
room bo
Room Bookings (SF)
All
created: jul. 8, 2018
Room Booking Metrics (GLBL)
created: feb. 15, 2010
All San Francisco room bookings
Global room booking metrics
Search
Datasets
TagsS3
Data Platform
Module: Health
● Owner
○ Team / project
● Schema
● Location
● Description
● Size
○ Growth over time
○ Number of records
● Lineage
@wslulciuc
Marquez
Search
Health
Triggers
Marquez: Data health
Data graph
Dataset
Job
Lineage queries!
Dataset
Job
Lineage
Data Platform
Module: Triggers
● Timely processing of data
○ No polling!
● Reduce manual handling of
backfills
● Reduce production of bad
data
○ Incomplete data
○ Low-quality data
@wslulciuc
Marquez
Search
Health
Triggers
Marquez: Data triggers
Dataset
Job
Upstream failure
detection!
Job failure
Dataset
Job
Affected paths!
Job failure
Cascading triggers!
Dataset
Job
Trigger
Core concepts
Data Platform
Job + Datasets
Input
Dataset
Output
Dataset
Job
@wslulciucMarquez: Core concepts
Data Platform
Dataset versions!
@wslulciucMarquez: Core concepts
A dataset version
contains a
complete snapshot
of data as of some
point in time
v1 v1
v2 v2
v3
Job
Data Platform
Deltas “diffs”!
v1 v1
v2 v2
v3
Job
@wslulciucMarquez: Core concepts
INSERT INTO room_bookings (location, bookings)
VALUES (b648485, 2)
Data Platform
Deltas “diffs”!
v1 v1
v2 v2
v3
Job
@wslulciucMarquez: Core concepts
Δv2→v3
INSERT INTO room_bookings (location, bookings)
VALUES (b648485, 2)
Data Platform
Job versions!
@wslulciucMarquez: Core concepts
A job version is created
when business logic has
changed
v1 v1
v2 v2
v3
Job
v1
Job
v2
Data Platform
Job runs!
v1 v1
v2 v2
v3
Job
v1
@wslulciucMarquez: Core concepts
Job
Dataset
New Run
Job
v2
Data Platform
Job runs!
v1 v1
v2 v2
v3
Job
v1
@wslulciucMarquez: Core concepts
Dataset
New Run
v4
Job
Job
v2
Data Platform
Job runs!
v1 v1
v2 v2
v3
Job
v1
@wslulciucMarquez: Core concepts
Dataset
New Run
v4
Finish
Update
Job
Job
v2
Data Platform
Data triggers!
v1 v1
v2 v2
v3
Job
v1
@wslulciucMarquez: Core concepts
Dataset
New Run
v4
Trigger
Job
v7
Job
v10
Job
Update
Finish
Job
v2
Data Platform
Job failures!
v1 v1
v2 v2
v3
Job
v1
@wslulciucMarquez: Core concepts
Dataset
New Run FailureJob
v4
Job
v2
Data Platform
Delayed datasets!
v1 v1
v2 v2
v3
Job
v1
@wslulciucMarquez: Core concepts
Dataset
New RunJob
v4
Job
v2
Failure
Delay
Data Platform
Design benefits
@wslulciucMarquez: Core concepts
● Early upstream failure detection
● Debugging
○ What job version(s) produced /
consumed dataset version X?
● Recoverability
○ Full / incremental processing
● Coordination
Data model
Job
Marquez: Data model @wslulciuc
Dataset JobVersion
JobRunDatasetVersion
*
1
*
1
*
1
1*
1*
Marquez: Data model @wslulciuc
DbTable
Filesystem
Stream
Datasource
Types
Job
Dataset JobVersion
JobRunDatasetVersion
*
1
*
1
*
1
1*
1*
Metadata collection
Data Platform
@wslulciucMarquez: Metadata collection
How is metadata collected?
● Marquez API
● Language-specific SDKs
○ Java
○ Python
Marquez
Job
record
metadata
Data Platform
@wslulciucMarquez: Metadata collection
Workflow
Register
Job
● Job version
● Inputs / outputs
(logical names)
● Owner
● Description
Data Platform
@wslulciucMarquez: Metadata collection
Register
Job
● Job version
● Inputs / outputs
(logical names)
● Owner
● Description
Register
Job Run
Workflow
Data Platform
@wslulciucMarquez: Metadata collection
Register
Job
● Job version
● Inputs / outputs
(logical names)
● Owner
● Description
Register
Job Run
Start
● Update job
run state to
STARTED
Complete
● Update job
run state to
COMPLETED
Workflow
Data Platform
@wslulciucMarquez: Metadata collection
Register
Job
● Job version
● Inputs / outputs
(logical names)
● Owner
● Description
Register
Job Run
Start
● Update job
run state to
STARTED
Complete
● Update job
run state to
COMPLETED
Register
Job Run
Outputs
● Outputs (physical
locations)
Workflow
Room bookings pipeline
(take 2)
04
Data Platform
Example: Room bookings pipeline (take 2) @wslulciuc
Recall, we are tasked with analyzing
room booking trends …
Data Platform
Example: Room bookings pipeline (take 2) @wslulciuc
Job Postgres
Room Bookings
Workflow
Top Locations
S3
Scheduler
Recall, we are tasked with analyzing
room booking trends …
Data Platform
@wslulciuc
Problems
● What’s our job’s input dataset?
● Does the dataset have an owner?
● How often is the dataset updated?
● Coordinate changes
● Figure out backfills
Example: Room bookings pipeline (take 2)
Enter Marquez
@wslulciuc
room bo
Room Bookings (ALL)
All
created: feb. 15, 2010
Room Bookings (SF)
created: jul. 8, 2018
All room bookings since beginning of time
All San Francisco room bookings
Example: Room bookings pipeline (take 2)
Data Platform
S3
S3
@wslulciuc
room bo
All
Room Bookings (SF)
created: jul. 8, 2018All San Francisco room bookings
Example: Room bookings pipeline (take 2)
Well, that
was easy!
Room Bookings (ALL)
created: feb. 15, 2010All room bookings since beginning of time
Data Platform
S3
S3
@wslulciucExample: Room bookings pipeline (take 2)
Room Bookings (ALL)
created: feb. 15, 2010All room bookings since beginning of time
Owner: Data Engineering
Location: s3://room_bookings/raw/
Info
Schema: https://registry.wework.com/schemas/ids/1
Updated: Hourly
Data Platform
Description: All room bookings since beginning of time
S3
@wslulciucExample: Room bookings pipeline (take 2)
Room Bookings (ALL)
created: feb. 15, 2010All room bookings since beginning of time
Owner: Data Engineering
Location: s3://room_bookings/raw/
Info
Schema: https://registry.wework.com/schemas/ids/1
Updated: Hourly
Data Platform
Description: All room bookings since beginning of time
S3
@wslulciucExample: Room bookings pipeline (take 2)
Room Bookings (ALL)
created: feb. 15, 2010All room bookings since beginning of time
Owner: Data Engineering
Location: s3://room_bookings/raw/
Info
Schema: https://registry.wework.com/schemas/ids/1
Updated: Hourly
Data Platform
Description: All room bookings since beginning of time
Bonus!
S3
Data Platform
@wslulciuc
Problems
● What’s our job’s input dataset?
● Does the dataset have an owner?
● How often is the dataset updated?
● Coordinate changes
● Figure out backfills
Example: Room bookings pipeline (take 2)
Data Platform
@wslulciuc
Problems
● What’s our job’s input dataset?
● Does the dataset have an owner?
● How often is the dataset updated?
● Coordinate changes
● Figure out backfills
Example: Room bookings pipeline (take 2)
Data Platform
@wslulciuc
Problems
● What’s our job’s input dataset?
● Does the dataset have an owner?
● How often is the dataset updated?
● Coordinate changes
● Figure out backfills
Example: Room bookings pipeline (take 2)
Data Platform
@wslulciuc
Problems
● What’s our job’s input dataset?
● Does the dataset have an owner?
● How often is the dataset updated?
● Coordinate changes
● Figure out backfills
Example: Room bookings pipeline (take 2)
Data Platform
Example: Room bookings pipeline (take 2) @wslulciuc
Job Postgres
Room Bookings
Workflow
Top Locations
S3
We also had to coordinate changes to
our input data
Scheduler
Our view
Dataset
Job
Job failure
Room bookings
workflow
Global view!
Dataset
Job
Job failure
Room bookings
workflow
Top locations
dataset
@wslulciucExample: Room bookings pipeline (take 2)
Room Bookings (ALL)
created: feb. 15, 2010All room bookings since beginning of time
Owner: Data Engineering
Location: s3://room_bookings/raw/
Info
Schema: https://registry.wework.com/schemas/ids/2
Updated: Hourly
Data Platform
Description: All room bookings since beginning of time
Oh, version
bumped!
S3
Patch, deploy, trigger!
Dataset
Job
Room bookings
workflow
Top locations
dataset
Trigger
Data Platform
@wslulciuc
Problems
● What’s our job’s input dataset?
● Does the dataset have an owner?
● How often is the dataset updated?
● Coordinate changes
● Figure out backfills
Example: Room bookings pipeline (take 2)
Data Platform
@wslulciuc
Problems
● What’s our job’s input dataset?
● Does the dataset have an owner?
● How often is the dataset updated?
● Coordinate changes
● Figure out backfills
Example: Room bookings pipeline (take 2)
Data Platform
@wslulciuc
Problems
● What’s our job’s input dataset?
● Does the dataset have an owner?
● How often is the dataset updated?
● Coordinate changes
● Figure out backfills
Example: Room bookings pipeline (take 2)
Data Platform
@wslulciuc
RECAP
● Make it trival to discovery datasets
● Global context when debugging
● Easily handle backfills
○ Datasets as dependencies
Future work05
Data Platform
WeWork + Marquez
● Data platform built around Marquez
● Internal integrations
○ Scheduling
○ Batching
○ Streaming
@wslulciucMarquez: Future work
Data Platform
Roadmap
● Short-term
○ Release Marquez 0.1.0
○ Docs
● Long-term
○ Marquez UI
@wslulciucMarquez: Future work
github.com/MarquezProject
@MarquezProject
Thanks!
Data Platform DataEngConf NYC ‘18
Data Platform
We’re
hiring!
contact: willy.lulciuc@wework.com
Questions?
Data Platform DataEngConf NYC ‘18

Weitere ähnliche Inhalte

Was ist angesagt?

Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitSpark Summit
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Flink Forward
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRAmazon Web Services
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsGuido Schmutz
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufVerverica
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
第3回ナレッジグラフ推論チャレンジ2020の紹介
第3回ナレッジグラフ推論チャレンジ2020の紹介第3回ナレッジグラフ推論チャレンジ2020の紹介
第3回ナレッジグラフ推論チャレンジ2020の紹介KnowledgeGraph
 
汎用Web API“SPARQL”でオープンデータ検索
汎用Web API“SPARQL”でオープンデータ検索汎用Web API“SPARQL”でオープンデータ検索
汎用Web API“SPARQL”でオープンデータ検索uedayou
 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsJulien Le Dem
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesFlink Forward
 
Graph Analytics with ArangoDB
Graph Analytics with ArangoDBGraph Analytics with ArangoDB
Graph Analytics with ArangoDBArangoDB Database
 
Apache Spark Data Validation
Apache Spark Data ValidationApache Spark Data Validation
Apache Spark Data ValidationDatabricks
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
 
SharePoint Online 外部共有を考える
SharePoint Online 外部共有を考えるSharePoint Online 外部共有を考える
SharePoint Online 外部共有を考えるTeruchika Yamada
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 
Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Flink Forward
 
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformHow to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformDatabricks
 
Modern Data Flow
Modern Data FlowModern Data Flow
Modern Data Flowconfluent
 
Talend Data Preparation Overview
Talend Data Preparation OverviewTalend Data Preparation Overview
Talend Data Preparation OverviewJean-Michel Franco
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLDatabricks
 

Was ist angesagt? (20)

Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
第3回ナレッジグラフ推論チャレンジ2020の紹介
第3回ナレッジグラフ推論チャレンジ2020の紹介第3回ナレッジグラフ推論チャレンジ2020の紹介
第3回ナレッジグラフ推論チャレンジ2020の紹介
 
汎用Web API“SPARQL”でオープンデータ検索
汎用Web API“SPARQL”でオープンデータ検索汎用Web API“SPARQL”でオープンデータ検索
汎用Web API“SPARQL”でオープンデータ検索
 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analytics
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
Graph Analytics with ArangoDB
Graph Analytics with ArangoDBGraph Analytics with ArangoDB
Graph Analytics with ArangoDB
 
Apache Spark Data Validation
Apache Spark Data ValidationApache Spark Data Validation
Apache Spark Data Validation
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
SharePoint Online 外部共有を考える
SharePoint Online 外部共有を考えるSharePoint Online 外部共有を考える
SharePoint Online 外部共有を考える
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!
 
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformHow to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
 
Modern Data Flow
Modern Data FlowModern Data Flow
Modern Data Flow
 
Talend Data Preparation Overview
Talend Data Preparation OverviewTalend Data Preparation Overview
Talend Data Preparation Overview
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
 

Ähnlich wie Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-based Triggers

ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!Timo Walther
 
MongoDB.local Austin 2018: Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch A...
MongoDB.local Austin 2018:  Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch A...MongoDB.local Austin 2018:  Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch A...
MongoDB.local Austin 2018: Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch A...MongoDB
 
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...Fabian Hueske
 
Keepin’ It Real(-Time) With Nadine Farah | Current 2022
Keepin’ It Real(-Time) With Nadine Farah | Current 2022Keepin’ It Real(-Time) With Nadine Farah | Current 2022
Keepin’ It Real(-Time) With Nadine Farah | Current 2022HostedbyConfluent
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessSATOSHI TAGOMORI
 
Streaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesStreaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesC4Media
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup Omid Vahdaty
 
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch FixData Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch FixStefan Krawczyk
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...Flink Forward
 
Workflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. TokyoWorkflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. TokyoTaro L. Saito
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
 
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022HostedbyConfluent
 
Architecting a next generation data platform
Architecting a next generation data platformArchitecting a next generation data platform
Architecting a next generation data platformhadooparchbook
 
Uncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony DavisUncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony DavisRed Gate Software
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseVictoriaMetrics
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Altinity Ltd
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingKent Graziano
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapItai Yaffe
 

Ähnlich wie Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-based Triggers (20)

ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
 
MongoDB.local Austin 2018: Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch A...
MongoDB.local Austin 2018:  Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch A...MongoDB.local Austin 2018:  Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch A...
MongoDB.local Austin 2018: Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch A...
 
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...
 
Keepin’ It Real(-Time) With Nadine Farah | Current 2022
Keepin’ It Real(-Time) With Nadine Farah | Current 2022Keepin’ It Real(-Time) With Nadine Farah | Current 2022
Keepin’ It Real(-Time) With Nadine Farah | Current 2022
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise Business
 
Streaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesStreaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+Tables
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
 
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch FixData Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch Fix
 
The Big Bad Data
The Big Bad DataThe Big Bad Data
The Big Bad Data
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
 
Workflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. TokyoWorkflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. Tokyo
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
 
Architecting a next generation data platform
Architecting a next generation data platformArchitecting a next generation data platform
Architecting a next generation data platform
 
Uncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony DavisUncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony Davis
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
 
MicroStrategy at Badoo
MicroStrategy at BadooMicroStrategy at Badoo
MicroStrategy at Badoo
 

Kürzlich hochgeladen

Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxchumtiyababu
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadhamedmustafa094
 

Kürzlich hochgeladen (20)

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 

Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-based Triggers