SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
Keep your Metadata Repository Current
with Event-Driven Updates using
CDC and Kafka Connect
Barbara Eckman, Ph.D.
Senior Principal Software Architect
Comcast
What Makes Your Life Hard?
Powerful business insights come from models integrating data that spans silos.
• Where is the data?
• What does it mean?
• Who produced it?
• Who touched it during its journey?
• What is its quality?
Industry-Leading Metadata Management
A unified metadata repository
– Heterogeneity, extensibility, scalability, accessibility
– Emerging industry standard around K-V store, free-text
search, graphdb
• Comcast has had it since 2017!
– Apache Atlas with many Comcast-contributed extensions
Keynote
Portal: Human Metadata Input, Search
Metadata management and storage
Metadata,
lineage
sources
On-prem
Hadoop
AWS Public Cloud RDBMS Systems
Current State
Automated capture
Event-driven
pushes
RDMBS
Connector
Nightly pulls
Apache
Kafka
Legacy Architecture for Metadata Capture
Pull bulk metadata
on regular schedule
Identify deltas
Java connector
Comcast code
OS code
metadata
RDBMS
Portal: Human Metadata Input, Search
Metadata management and storage
Metadata,
lineage
sources
On-prem
Hadoop
AWS Public Cloud RDBMS Systems
Desired State
Automated capture
Event-driven
pushes
Event-driven push:
HOW???
Obvious Fix that didn’t work:
• Use Kafka Connect log-based CDC
for pushing metadata deltas!
Binlogs don’t capture changes in
metadata except as DDL statements
Another Obvious Fix that didn’t quite work:
• Use Kafka Connect query-based
CDC for querying system tables for
metadata deltas!
Queries on system tables give current
state only. Deletes are not captured.
Less Obvious Fix that DID work:
• Outbox Architectural Pattern
• Maintain copies of system catalog
metadata tables as outboxes
Binlogs DO capture changes in
metadata outboxes, just like regular
data tables
Creating Outboxes from System Catalogs
create table outbox.info_columns as
select
TABLE_CATALOG,
TABLE_SCHEMA,
TABLE_NAME,
COLUMN_NAME,
DATA_TYPE,
IS_NULLABLE,
CHARACTER_MAXIMUM_LENGTH
from information_schema.columns;
• MERGE updates from
information_schema.columns to
outbox.info_columns on a regular
schedule (eg 5 minutes, 30
minutes, 1 hour)
• Why not use a trigger to activate
merge?
– No triggers on system tables
Apache
Kafka
New Architecture pushes deltas as they happen
Comcast code
OS code
metadata
MySQL,
MSSQL
Debezium
Log-based
CDC Source
Connector
Kafka Connect
Atlas Sink
Connector
Portal: Human Metadata Input, Search
Metadata management and storage
Metadata,
lineage
sources
On-prem
Hadoop
AWS Public Cloud RDBMS Systems
Desired State Is New State!
Automated capture Event-driven pushes
Acknowledgements
barbara_eckman@cable.comcast.com
Backup Slides
Less Obvious Fix that kinda worked:
• Use JCBC CDC Connector
• Maintain copies of system catalog
metadata tables
• Configure MSSQL CDC on these
tables
Query-based CDC that captures
changes in metadata outboxes.
RDBMS-specific.
EXECUTE sys.sp_cdc_enable_db;

Weitere ähnliche Inhalte

Was ist angesagt?

Azure Cosmos DB Kafka Connectors | Abinav Rameesh, Microsoft
Azure Cosmos DB Kafka Connectors | Abinav Rameesh, MicrosoftAzure Cosmos DB Kafka Connectors | Abinav Rameesh, Microsoft
Azure Cosmos DB Kafka Connectors | Abinav Rameesh, MicrosoftHostedbyConfluent
 
Gwen Shapira, Confluent | Kafka Summit 2020 Keynote | Kafka’s New Architecture
Gwen Shapira, Confluent | Kafka Summit 2020 Keynote | Kafka’s New ArchitectureGwen Shapira, Confluent | Kafka Summit 2020 Keynote | Kafka’s New Architecture
Gwen Shapira, Confluent | Kafka Summit 2020 Keynote | Kafka’s New Architectureconfluent
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...HostedbyConfluent
 
0-330km/h: Porsche's Data Streaming Journey | Sridhar Mamella, Porsche
0-330km/h: Porsche's Data Streaming Journey | Sridhar Mamella, Porsche0-330km/h: Porsche's Data Streaming Journey | Sridhar Mamella, Porsche
0-330km/h: Porsche's Data Streaming Journey | Sridhar Mamella, PorscheHostedbyConfluent
 
Column and hadoop
Column and hadoopColumn and hadoop
Column and hadoopAlex Jiang
 
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetHostedbyConfluent
 
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...HostedbyConfluent
 
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...HostedbyConfluent
 
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...HostedbyConfluent
 
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...HostedbyConfluent
 
Devops Days, 2019 - Charlotte
Devops Days, 2019 - CharlotteDevops Days, 2019 - Charlotte
Devops Days, 2019 - Charlottebotsplash.com
 
Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Ala...
Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Ala...Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Ala...
Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Ala...HostedbyConfluent
 
The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...
The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...
The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...HostedbyConfluent
 
Kafka Excellence at Scale – Cloud, Kubernetes, Infrastructure as Code (Vik Wa...
Kafka Excellence at Scale – Cloud, Kubernetes, Infrastructure as Code (Vik Wa...Kafka Excellence at Scale – Cloud, Kubernetes, Infrastructure as Code (Vik Wa...
Kafka Excellence at Scale – Cloud, Kubernetes, Infrastructure as Code (Vik Wa...HostedbyConfluent
 
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...HostedbyConfluent
 
How to Discover, Visualize, Catalog, Share and Reuse your Kafka Streams (Jona...
How to Discover, Visualize, Catalog, Share and Reuse your Kafka Streams (Jona...How to Discover, Visualize, Catalog, Share and Reuse your Kafka Streams (Jona...
How to Discover, Visualize, Catalog, Share and Reuse your Kafka Streams (Jona...HostedbyConfluent
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...HostedbyConfluent
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformconfluent
 
Tackling Kafka, with a Small Team ( Jaren Glover, Robinhood) Kafka Summit SF ...
Tackling Kafka, with a Small Team ( Jaren Glover, Robinhood) Kafka Summit SF ...Tackling Kafka, with a Small Team ( Jaren Glover, Robinhood) Kafka Summit SF ...
Tackling Kafka, with a Small Team ( Jaren Glover, Robinhood) Kafka Summit SF ...confluent
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterconfluent
 

Was ist angesagt? (20)

Azure Cosmos DB Kafka Connectors | Abinav Rameesh, Microsoft
Azure Cosmos DB Kafka Connectors | Abinav Rameesh, MicrosoftAzure Cosmos DB Kafka Connectors | Abinav Rameesh, Microsoft
Azure Cosmos DB Kafka Connectors | Abinav Rameesh, Microsoft
 
Gwen Shapira, Confluent | Kafka Summit 2020 Keynote | Kafka’s New Architecture
Gwen Shapira, Confluent | Kafka Summit 2020 Keynote | Kafka’s New ArchitectureGwen Shapira, Confluent | Kafka Summit 2020 Keynote | Kafka’s New Architecture
Gwen Shapira, Confluent | Kafka Summit 2020 Keynote | Kafka’s New Architecture
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 
0-330km/h: Porsche's Data Streaming Journey | Sridhar Mamella, Porsche
0-330km/h: Porsche's Data Streaming Journey | Sridhar Mamella, Porsche0-330km/h: Porsche's Data Streaming Journey | Sridhar Mamella, Porsche
0-330km/h: Porsche's Data Streaming Journey | Sridhar Mamella, Porsche
 
Column and hadoop
Column and hadoopColumn and hadoop
Column and hadoop
 
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
 
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
 
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
 
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
 
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
 
Devops Days, 2019 - Charlotte
Devops Days, 2019 - CharlotteDevops Days, 2019 - Charlotte
Devops Days, 2019 - Charlotte
 
Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Ala...
Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Ala...Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Ala...
Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Ala...
 
The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...
The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...
The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...
 
Kafka Excellence at Scale – Cloud, Kubernetes, Infrastructure as Code (Vik Wa...
Kafka Excellence at Scale – Cloud, Kubernetes, Infrastructure as Code (Vik Wa...Kafka Excellence at Scale – Cloud, Kubernetes, Infrastructure as Code (Vik Wa...
Kafka Excellence at Scale – Cloud, Kubernetes, Infrastructure as Code (Vik Wa...
 
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
 
How to Discover, Visualize, Catalog, Share and Reuse your Kafka Streams (Jona...
How to Discover, Visualize, Catalog, Share and Reuse your Kafka Streams (Jona...How to Discover, Visualize, Catalog, Share and Reuse your Kafka Streams (Jona...
How to Discover, Visualize, Catalog, Share and Reuse your Kafka Streams (Jona...
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
 
Tackling Kafka, with a Small Team ( Jaren Glover, Robinhood) Kafka Summit SF ...
Tackling Kafka, with a Small Team ( Jaren Glover, Robinhood) Kafka Summit SF ...Tackling Kafka, with a Small Team ( Jaren Glover, Robinhood) Kafka Summit SF ...
Tackling Kafka, with a Small Team ( Jaren Glover, Robinhood) Kafka Summit SF ...
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
 

Ähnlich wie Keep your Metadata Repository Current with Event-Driven Updates using CDC and Kafka Connect

Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAnnouncing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAmazon Web Services
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsAsis Mohanty
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...DataWorks Summit
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.Amazon Web Services
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.Amazon Web Services
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Michael Rys
 
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Jamie Kinney
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)Amazon Web Services Korea
 
Hyperbatch danielpeter-161117095610
Hyperbatch danielpeter-161117095610Hyperbatch danielpeter-161117095610
Hyperbatch danielpeter-161117095610Sandeep Dobariya
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of MetadataJim Dowling
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - DatalakeLam Le
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analyticsconfluent
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveTorsten Steinbach
 
Improve power bi performance
Improve power bi performanceImprove power bi performance
Improve power bi performanceAnnie Xu
 

Ähnlich wie Keep your Metadata Repository Current with Event-Driven Updates using CDC and Kafka Connect (20)

Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAnnouncing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture Patterns
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
 
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
 
Hyperbatch danielpeter-161117095610
Hyperbatch danielpeter-161117095610Hyperbatch danielpeter-161117095610
Hyperbatch danielpeter-161117095610
 
HyperBatch
HyperBatchHyperBatch
HyperBatch
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of Metadata
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep Dive
 
Improve power bi performance
Improve power bi performanceImprove power bi performance
Improve power bi performance
 

Mehr von confluent

Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 
The Journey to Data Mesh with Confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluentconfluent
 

Mehr von confluent (20)

Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 
The Journey to Data Mesh with Confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluent
 

Kürzlich hochgeladen

A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactivestartupro
 
Dublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptxDublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptxKunal Gupta
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdfHCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdfROWELL MARQUINA
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Automation Ops Series: Session 3 - Solutions management
Automation Ops Series: Session 3 - Solutions managementAutomation Ops Series: Session 3 - Solutions management
Automation Ops Series: Session 3 - Solutions managementDianaGray10
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 

Kürzlich hochgeladen (20)

A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactive
 
Dublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptxDublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdfHCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Automation Ops Series: Session 3 - Solutions management
Automation Ops Series: Session 3 - Solutions managementAutomation Ops Series: Session 3 - Solutions management
Automation Ops Series: Session 3 - Solutions management
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 

Keep your Metadata Repository Current with Event-Driven Updates using CDC and Kafka Connect

  • 1. Keep your Metadata Repository Current with Event-Driven Updates using CDC and Kafka Connect Barbara Eckman, Ph.D. Senior Principal Software Architect Comcast
  • 2. What Makes Your Life Hard? Powerful business insights come from models integrating data that spans silos. • Where is the data? • What does it mean? • Who produced it? • Who touched it during its journey? • What is its quality?
  • 3. Industry-Leading Metadata Management A unified metadata repository – Heterogeneity, extensibility, scalability, accessibility – Emerging industry standard around K-V store, free-text search, graphdb • Comcast has had it since 2017! – Apache Atlas with many Comcast-contributed extensions Keynote
  • 4. Portal: Human Metadata Input, Search Metadata management and storage Metadata, lineage sources On-prem Hadoop AWS Public Cloud RDBMS Systems Current State Automated capture Event-driven pushes RDMBS Connector Nightly pulls
  • 5. Apache Kafka Legacy Architecture for Metadata Capture Pull bulk metadata on regular schedule Identify deltas Java connector Comcast code OS code metadata RDBMS
  • 6. Portal: Human Metadata Input, Search Metadata management and storage Metadata, lineage sources On-prem Hadoop AWS Public Cloud RDBMS Systems Desired State Automated capture Event-driven pushes Event-driven push: HOW???
  • 7. Obvious Fix that didn’t work: • Use Kafka Connect log-based CDC for pushing metadata deltas! Binlogs don’t capture changes in metadata except as DDL statements
  • 8. Another Obvious Fix that didn’t quite work: • Use Kafka Connect query-based CDC for querying system tables for metadata deltas! Queries on system tables give current state only. Deletes are not captured.
  • 9. Less Obvious Fix that DID work: • Outbox Architectural Pattern • Maintain copies of system catalog metadata tables as outboxes Binlogs DO capture changes in metadata outboxes, just like regular data tables
  • 10. Creating Outboxes from System Catalogs create table outbox.info_columns as select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, DATA_TYPE, IS_NULLABLE, CHARACTER_MAXIMUM_LENGTH from information_schema.columns; • MERGE updates from information_schema.columns to outbox.info_columns on a regular schedule (eg 5 minutes, 30 minutes, 1 hour) • Why not use a trigger to activate merge? – No triggers on system tables
  • 11. Apache Kafka New Architecture pushes deltas as they happen Comcast code OS code metadata MySQL, MSSQL Debezium Log-based CDC Source Connector Kafka Connect Atlas Sink Connector
  • 12. Portal: Human Metadata Input, Search Metadata management and storage Metadata, lineage sources On-prem Hadoop AWS Public Cloud RDBMS Systems Desired State Is New State! Automated capture Event-driven pushes
  • 16. Less Obvious Fix that kinda worked: • Use JCBC CDC Connector • Maintain copies of system catalog metadata tables • Configure MSSQL CDC on these tables Query-based CDC that captures changes in metadata outboxes. RDBMS-specific. EXECUTE sys.sp_cdc_enable_db;

Hinweis der Redaktion

  1. Where is data about X? Most authoritative copy? What does this attribute mean? Fear of joining on the wrong “account number”? Who produced this data? Who messed up the data I produced? Data quality
  2. DON’T TALK ABOUT DETAILS OF THE RDBMS CONNECTOR YET! Dataset types: Hadoop Hive tables Hdfs files Kafka topics Kinesis streams AWS: S3 buckets, prefixes glue tables Dynamodb RDBMS Schemas: Avro JSON Delimited Parquet ML Models, Feature Engineering, Feature Sets Microservices
  3. Trigger on every table to send message to kafka when insert/update/delete is made?????!!!!!
  4. THEMES: Heterogeneity Extensibility Scale Dataset types currently supported: RDBMS Kafka topics Kinesis streams Schemas: Avro JSON Delimited Parquet AWS: S3 buckets, prefixes glue tables Dynamodb Hadoop Hive tables Hdfs files ML Models, Feature Engineering, Feature Sets
  5. ROBIN MOFFATT
  6. Also, has to be configured separately for each db.