SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Agenda
Me and DataMountaineer
Data Integration
Kafka Connect
Demo
Andrew Stevenson
Lead Mountain Goat at DM
Solution Architect
- Fast Data
- Big Data as long as it’s Fast
Contributed to
- Kafka Connect, Sqoop, Kite SDK
Kafka Connect Connectors
20+ connectors.
DataStax Certified Cassandra Sink
Professional Services
Implementations, Architecture reviews
DevOps & Tooling
Connector Support
Partners
Data Integration
Loading and unloading data
should be easy
But it takes too long
(certainly on hadoop)
Why?
Enterprise pipelines must consider:
Delivery semantics
Offset management
Serialization / de-serialization
Partitioning / scalability
Fault tolerance / failover
Data model integration
CI/CD
Metrics / monitoring
Which results in?
Multiple technologies
- Bash wrappers on Sqoop
- Oozie Xml 
- Custom Java/Scala/C#
- Third Party
- Multiple teams hand roll similar solutions
Lack of separation of concerns
- Extract/loading ends up domain specific
What we really care
about…
DOMAIN SPECIFIC TRANSFORMATIONS
Focus on adding value
Kafka Connect?
✓Delivery semantics
✓Offset management
✓Serialization / de-serialization
✓Partitioning / scalability
✓Fault tolerance / fail-over
✓Data model integration
✓Metrics
Out of the Box – ONE FRAMEWROK
Lets you focus on domain logic
Kafka Connect
“a common framework facilitating
data streams
between kafka and other systems”
Ease of use 
deploy flows via configuration files
with no code necessary
Out of the box & Community
Configurations
are key-value mappings
name connector’s unique name
connector.class connector’s class
max.tasks maximum tasks to create
Option[topics] list of topics (for sinks)
Config Example
name = kudu-sink
connector.class = KuduSinkConnector
tasks.max = 1
topics = kudu_test
connect.kudu.master = quickstart
connect.kudu.sink.kcql = INSERT INTO KuduTable
SELECT * FROM kudu_test
KCQL
is a SQL like syntax allowing streamlined
configuration of Kafka Sink Connectors and
then some more..
Example:
Project fields, rename or ignore them and further
customise in plain text
INSERT INTO transactions SELECT field1 AS column1, field2 AS column2 FROM TransactionTopic;
INSERT INTO audits SELECT * FROM AuditsTopic;
INSERT INTO logs SELECT * FROM LogsTopic AUTOEVOLVE;
INSERT INTO invoices SELECT * FROM InvoiceTopic PK invoiceID;
KCQL |
{ "sensor_id": "01" , "temperature": 52.7943, "ts": 1484648810 }
{ “sensor_id": "02" , "temperature": 28.8597, "ts": 1484648810 }
INSERT INTO sensor_ringbuffer
SELECT sensor_id, temperature, ts
FROM coap_sensor_topic
WITHFORMAT JSON
STOREAS RING_BUFFER
INSERT INTO sensor_reliabletopic
SELECT sensor_id, temperature, ts
FROM coap_sensor_topic
WITHFORMAT AVRO
STOREAS RELIABLE_TOPIC
Deeper into Connect
Modes
Workers
Connectors
Tasks
Converters
Modes
Standalone
- single node, for testing, one off import/exports
Distributed
- 1 or more workers on 1 or more servers
form a cluster/consumer group
Interaction
Standalone
- properties file at start up
Distributed
- rest API  ./cli
- create  ./cli create conn < conn.conf
- start  ./cli start conn < conn.conf
- stop  ./cli stop conn
- restart  ./cli restart conn
- pause  ./cli pause conn
- remove  ./cli rm conn
- status  ./cli ps
- plugins  ./cli plugins
- validate  ./cli validate < conn.conf
Workers
group.id = cluster1
group.id = cluster1
group.id = cluster1
* Kafka Consumer Group
Kafka
Connectors
Define the how
- which plugin to use, must be on CLASSPATH
- breaks up work into tasks
- multiple per cluster, unique name
- fault tolerant
Happy flow
Rest
API
W1
W2
W3
C1
C1 T1
Config topic
C1 T1
C1 T2
C1 T3
Put
* Coordinator
Unhappy flow
W1
W2
W3
C1
C1 T1
Config topic
C1 T1
C1 T2
C1 T3
* ReBalanceListener
Connector API
class CoherenceSinkConnector extends SinkConnector {
override def taskClass(): Class[_ <: Task]
override def start(props: util.Map[String, String]): Unit
override def taskConfigs(maxTasks: Int): util.List[util.Map[String, String]]
override def stop(): Unit
}
Tasks
Perform the actual work
- loading / unloading
- single threaded
Used to Scale
- more task more parallelism
- kafka consumer group managed
- if (tasks > partitions) => idle tasks
Contains no state, it’s in Kafka
- started
- stopped
- restarted
- pause
Task API
class CoherenceSinkTask extends SinkTask {
override def start(props: util.Map[String, String]): Unit
override def stop(): Unit
override def flush(offsets: util.Map[TopicPartition,
OffsetAndMetadata])
override def put(records: util.Collection[SinkRecord])
}
Converters
JsonConverter
- ships with Kafka
AvroConverter
- ships with Confluent
- integrates with Schema Registry
kafka-connect-blockchain
kafka-connect-bloomberg
kafka-connect-cassandra
kafka-connect-coap
kafka-connect-druid
kafka-connect-elastic
kafka-connect-ftp
kafka-connect-hazelcast
kafka-connect-hbase
kafka-connect-influxdb
kafka-connect-jms
kafka-connect-kudu
kafka-connect-mongodb
kafka-connect-mqtt
kafka-connect-redis
kafka-connect-rethink
kafka-connect-voltdb
kafka-connect-yahoo
Source: https://github.com/datamountaineer/stream-reactor
Integration Tests: http://coyote.landoop.com/connect/
Connect UI
Schema UI
Topic UI
Dockers
Cloudera CSD
Kafka Connect
Scalable
Fault tolerant
Common framework
Does the hard for work you
DEMO

Weitere ähnliche Inhalte

Was ist angesagt?

Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsSlim Baltagi
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Knoldus Inc.
 
Data integration with Apache Kafka
Data integration with Apache KafkaData integration with Apache Kafka
Data integration with Apache Kafkaconfluent
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridPaolo Castagna
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Fieldconfluent
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Kafka connect 101
Kafka connect 101Kafka connect 101
Kafka connect 101Whiteklay
 
Confluent Developer Training
Confluent Developer TrainingConfluent Developer Training
Confluent Developer Trainingconfluent
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
 
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...HostedbyConfluent
 
Confluent building a real-time streaming platform using kafka streams and k...
Confluent   building a real-time streaming platform using kafka streams and k...Confluent   building a real-time streaming platform using kafka streams and k...
Confluent building a real-time streaming platform using kafka streams and k...Thomas Alex
 
Scaling an Event-Driven Architecture with IBM and Confluent | Antony Amanse a...
Scaling an Event-Driven Architecture with IBM and Confluent | Antony Amanse a...Scaling an Event-Driven Architecture with IBM and Confluent | Antony Amanse a...
Scaling an Event-Driven Architecture with IBM and Confluent | Antony Amanse a...HostedbyConfluent
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedGuozhang Wang
 
Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)
Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)
Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)Keigo Suda
 
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, Nutanix
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, NutanixGuaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, Nutanix
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, NutanixHostedbyConfluent
 
Changing landscapes in data integration - Kafka Connect for near real-time da...
Changing landscapes in data integration - Kafka Connect for near real-time da...Changing landscapes in data integration - Kafka Connect for near real-time da...
Changing landscapes in data integration - Kafka Connect for near real-time da...HostedbyConfluent
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaGuido Schmutz
 

Was ist angesagt? (20)

Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
 
Data integration with Apache Kafka
Data integration with Apache KafkaData integration with Apache Kafka
Data integration with Apache Kafka
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - Madrid
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Kafka connect 101
Kafka connect 101Kafka connect 101
Kafka connect 101
 
Confluent Developer Training
Confluent Developer TrainingConfluent Developer Training
Confluent Developer Training
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Kafka Connect
Kafka ConnectKafka Connect
Kafka Connect
 
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
 
Confluent building a real-time streaming platform using kafka streams and k...
Confluent   building a real-time streaming platform using kafka streams and k...Confluent   building a real-time streaming platform using kafka streams and k...
Confluent building a real-time streaming platform using kafka streams and k...
 
Scaling an Event-Driven Architecture with IBM and Confluent | Antony Amanse a...
Scaling an Event-Driven Architecture with IBM and Confluent | Antony Amanse a...Scaling an Event-Driven Architecture with IBM and Confluent | Antony Amanse a...
Scaling an Event-Driven Architecture with IBM and Confluent | Antony Amanse a...
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
 
Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)
Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)
Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, Nutanix
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, NutanixGuaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, Nutanix
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, Nutanix
 
Changing landscapes in data integration - Kafka Connect for near real-time da...
Changing landscapes in data integration - Kafka Connect for near real-time da...Changing landscapes in data integration - Kafka Connect for near real-time da...
Changing landscapes in data integration - Kafka Connect for near real-time da...
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
 

Andere mochten auch

Processing IoT Data with Apache Kafka
Processing IoT Data with Apache KafkaProcessing IoT Data with Apache Kafka
Processing IoT Data with Apache KafkaMatthew Howlett
 
Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Nitin Kumar
 
Getting started with Azure Event Hubs and Stream Analytics services
Getting started with Azure Event Hubs and Stream Analytics servicesGetting started with Azure Event Hubs and Stream Analytics services
Getting started with Azure Event Hubs and Stream Analytics servicesVladimir Bychkov
 
London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)Landoop Ltd
 
Not Only Streams for Akademia JLabs
Not Only Streams for Akademia JLabsNot Only Streams for Akademia JLabs
Not Only Streams for Akademia JLabsKonrad Malawski
 
IoT Connected Brewery
IoT Connected BreweryIoT Connected Brewery
IoT Connected BreweryJason Hubbard
 
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...confluent
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at TwitterPrasad Wagle
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformconfluent
 
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkBuilding Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkTodd Fritz
 
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...Data Con LA
 
IoT Innovation Lab Berlin @relayr - Kay Lerch on Getting basics right for you...
IoT Innovation Lab Berlin @relayr - Kay Lerch on Getting basics right for you...IoT Innovation Lab Berlin @relayr - Kay Lerch on Getting basics right for you...
IoT Innovation Lab Berlin @relayr - Kay Lerch on Getting basics right for you...Kay Lerch
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured StreamingKnoldus Inc.
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseAldrin Piri
 
Comparison of various streaming technologies
Comparison of various streaming technologiesComparison of various streaming technologies
Comparison of various streaming technologiesSachin Aggarwal
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Spark Summit
 
Real-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS AzureReal-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS AzureKhalid Salama
 
Reactive integrations with Akka Streams
Reactive integrations with Akka StreamsReactive integrations with Akka Streams
Reactive integrations with Akka StreamsKonrad Malawski
 

Andere mochten auch (20)

Processing IoT Data with Apache Kafka
Processing IoT Data with Apache KafkaProcessing IoT Data with Apache Kafka
Processing IoT Data with Apache Kafka
 
Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017
 
Getting started with Azure Event Hubs and Stream Analytics services
Getting started with Azure Event Hubs and Stream Analytics servicesGetting started with Azure Event Hubs and Stream Analytics services
Getting started with Azure Event Hubs and Stream Analytics services
 
Blr hadoop meetup
Blr hadoop meetupBlr hadoop meetup
Blr hadoop meetup
 
Storm over gearpump
Storm over gearpumpStorm over gearpump
Storm over gearpump
 
London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)
 
Not Only Streams for Akademia JLabs
Not Only Streams for Akademia JLabsNot Only Streams for Akademia JLabs
Not Only Streams for Akademia JLabs
 
IoT Connected Brewery
IoT Connected BreweryIoT Connected Brewery
IoT Connected Brewery
 
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
 
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkBuilding Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
 
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
 
IoT Innovation Lab Berlin @relayr - Kay Lerch on Getting basics right for you...
IoT Innovation Lab Berlin @relayr - Kay Lerch on Getting basics right for you...IoT Innovation Lab Berlin @relayr - Kay Lerch on Getting basics right for you...
IoT Innovation Lab Berlin @relayr - Kay Lerch on Getting basics right for you...
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
 
Comparison of various streaming technologies
Comparison of various streaming technologiesComparison of various streaming technologies
Comparison of various streaming technologies
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
 
Real-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS AzureReal-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS Azure
 
Reactive integrations with Akka Streams
Reactive integrations with Akka StreamsReactive integrations with Akka Streams
Reactive integrations with Akka Streams
 

Ähnlich wie Kafka connect

AutoScaling and Drupal
AutoScaling and DrupalAutoScaling and Drupal
AutoScaling and DrupalPromet Source
 
How to Treat a Network Like a Container (Or Get Close)
How to Treat a Network Like a Container (Or Get Close)How to Treat a Network Like a Container (Or Get Close)
How to Treat a Network Like a Container (Or Get Close)All Things Open
 
All Things Open 2017: How to Treat a Network as a Container
All Things Open 2017: How to Treat a Network as a ContainerAll Things Open 2017: How to Treat a Network as a Container
All Things Open 2017: How to Treat a Network as a ContainerRosemary Wang
 
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...Codemotion
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01Scott Miao
 
DevOpsDaysPhoenix - CI/CD Pipelines for CDN
DevOpsDaysPhoenix -  CI/CD Pipelines for CDNDevOpsDaysPhoenix -  CI/CD Pipelines for CDN
DevOpsDaysPhoenix - CI/CD Pipelines for CDNAkshay Ranganath
 
Modern CI/CD in the microservices world with Kubernetes
Modern CI/CD in the microservices world with KubernetesModern CI/CD in the microservices world with Kubernetes
Modern CI/CD in the microservices world with KubernetesMikalai Alimenkou
 
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
20141021 AWS Cloud Taekwon - Startup Best Practices on AWSAmazon Web Services Korea
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0Joe Stein
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Codemotion
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Codemotion
 
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesDataWorks Summit
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 
Bootstrapping - Session 1 - Your First Week with Amazon EC2
Bootstrapping - Session 1 - Your First Week with Amazon EC2Bootstrapping - Session 1 - Your First Week with Amazon EC2
Bootstrapping - Session 1 - Your First Week with Amazon EC2Amazon Web Services
 
Discovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clustersDiscovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clustersIvan Donev
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value StoreSantal Li
 
Distribute key value_store
Distribute key value_storeDistribute key value_store
Distribute key value_storedrewz lin
 
LinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on Openstack
LinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on OpenstackLinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on Openstack
LinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on OpenstackOpenShift Origin
 

Ähnlich wie Kafka connect (20)

AutoScaling and Drupal
AutoScaling and DrupalAutoScaling and Drupal
AutoScaling and Drupal
 
How to Treat a Network Like a Container (Or Get Close)
How to Treat a Network Like a Container (Or Get Close)How to Treat a Network Like a Container (Or Get Close)
How to Treat a Network Like a Container (Or Get Close)
 
All Things Open 2017: How to Treat a Network as a Container
All Things Open 2017: How to Treat a Network as a ContainerAll Things Open 2017: How to Treat a Network as a Container
All Things Open 2017: How to Treat a Network as a Container
 
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01
 
DevOpsDaysPhoenix - CI/CD Pipelines for CDN
DevOpsDaysPhoenix -  CI/CD Pipelines for CDNDevOpsDaysPhoenix -  CI/CD Pipelines for CDN
DevOpsDaysPhoenix - CI/CD Pipelines for CDN
 
Architecting Applications
Architecting ApplicationsArchitecting Applications
Architecting Applications
 
Modern CI/CD in the microservices world with Kubernetes
Modern CI/CD in the microservices world with KubernetesModern CI/CD in the microservices world with Kubernetes
Modern CI/CD in the microservices world with Kubernetes
 
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
 
MidSem
MidSemMidSem
MidSem
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
 
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
Bootstrapping - Session 1 - Your First Week with Amazon EC2
Bootstrapping - Session 1 - Your First Week with Amazon EC2Bootstrapping - Session 1 - Your First Week with Amazon EC2
Bootstrapping - Session 1 - Your First Week with Amazon EC2
 
Discovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clustersDiscovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clusters
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value Store
 
Distribute key value_store
Distribute key value_storeDistribute key value_store
Distribute key value_store
 
LinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on Openstack
LinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on OpenstackLinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on Openstack
LinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on Openstack
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 

Kürzlich hochgeladen (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Kafka connect

Hinweis der Redaktion

  1. First me, hurray, then DataMountaineer, who we are and want we do, then Connect, why, what is it, how does it work. Hopefully time for a demo
  2. Big Data background, Clearstream, HFTs, Investment banks. We were Big Data experts but saw first hand Big data’s no good if it’s too slow. Big data is no good if its slow. Anyone know Knight Capital? Lost $440 million in 2008 in 30 minutes, that's 172,000 per second. Hourly or EOD risk reporting is no good, too late your dead. Put another way would cross the road outside with information that's 10 minutes old?
  3. Everybody will have been involved in data integration, data is it new oil. It should be easy....right? Just copy the data from that database and put it over here.
  4. We need to think about these items even if we are already using Kafka consumers and producers. Or we should be.
  5. All results in wasted time and resources. If your a CTO of a large organization this will make your toes curl. No common framework. Team A puts logic in his extract component, means it's not reusable. Team B re implements.
  6. Kafka connect, scalable and fault tolerant, addresses developers needs. One stack, all part of Kafka.
  7. Ingest data, unload data, buffer and persist data, process data all from one distribution.
  8. SAT4s is an example, configured 4 second windfarm signal flow from a SQL Server database to HDFS and Cassandra, no coding.
  9. Hazelcast requires that data is serializable, JSON and Avro are supported.
  10. Highest level component as a developer you deal with. You can have multiple instances of connectors per cluster, must have a unique name
  11. Config topic is a backing topic of each connect cluster. 1 Partition, guarantees order of configuration updates. Starts up you connector, asks it for task configurations assigns tasks to members of the cluster/consumer group. How does it do this? It implements Kafka Abstract Coordinator.
  12. Kafka notifies the Herder that a rebalance is happening, the framework, rebalances the partitions or tasks across the remaining consumer group or connect cluster.
  13. Basically a Consumer group behind the scenes, single threaded, so just like a regular if you have more tasks than partitions in you topics they'll be idle. No state, so good for restful, micro service architectures. it's in kafka.
  14. SinkRecords – is a wrapper, contains topic information, key and value schemas and payloads. Connect is configured with Converters that serialized/deserialize back and forth between the data in Kafka to Connect.
  15. You could for example write your own Protobuf converter and plug that.