SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Downloaden Sie, um offline zu lesen
messaging → logs
@apachekafka
Jorge Quilcate Otoya
@jeqo89
About me
Jorge Quilcate Otoya
Back-end/Integration Developer
at Sysco Middleware
@jeqo89 | github.com/jeqo | jeqo.github.io
Contexto
“Tecnología que permite comunicación asíncrona…
Channels, también conocidos queues (colas), son la ruta
lógica que conecta los programas y transmite los mensajes …
El remitente o producer (productor) es el programa que envía
mensajes, escribiendo el mensaje en un canal
El receptor o consumer (consumidor) es el programa que
recibe los mensajes, leyéndolo (y eliminandolo) del canal.”
Context: Messaging
Enterprise Integration Patterns - Gregor Hohpe and Bobby Woolf
http://www.enterpriseintegrationpatterns.com/patterns/messaging/Introduction.html
Message Channels: Point-to-Point, Pub/Sub
Messaging
use-case:
Job Queues
Fire and Forget
Store and Forward (a.k.a.
Push Model)
Broker a cargo de la
entrega confiable de
mensajes
Event sourcing and stream processing at scale - Martin Kleppmann
https://martin.kleppmann.com/2016/01/29/event-sourcing-stream-proce
ssing-at-ddd-europe.html
Implementations: JMS/AMQP
Messaging Challenges
Riesgo de mensajes
Out-of-order cuando se
re-intenta enviar un mensaje
fallido
Riesgo de inconsistencia en
distintos clientes
(producers and/or consumers)
Context: Logs
Records (registros) son adjuntados al final del Log...
Cada Record tiene un Key (llave)…
Los Records están ordenados…
El Orden define la noción de “tiempo”...
El Contenido no es importante en este punto, podría ser cualquiera
… Registran que ha pasado y cuando.
The Log: What every software engineer should know about real-time data's unifying abstraction - Jay Kreps
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
Logs everywhere
Cómo tu base de datos almacena información en disco de forma consistente?
Utiliza un log.
Cómo las réplicas de una base de datos sincronizan con otras réplicas?
Utiliza un log.
Cómo los datos una actividad quedan registrados en un sistema como Apache
Kafka?
Utiliza un log.
Cómo la infraestructura de tu aplicación se mantendrá robusta a escala?
Adivina cómo…
Using logs to build a solid data infrastructure (or why dual writes are a bad idea) - Martin Kleppmann
https://www.confluent.io/blog/using-logs-to-build-a-solid-data-infrastructure-or-why-dual-writes-are-a-bad-idea/
https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/
Log-Centric Architecture (a.k.a. Kappa)
“Un sistema que asume un log externo está
presente permite a los sistemas individuales
abandonar una gran cantidad de complejidad y
confiar en el log compartido.”
The Log: What every software engineer should know about real-time data's unifying abstraction - Jay Kreps
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
http://milinda.pathirage.org/kappa-architecture.com/
Logs use-case:
Event Log
Pull Model
Ordered stream de Eventos
Consumers a cargo de
obtener mensajes (poll)
Event sourcing and stream processing at scale - Martin Kleppmann
https://martin.kleppmann.com/2016/01/29/event-sourcing-stream-process
ing-at-ddd-europe.html
Implementations:
Apache Kafka,
Amazon Kinesis,
Apache DistributedLog
(incubating)
Solving Messaging Challenges with Logs
Orden y Reprocesamiento
Apache Kafka
A Distributed Streaming Platform
Apache Kafka: Hechos
➔ Nació de la necesidad de
resolver el problema de
data pipeline en
LinkedIn.
➔ Primeros use-cases:
Recolectar métricas de
sistemas y monitorear la
actividad de usuarios.
2010: Open-sourced
2011: Apache project
2012: Graduated from incubator
in October
2014: Confluent Inc. founded
Kafka: The Definitive Guide - Neha Narkhede, Gwen Shapira & Todd Palino
Apache Kafka: Use-cases
➔ Activity Tracking
➔ Messaging
➔ Metrics/Logging
➔ Commit Log
➔ Stream Processing
➔ Cloud Adoption
Apache Kafka
Tour
(v0.10.2.0)
Log Records
Kafka Cluster
Kafka Producer API
Kafka Consumer API
Kafka Streams API
Kafka Connect API
Kafka ++
Kafka Core
Log Record
from Topics to Partitions
http://kafka.apache.org/documentation
from Partitions to Segments
https://www.confluent.io/apache-kafka-talk-series/deep-dive-into-apache-kafka/
https://www.confluent.io/apache-kafka-talk-series/
from Segments to Records
https://www.confluent.io/apache-kafka-talk-series/deep-dive-into-apache-kafka/
https://www.confluent.io/apache-kafka-talk-series/
Log unit: Record
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
Lab: Log Record
Record Structure: Key/Value
Serialization/Deserialization
Metadata: Offset/Timestamp
Schema Evolution: Why Avro?
Reader’s schema y writer’s schema no
requieren ser la misma
Forward/Backward compatibility
➔ Agregar/eliminar campos con valores
por defector
➔ Tipo `null` explicito (no
optional/required markers)
➔ Posible cambiar data types
➔ Posible cambiar nombres (i.e. alias)
Designing Data-Intensive Applications - Martin Kleppmann
Kafka Cluster
Servicio de Coordinación centralizado: consensus, group
management, presence protocols, atomic broadcast
“Fuente de verdad” interno de Kafka
Usado para:
➔ Elección de Réplica Líder
➔ Sincronización réplicas (ISR)
➔ Y más
Kafka Topology: Why Zookeeper?
Distributed Consensus Reloaded: Apache Zookeeper and Replication in Kafka - Flavio Junqueira
https://www.confluent.io/blog/distributed-consensus-reloaded-apache-zookeeper-and-replication-in-kafka/
Balance Availability and Consistency
Use case: Activity
Tracking
➔ Retención: 3 días
➔ Más particiones
➔ Menor factor de
replicación
➔ Disponibilidad es más
importante
Use case: Inventory
adjustments
➔ Retención: 6 meses
➔ Menos particiones
➔ Mayor factor de
replicación
➔ Consistencia es más
importante
Streaming in Practice: Putting Kafka in Production - Roger Hoover
https://www.confluent.io/apache-kafka-talk-series/Streaming-in-Practice-Putting-Kafka-in-Production/
Lab: Kafka Cluster
Scalability: Cluster and
Brokers
Topics: Partitions,
Replication, ISR
Cleaning up: Compaction and
Retention
Be careful with putting
data in Containers
https://twitter.com/waxzce/status/829420329177083904
Kafka Clients API
Kafka Clients survey
https://www.confluent.io/blog/first-annual-
state-apache-kafka-client-use-survey
Kafka Producer API
Batching and Compression
Acknowledgment: Latency vs Durability
Ack=0 → No network delay → some data loss
Acknowledgment: Latency vs Durability
Ack=1 → 1 network round-trip → few data loss
Acknowledgment: Latency vs Durability
Ack=all (-1) → 2 network round-trip → no data loss
(in combination with `min.insync.replicas`)
Lab: Kafka
Producer
Batching and Compression
Acknowledgements
Results
kafka_producer_ack_zero_latency_sum/kafka_producer_ack_zero_latency_count
ack=0 => 0.05494 s.
kafka_producer_ack_one_latency_sum/kafka_producer_ack_one_latency_count
ack=1 => 0.06097 s.
kafka_producer_ack_all_latency_sum/kafka_producer_ack_all_latency_count
ack=* => 0.06375 s.
Benchmarking Apache Kafka: 2 million writes per second on 3 cheap machines- Roger Hoover
https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
Kafka Consumer API
➔ Consumer Groups as Logical
Subscribers
➔ Offset by Consumer instance
(group member)
➔ Consumer Groups as base of
parallelism, with Partitions
➔ Ordering ensured by partition
(+ keyed topics is normally
enough)
Multiple Consumers
At-Most-Once Delivery
➔ Scenario
El proceso del consumo se ‘cae’
luego de guardar su posición pero
antes de procesar el mensaje.
➔ Result
El proceso que retoma el
procesamiento, empezará de la
posición guardada, aún si algunos
mensajes previos no han sido
procesados.
At-Least-Once Delivery
➔ Scenario
El proceso de consumo se ‘cae’
luego de procesar los mensajes,
pero antes de guardar su
posición.
➔ Result
Cuando el nuevo proceso retoma el
procesamiento, los primeros
mensajes que reciba pueden ya
haber sido procesados.
Exactly-Once Delivery
“Exactly-once delivery require de la
cooperación con el sistema de
almacenamiento de destino …”
Próximamente (KIP-98):
● Idempotent Producer Guarantees
● Transactional Guarantees
Lab: Kafka
Consumer
Consumer Groups: Parallelism
Consumer Offsets: Control and
reprocessing
(https://jeqo.github.io/post/2017-01-31-kafka-rewind-consume
rs-offset/)
Kafka Streams API &
Kafka Connector API
Kafka Streams API & Kafka Connector API
Unifying Stream Processing and Interactive Queries in Apache Kafka - Eno Thereska
https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/
Kafka Streams
https://twitter.com/lcrsilveira/status/829615803133730816 https://twitter.com/jessetanderson/status/830113106277785600
Kafka Connect
HDFS, JDBC, GoldenGate,
Elasticsearch,
Couchbase, DataStax,
Cassandra, Attunity,
Azure IoTHub, SAP Hana,
VoltDb, FTP, JMS, JMX,
MongoDB, Solr, Splunk,
RethinkDB, SQS, S3,
MQTT, Redis, InfluxDB,
HBase, Hazelcast,
Twitter, and more...
Lab: Kafka
Streams & Kafka
Connector
“Simplified Consumer”
Stream/Table Duality
Windows
Kafka++
Confluent Platform
Confluent Platform: Apache Kafka Enterprise Edition
Lab: Confluent
Platform
Confluent Platform:
➔ Schema Registry
➔ REST API
Integración con
Apache Kafka
Lab: Integración
con Kafka
Integration Platforms:
➔ Camel
http://camel.apache.org/kafka.html
➔ Akka Streams
http://doc.akka.io/docs/akka-stream-kafka/current/home.html
➔ Oracle Service Bus
http://www.ateam-oracle.com/osb-transport-for-apache-kafka-part-1/
What’s in
discussion and/or
coming soon?
Exactly-once Delivery / Txn Messaging
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly
+Once+Delivery+and+Transactional+Messaging
Headers support (additional metadata)
https://cwiki.apache.org/confluence/display/KAFKA/KIP-82+-+Add+Rec
ord+Headers
ZStandard Compression support
https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+C
odec+for+ZStandard+Compression
Reset Offset tool
https://cwiki.apache.org/confluence/display/KAFKA/KIP-122%3A+Add+a
+tool+to+Reset+Consumer+Group+Offsets
https://cwiki.apache.org/confluence/display/KAFKA/
Kafka+Improvement+Proposals
How NOT to use
Kafka
Top 5:
➔ No consideration of data
on the inside vs outside
➔ Schema not externally
defined
➔ Same config for every
clients/topics
➔ 128 partitions as default
➔ Running on 8 overloaded
nodes
Kafka Summit 2016: 101 ways to config
Kafka - Badly
https://www.confluent.io/
kafka-summit-2016-101-ways-to-configure-kafka-badly
https://cwiki.apache.org/confluence/display/KAFKA/Operations
Further reading
Thanks!!!
Twitter: @jeqo89
GitHub: /jeqo
Blog: jeqo.github.io
Code: github.com/jeqo/talk-kafka-messaging-logs

Weitere ähnliche Inhalte

Was ist angesagt?

What is Docker | Docker Tutorial for Beginners | Docker Container | DevOps To...
What is Docker | Docker Tutorial for Beginners | Docker Container | DevOps To...What is Docker | Docker Tutorial for Beginners | Docker Container | DevOps To...
What is Docker | Docker Tutorial for Beginners | Docker Container | DevOps To...Edureka!
 
Kubernetes architecture
Kubernetes architectureKubernetes architecture
Kubernetes architectureJanakiram MSV
 
ksqlDB로 실시간 데이터 변환 및 스트림 처리
ksqlDB로 실시간 데이터 변환 및 스트림 처리ksqlDB로 실시간 데이터 변환 및 스트림 처리
ksqlDB로 실시간 데이터 변환 및 스트림 처리confluent
 
Integration Microservices
Integration MicroservicesIntegration Microservices
Integration MicroservicesKasun Indrasiri
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafkaconfluent
 
Event Driven Architecture (EDA) Reference Architecture | Anbu Krishnaswamy
Event Driven Architecture (EDA) Reference Architecture | Anbu KrishnaswamyEvent Driven Architecture (EDA) Reference Architecture | Anbu Krishnaswamy
Event Driven Architecture (EDA) Reference Architecture | Anbu KrishnaswamyBob Rhubart
 
Microservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka EcosystemMicroservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka Ecosystemconfluent
 
Autoscaling Kubernetes
Autoscaling KubernetesAutoscaling Kubernetes
Autoscaling Kubernetescraigbox
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
 
Docker Tutorial For Beginners | What Is Docker And How It Works? | Docker Tut...
Docker Tutorial For Beginners | What Is Docker And How It Works? | Docker Tut...Docker Tutorial For Beginners | What Is Docker And How It Works? | Docker Tut...
Docker Tutorial For Beginners | What Is Docker And How It Works? | Docker Tut...Simplilearn
 
Docker & Kubernetes 기초 - 최용호
Docker & Kubernetes 기초 - 최용호Docker & Kubernetes 기초 - 최용호
Docker & Kubernetes 기초 - 최용호용호 최
 
Benefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use CasesBenefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use Casesconfluent
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaSlim Baltagi
 
Developing event-driven microservices with event sourcing and CQRS (phillyete)
Developing event-driven microservices with event sourcing and CQRS (phillyete)Developing event-driven microservices with event sourcing and CQRS (phillyete)
Developing event-driven microservices with event sourcing and CQRS (phillyete)Chris Richardson
 
Event-driven microservices
Event-driven microservicesEvent-driven microservices
Event-driven microservicesAndrew Schofield
 
Event Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQEvent Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQAraf Karsh Hamid
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 

Was ist angesagt? (20)

What is Docker | Docker Tutorial for Beginners | Docker Container | DevOps To...
What is Docker | Docker Tutorial for Beginners | Docker Container | DevOps To...What is Docker | Docker Tutorial for Beginners | Docker Container | DevOps To...
What is Docker | Docker Tutorial for Beginners | Docker Container | DevOps To...
 
Kubernetes architecture
Kubernetes architectureKubernetes architecture
Kubernetes architecture
 
ksqlDB로 실시간 데이터 변환 및 스트림 처리
ksqlDB로 실시간 데이터 변환 및 스트림 처리ksqlDB로 실시간 데이터 변환 및 스트림 처리
ksqlDB로 실시간 데이터 변환 및 스트림 처리
 
Integration Microservices
Integration MicroservicesIntegration Microservices
Integration Microservices
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
 
Event Driven Architecture (EDA) Reference Architecture | Anbu Krishnaswamy
Event Driven Architecture (EDA) Reference Architecture | Anbu KrishnaswamyEvent Driven Architecture (EDA) Reference Architecture | Anbu Krishnaswamy
Event Driven Architecture (EDA) Reference Architecture | Anbu Krishnaswamy
 
Twitter Finagle
Twitter FinagleTwitter Finagle
Twitter Finagle
 
Microservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka EcosystemMicroservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka Ecosystem
 
Autoscaling Kubernetes
Autoscaling KubernetesAutoscaling Kubernetes
Autoscaling Kubernetes
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
Docker Tutorial For Beginners | What Is Docker And How It Works? | Docker Tut...
Docker Tutorial For Beginners | What Is Docker And How It Works? | Docker Tut...Docker Tutorial For Beginners | What Is Docker And How It Works? | Docker Tut...
Docker Tutorial For Beginners | What Is Docker And How It Works? | Docker Tut...
 
Docker & Kubernetes 기초 - 최용호
Docker & Kubernetes 기초 - 최용호Docker & Kubernetes 기초 - 최용호
Docker & Kubernetes 기초 - 최용호
 
Benefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use CasesBenefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use Cases
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
Developing event-driven microservices with event sourcing and CQRS (phillyete)
Developing event-driven microservices with event sourcing and CQRS (phillyete)Developing event-driven microservices with event sourcing and CQRS (phillyete)
Developing event-driven microservices with event sourcing and CQRS (phillyete)
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Event-driven microservices
Event-driven microservicesEvent-driven microservices
Event-driven microservices
 
Event Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQEvent Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQ
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 

Andere mochten auch

Troubleshooting Apache CloudStack at #ccceu14 by @jorizvl
Troubleshooting Apache CloudStack at #ccceu14 by @jorizvlTroubleshooting Apache CloudStack at #ccceu14 by @jorizvl
Troubleshooting Apache CloudStack at #ccceu14 by @jorizvlJoris van Lieshout
 
Syed Vali Resume
Syed Vali ResumeSyed Vali Resume
Syed Vali ResumeSyed Vali
 
WebLogic on ODA - Oracle Open World 2013
WebLogic on ODA - Oracle Open World 2013WebLogic on ODA - Oracle Open World 2013
WebLogic on ODA - Oracle Open World 2013Michel Schildmeijer
 
Apache logs monitoring
Apache logs monitoringApache logs monitoring
Apache logs monitoringUmair Amjad
 
WebLogic Filtering ClassLoader and ClassLoader Analysis Tool Demo
WebLogic Filtering ClassLoader and ClassLoader Analysis Tool DemoWebLogic Filtering ClassLoader and ClassLoader Analysis Tool Demo
WebLogic Filtering ClassLoader and ClassLoader Analysis Tool DemoJeffrey West
 
WebLogic in Practice: SSL Configuration
WebLogic in Practice: SSL ConfigurationWebLogic in Practice: SSL Configuration
WebLogic in Practice: SSL ConfigurationSimon Haslam
 
SOA Suite 12c Customer implementation
SOA Suite 12c Customer implementationSOA Suite 12c Customer implementation
SOA Suite 12c Customer implementationMichel Schildmeijer
 
Web Server(Apache),
Web Server(Apache), Web Server(Apache),
Web Server(Apache), webhostingguy
 
Weblogic Cluster advanced performance tuning
Weblogic Cluster advanced performance tuningWeblogic Cluster advanced performance tuning
Weblogic Cluster advanced performance tuningAditya Bhuyan
 
WebLogic Performance on SOLARIS SPARC Servers
WebLogic Performance on SOLARIS SPARC ServersWebLogic Performance on SOLARIS SPARC Servers
WebLogic Performance on SOLARIS SPARC ServersM. Fevzi Korkutata
 
weblogic perfomence tuning
weblogic perfomence tuningweblogic perfomence tuning
weblogic perfomence tuningprathap kumar
 
Deployment Best Practices on WebLogic Server (DOAG IMC Summit 2013)
Deployment Best Practices on WebLogic Server (DOAG IMC Summit 2013)Deployment Best Practices on WebLogic Server (DOAG IMC Summit 2013)
Deployment Best Practices on WebLogic Server (DOAG IMC Summit 2013)Andreas Koop
 
Oracle Fusion Middleware Infrastructure Best Practices
Oracle Fusion Middleware Infrastructure Best PracticesOracle Fusion Middleware Infrastructure Best Practices
Oracle Fusion Middleware Infrastructure Best PracticesRevelation Technologies
 
Performance Tuning Oracle Weblogic Server 12c
Performance Tuning Oracle Weblogic Server 12cPerformance Tuning Oracle Weblogic Server 12c
Performance Tuning Oracle Weblogic Server 12cAjith Narayanan
 
WebLogic Developer Webcast 5: Troubleshooting and Testing with WebLogic, Soap...
WebLogic Developer Webcast 5: Troubleshooting and Testing with WebLogic, Soap...WebLogic Developer Webcast 5: Troubleshooting and Testing with WebLogic, Soap...
WebLogic Developer Webcast 5: Troubleshooting and Testing with WebLogic, Soap...Jeffrey West
 
How To Install and Configure Apache SSL on CentOS 7
How To Install and Configure Apache SSL on CentOS 7How To Install and Configure Apache SSL on CentOS 7
How To Install and Configure Apache SSL on CentOS 7VCP Muthukrishna
 

Andere mochten auch (20)

Troubleshooting Apache CloudStack at #ccceu14 by @jorizvl
Troubleshooting Apache CloudStack at #ccceu14 by @jorizvlTroubleshooting Apache CloudStack at #ccceu14 by @jorizvl
Troubleshooting Apache CloudStack at #ccceu14 by @jorizvl
 
Troubleshooting guide for apache 2.2 service.
Troubleshooting guide for apache 2.2 service.Troubleshooting guide for apache 2.2 service.
Troubleshooting guide for apache 2.2 service.
 
resume
resumeresume
resume
 
Syed Vali Resume
Syed Vali ResumeSyed Vali Resume
Syed Vali Resume
 
E10132
E10132E10132
E10132
 
WebLogic on ODA - Oracle Open World 2013
WebLogic on ODA - Oracle Open World 2013WebLogic on ODA - Oracle Open World 2013
WebLogic on ODA - Oracle Open World 2013
 
ApacheCon-HBase-2016
ApacheCon-HBase-2016ApacheCon-HBase-2016
ApacheCon-HBase-2016
 
Apache logs monitoring
Apache logs monitoringApache logs monitoring
Apache logs monitoring
 
WebLogic Filtering ClassLoader and ClassLoader Analysis Tool Demo
WebLogic Filtering ClassLoader and ClassLoader Analysis Tool DemoWebLogic Filtering ClassLoader and ClassLoader Analysis Tool Demo
WebLogic Filtering ClassLoader and ClassLoader Analysis Tool Demo
 
WebLogic in Practice: SSL Configuration
WebLogic in Practice: SSL ConfigurationWebLogic in Practice: SSL Configuration
WebLogic in Practice: SSL Configuration
 
SOA Suite 12c Customer implementation
SOA Suite 12c Customer implementationSOA Suite 12c Customer implementation
SOA Suite 12c Customer implementation
 
Web Server(Apache),
Web Server(Apache), Web Server(Apache),
Web Server(Apache),
 
Weblogic Cluster advanced performance tuning
Weblogic Cluster advanced performance tuningWeblogic Cluster advanced performance tuning
Weblogic Cluster advanced performance tuning
 
WebLogic Performance on SOLARIS SPARC Servers
WebLogic Performance on SOLARIS SPARC ServersWebLogic Performance on SOLARIS SPARC Servers
WebLogic Performance on SOLARIS SPARC Servers
 
weblogic perfomence tuning
weblogic perfomence tuningweblogic perfomence tuning
weblogic perfomence tuning
 
Deployment Best Practices on WebLogic Server (DOAG IMC Summit 2013)
Deployment Best Practices on WebLogic Server (DOAG IMC Summit 2013)Deployment Best Practices on WebLogic Server (DOAG IMC Summit 2013)
Deployment Best Practices on WebLogic Server (DOAG IMC Summit 2013)
 
Oracle Fusion Middleware Infrastructure Best Practices
Oracle Fusion Middleware Infrastructure Best PracticesOracle Fusion Middleware Infrastructure Best Practices
Oracle Fusion Middleware Infrastructure Best Practices
 
Performance Tuning Oracle Weblogic Server 12c
Performance Tuning Oracle Weblogic Server 12cPerformance Tuning Oracle Weblogic Server 12c
Performance Tuning Oracle Weblogic Server 12c
 
WebLogic Developer Webcast 5: Troubleshooting and Testing with WebLogic, Soap...
WebLogic Developer Webcast 5: Troubleshooting and Testing with WebLogic, Soap...WebLogic Developer Webcast 5: Troubleshooting and Testing with WebLogic, Soap...
WebLogic Developer Webcast 5: Troubleshooting and Testing with WebLogic, Soap...
 
How To Install and Configure Apache SSL on CentOS 7
How To Install and Configure Apache SSL on CentOS 7How To Install and Configure Apache SSL on CentOS 7
How To Install and Configure Apache SSL on CentOS 7
 

Ähnlich wie De Mensajería hacia Logs con Apache Kafka

Rendimiento en magento
Rendimiento en magentoRendimiento en magento
Rendimiento en magentoOnestic
 
Diseño de aplicaciones de bases de datos SQL Azure
Diseño de aplicaciones de bases de datos SQL AzureDiseño de aplicaciones de bases de datos SQL Azure
Diseño de aplicaciones de bases de datos SQL AzureJoseph Lopez
 
Why Apache Flink is better than Spark by Rubén Casado
Why Apache Flink is better than Spark by Rubén CasadoWhy Apache Flink is better than Spark by Rubén Casado
Why Apache Flink is better than Spark by Rubén CasadoBig Data Spain
 
On-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, KibanaOn-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, KibanaStratio
 
Estudio sobre Spark, Storm, Kafka y Hive
Estudio sobre Spark, Storm, Kafka y HiveEstudio sobre Spark, Storm, Kafka y Hive
Estudio sobre Spark, Storm, Kafka y HiveWellness Telecom
 
Oracle Coherence (by Leonardo Torres Altez)
Oracle Coherence (by Leonardo Torres Altez)Oracle Coherence (by Leonardo Torres Altez)
Oracle Coherence (by Leonardo Torres Altez)barcelonajug
 
Civir: Soluciones de Observabilidad con Elastic como servicio
Civir: Soluciones de Observabilidad con Elastic como servicioCivir: Soluciones de Observabilidad con Elastic como servicio
Civir: Soluciones de Observabilidad con Elastic como servicioElasticsearch
 
Codemotion 2013 - Quiero tiempo real y lo quiero para ayer
Codemotion 2013 - Quiero tiempo real y lo quiero para ayerCodemotion 2013 - Quiero tiempo real y lo quiero para ayer
Codemotion 2013 - Quiero tiempo real y lo quiero para ayerIván López Martín
 
Conociendo las tecnologías de TypeSafe (Primer meetup Scala Perú Nov 2015)
Conociendo las tecnologías de TypeSafe (Primer meetup Scala Perú Nov 2015)Conociendo las tecnologías de TypeSafe (Primer meetup Scala Perú Nov 2015)
Conociendo las tecnologías de TypeSafe (Primer meetup Scala Perú Nov 2015)Marco Antonio Ordoñez Valverde
 
Docker y Kubernetes, en busca de la alta disponibilidad
Docker y Kubernetes, en busca de la alta disponibilidadDocker y Kubernetes, en busca de la alta disponibilidad
Docker y Kubernetes, en busca de la alta disponibilidadÓscar De Arriba González
 
Marcos quesada caching_sf2
Marcos quesada caching_sf2Marcos quesada caching_sf2
Marcos quesada caching_sf2symfony_bcn
 
Aceleradores Php Final - Programador PHP
Aceleradores Php Final - Programador PHPAceleradores Php Final - Programador PHP
Aceleradores Php Final - Programador PHPJuan Belón Pérez
 
Aceleradores Php Final - Programador PHP
Aceleradores Php Final - Programador PHPAceleradores Php Final - Programador PHP
Aceleradores Php Final - Programador PHPJuan Belón Pérez
 
Aceleradores PHP Final - Programador PHP
Aceleradores PHP Final - Programador PHPAceleradores PHP Final - Programador PHP
Aceleradores PHP Final - Programador PHPJuan Belón Pérez
 
Analitica y toma de decisiones en tiempo real sobre plataformas big data
Analitica y toma de decisiones en tiempo real sobre plataformas big dataAnalitica y toma de decisiones en tiempo real sobre plataformas big data
Analitica y toma de decisiones en tiempo real sobre plataformas big dataJosé Carlos García Serrano
 

Ähnlich wie De Mensajería hacia Logs con Apache Kafka (20)

Rendimiento en magento
Rendimiento en magentoRendimiento en magento
Rendimiento en magento
 
Performance en Drupal 7
Performance en Drupal 7Performance en Drupal 7
Performance en Drupal 7
 
Diseño de aplicaciones de bases de datos SQL Azure
Diseño de aplicaciones de bases de datos SQL AzureDiseño de aplicaciones de bases de datos SQL Azure
Diseño de aplicaciones de bases de datos SQL Azure
 
Why Apache Flink is better than Spark by Rubén Casado
Why Apache Flink is better than Spark by Rubén CasadoWhy Apache Flink is better than Spark by Rubén Casado
Why Apache Flink is better than Spark by Rubén Casado
 
On-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, KibanaOn-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
 
Estudio sobre Spark, Storm, Kafka y Hive
Estudio sobre Spark, Storm, Kafka y HiveEstudio sobre Spark, Storm, Kafka y Hive
Estudio sobre Spark, Storm, Kafka y Hive
 
Oracle Coherence (by Leonardo Torres Altez)
Oracle Coherence (by Leonardo Torres Altez)Oracle Coherence (by Leonardo Torres Altez)
Oracle Coherence (by Leonardo Torres Altez)
 
Tuning Lamp
Tuning LampTuning Lamp
Tuning Lamp
 
Clústers Alta Disponibilidad
Clústers Alta DisponibilidadClústers Alta Disponibilidad
Clústers Alta Disponibilidad
 
Pg pool cluster postgresql
Pg pool cluster postgresqlPg pool cluster postgresql
Pg pool cluster postgresql
 
Civir: Soluciones de Observabilidad con Elastic como servicio
Civir: Soluciones de Observabilidad con Elastic como servicioCivir: Soluciones de Observabilidad con Elastic como servicio
Civir: Soluciones de Observabilidad con Elastic como servicio
 
Codemotion 2013 - Quiero tiempo real y lo quiero para ayer
Codemotion 2013 - Quiero tiempo real y lo quiero para ayerCodemotion 2013 - Quiero tiempo real y lo quiero para ayer
Codemotion 2013 - Quiero tiempo real y lo quiero para ayer
 
Conociendo las tecnologías de TypeSafe (Primer meetup Scala Perú Nov 2015)
Conociendo las tecnologías de TypeSafe (Primer meetup Scala Perú Nov 2015)Conociendo las tecnologías de TypeSafe (Primer meetup Scala Perú Nov 2015)
Conociendo las tecnologías de TypeSafe (Primer meetup Scala Perú Nov 2015)
 
Docker y Kubernetes, en busca de la alta disponibilidad
Docker y Kubernetes, en busca de la alta disponibilidadDocker y Kubernetes, en busca de la alta disponibilidad
Docker y Kubernetes, en busca de la alta disponibilidad
 
Marcos quesada caching_sf2
Marcos quesada caching_sf2Marcos quesada caching_sf2
Marcos quesada caching_sf2
 
Aceleradores Php Final - Programador PHP
Aceleradores Php Final - Programador PHPAceleradores Php Final - Programador PHP
Aceleradores Php Final - Programador PHP
 
Aceleradores Php Final - Programador PHP
Aceleradores Php Final - Programador PHPAceleradores Php Final - Programador PHP
Aceleradores Php Final - Programador PHP
 
Aceleradores PHP Final - Programador PHP
Aceleradores PHP Final - Programador PHPAceleradores PHP Final - Programador PHP
Aceleradores PHP Final - Programador PHP
 
Analitica y toma de decisiones en tiempo real sobre plataformas big data
Analitica y toma de decisiones en tiempo real sobre plataformas big dataAnalitica y toma de decisiones en tiempo real sobre plataformas big data
Analitica y toma de decisiones en tiempo real sobre plataformas big data
 
Docker y PostgreSQL
Docker y PostgreSQLDocker y PostgreSQL
Docker y PostgreSQL
 

De Mensajería hacia Logs con Apache Kafka

  • 1. messaging → logs @apachekafka Jorge Quilcate Otoya @jeqo89
  • 2. About me Jorge Quilcate Otoya Back-end/Integration Developer at Sysco Middleware @jeqo89 | github.com/jeqo | jeqo.github.io
  • 4. “Tecnología que permite comunicación asíncrona… Channels, también conocidos queues (colas), son la ruta lógica que conecta los programas y transmite los mensajes … El remitente o producer (productor) es el programa que envía mensajes, escribiendo el mensaje en un canal El receptor o consumer (consumidor) es el programa que recibe los mensajes, leyéndolo (y eliminandolo) del canal.” Context: Messaging Enterprise Integration Patterns - Gregor Hohpe and Bobby Woolf http://www.enterpriseintegrationpatterns.com/patterns/messaging/Introduction.html
  • 6. Messaging use-case: Job Queues Fire and Forget Store and Forward (a.k.a. Push Model) Broker a cargo de la entrega confiable de mensajes Event sourcing and stream processing at scale - Martin Kleppmann https://martin.kleppmann.com/2016/01/29/event-sourcing-stream-proce ssing-at-ddd-europe.html Implementations: JMS/AMQP
  • 7. Messaging Challenges Riesgo de mensajes Out-of-order cuando se re-intenta enviar un mensaje fallido Riesgo de inconsistencia en distintos clientes (producers and/or consumers)
  • 8. Context: Logs Records (registros) son adjuntados al final del Log... Cada Record tiene un Key (llave)… Los Records están ordenados… El Orden define la noción de “tiempo”... El Contenido no es importante en este punto, podría ser cualquiera … Registran que ha pasado y cuando. The Log: What every software engineer should know about real-time data's unifying abstraction - Jay Kreps https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
  • 9. Logs everywhere Cómo tu base de datos almacena información en disco de forma consistente? Utiliza un log. Cómo las réplicas de una base de datos sincronizan con otras réplicas? Utiliza un log. Cómo los datos una actividad quedan registrados en un sistema como Apache Kafka? Utiliza un log. Cómo la infraestructura de tu aplicación se mantendrá robusta a escala? Adivina cómo… Using logs to build a solid data infrastructure (or why dual writes are a bad idea) - Martin Kleppmann https://www.confluent.io/blog/using-logs-to-build-a-solid-data-infrastructure-or-why-dual-writes-are-a-bad-idea/ https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/
  • 10. Log-Centric Architecture (a.k.a. Kappa) “Un sistema que asume un log externo está presente permite a los sistemas individuales abandonar una gran cantidad de complejidad y confiar en el log compartido.” The Log: What every software engineer should know about real-time data's unifying abstraction - Jay Kreps https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying http://milinda.pathirage.org/kappa-architecture.com/
  • 11. Logs use-case: Event Log Pull Model Ordered stream de Eventos Consumers a cargo de obtener mensajes (poll) Event sourcing and stream processing at scale - Martin Kleppmann https://martin.kleppmann.com/2016/01/29/event-sourcing-stream-process ing-at-ddd-europe.html Implementations: Apache Kafka, Amazon Kinesis, Apache DistributedLog (incubating)
  • 12. Solving Messaging Challenges with Logs Orden y Reprocesamiento
  • 13. Apache Kafka A Distributed Streaming Platform
  • 14. Apache Kafka: Hechos ➔ Nació de la necesidad de resolver el problema de data pipeline en LinkedIn. ➔ Primeros use-cases: Recolectar métricas de sistemas y monitorear la actividad de usuarios. 2010: Open-sourced 2011: Apache project 2012: Graduated from incubator in October 2014: Confluent Inc. founded Kafka: The Definitive Guide - Neha Narkhede, Gwen Shapira & Todd Palino
  • 15. Apache Kafka: Use-cases ➔ Activity Tracking ➔ Messaging ➔ Metrics/Logging ➔ Commit Log ➔ Stream Processing ➔ Cloud Adoption
  • 16. Apache Kafka Tour (v0.10.2.0) Log Records Kafka Cluster Kafka Producer API Kafka Consumer API Kafka Streams API Kafka Connect API Kafka ++
  • 19. from Topics to Partitions http://kafka.apache.org/documentation
  • 20. from Partitions to Segments https://www.confluent.io/apache-kafka-talk-series/deep-dive-into-apache-kafka/ https://www.confluent.io/apache-kafka-talk-series/
  • 21. from Segments to Records https://www.confluent.io/apache-kafka-talk-series/deep-dive-into-apache-kafka/ https://www.confluent.io/apache-kafka-talk-series/
  • 23. Lab: Log Record Record Structure: Key/Value Serialization/Deserialization Metadata: Offset/Timestamp
  • 24. Schema Evolution: Why Avro? Reader’s schema y writer’s schema no requieren ser la misma Forward/Backward compatibility ➔ Agregar/eliminar campos con valores por defector ➔ Tipo `null` explicito (no optional/required markers) ➔ Posible cambiar data types ➔ Posible cambiar nombres (i.e. alias) Designing Data-Intensive Applications - Martin Kleppmann
  • 26. Servicio de Coordinación centralizado: consensus, group management, presence protocols, atomic broadcast “Fuente de verdad” interno de Kafka Usado para: ➔ Elección de Réplica Líder ➔ Sincronización réplicas (ISR) ➔ Y más Kafka Topology: Why Zookeeper? Distributed Consensus Reloaded: Apache Zookeeper and Replication in Kafka - Flavio Junqueira https://www.confluent.io/blog/distributed-consensus-reloaded-apache-zookeeper-and-replication-in-kafka/
  • 27. Balance Availability and Consistency Use case: Activity Tracking ➔ Retención: 3 días ➔ Más particiones ➔ Menor factor de replicación ➔ Disponibilidad es más importante Use case: Inventory adjustments ➔ Retención: 6 meses ➔ Menos particiones ➔ Mayor factor de replicación ➔ Consistencia es más importante Streaming in Practice: Putting Kafka in Production - Roger Hoover https://www.confluent.io/apache-kafka-talk-series/Streaming-in-Practice-Putting-Kafka-in-Production/
  • 28. Lab: Kafka Cluster Scalability: Cluster and Brokers Topics: Partitions, Replication, ISR Cleaning up: Compaction and Retention
  • 29. Be careful with putting data in Containers https://twitter.com/waxzce/status/829420329177083904
  • 34. Acknowledgment: Latency vs Durability Ack=0 → No network delay → some data loss
  • 35. Acknowledgment: Latency vs Durability Ack=1 → 1 network round-trip → few data loss
  • 36. Acknowledgment: Latency vs Durability Ack=all (-1) → 2 network round-trip → no data loss (in combination with `min.insync.replicas`)
  • 37. Lab: Kafka Producer Batching and Compression Acknowledgements
  • 38. Results kafka_producer_ack_zero_latency_sum/kafka_producer_ack_zero_latency_count ack=0 => 0.05494 s. kafka_producer_ack_one_latency_sum/kafka_producer_ack_one_latency_count ack=1 => 0.06097 s. kafka_producer_ack_all_latency_sum/kafka_producer_ack_all_latency_count ack=* => 0.06375 s. Benchmarking Apache Kafka: 2 million writes per second on 3 cheap machines- Roger Hoover https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
  • 40. ➔ Consumer Groups as Logical Subscribers ➔ Offset by Consumer instance (group member) ➔ Consumer Groups as base of parallelism, with Partitions ➔ Ordering ensured by partition (+ keyed topics is normally enough) Multiple Consumers
  • 41. At-Most-Once Delivery ➔ Scenario El proceso del consumo se ‘cae’ luego de guardar su posición pero antes de procesar el mensaje. ➔ Result El proceso que retoma el procesamiento, empezará de la posición guardada, aún si algunos mensajes previos no han sido procesados.
  • 42. At-Least-Once Delivery ➔ Scenario El proceso de consumo se ‘cae’ luego de procesar los mensajes, pero antes de guardar su posición. ➔ Result Cuando el nuevo proceso retoma el procesamiento, los primeros mensajes que reciba pueden ya haber sido procesados.
  • 43. Exactly-Once Delivery “Exactly-once delivery require de la cooperación con el sistema de almacenamiento de destino …” Próximamente (KIP-98): ● Idempotent Producer Guarantees ● Transactional Guarantees
  • 44. Lab: Kafka Consumer Consumer Groups: Parallelism Consumer Offsets: Control and reprocessing (https://jeqo.github.io/post/2017-01-31-kafka-rewind-consume rs-offset/)
  • 45. Kafka Streams API & Kafka Connector API
  • 46. Kafka Streams API & Kafka Connector API Unifying Stream Processing and Interactive Queries in Apache Kafka - Eno Thereska https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/
  • 48. Kafka Connect HDFS, JDBC, GoldenGate, Elasticsearch, Couchbase, DataStax, Cassandra, Attunity, Azure IoTHub, SAP Hana, VoltDb, FTP, JMS, JMX, MongoDB, Solr, Splunk, RethinkDB, SQS, S3, MQTT, Redis, InfluxDB, HBase, Hazelcast, Twitter, and more...
  • 49. Lab: Kafka Streams & Kafka Connector “Simplified Consumer” Stream/Table Duality Windows
  • 52. Confluent Platform: Apache Kafka Enterprise Edition
  • 53. Lab: Confluent Platform Confluent Platform: ➔ Schema Registry ➔ REST API
  • 55. Lab: Integración con Kafka Integration Platforms: ➔ Camel http://camel.apache.org/kafka.html ➔ Akka Streams http://doc.akka.io/docs/akka-stream-kafka/current/home.html ➔ Oracle Service Bus http://www.ateam-oracle.com/osb-transport-for-apache-kafka-part-1/
  • 56. What’s in discussion and/or coming soon? Exactly-once Delivery / Txn Messaging https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly +Once+Delivery+and+Transactional+Messaging Headers support (additional metadata) https://cwiki.apache.org/confluence/display/KAFKA/KIP-82+-+Add+Rec ord+Headers ZStandard Compression support https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+C odec+for+ZStandard+Compression Reset Offset tool https://cwiki.apache.org/confluence/display/KAFKA/KIP-122%3A+Add+a +tool+to+Reset+Consumer+Group+Offsets https://cwiki.apache.org/confluence/display/KAFKA/ Kafka+Improvement+Proposals
  • 57. How NOT to use Kafka Top 5: ➔ No consideration of data on the inside vs outside ➔ Schema not externally defined ➔ Same config for every clients/topics ➔ 128 partitions as default ➔ Running on 8 overloaded nodes Kafka Summit 2016: 101 ways to config Kafka - Badly https://www.confluent.io/ kafka-summit-2016-101-ways-to-configure-kafka-badly https://cwiki.apache.org/confluence/display/KAFKA/Operations
  • 59. Thanks!!! Twitter: @jeqo89 GitHub: /jeqo Blog: jeqo.github.io Code: github.com/jeqo/talk-kafka-messaging-logs