Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Hands-on Workshop: Apache Pulsar

1.191 Aufrufe

Veröffentlicht am

Matteo Merli and Sijie Guo from Streamlio gave a hands-on workshop on Apache Pulsar. #fast #durable #pubsub #messaging system. A low latency alternative to #kafka.

Veröffentlicht in: Internet
  • Als Erste(r) kommentieren

Hands-on Workshop: Apache Pulsar

  1. 1. Matteo Merli & Sijie Guo fast, durable, flexible pub/sub messaging
  2. 2. Overview
  3. 3. What is Apache Pulsar? 3 Ordering Guaranteed ordering Multi-tenancy A single cluster can support many tenants and use cases High throughput Can reach 1.8 M messages/s in a single partition Durability Data replicated and synced to disk Geo-replication Out of box support for geographically distributed applications Unified messaging model Support both Topic & Queue semantic in a single model Delivery Guarantees At least once, at most once and effectively once Low Latency Low publish latency of 5ms at 99pct Highly scalable Can support millions of topics
  4. 4. Usage of Pulsar • In production for 3+ years at Yahoo • Powering critical products like: • Yahoo Mail, Yahoo Finance, Gemini Ads, Flickr and Sherpa (NoSQL database) • 80+ tenants • 2.3 Million topics • 100 B messages / day • Full-mesh replication in 8 data-centers 4
  5. 5. Why build a new system? • No existing solution to satisfy requirements • Multi tenant — 1M topics — Low latency — Durability — Geo replication • Kafka doesn’t scale well with many topics: • Storage model based on individual directory per topic partition • Enabling durability kills the performance • Many other choking points: getting stats, access to metadata, flow-control • Operations are not very convenient • eg: replacing a server, manual commands to copy the data and involves clients • clients access to ZK clusters not desirable • Ability to manage large backlogs • No scalable support to keep consumer position 5
  6. 6. Architecture view Separate layers between brokers bookies • Broker and bookies can be added independently • Traffic can be shifted very quickly across brokers • New bookies will ramp up on traffic quickly 6 Pulsar Broker 1 Pulsar Broker 1 Pulsar Broker 1 Bookie 1 Bookie 2 Bookie 3 Bookie 4 Bookie 5 Apache BookKeeper Apache Pulsar Producer Consumer
  7. 7. Apache BookKeeper • A replicated log storage • Low-latency durable writes • Simple repeatable read consistency • Highly available • Store many logs per node • I/O Isolation 7
  8. 8. Kafka vs. Pulsar Segment Distribution
  9. 9. BookKeeper - Storage • A single bookie can serve and store thousands of ledgers • Write and read paths are separated: • Avoid read activity to impact write latency • Writes are added to in- memory write-cache and committed to journal • Write cache is flushed in background to separated device • Entries are sorted to allow for mostly sequential reads 9
  10. 10. Concepts
  11. 11. Pulsar Concepts Support both topic and queue semantics in a unified topic concept 11
  12. 12. Pulsar Namespace 12
  13. 13. Pulsar Subscription 13
  14. 14. Partitioned Topic 14
  15. 15. Pulsar client library • Java — C++ — Python — WebSocket APIs • Partitioned topics • Apache Kafka compatibility wrapper API • Transparent batching of messages • Compression • TLS encryption and authentication • End-to-end encryption 15
  16. 16. Demo time 16
  17. 17. Demo time 17
  18. 18. Demo time 18
  19. 19. Lunch Break
  20. 20. Use cases - Message Queue • Decouple online / background • Provide high-availability • Reliable data transport 20 Online events Pulsar topic 1 Worker 1 Worker 2 Worker 3 Pulsar topic 2 Low latency publish Long running task Notification
  21. 21. Use cases - Message Queue • Async processing of time intensive background tasks • Publishes and complete HTTP request • Examples: Image / Video transcoding, bulk operations (large folder deletions) 21
  22. 22. Use cases - Message Queue 22 • Delegate the processing to multiple compute jobs which interact with micro-services • Wait until the response is pushed back to the HTTP server • Examples: breaking monolithic applications into micro-services
  23. 23. Use cases - Notifications • Change data capture • Listeners are frequently different tenants • Quotas needs to ensure producer is not affected 23 Event Pulsar topic Component 1 Component 2 Component 3 Listeners
  24. 24. Use cases - Notifications • Replicating DB transactions across regions • Notification to multiple interested tenants (example: new user account, …) 24
  25. 25. Use cases - Feedback system • Coordinate a large number of machines • Propagate state 25 External inputs Pulsar topic 1 Serving system Serving system Serving system Pulsar topic 2 Controller Updates Feedback
  26. 26. Multi Tenancy
  27. 27. Multi-Tenancy • Authentication / Authorization / Namespaces / Admin APIs • I/O Isolations between writes and reads • Provided by BookKeeper - Ensure readers draining backlog won’t affect publishers • Soft isolation • Storage quotas — flow-control — back-pressure — rate limiting • Hardware isolation • Constrain some tenants on a subset of brokers or bookies 27
  28. 28. Demo time 28
  29. 29. Geo-Replication
  30. 30. Geo-Replication • Scalable asynchronous replication • Integrated in the broker message flow • Simple configuration to add/remove regions 30 Topic (T1) Topic (T1) Topic (T1) Subscrip@on (S1) Subscrip@on (S1) Producer (P1) Consumer (C1) Producer (P3) Producer (P2) Consumer (C2) Data Center A Data Center B Data Center C
  31. 31. Group Exercise 31
  32. 32. Pulsar — Conclusion • A fast durable, distributed pub/sub messaging • Flexible Traditional Messaging: Queuing and Pub/Sub • Focus on message dispatch and consumption • Remove data as soon as possible if they are not needed • It is backed by a scalable log store • durable message store, zero data loss • allow rewinding / reprocessing messages for backfill, bootstrap systems and stream computing • It carries all the fantastic features from the log store. 32
  33. 33. Real-Time Solution 33
  34. 34. Curious to Learn More? • Apache Pulsar : http://pulsar.incubator.apache.org • Apache DistributedLog : http://bookkeeper.apache.org/ distributedlog • Apache BookKeeper : http://bookkeeper.apache.org • Follow Us @apache_pulsar @asfbookkeeper @distributedlog 34
  35. 35. Curious to Learn More? • Messaging, Storage, or Both: https://streaml.io/blog/ messaging-storage-or-both/ • Why BookKeeper: Consistency, Durability and Availability: https://streaml.io/blog/why-apache-bookkeeper/ • Introduction to Apache Pulsar: https://streaml.io/blog/intro- to-pulsar/ • Kafka vs DistributedLog: https://bookkeeper.apache.org/ distributedlog/technical-review/2016/09/19/kafka-vs- distributedlog.html 35
  36. 36. Curious to learn more about Streamlio? • Streamlio: https://streaml.io • Sandbox Preview: https://streaml.io/docs/getting-started • Learn slack channel: https://learn-streamlio.slack.com
 36

×