Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Nächste SlideShare
Io 2018
Io 2018
Wird geladen in …3
×

Hier ansehen

1 von 13 Anzeige

Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022

Herunterladen, um offline zu lesen

Pulsar Summit San Francisco is the event dedicated to Apache Pulsar. This one-day, action-packed event will include 5 keynotes, 12 breakout sessions, and 1 amazing happy hour. Speakers are from top companies, including Google, AWS, Databricks, Onehouse, StarTree, Intel, ScyllaDB, and more! It’s the perfect opportunity to network with Pulsar thought leaders in person.

Join developers, architects, data engineers, DevOps professionals, and anyone who wants to learn about messaging and event streaming for this one-day, in-person event. Pulsar Summit San Francisco brings the Apache Pulsar Community together to share best practices and discuss the future of streaming technologies.

Pulsar Summit San Francisco is the event dedicated to Apache Pulsar. This one-day, action-packed event will include 5 keynotes, 12 breakout sessions, and 1 amazing happy hour. Speakers are from top companies, including Google, AWS, Databricks, Onehouse, StarTree, Intel, ScyllaDB, and more! It’s the perfect opportunity to network with Pulsar thought leaders in person.

Join developers, architects, data engineers, DevOps professionals, and anyone who wants to learn about messaging and event streaming for this one-day, in-person event. Pulsar Summit San Francisco brings the Apache Pulsar Community together to share best practices and discuss the future of streaming technologies.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Ähnlich wie Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022 (20)

Weitere von StreamNative (20)

Anzeige

Aktuellste (20)

Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022

  1. 1. Pulsar Summit San Francisco Hotel Nikko August 18 2022 Keynote Beam + Pulsar: Powerful Stream Processing at Scale Byron Ellis Senior Software Engineering Manager • Google Cloud
  2. 2. Lead of the Portable Languages and Tools team who, among other things, are the team that works on Beam within Google Cloud. A long time user of large scale data processing tools at companies both large and small and across many different job functions he is now working to make it easier for everyone to use these tools. Byron Ellis Senior Software Engineering Manager Google
  3. 3. Beam + Pulsar: Powerful Stream Processing at Scale About Apache Beam
  4. 4. The Beam Model
  5. 5. Example Beam Pipeline Execution Sum Per Key Cloud Dataflow Apache Spark Apache Flink Apache Samza Input.apply (Sum.integersPerKey()) Java input | Sum.PerKey() Python stats.Sum(s, input) Go SELECT key, SUM(value) FROM input GROUP BY key SQL Experimental Runners in Development
  6. 6. Apache Beam is a truly unified batch and streaming data processing platform, with no compromises. Batch is batch, and streaming is streaming. Beam has official Java, Python, and Golang SDKs with an independent Scala SDK available as well. I/Os are included in Beam, meaning that discoverability of new I/Os is super easy - just upgrade to the latest and see what you get. (This is why we’re here today!) Run Beam pipelines on your existing Spark clusters, Google Cloud Dataflow, AWS KDA, Talend, self-hosted Flink, or the newest experimental runners. Use the DirectRunner for local testing and development. Truly Unified SDKs Included I/Os Portability What Makes Apache Beam Different?
  7. 7. Beam + Pulsar: Powerful Stream Processing at Scale Pulsar on Apache Beam
  8. 8. Beam and Pulsar = 🤝 Beam is unique in that I/O connectors are included in the project and are not external. As a result, the Beam community and those of us at Google with Beam responsibilities are always looking for better I/O support. Seeing the traction of Pulsar in the market, Google decided to kickstart a PulsarIO by engaging with a third-party vendor to create a new PulsarIO Beam connector.
  9. 9. First PR from StreamNative on PulsarIO https://github.com/apache /beam/pull/22026 Initial PR for PulsarIO: https://github.com/apache /beam/pull/15572 PR for PulsarIO merged into Apache Beam main: https://github.com/apache /beam/pull/16634 Design doc for PulsarIO Beam connector: https://docs.google.com/d ocument/d/11U81IEeB0rly 63Ly62CTIa45fuvm05TGb TcXV2wRPnM/edit#headin g=h.hxrvoaq1om85 September 2021 March 2022 January 2022 June 2022
  10. 10. PulsarIO is now a part of Beam and you can use it now! Please give it a whirl - try PulsarIO on your own dev machine with the Beam DirectRunner. Or try running Beam pipelines on your existing Spark or Flink clusters, or on the Google Cloud Dataflow managed service.
  11. 11. Beam + Pulsar: Powerful Stream Processing at Scale What’s next for Beam + Pulsar
  12. 12. StreamNative will ensure that the Apache Beam PulsarIO will be a certified StreamNative Cloud Compute Engine Connector!
  13. 13. Sachin Agarwal Thank you! sachinag @ google.com @sachinag /in/sachinag Pulsar Summit San Francisco Hotel Nikko August 18 2022

×