SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Downloaden Sie, um offline zu lesen
Streaming Made Easy
David Kjerrumgaard
Tim Spann
● Apache Pulsar Committer | Author of Pulsar In Action
● Former Principal Software Engineer on Splunk’s messaging
team that is responsible for Splunk’s internal
Pulsar-as-a-Service platform.
● Former Director of Solution Architecture at Streamlio.
David
Kjerrumgaard
Sales Engineer
● Author of Pulsar In Action.
● Co-Author of Practical Hive
https://streamnative.io/download/manning-ebook-apache-pulsar-in-action
David Kjerrumgaard
Developer Advocate
Tim Spann
Developer Advocate
Tim Spann
Developer Advocate at StreamNative
● FLiP(N) Stack = Flink, Pulsar and NiFi Stack
● Streaming Systems & Data Architecture Expert
● Experience:
○ 15+ years of experience with streaming technologies including Pulsar, Flink, Spark, NiFi,
Big Data, Cloud, MXNet, IoT, Python and more.
○ Today, he helps to grow the Pulsar community sharing rich technical knowledge and
experience at both global conferences and through individual conversations.
FLiP Stack Weekly
This week in Apache Flink, Apache Pulsar, Apache
NiFi, Apache Spark and open source friends.
https://bit.ly/32dAJft
Apache Pulsar has a vibrant community
560+
Contributors
10,000+
Commits
7,000+
Slack Members
1,000+
Organizations
Using Pulsar
Evolution of Pulsar Growth
Unied
Messaging
Platform
Guaranteed
Message
Delivery
Resiliency Innite
Scalability
The Basics
Cloud native with decoupled storage and
compute layers.
Built-in compatibility with your existing
code and messaging infrastructure.
Geographic redundancy and high
availability included.
Centralized cluster management and
oversight.
Elastic horizontal and vertical scalability.
Seamless and instant partitioning
rebalancing with no downtime.
Flexible subscription model supports a wide
array of use cases.
Compatible with the tools you use to store,
analyze, and process data.
Enterprise Features
● “Bookies”
● Stores messages and cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an
“ensemble” to store a ledger
● “Brokers”
● Handles message routing and
connections
● Stateless, but with caches
● Automatic load-balancing
● Topics are composed of
multiple segments
●
● Stores metadata for both
Pulsar and BookKeeper
● Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
Pulsar Cluster
Metadata
Storage
Pulsar Cluster
Tenants
(Compliance)
Tenants
(Data Services)
Namespace
(Microservices)
Topic-1
(Cust Auth)
Topic-1
(Location Resolution)
Topic-2
(Demographics)
Topic-1
(Budgeted Spend)
Topic-1
(Acct History)
Topic-1
(Risk Detection)
Namespace
(ETL)
Namespace
(Campaigns)
Namespace
(ETL)
Tenants
(Marketing)
Namespace
(Risk Assessment)
Pulsar Cluster
Tenant - Namespaces - Topics
If you want to... Do this... And use these subscription
modes...
achieve fanout messaging
among consumers
specify a unique subscription name
for each consumer
exclusive (or failover)
achieve message queuing
among consumers
share the same subscription name
among multiple consumers
shared
allow for ordered
consumption (streaming)
distribute work among any number
of consumers
exclusive, failover or key
shared
Fanout, Queueing, or Streaming
Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data
can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like
topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer name, the
default name is used.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the
message is its order in that sequence.
Messages
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2
(value=Avro/Protobuf/JSON)
schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
Integrated Schema Registry
Consumer consumer = client.newConsumer()
.topic("my-topic")
.subscriptionName("work-q-1")
.subscriptionType(SubType.Shared)
.subscribe();
Flexible Pub/Sub API for Pulsar - Shared
Consumer consumer = client.newConsumer()
.topic("my-topic")
.subscriptionName("stream-1")
.subscriptionType(SubType.Failover)
.subscribe();
Flexible Pub/Sub API for Pulsar - Failover
byte[] msgIdBytes = // Some byte
array
MessageId id =
MessageId.fromByteArray(msgIdBytes);
Reader<byte[]> reader =
pulsarClient.newReader()
.topic(topic)
.startMessageId(id)
.create();
Create a reader that will read from some
message between earliest and latest.
Reader
Reader Interface
Producer<String> producer = client.newProducer(Schema.STRING)
.topic("hellotopic")
.blockIfQueueFull(true) // enable blocking
// when queue is full
.maxPendingMessages(10) // max queue size
.create();
// During Back Pressure: the sendAsync call blocks
// with no room in queues
producer.newMessage()
.key("mykey")
.value("myvalue")
.sendAsync(); // can be a blocking call
Built-in Back Pressure
● In many use cases, applications want to consume data from a
Pulsar Topic as if it were a database table, where each new
message is an “update” to the table.
● Up until now, applications used Pulsar consumers or readers to
fetch all the updates from a topic and construct a map with the
latest value of each key for received messages.
● The Table Abstraction provides a standard implementation of this
message consumption pattern for any given keyed topic.
Table Abstraction
● When you create a TableView, and
additional compacted topic is
created.
● In a compacted topic, only the most
recent value associated with each
key is retained.
● A background reader consumes
from the compacted topic and
updates the map when new
messages arrive.
How Does it Work?
• Functions - Lightweight Stream
Processing (Java, Python, Go)
• Connectors - Sources & Sinks
(Cassandra, Kafka, …)
• Protocol Handlers - AoP (AMQP), KoP
(Kafka), MoP (MQTT)
• Processing Engines - Flink, Spark,
Presto/Trino via Pulsar SQL
• Data Offloaders - Tiered Storage - (S3)
Easy Ingest, Egress, Processing and Offloading
22
Pulsar to Clickhouse
https://docs.altinity.com/altinitycloud/quickstartguide/connectclient/
pulsar-io-jdbc-clickhouse-2.X.nar 23
CREATE TABLE iotjetsonjson_local
(
uuid String, camera String, ipaddress String, networktime String, top1pct String,
top1 String, cputemp String, gputemp String, gputempf String,
cputempf String, runtime String, host String,filename String, host_name String,
macaddress String, te String, systemtime String,cpu String, diskusage String,
memory String, imageinput String
)
ENGINE = MergeTree()
PARTITION BY uuid ORDER BY (uuid);
CREATE TABLE iotjetsonjson ON CLUSTER '{cluster}' AS iotjetsonjson_local
ENGINE = Distributed('{cluster}', default, iotjetsonjson_local, rand());
Streaming Events into Clickhouse
24
Running a Clickhouse Cluster
25
Pulsar Has A Serverless Framework
● Lightweight computation similar
to AWS Lambda.
● Specifically designed to use
Apache Pulsar as a message
bus.
● Function runtime can be
located within Pulsar Broker.
● Java Functions
A serverless event
streaming framework
Pulsar Functions
● Consume messages from one or
more Pulsar topics.
● Apply user-supplied processing
logic to each message.
● Publish the results of the
computation to another topic.
● Support multiple programming
languages (Java, Python, Go)
● Can leverage 3rd-party libraries
to support the execution of ML
models on the edge.
Pulsar Functions
Simple Native Java Function
Simple Function with Pulsar SDK
from pulsar import Function
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import json
class Chat(Function):
def __init__(self):
pass
def process(self, input, context):
elds = json.loads(input)
sid = SentimentIntensityAnalyzer()
ss = sid.polarity_scores(elds["comment"])
row = { }
row['id'] = str(msg_id)
if ss['compound'] < 0.00:
row['sentiment'] = 'Negative'
else:
row['sentiment'] = 'Positive'
row['comment'] = str(elds["comment"])
json_string = json.dumps(row)
return json_string
Entire Function
Pulsar Python NLP Function
Pulsar Functions, along with Pulsar
IO/Connectors, provide a powerful API for
ingesting, transforming, and outputting
data.
Function Mesh, another StreamNative
project, makes it easier for developers to
create entire applications built from
sources, functions, and sinks all through a
declarative API.
Function Mesh
32
Pulsar’s Other Real-Time Teammates
Building Real-Time Requires a Team
https://www.influxdata.com/integration/mqtt-monitoring/
https://www.influxdata.com/integration/mqtt-monitoring/
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Supports push and pull
models
• Hundreds of processors
• Visual command and
control
• Over a 300 components
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
• Version Control
Apache NiFi Basics
Apache NiFi - Apache Pulsar Connector
● Unified computing engine
● Batch processing is a special case of stream processing
● Stateful processing
● Massive Scalability
● Flink SQL for queries, inserts against Pulsar Topics
● Streaming Analytics
● Continuous SQL
● Continuous ETL
● Complex Event Processing
● Standard SQL Powered by Apache Calcite
Apache Flink
Apache Flink SQL
select aqi, parameterName, dateObserved, hourObserved, latitude,
longitude, localTimeZone, stateCode, reportingArea from
airquality;
select max(aqi) as MaxAQI, parameterName, reportingArea from
airquality group by parameterName, reportingArea;
select max(aqi) as MaxAQI, min(aqi) as MinAQI, avg(aqi) as
AvgAQI, count(aqi) as RowCount, parameterName, reportingArea
from airquality group by parameterName, reportingArea;
38
Pulsar Can Speak Multiple Protocols
Kafka on Pulsar (KoP)
MQTT on Pulsar (MoP)
AMQP on Pulsar (AoP)
42
Multiple Protocol
Demo
Source Code: git@github.com:david-streamlio/multi-protocol-pulsar.git
Customers on StreamNative Cloud
Founded by the original creators of Apache Pulsar.
StreamNative has more experience designing,
deploying, and running large-scale Apache Pulsar
instances than any team in the world.
StreamNative employs more than 50% of the
active core committers to Apache Pulsar.
Scan the QR code to
learn more about
Apache Pulsar and
StreamNative.
Scan the QR code
to build your own
apps today.

Weitere ähnliche Inhalte

Ähnlich wie OSA Con 2022: Streaming Data Made Easy

Unified Messaging and Data Streaming 101
Unified Messaging and Data Streaming 101Unified Messaging and Data Streaming 101
Unified Messaging and Data Streaming 101
Timothy Spann
 
(Current22) Let's Monitor The Conditions at the Conference
(Current22) Let's Monitor The Conditions at the Conference(Current22) Let's Monitor The Conditions at the Conference
(Current22) Let's Monitor The Conditions at the Conference
Timothy Spann
 
[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake
Timothy Spann
 
JConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaJConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with Java
Timothy Spann
 
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
Timothy Spann
 
Fast Streaming into Clickhouse with Apache Pulsar
Fast Streaming into Clickhouse with Apache PulsarFast Streaming into Clickhouse with Apache Pulsar
Fast Streaming into Clickhouse with Apache Pulsar
Timothy Spann
 
Princeton Dec 2022 Meetup_ NiFi + Flink + Pulsar
Princeton Dec 2022 Meetup_ NiFi + Flink + PulsarPrinceton Dec 2022 Meetup_ NiFi + Flink + Pulsar
Princeton Dec 2022 Meetup_ NiFi + Flink + Pulsar
Timothy Spann
 
DevNexus: Apache Pulsar Development 101 with Java
DevNexus:  Apache Pulsar Development 101 with JavaDevNexus:  Apache Pulsar Development 101 with Java
DevNexus: Apache Pulsar Development 101 with Java
Timothy Spann
 

Ähnlich wie OSA Con 2022: Streaming Data Made Easy (20)

Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
 
bigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Apps
 
Unified Messaging and Data Streaming 101
Unified Messaging and Data Streaming 101Unified Messaging and Data Streaming 101
Unified Messaging and Data Streaming 101
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
 
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
 
(Current22) Let's Monitor The Conditions at the Conference
(Current22) Let's Monitor The Conditions at the Conference(Current22) Let's Monitor The Conditions at the Conference
(Current22) Let's Monitor The Conditions at the Conference
 
CODEONTHEBEACH_Streaming Applications with Apache Pulsar
CODEONTHEBEACH_Streaming Applications with Apache PulsarCODEONTHEBEACH_Streaming Applications with Apache Pulsar
CODEONTHEBEACH_Streaming Applications with Apache Pulsar
 
[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake
 
JConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaJConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with Java
 
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
 
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar
OSS EU:  Deep Dive into Building Streaming Applications with Apache PulsarOSS EU:  Deep Dive into Building Streaming Applications with Apache Pulsar
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar
 
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8[Conf42-KubeNative] Building Real-time Pulsar Apps on K8
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8
 
Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar
 
[DoKDayNA2022] - Architecting Your First Event Driven Serverless Streaming Ap...
[DoKDayNA2022] - Architecting Your First Event Driven Serverless Streaming Ap...[DoKDayNA2022] - Architecting Your First Event Driven Serverless Streaming Ap...
[DoKDayNA2022] - Architecting Your First Event Driven Serverless Streaming Ap...
 
Fast Streaming into Clickhouse with Apache Pulsar
Fast Streaming into Clickhouse with Apache PulsarFast Streaming into Clickhouse with Apache Pulsar
Fast Streaming into Clickhouse with Apache Pulsar
 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
 
Deploying Machine Learning Models with Pulsar Functions - Pulsar Summit Asia...
Deploying Machine Learning Models with Pulsar Functions  - Pulsar Summit Asia...Deploying Machine Learning Models with Pulsar Functions  - Pulsar Summit Asia...
Deploying Machine Learning Models with Pulsar Functions - Pulsar Summit Asia...
 
Building Modern Data Streaming Apps with Python
Building Modern Data Streaming Apps with PythonBuilding Modern Data Streaming Apps with Python
Building Modern Data Streaming Apps with Python
 
Princeton Dec 2022 Meetup_ NiFi + Flink + Pulsar
Princeton Dec 2022 Meetup_ NiFi + Flink + PulsarPrinceton Dec 2022 Meetup_ NiFi + Flink + Pulsar
Princeton Dec 2022 Meetup_ NiFi + Flink + Pulsar
 
DevNexus: Apache Pulsar Development 101 with Java
DevNexus:  Apache Pulsar Development 101 with JavaDevNexus:  Apache Pulsar Development 101 with Java
DevNexus: Apache Pulsar Development 101 with Java
 

Mehr von Timothy Spann

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
Timothy Spann
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
Timothy Spann
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
Timothy Spann
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
Timothy Spann
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Timothy Spann
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
Timothy Spann
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann
 

Mehr von Timothy Spann (20)

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsConf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python Processors
 
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
 
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
 

KĂźrzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

KĂźrzlich hochgeladen (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

OSA Con 2022: Streaming Data Made Easy

  • 1. Streaming Made Easy David Kjerrumgaard Tim Spann
  • 2. ● Apache Pulsar Committer | Author of Pulsar In Action ● Former Principal Software Engineer on Splunk’s messaging team that is responsible for Splunk’s internal Pulsar-as-a-Service platform. ● Former Director of Solution Architecture at Streamlio. David Kjerrumgaard Sales Engineer
  • 3. ● Author of Pulsar In Action. ● Co-Author of Practical Hive https://streamnative.io/download/manning-ebook-apache-pulsar-in-action David Kjerrumgaard Developer Advocate
  • 4. Tim Spann Developer Advocate Tim Spann Developer Advocate at StreamNative ● FLiP(N) Stack = Flink, Pulsar and NiFi Stack ● Streaming Systems & Data Architecture Expert ● Experience: ○ 15+ years of experience with streaming technologies including Pulsar, Flink, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Python and more. ○ Today, he helps to grow the Pulsar community sharing rich technical knowledge and experience at both global conferences and through individual conversations.
  • 5. FLiP Stack Weekly This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark and open source friends. https://bit.ly/32dAJft
  • 6. Apache Pulsar has a vibrant community 560+ Contributors 10,000+ Commits 7,000+ Slack Members 1,000+ Organizations Using Pulsar
  • 9. Cloud native with decoupled storage and compute layers. Built-in compatibility with your existing code and messaging infrastructure. Geographic redundancy and high availability included. Centralized cluster management and oversight. Elastic horizontal and vertical scalability. Seamless and instant partitioning rebalancing with no downtime. Flexible subscription model supports a wide array of use cases. Compatible with the tools you use to store, analyze, and process data. Enterprise Features
  • 10. ● “Bookies” ● Stores messages and cursors ● Messages are grouped in segments/ledgers ● A group of bookies form an “ensemble” to store a ledger ● “Brokers” ● Handles message routing and connections ● Stateless, but with caches ● Automatic load-balancing ● Topics are composed of multiple segments ● ● Stores metadata for both Pulsar and BookKeeper ● Service discovery Store Messages Metadata & Service Discovery Metadata & Service Discovery Pulsar Cluster Metadata Storage Pulsar Cluster
  • 11. Tenants (Compliance) Tenants (Data Services) Namespace (Microservices) Topic-1 (Cust Auth) Topic-1 (Location Resolution) Topic-2 (Demographics) Topic-1 (Budgeted Spend) Topic-1 (Acct History) Topic-1 (Risk Detection) Namespace (ETL) Namespace (Campaigns) Namespace (ETL) Tenants (Marketing) Namespace (Risk Assessment) Pulsar Cluster Tenant - Namespaces - Topics
  • 12. If you want to... Do this... And use these subscription modes... achieve fanout messaging among consumers specify a unique subscription name for each consumer exclusive (or failover) achieve message queuing among consumers share the same subscription name among multiple consumers shared allow for ordered consumption (streaming) distribute work among any number of consumers exclusive, failover or key shared Fanout, Queueing, or Streaming
  • 13. Component Description Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data schemas. Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like topic compaction. Properties An optional key/value map of user-defined properties. Producer name The name of the producer who produces the message. If you do not specify a producer name, the default name is used. Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the message is its order in that sequence. Messages
  • 14. Schema Registry schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers Integrated Schema Registry
  • 15. Consumer consumer = client.newConsumer() .topic("my-topic") .subscriptionName("work-q-1") .subscriptionType(SubType.Shared) .subscribe(); Flexible Pub/Sub API for Pulsar - Shared
  • 16. Consumer consumer = client.newConsumer() .topic("my-topic") .subscriptionName("stream-1") .subscriptionType(SubType.Failover) .subscribe(); Flexible Pub/Sub API for Pulsar - Failover
  • 17. byte[] msgIdBytes = // Some byte array MessageId id = MessageId.fromByteArray(msgIdBytes); Reader<byte[]> reader = pulsarClient.newReader() .topic(topic) .startMessageId(id) .create(); Create a reader that will read from some message between earliest and latest. Reader Reader Interface
  • 18. Producer<String> producer = client.newProducer(Schema.STRING) .topic("hellotopic") .blockIfQueueFull(true) // enable blocking // when queue is full .maxPendingMessages(10) // max queue size .create(); // During Back Pressure: the sendAsync call blocks // with no room in queues producer.newMessage() .key("mykey") .value("myvalue") .sendAsync(); // can be a blocking call Built-in Back Pressure
  • 19. ● In many use cases, applications want to consume data from a Pulsar Topic as if it were a database table, where each new message is an “update” to the table. ● Up until now, applications used Pulsar consumers or readers to fetch all the updates from a topic and construct a map with the latest value of each key for received messages. ● The Table Abstraction provides a standard implementation of this message consumption pattern for any given keyed topic. Table Abstraction
  • 20. ● When you create a TableView, and additional compacted topic is created. ● In a compacted topic, only the most recent value associated with each key is retained. ● A background reader consumes from the compacted topic and updates the map when new messages arrive. How Does it Work?
  • 21. • Functions - Lightweight Stream Processing (Java, Python, Go) • Connectors - Sources & Sinks (Cassandra, Kafka, …) • Protocol Handlers - AoP (AMQP), KoP (Kafka), MoP (MQTT) • Processing Engines - Flink, Spark, Presto/Trino via Pulsar SQL • Data Offloaders - Tiered Storage - (S3) Easy Ingest, Egress, Processing and Offloading
  • 23. https://docs.altinity.com/altinitycloud/quickstartguide/connectclient/ pulsar-io-jdbc-clickhouse-2.X.nar 23 CREATE TABLE iotjetsonjson_local ( uuid String, camera String, ipaddress String, networktime String, top1pct String, top1 String, cputemp String, gputemp String, gputempf String, cputempf String, runtime String, host String,filename String, host_name String, macaddress String, te String, systemtime String,cpu String, diskusage String, memory String, imageinput String ) ENGINE = MergeTree() PARTITION BY uuid ORDER BY (uuid); CREATE TABLE iotjetsonjson ON CLUSTER '{cluster}' AS iotjetsonjson_local ENGINE = Distributed('{cluster}', default, iotjetsonjson_local, rand()); Streaming Events into Clickhouse
  • 25. 25 Pulsar Has A Serverless Framework
  • 26. ● Lightweight computation similar to AWS Lambda. ● Specically designed to use Apache Pulsar as a message bus. ● Function runtime can be located within Pulsar Broker. ● Java Functions A serverless event streaming framework Pulsar Functions
  • 27. ● Consume messages from one or more Pulsar topics. ● Apply user-supplied processing logic to each message. ● Publish the results of the computation to another topic. ● Support multiple programming languages (Java, Python, Go) ● Can leverage 3rd-party libraries to support the execution of ML models on the edge. Pulsar Functions
  • 28. Simple Native Java Function
  • 29. Simple Function with Pulsar SDK
  • 30. from pulsar import Function from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer import json class Chat(Function): def __init__(self): pass def process(self, input, context): elds = json.loads(input) sid = SentimentIntensityAnalyzer() ss = sid.polarity_scores(elds["comment"]) row = { } row['id'] = str(msg_id) if ss['compound'] < 0.00: row['sentiment'] = 'Negative' else: row['sentiment'] = 'Positive' row['comment'] = str(elds["comment"]) json_string = json.dumps(row) return json_string Entire Function Pulsar Python NLP Function
  • 31. Pulsar Functions, along with Pulsar IO/Connectors, provide a powerful API for ingesting, transforming, and outputting data. Function Mesh, another StreamNative project, makes it easier for developers to create entire applications built from sources, functions, and sinks all through a declarative API. Function Mesh
  • 34. https://www.influxdata.com/integration/mqtt-monitoring/ https://www.influxdata.com/integration/mqtt-monitoring/ • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Over a 300 components • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering • Version Control Apache NiFi Basics
  • 35. Apache NiFi - Apache Pulsar Connector
  • 36. ● Unied computing engine ● Batch processing is a special case of stream processing ● Stateful processing ● Massive Scalability ● Flink SQL for queries, inserts against Pulsar Topics ● Streaming Analytics ● Continuous SQL ● Continuous ETL ● Complex Event Processing ● Standard SQL Powered by Apache Calcite Apache Flink
  • 37. Apache Flink SQL select aqi, parameterName, dateObserved, hourObserved, latitude, longitude, localTimeZone, stateCode, reportingArea from airquality; select max(aqi) as MaxAQI, parameterName, reportingArea from airquality group by parameterName, reportingArea; select max(aqi) as MaxAQI, min(aqi) as MinAQI, avg(aqi) as AvgAQI, count(aqi) as RowCount, parameterName, reportingArea from airquality group by parameterName, reportingArea;
  • 38. 38 Pulsar Can Speak Multiple Protocols
  • 40. MQTT on Pulsar (MoP)
  • 41. AMQP on Pulsar (AoP)
  • 42. 42 Multiple Protocol Demo Source Code: git@github.com:david-streamlio/multi-protocol-pulsar.git
  • 44. Founded by the original creators of Apache Pulsar. StreamNative has more experience designing, deploying, and running large-scale Apache Pulsar instances than any team in the world. StreamNative employs more than 50% of the active core committers to Apache Pulsar.
  • 45. Scan the QR code to learn more about Apache Pulsar and StreamNative.
  • 46. Scan the QR code to build your own apps today.