We examine real-world architectural patterns involving Apache Pulsar to automate the creation of function and pub/sub flows for improved operational scalability and ease of management. We’ll cover CI/CD automation patterns and reveal our innovative approach of leveraging streaming data to create a self-service platform that automates the provisioning of new users. We will also demonstrate the innovative approach of creating function flows through patterns and configuration, enabling non-developer users to create entire function flows simply by changing configurations. These patterns enable us to drive the automation of managing Pulsar to a whole new level. We also cover CI/CD for on-prem, GCP, and AWS users.
Pulsar Architectural Patterns for CI/CD Automation and Self-Service
1. Pulsar Architectural Patterns for CI/CD
Every pattern shown here has been developed and implemented with my
team at Overstock
Email: dbost@overstock.com
Twitter: DevinBost
LinkedIn: https://www.linkedin.com/in/devinbost/
By Devin Bost, Senior Data Engineer at Overstock
Data-Driven CI/CD Automation for Pulsar Function Flows and Pub/Sub
+
Includes on-prem, AWS, and GCP architectures
25. Might need to manually satisfy contract at firstUntil you can get to where the data is originated
26.
27.
28. {
" t ype" : " f unct i on" ,
" ar t i f act Pat hOr Ur l " : " ht t p: / / pat h- t o- ar t i f act / exampl e- i gni t e- f unct i on- 1. 0. 1- 20200125. 003935- 3-
j ar - wi t h- dependenci es. j ar " ,
" t enant " : " exampl eTenant " ,
" namespace" : " exampl eNamespace" ,
" name" : " exampl eI gni t eFunct i on- backf i l l ” ,
" cl assName" : " com. your company. pul sar . f unct i ons. Exampl eI gni t eFunct i on" ,
" user Conf i g" : {
" user name" : " i gni t eUser " ,
" passwor d" : " exampl eHashedPass" ,
" cache_name" : " exampl e- i gni t e- cache- backf i l l ” ,
" host s_wi t h_por t s" : " i gni t eser ver 1. domai n. com: 10800, i gni t eser ver 2. domai n. com: 10800,
i gni t eser ver 3. domai n. com: 10800, i gni t eser ver 4. domai n. com: 10800
} ,
" i nput s" : [
" per si st ent : / / f eeds/ exampl ePr oj ect / dat a- t o- dump- i nt o- i gni t e - backf i l l ”
] ,
" out put " : " per si st ent : / / exampl eTenant / exampl eNamespace/ dat a- enr i ched- f r om- i gni t e - backf i l l ” ,
" l ogTopi c" : " per si st ent : / / publ i c/ def aul t / f unct i on- l og- t opi c - backf i l l ”
}
29. Using the Java Admin API to consume from a Pulsar topic
Pulsar REST
Admin API
Consumer/Producer
{
" t ype" : " f unct i on" ,
" ar t i f act Pat hOr Ur l " : " ht t p: / / pat h- t o- ar t i f act / exampl e- i gni t e-
f unct i on- 1. 0. 1- 20200125. 003935- 3- j ar - wi t h- dependenci es. j ar " ,
" t enant " : " exampl eTenant " ,
" namespace" : " exampl eNamespace" ,
" name" : " exampl eI gni t eFunct i on" ,
" cl assName" :
" com. your company. pul sar . f unct i ons. Exampl eI gni t eFunct i on" ,
" user Conf i g" : {
" user name" : " i gni t eUser " ,
" passwor d" : " exampl eHashedPass" ,
" cache_name" : " exampl e- i gni t e- cache" ,
" host s_wi t h_por t s" : " i gni t eser ver 1. domai n. com: 10800,
i gni t eser ver 2. domai n. com: 10800, i gni t eser ver 3. domai n. com: 10800,
i gni t eser ver 4. domai n. com: 10800
} ,
" i nput s" : [
" per si st ent : / / f eeds/ exampl ePr oj ect / dat a- t o- dump- i nt o- i gni t e"
] ,
" out put " : " per si st ent : / / exampl eTenant / exampl eNamespace/ dat a-
enr i ched- f r om- i gni t e" ,
" l ogTopi c" : " per si st ent : / / publ i c/ def aul t / f unct i on- l og- t opi c"
}
Pulsar Brokers
via Java
Admin API
30. More direct, faster, cleaner, and half the code volume
Pulsar REST
Admin API
Consumer/Producer
{
" t ype" : " f unct i on" ,
" ar t i f act Pat hOr Ur l " : " ht t p: / / pat h- t o- ar t i f act / exampl e- i gni t e-
f unct i on- 1. 0. 1- 20200125. 003935- 3- j ar - wi t h- dependenci es. j ar " ,
" t enant " : " exampl eTenant " ,
" namespace" : " exampl eNamespace" ,
" name" : " exampl eI gni t eFunct i on" ,
" cl assName" :
" com. your company. pul sar . f unct i ons. Exampl eI gni t eFunct i on" ,
" user Conf i g" : {
" user name" : " i gni t eUser " ,
" passwor d" : " exampl eHashedPass" ,
" cache_name" : " exampl e- i gni t e- cache" ,
" host s_wi t h_por t s" : " i gni t eser ver 1. domai n. com: 10800,
i gni t eser ver 2. domai n. com: 10800, i gni t eser ver 3. domai n. com: 10800,
i gni t eser ver 4. domai n. com: 10800
} ,
" i nput s" : [
" per si st ent : / / f eeds/ exampl ePr oj ect / dat a- t o- dump- i nt o- i gni t e"
] ,
" out put " : " per si st ent : / / exampl eTenant / exampl eNamespace/ dat a-
enr i ched- f r om- i gni t e" ,
" l ogTopi c" : " per si st ent : / / publ i c/ def aul t / f unct i on- l og- t opi c"
}
Pulsar Brokers
31. Higher-availability option
Consumer/Producer
Consumer/Producer
Consumer/Producer
Pulsar REST
Admin API
{
" t ype" : " f unct i on" ,
" ar t i f act Pat hOr Ur l " : " ht t p: / / pat h- t o- ar t i f act / exampl e- i gni t e-
f unct i on- 1. 0. 1- 20200125. 003935- 3- j ar - wi t h- dependenci es. j ar " ,
" t enant " : " exampl eTenant " ,
" namespace" : " exampl eNamespace" ,
" name" : " exampl eI gni t eFunct i on" ,
" cl assName" :
" com. your company. pul sar . f unct i ons. Exampl eI gni t eFunct i on" ,
" user Conf i g" : {
" user name" : " i gni t eUser " ,
" passwor d" : " exampl eHashedPass" ,
" cache_name" : " exampl e- i gni t e- cache" ,
" host s_wi t h_por t s" : " i gni t eser ver 1. domai n. com: 10800,
i gni t eser ver 2. domai n. com: 10800, i gni t eser ver 3. domai n. com: 10800,
i gni t eser ver 4. domai n. com: 10800
} ,
" i nput s" : [
" per si st ent : / / f eeds/ exampl ePr oj ect / dat a- t o- dump- i nt o- i gni t e"
] ,
" out put " : " per si st ent : / / exampl eTenant / exampl eNamespace/ dat a-
enr i ched- f r om- i gni t e" ,
" l ogTopi c" : " per si st ent : / / publ i c/ def aul t / f unct i on- l og- t opi c"
}
Pulsar Brokers
via Java Admin API
via Java Admin API
via Java Admin API
32. Fast-deploy
Pulsar REST
Admin API
{
" t ype" : " f unct i on" ,
" ar t i f act Pat hOr Ur l " : " ht t p: / / pat h- t o- ar t i f act / exampl e- i gni t e-
f unct i on- 1. 0. 1- 20200125. 003935- 3- j ar - wi t h- dependenci es. j ar " ,
" t enant " : " exampl eTenant " ,
" namespace" : " exampl eNamespace" ,
" name" : " exampl eI gni t eFunct i on" ,
" cl assName" :
" com. your company. pul sar . f unct i ons. Exampl eI gni t eFunct i on" ,
" user Conf i g" : {
" user name" : " i gni t eUser " ,
" passwor d" : " exampl eHashedPass" ,
" cache_name" : " exampl e- i gni t e- cache" ,
" host s_wi t h_por t s" : " i gni t eser ver 1. domai n. com: 10800,
i gni t eser ver 2. domai n. com: 10800, i gni t eser ver 3. domai n. com: 10800,
i gni t eser ver 4. domai n. com: 10800
} ,
" i nput s" : [
" per si st ent : / / f eeds/ exampl ePr oj ect / dat a- t o- dump- i nt o- i gni t e"
] ,
" out put " : " per si st ent : / / exampl eTenant / exampl eNamespace/ dat a-
enr i ched- f r om- i gni t e" ,
" l ogTopi c" : " per si st ent : / / publ i c/ def aul t / f unct i on- l og- t opi c"
}
Pulsar Brokers
Or, as a Pulsar function
33.
34. The Router Function
Router’s Function Config specifies a key in the message, such as “environment”, along with a tenant and namespace name.
The router then gets the value of this key in the message and creates a destination topic name from the value.
Creates /ops/deployment-automation/[environment]
36. The Router Function
Router’s Function Config specifies a key in the message, such as “environment”, along with a tenant and namespace name.
The router then gets the value of this key in the message and creates a destination topic name from the value.
From the message below, the router creates:
/ops/deployment-automation/test
and routes the message there
37. The Router Function
Router’s Function Config specifies a key in the message, such as “environment”, along with a tenant and namespace name.
The router then gets the value of this key in the message and creates a destination topic name from the value.
Creates /ops/deployment-automation/[environment]
38. The Router Function
Router’s Function Config specifies a key in the message, such as “environment”, along with a tenant and namespace name.
The router then gets the value of this key in the message and creates a destination topic name from the value.
Creates /ops/deployment-automation/[generator-type]
39. {
" envi r onment " : " t est " ,
" conf i gs" : [ {
" t ype" : " f unct i on" ,
" ar t i f act Pat hOr Ur l " : " ht t p: / / r epo- name/ pr oj ect - name/ exampl e- i gni t e- f unct i on- 1. 0. 1- 3- j ar - wi t h- dependenci es. j ar " ,
" t enant " : " exampl eTenant " ,
" namespace" : " exampl eNamespace" ,
" name" : " exampl eI gni t eFunct i on" ,
" cl assName" : " com. your company. pul sar . f unct i ons. Exampl eI gni t eFunct i on" ,
" i nput s" : [
" per si st ent : / / exampl eTenant / exampl eNamespace/ dat a- t o- dump- i nt o- i gni t e"
] ,
" out put " : " per si st ent : / / exampl eTenant / exampl eNamespace/ dat a- enr i ched- f r om- i gni t e" ,
" l ogTopi c" : " per si st ent : / / exampl eTenant / exampl eNamespace/ dat a- enr i ched- f r om- i gni t e- l og”
} ,
{
" t ype" : " f unct i on" ,
" ar t i f act Pat hOr Ur l " : " ht t p: / / r epo- name/ pr oj ect - name/ exampl e- f i l t er - f unct i on- 1. 0. 0- 7- j ar - wi t h- dependenci es. j ar " ,
" t enant " : " exampl eTenant " ,
" namespace" : " exampl eNamespace" ,
" name" : " exampl eFi l t er Funct i on" ,
" cl assName" : " com. your company. pul sar . f unct i ons. Exampl eFi l t er Funct i on" ,
" i nput s" : [
" per si st ent : / / f eeds/ exampl ePr oj ect / r aw- dat a”
] ,
" out put " : " per si st ent : / / exampl eTenant / exampl eNamespace/ dat a- t o- dump- i nt o- i gni t e" ,
" l ogTopi c" : " per si st ent : / / exampl eTenant / exampl eNamespace/ dat a- t o- dump- i nt o- i gni t e- l og”
}
]
}
40. Synchronous Artifact
Download/Upload
(1)
(2)
Push for real-
time updates
Pull to get
all data
UI Tool
Server Sent Events (SSE’s)
Artifact URL +
identifying
metadata
from build tool
Keep track of configs here
Note: In this option, you must use the UI to merge the artifact with the configs.
Ensure brokers can download from where you store the artifact!
41. Server Sent Events (SSE’s)
UI Tool
Synchronous Artifact
Download/Upload
(1)
(2)
Query to get all places
where the artifact has
been used.
Enrich the JSON with
this data.
Update configs
to use new
artifact
(1) Update configs in
CouchDB by writing as
staged
Once staged configs are approved,
push into test or prod environments
Synchronously
stage changes in
DB. (Add to
stage set.)
(2)
Push for real-
time updates
Pull to get all data
Artifact URL +
identifying
metadata
from build tool
Keep track of configs here
Ensure brokers can download from where you store the artifact!
42. Server Sent Events (SSE’s)
UI Tool
Synchronous Artifact
Download/Upload
(1)
(2)
Query to get all places
where the artifact has
been used.
Enrich the JSON with
this data.
Update configs
to use new
artifact
(1) Update configs in
CouchDB by writing as
staged
Synchronously
stage changes in
DB. (Add to
stage set.)
(2)
Push for real-
time updates
Pass command
Synchronously
execute
CouchDB
command
Be careful to avoid creating security
risks with how you implement this
e.g.
“merge-stage-sets”,
“commit-staged-to-test”,
“commit-staged-to-prod”,
“un-stage”,
“rollback”,
“get-all-data”,
etc.
(in a JSON object with any
additional parameters)
(1)
(2) Return result
Artifact URL +
identifying
metadata
from build tool
Keep track of configs here
Ensure brokers can download from where you store the artifact!
43. Build System Storage
Get our
artifact URL
(and any
necessary
metadata, if
applicable)
WebHook Filter/Transform
44. Build System Storage
Build/storage data
Get our
artifact URL
(and any
necessary
metadata, if
applicable)
AWS CodePipeline S3
Github Web Hook (1)
(2)
Passes metadata and reference to S3 artifact
Pulsar Beam
or equivalent HTTP Endpoint for Pulsar
Pulsar Brokers
Granting access to download artifacts in S3
. . .
Write JSON to Pulsar
45. Github Web Hook
(2)
Passes metadata and reference to S3 artifact
Pulsar Beam
or equivalent HTTP Endpoint for Pulsar
Pulsar Brokers
Granting access to download artifacts in S3
. . .
Write JSON to Pulsar
GCP Cloud Build
GCP IAM
(1)
Build System
Storage
Build/storage data
Get our
artifact URL
(and any
necessary
metadata, if
applicable)
46. Filter/Transform
This was best done in Scala
You could do the download asynchronously at a different point in the
flow, but then you will need to ensure it’s fully downloaded before
pushing the deployment from the UI
Synchronous Artifact
Download/Upload
(1)
(2)
Security checking logic, such as package
vulnerability checks
Option 1 - Basic function CI/CD flow
Push for real-
time updates
Pull to get
all data
Deploy to test Deploy to prod
fast-deploy-go
Test Pulsar REST Admin API Prod Pulsar REST Admin API
fast-deploy-go
Router
UI Tool
Server Sent Events (SSE’s)
WebHook
Download artifact to store in CouchDB
Keep track of configs here
47. Deploy to test Deploy to prod
fast-deploy-go
Test Pulsar REST Admin API Prod Pulsar REST Admin API
fast-deploy-go
Router
Server Sent Events (SSE’s)
UI Tool
You could do the download asynchronously at a different point in the
flow, but then you will need to ensure it’s fully downloaded before
pushing the deployment from the UI
Synchronous Artifact
Download/Upload
(1)
(2)
Query to get all places
where the artifact has
been used.
Enrich the JSON with
this data.
Update configs
to use new
artifact
(1) Update configs in
CouchDB by writing as
staged
Once staged configs are approved,
push into test or prod environments
Synchronously
stage changes in
DB. (Add to
stage set.)
(2)
Push for real-
time updates
Pull to get all data
Filter/Transform
This was best done in Scala
WebHook
Download artifact to store in CouchDB
Keep track of configs here
48. Deploy to test Deploy to prod
fast-deploy-go
Test Pulsar REST Admin API Prod Pulsar REST Admin API
fast-deploy-go
Router
Server Sent Events (SSE’s)
UI Tool
You could do the download asynchronously at a different point in the
flow, but then you will need to ensure it’s fully downloaded before
pushing the deployment from the UI
Synchronous Artifact
Download/Upload
(1)
(2)
Query to get all places
where the artifact has
been used.
Enrich the JSON with
this data.
Update configs
to use new
artifact
(1) Update configs in
CouchDB by writing as
staged
Synchronously
stage changes in
DB. (Add to
stage set.)
(2)
Push for real-
time updates
Pass command
Synchronously
execute
CouchDB
command
Be careful to avoid creating security
risks with how you implement this
e.g.
“merge-stage-sets”,
“commit-staged-to-test”,
“commit-staged-to-prod”,
“un-stage”,
“rollback”,
“get-all-data”,
etc.
(in a JSON object with any
additional parameters)
(1)
(2) Return result
Filter/Transform
This was best done in Scala
WebHook
Download artifact to store in CouchDB
Keep track of configs here
50. User
Request new topic for SNOW Request feed
Request datasource
Approval Gate
ACL approver DataEng
Saves back to SNOW table
(workflow is triggered on write)
Generate
function configs
Generate role
configs
Generate token
configs
Generate
source tap
function configs
Generate
validation tap
function configs
Generate
passthrough
function configs
SNOW = Service Now
Fast-Deploy
Report functions
deployed for topic
Role Generator
Report roles
created for topic
Token Generator
Report tokens
created for topic
Flink keyBy request ID
window with 60 second timeout
Save configs of what was created
Add into single
JSON array of
function configs
Router
SNOW Request
Could be modified to use custom UI instead
Populates template for configs for request ID
Be sure to pass the request ID
with each JSON object to
allow all configs to be joined
to the user request after
deployment!
Note: One request ID represents all configs produced by this template
Router removes the routing envelope since it won’t be needed downstream
Note: We created the token generator
as a producer/consumer due to a lack
of available API to generate tokens. So,
we needed to use the Pulsar CLI, which
meant that we needed a disk location to
save the token.
Check if all required objects were created
or if anything is missing.
Report any problems to DataEng. Else,
notify user that their topic is ready and
provide them with the tokens and
connection details.
Notification function that sends Email, UI,
and/or Slack notification.
51. Request new topic for SNOW Request feed
Request datasource
Approval Gate
ACL approver DataEng
Saves back to SNOW table
(workflow is triggered on write)
SNOW = Service Now
SNOW Request
Could be modified to use custom UI instead
User
54. Generate
function configs
Generate role
configs
Generate token
configs
Generate
source tap
function configs
Generate
validation tap
function configs
Generate
passthrough
function configs
Add into single
JSON array of
function configs
Populates template for configs for request ID
Be sure to pass the request ID
with each JSON object to
allow all configs to be joined
to the user request after
deployment!
Note: One request ID represents all configs produced by this template
55. Fast-Deploy
Report functions
deployed for topic
Role Generator
Report roles
created for topic
Token Generator
Report tokens
created for topic
Flink keyBy request ID
window with 60 second timeout
Router
Router removes the routing envelope since it won’t be needed downstream
Note: We created the token generator
as a producer/consumer due to a lack
of available API to generate tokens. So,
we needed to use the Pulsar CLI, which
meant that we needed a disk location to
save the token.
56. Save configs of what was created
Check if all required objects were created
or if anything is missing.
Report any problems to DataEng. Else,
notify user that their topic is ready and
provide them with the tokens and
connection details.
Notification function that sends Email, UI,
and/or Slack notification.
57. User
Request new topic for SNOW Request feed
Request datasource
Approval Gate
ACL approver DataEng
Saves back to SNOW table
(workflow is triggered on write)
Generate
function configs
Generate role
configs
Generate token
configs
Generate
source tap
function configs
Generate
validation tap
function configs
Generate
passthrough
function configs
SNOW = Service Now
Fast-Deploy
Report functions
deployed for topic
Role Generator
Report roles
created for topic
Token Generator
Report tokens
created for topic
Flink keyBy request ID
window with 60 second timeout
Save configs of what was created
Add into single
JSON array of
function configs
Router
SNOW Request
Could be modified to use custom UI instead
Populates template for configs for request ID
Be sure to pass the request ID
with each JSON object to
allow all configs to be joined
to the user request after
deployment!
Note: One request ID represents all configs produced by this template
Router removes the routing envelope since it won’t be needed downstream
Note: We created the token generator
as a producer/consumer due to a lack
of available API to generate tokens. So,
we needed to use the Pulsar CLI, which
meant that we needed a disk location to
save the token.
Check if all required objects were created
or if anything is missing.
Report any problems to DataEng. Else,
notify user that their topic is ready and
provide them with the tokens and
connection details.
Notification function that sends Email, UI,
and/or Slack notification.
58. Why Streaming and Pulsar – Ammunition for the Business Case: https://www.youtube.com/watch?v=qsz-
FruOGoo&feature=youtu.be
Performance Architecture Deep Dive:
https://streamnative.io/whitepaper/taking-a-deep-dive-into-apache-pulsar-architecture-for-performance-tuning/
How Pulsar works: https://jack-vanlightly.com/blog/2018/10/2/understanding-how-apache-pulsar-works
2020 Apache Pulsar User Survey: https://streamnative.io/whitepaper/sn-apache-pulsar-user-survey-report-2020/
Basics of Pulsar architecture: https://www.youtube.com/watch?v=vlU9UegYab8&feature=youtu.be
Common Pulsar Architectural Patterns: https://www.youtube.com/watch?v=pmaCG1SHAW8&feature=youtu.be
(my most popular video yet!)
You can learn more about Pulsar Beam here: https://kafkaesque.io/introducing-pulsar-beam-http-for-apache-pulsar/
59. Why Streaming and Pulsar – Ammunition for the Business Case: https://www.youtube.com/watch?v=qsz-
FruOGoo&feature=youtu.be
2020 Apache Pulsar User Survey: https://streamnative.io/whitepaper/sn-apache-pulsar-user-survey-report-2020/
Basics of Pulsar architecture: https://www.youtube.com/watch?v=vlU9UegYab8&feature=youtu.be
Common Pulsar Architectural Patterns: https://www.youtube.com/watch?v=pmaCG1SHAW8&feature=youtu.be
(my most popular video yet!)
You can learn more about Pulsar Beam here: https://kafkaesque.io/introducing-pulsar-beam-http-for-apache-pulsar/
Performance Architecture Deep Dive:
https://streamnative.io/whitepaper/taking-a-deep-dive-into-apache-pulsar-architecture-for-performance-tuning/
How Pulsar works: https://jack-vanlightly.com/blog/2018/10/2/understanding-how-apache-pulsar-works
60. Why Streaming and Pulsar – Ammunition for the Business Case: https://www.youtube.com/watch?v=qsz-
FruOGoo&feature=youtu.be
2020 Apache Pulsar User Survey: https://streamnative.io/whitepaper/sn-apache-pulsar-user-survey-report-2020/
Basics of Pulsar architecture: https://www.youtube.com/watch?v=vlU9UegYab8&feature=youtu.be
Common Pulsar Architectural Patterns: https://www.youtube.com/watch?v=pmaCG1SHAW8&feature=youtu.be
(my most popular video yet!)
You can learn more about Pulsar Beam here: https://kafkaesque.io/introducing-pulsar-beam-http-for-apache-pulsar/
Performance Architecture Deep Dive:
https://streamnative.io/whitepaper/taking-a-deep-dive-into-apache-pulsar-architecture-for-performance-tuning/
How Pulsar works: https://jack-vanlightly.com/blog/2018/10/2/understanding-how-apache-pulsar-works
62. Pulsar Architectural Patterns for CI/CD
Every pattern shown here has been developed and implemented with my
team at Overstock
Email: dbost@overstock.com
Twitter: DevinBost
LinkedIn: https://www.linkedin.com/in/devinbost/
By Devin Bost, Senior Data Engineer at Overstock
Data-Driven CI/CD Automation for Pulsar Function Flows and Pub/Sub
+
Includes on-prem, AWS, and GCP architectures
Editor's Notes
Hi, I’m Devin Bost, and I’m going to be talking about Pulsar architectural patterns for Continuous Integration and Continuous Deployment for Pulsar Function flows, as well as for Pub/Sub. I also want to give a shout out to my amazing team at Overstock. They have contributed to many of these patterns.
I’ll start by talking about our journey
Then I’ll jump into the architecture
Of the architecture, we will start with pulsar functions.
And, then we will cover pub/sub.
Clunky and slow but gets the job done.
That breaks down when you have lots of spectators who are wanting functions deployed
So, we tried bash.
But, that was slow.
And, dealing with the output was a mess.
So, we wrapped it in Python.
But, that wasn’t much better, so we went back to the drawing board.
Then, the epiphany came.
The idea is to leverage Pulsar itself along with a data contract. This leads us to data-driven deployment and modular design.
Here is a data contract for our Pulsar deployment application. We call it Fast Deploy. In this contract, we’re telling Fast Deploy exactly what we want to deploy. In this case, this message is passed to Fast Deploy to deploy a function that connects to Apache Ignite for enriching incoming data from a cache. If you’ve ever deployed a Pulsar function, most of these parameters should look familiar to you.
The idea is simple. You have reusable functions, and by leveraging data to drive your deployments, you can easily build things simply by constructing the required data and pushing it through the deployment process.
That enables things like drag-and-drop building of function flows and other kinds of patterns. The idea is that as long as you can produce the data you need for your deployment parameters, you can deploy all you want.
This also enables doing things like creating a backfill path for a flow with a single click.
If you want more information on backfills, I have a link to a previous presentation at the end of this presentation that covers that.
Typically, you want to build flows by looking at the end from the beginning. You want to start from the endpoint and determine what’s required to satisfy your final data contract. Then, you can build incremental layers of automation, each of which satisfies your next data contract.
If you build the other way, from the start to the end, though it might seem more intuitive, you can end up spending a lot of time trying to get somewhere only to end up reaching a dead end. This can result in wasted time and require you to go back to the drawing board to find a clearer path. Also, as you have no deliverable until you get to the end, there’s no incremental value you can delver, and if you go long enough without anything to show, you run the risk of the project getting killed before you actually reach completion.
So, you start from the endpoint.
And before you get too far, you need to look at the data source as well to ensure you don’t drift while you’re building your pipes.
At first, you might need to manually satisfy this contract by producing messages to it directly, but that’s okay. You have an immediate deliverable.
You then build pieces incrementally that get you closer and closer to your originating source of data.
Until you finally get to the source of where your data is originated.
So, at a high level, this is what we’re going to cover in this part of the presentation. We build our application or function, filter the metadata on the build artifacts, put it in a local store, push it to our gate keeping application, and then when approved, we push it into Pulsar.
The key thing about using data to drive Pulsar deployments is you can easily vary the properties to change the behavior.
As an example of the value of using this approach of building function flows from data, we can easily append “-backfill” to the functions required, such as this one, to create our backfill path. If you want to learn more about backfills, I talk about those in another video, and I put the link to it at the end of this presentation.
Now, in the history of Fast Deploy, we started with the Pulsar Java Admin API. It got the job done, but it was very clunky and hard to maintain.
By moving to Go and hitting the Pulsar REST API directly, we cut our code volume in half, even after adding functionality to translate the config JSON directly into the HTTP payload, which allowed us to provide a more flexible Pulsar data contract without needing to add more code.
Making this higher availability is not hard. You just need more containers.
And now that Go functions are a little more production ready, we could even move it to a Pulsar function to eliminate the dependency on Docker if we wanted to.
Upstream from Fast Deploy, the next piece in our pipeline, is a router instance. The router is a reusable function that’s useful for many different use cases. Basically, it takes incoming messages, and based on a value in the message, it routes the message to a corresponding topic. In this case, we use it to route our message to different environments. I’ll show you how this works.
As a side note, making this work can involve some networking complexity because you want to make sure that you can deploy to your production environment safely without opening up your entire production network to potential security threats.
Here’s what the router’s Fast Deploy config looks like. (This is the config we pass to Fast Deploy to provision an instance of the router.)
When we provision the router, we give it this userConfig that provides a key. For every message that comes through the router, it looks for a JSON property with the name of this key. In this case, it’s going to look up the environment property. Then, it gets the value of that property from the JSON object and sends the object to that topic.
The router allows us to send the message where it needs to go. Optionally, we can remove the message before sending the message downstream further to the specified recipient.
The router then inspects each message it’s passed, this being one example. In the incoming message, the router looks for the key that we specified in the config, and if it finds that key, it retrieves that value and uses it to route the message.
In Pulsar, if the topic doesn’t exist, it creates it with the specified tenant and namespace. So, in this case, it’s going to route the object to ”ops/deployment-automation/[environment]” where environment is substituted for the value in the incoming object.
If we wanted to provision a router that matched on a different key, such as “generator-type,” we would provision it with this config instead.
In this case, our router would send both of these configs as a single JSON array downstream to fast-deploy for the test environment.
These can be part of a staging set.
Now that we understand how fast-deploy and the router work, let’s look at our options for getting data to Fast-Deploy.
The key is that we need some way of keeping track of our fast-deploy configs and providing a gating mechanism for their deployment. This ideally should be done through a UI, though with some creativity, you might be able to find another way to do so if a UI is out of the scope of what’s feasible for your team.
Aside from have a place to store your configs, if you use CouchDB, you can also store your artifact along with the config. This is nice because you get CouchDB’s versioning feature, and if you don’t have control over artifact lifecycle in the place where your build tool stores your artifacts, you can put them here to ensure that they will always be available when Pulsar brokers need to download the artifact for your functions.
In the case where you have reusable functions and want to update all running instances of that function when you update your code, we can first query to get the list of those configs and then create a staging set. Then, from the UI, we can approve or modify the staging set before deploying it to our target environments.
Another benefit of having the UI is you can control where you want to deploy to. So, we can push the updates to test and then determine if anything broke before pushing them to prod.
Now, if we want to completely decouple our UI from our database, we can do that by putting Pulsar in between them. This can be nice if you want to be able to easily swap out your database technology without breaking existing integrations because you can just stop the Pulsar function, swap out the database, run a batch migration, and then turn the Pulsar functions back on, and nobody will even notice. This decoupling is a best practice in general and should be used whenever possible as long as you can afford the extra latency and dev time.
Now, we can look at ways of getting data into the staging flow that we just covered.
In this case, our build tool is Jenkins. Jenkins is storing the artifacts in Sonotype Nexus. From Nexus, we can leverage a web hook that hits Pulsar beam that produces to a Pulsar topic. Since we get a lot of data from the web hook, we need to filter that down to just the metadata that describes our artifact and provides a URL we can use to download it. You want to be sure that you can obtain the metadata required to uniquely identify you function so it won’t be confused with others. If you can’t get that directly from the build tool, you may need to find another way to get that information into the flow to satisfy this requirement. However, be sure to not hard-code implementation details into your function since that could make reusability more difficult. If you need to provide something specific into your function, pass it through a function config.
Also, if you need to perform security analysis on your artifacts, there are usually ways to do that in this part of the pipeline.
Now, since many of you are on the cloud, here’s what this would look like in AWS.
Here’s what this part of the pipeline might look like if using a GCP-based build pipeline, which is almost identical to the AWS version.
1
2
3
What about Pub/Sub?
I promised that I’d talk about automation for the pub/sub case, so let’s walk through that architecture as well.
Here’s a high-level diagram, but there’s a lot going on here, so we’re going to walk through it in parts.
So, in this case, we’re using Service Now to process requests for new topics. However, you could use any application or UI that enables users to request new topics. The key here is that there’s a security gating mechanism that allows the request to be approved to ensure that authorization is required to access new topics.
After the request is approved, it sends a message to Pulsar through Pulsar Beam. Pulsar Beam allows RESTful web requests to produce Pulsar messages, so we’re leveraging that here. There’s a link to the repo for Pulsar Beam at the end of this presentation.
Regardless, if you aren’t using Service Now, you can use any equivalent application that has a web hook that you can use to write this web request.
If you’re operating on the cloud, you can use a Lambda or Cloud Function to hit Pulsar with the request.
The data contract we send to Pulsar is very simple, but you can easily add more metadata if you want.
The backfill parameter simply specifies if they want a backfill path to be created for them or not, but creating a backfill is our default since it’s a best practice.
We then pass that message through a set of templates to construct our passthrough functions and taps. A link to my video on passthrough functions and taps is at the end of this presentation.
After the message is passed to Pulsar, we use the information in the message to construct a set of messages from templates, each of which is written to the downstream Pulsar topic. We will route these messages to separate topics for constructing the passthrough functions, tap functions, roles, and tokens.
In each of the messages we send downstream, we need to include a request ID that allows us to uniquely join each message back to the originating request. I’ll show you why this ID is important in the next slide.
As the messages we generated in the previous function come into this router, the router sends them to the appropriate places. We could have done this routing in the previous function, but we get better separation of concerns by using a router here.
After the functions, roles, and tokens are created, on success, the responsible producers write to topics that are ingested by a Flink job. The Flink job joins the messages on the request ID and passes the joined result downstream. The joined message set allows us to validate what was created within the timeout period specified by the Flink job’s window size.
After the Flink job, we save the information about what was created and then check if anything was missed. If something was missed, it gets reported to the team responsible for managing the Pulsar infrastructure so they can investigate.
If the job is successful, the resulting information is reported to the owning team. They can then lookup the required information.
In terms of how to handle the token information in a secure manner, there are a number of ways that can be done if you don’t have a UI you can post it to in a secure manner, so that’s out of scope of this presentation.
Here’s the entire flow again. We’ve got the UI for making the request, the approval gate, the function for creating the config messages, the router, the generators, the joiner, the function that saves the progress, the validation function, and the notification function.
Here are some additional resources that you should check out.
My promised video that covers backfills, tap functions, and passthrough functions is this one.
The Apache Beam library that I referenced earlier can be accessed here.
I’ll be uploading this PowerPoint to SlideShare immediately after my presentation, and I’ll put a link to it on my LinkedIn profile and in the recording of this presentation.
Any questions?
If you want more information or have additional questions, please reach out to me!