Organizations have a need to protect Personally Identifiable Information (PII). As Event Streaming Architecture (ESA) becomes ubiquitous in the enterprise, the prevalence of PII within data streams will only increase. Data architects must be cognizant of how their data pipelines can allow for potential leaks. In highly distributed systems, zero-trust networking has become an industry best practice. We can do the same with Kafka by introducing message-level security.
A DevSecOps Engineer with some Kafka experience can leverage Kafka Streams to protect PII by enforcing role-based access control using Open Policy Agent. Rather than implementing a REST API to handle message-level security, Kafka Streams can filter, or even transform outgoing messages in order to redact PII data while leveraging the native capabilities of Kafka.
In our proposed presentation, we will provide a live demonstration that consists of two consumers subscribing to the same Kafka topic, but receiving different messages based on the rules specified in Open Policy Agent. At the conclusion of the presentation, we will provide attendees with a GitHub repository, so that they can enjoy a sandbox environment for hands-on experimentation with message-level security.
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raft LLC
1. Securing the
Message Bus with
Kafka Streams
SBA 8(a) Certified, WOSB, and EDWOSB
https://goraft.tech
Kafka Summit, Americas
September 14 – 15, 2021
Presenters: Paul Otto & Ryan Salcido
2. • Introduction
• Objective
• Why is this needed?
• Caveats
• Architecture Diagram
• Open Policy Agent
• Kafka Streams
• Kafka Consumer Examples
• Demo
• Final Remarks/Questions
Agenda. 2
3. Introduction. 3
• Inspired by the Raft Consensus Model, Raft
strives to deliver solutions that are
dependable, accessible, and viable at scale
within the public sector
• This presentation describes how we
developed an event-streaming service using
Confluent Platform, Open Policy Agent, and
Kafka Streams to provide topic and message
level security
• Researched and prototyped a solution that
simplified the integration process for
applications while leveraging the native Kafka
capabilities to provide a “single-source-of-
truth” data solution
4. Objective.
• Provide message-level security with
Kafka using Open Policy Agent and
Kafka Streams
• Use native Kafka capabilities without
the need for a REST API
• Protect sensitive data (i.e., PII) without
the need for multiple sub-topics
• Allow for different consumers to
subscribe to the same topic, but
receive appropriate messages
according to access-level
4
5. Why is this needed?
• With Event Streaming Architecture becoming more prevalent within
enterprises, the need for securing data streams containing PII (or
classified) data is important.
• Within the public sector, protecting classified data is a must and
becomes more difficult when working with ESA
• A common solution for adding security controls at the topic and
message level within Kafka is to create a REST API to enforce RBAC
• Lose the ability to get the data to the consumer when it is needed
• Another solution is to create sub-topics that consumers can then
subscribe to, but can quickly run into scalability issues
5
6. Caveats.
• Use case being shown here is a way to help
prevent PII leakage when using Kafka
• Additional steps would need to be taken to
prevent a consumer from directly accessing the
Kafka broker rather than Kafka Streams
• Would work in an environment where
the consumers/producers and Kafka platform
can have a trusted, mutual agreement
• Could include periodic audits of Kafka
usage
• In zero-trust environments, a Kafka proxy would
be needed between the Kafka Streams
interface and the consumers
6
8. What is Open Policy Agent?
• Policy engine typically used for cloud
native environments
• Fits our use case on integrating with
Kafka to provide topic-level security
• Utilizes its own declarative policy
language called Rego to define policies
(".rego" file extension)
• Obtained CNCF graduated status in
early 2021
8
9. Example of OPA's Rego Query Language.
• The screenshot on the left shows a data structure for controlling access
to topics
• The screenshot on the right processes the input and ultimately
determines if the user has access to the requested topic
• A boolean value is returned to Kafka based on whether the user has
access or not
9
10. Rego Policy: Defining levels of access for users.
• Additionally, we can restrict users from doing certain operations within
Kafka
• In this example, "bobjones" is allowed to read, write, describe, and create
the "pii" topic
• However, "alicesmith" is only granted permission to read and describe the
"pii" topic
• Any other operations not explicitly
granted will result in an
unauthorized error
10
11. How do we write the allow policies in OPA?
• To allow certain operations, we create an "allow" block with the necessary logic
• The first "allow" block checks the list of clients defined earlier against the
requested operation
• Example:
• principal.name == "bobjones"
• input.resource.name == "pii" (the topic name)
• input.operation.name == "read"
• Can also be "write", "create",
"describe", "delete"
• The "[_]" is a for loop in Rego syntax and
checks to see if the list of allowed operations
for the user matches the requested operation
• If it does, then return "true" to Kafka,
otherwise return "false"
11
12. Leveraging GitOps with OPA.
• Rather than storing RBAC policies directly (the
previous example), we can leverage GitOps to
reduce the issue of change management
• Can integrate policy-as-code to help automate the
process to deployment by using CI/CD pipelines
• Changes to the git repository can automatically be
picked up, tested, validated, and deployed
12
13. Identity and Access Management with OPA.
• In addition to leveraging GitOps, an IAM framework such as Keycloak
can be used to store the RBAC policies for users
• Helps declutter the Rego files
• As a result, once a user authenticates via IAM, the JWT response can
contain the RBAC policies granted to the user
13
14. How does Kafka communicate with OPA?
• For Kafka to be able to communicate with OPA to provide topic-level
security, we need to create a derivative Docker image to inject the OPA
jar into the base Kafka image
• Then, we need to provide the Kafka broker with additional configuration
properties
14
15. What does the derivative Docker image look like?
# Base image: Confluent Kafka v5.5.2
FROM confluentinc/cp-server:5.5.2
WORKDIR /opt
# Copy the OPA jar that handles the role-based access control
COPY ./target/kafka-opa-1.0.0.jar /usr/share/java/kafka
# Change to non-root user
USER 1001
Dockerfile:
15
16. Additional Kafka Broker Properties.
• As mentioned earlier, we need to add additional properties to the Kafka broker,
so that it knows how to communicate with OPA
• If environment variables are needed instead (i.e., Docker-Compose), replace
the "." with "_", capitalize all property names, and prepend "KAFKA"
• Example: authorizer.class.name == KAFKA_AUTHORIZER_CLASS_NAME
# Properties
# Specify full class name
authorizer.class.name=tech.goraft.kafka.opa.OpaAuthorizer
# The url that handles the logic on whether to allow the user to access the topic
opa.authorizer.url=http://opa:8181/v1/data/kafka/authz/allow
# Fail secure
opa.authorizer.allow.on.error=false
opa.authorizer.cache.initial.capacity=100
opa.authorizer.cache.maximum.size=100
opa.authorizer.cache.expire.after.ms=10000
16
17. Kafka Streams.
• A library for building real-time stream-processing applications
• In this case, we leveraged Kafka Streams to provide message-level
security based on the authenticated consumer
• Once a user is granted access to the requested topic in OPA, the Kafka
Streams microservice checks each outgoing message
• Messages are filtered out if the end user does not have access
• In this scenario, we can still leverage the native Kafka capabilities for
processing streams in real-time
17
18. Kafka Streams (cont.).
• If needed, this can be taken a step further by redacting certain fields of
an outgoing message
• Kafka Streams can transform messages, so that certain sensitive data is
not consumed
• For example, if one of the fields is a person's SSN, there may be a
situation where we want to return only the last 4 digits or even remove
the field altogether
• Can use a combination of the "filter" and "map" methods provided in the
KStream Java class
18
19. Example: Consumer subscribing to Kafka topic.
• This example shows the messages "bobjones" receives when
subscribing to the "pii" Kafka topic
• Even though there are many other messages in the Kafka topic for
other users, "bobjones" can only see his
19
20. Example: TopicAuthorizationException Error.
• This examples shows the result of a consumer attempting to subscribe
to a topic they do not have access to
• The user was able to authenticate properly via username/password,
but OPA prohibited the user, "johnhernandez", from reading the "pii"
topic
20
21. Demo.
• Encompasses the concepts we discussed
earlier with Open Policy Agent for topic-
level security and Kafka Streams for
message-level security
• The repository contains source code for
bootstrapping a Confluent Kafka cluster
with Open Policy Agent and a Kafka
Stream running for each of the 3 users:
"bobjones", "alicesmith", "johnhernandez"
• Uses Docker-Compose to start up all the
necessary services
• GitHub repository: https://github.com/raft-
tech/kafka-summit-2021
21
22. GitHub Repository.
We have set-up a sandbox environment using Docker-Compose
to allow for hands-on experimentation with Confluent, Open
Policy Agent, and Kafka Streams.
Please feel free to check it out after this presentation!
GitHub repository: https://github.com/raft-tech/kafka-summit-
2021
22
23. Thank you.
SBA 8(a) Certified, WOSB, and EDWOSB
https://goraft.tech
Paul Otto
Email: potto@goraft.tech
Twitter: @potto007
LinkedIn: https://www.linkedin.com/in/paulhotto
Ryan Salcido
Email: rsalcido@goraft.tech
Twitter: @ryan__salcido
LinkedIn: https://www.linkedin.com/in/ryan-salcido
GitHub repository: https://github.com/raft-tech/kafka-summit-2021
23