SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Building a Streaming Microservice
Architecture: With Spark
Structured Streaming and Friends
Scott Haines
Senior Principal Software Engineer
Introductions
â–Ș I work at Twilio
â–Ș Over 10 years working on Streaming
Architectures
â–Ș Helped Bring Streaming-First Spark Architecture
to Voice & Voice Insights
â–Ș Leads Spark Office Hours @ Twilio
â–Ș Loves Distributed Systems
About Me
Scott Haines: Senior Principal Software Engineer @newfront
Agenda
The Big Picture
What the Architecture looks like
Protocol Buffers
What they are. Why they rule!
GRPC / Protocol Streams
Versioned Data Lineage as a Service
How this fits into Spark
Structured Streaming with Protobuf support
The Big Picture
Streaming Microservice Architecture
GRPC Client
GRPC Server GRPC Server GRPC Server
1
2
3
Kafka Broker
4
Kafka Broker
5
6
Spark Application
7 8
HDFS
S39
HTTP /2
Streaming Microservice Architecture
Kafka Topic Kafka Topic
Spark Application Spark Application Spark Application
Kafka Topic
Data Table Data Table
Spark Application
GRPC Server
Protocol Buffers aka protobuf
Protocol Buffers
â–Ș Strict Types
â–Ș Enforce structure at compile time
â–Ș Similar to StructType in Apache Spark
â–Ș Interoperable with Spark via ExpressionEncoding extension
â–Ș Versioning API / Data Pipeline
â–Ș Compiled protobuf (*.proto) can be released like normal code
â–Ș Interoperable
â–Ș Pick your favorite programming language and compile and release.
â–Ș Supports Java, Scala, C++, Go, Obj-C, Node-JS, Python and more
Why use them?
Protocol Buffers
â–Ș Code Gen
â–Ș Automatically generate Builder classes
â–Ș Being lazy is okay!
â–Ș Optimized
â–Ș Messages are optimized and ship with their own
Serialization/Deserialization mechanics (SerDe)
Why use them?
GRPC and Protocol Streams
gRPC
â–Ș High Performance
â–Ș Compact Binary Exchange Format
â–Ș Make API Calls to the Server like they were Client local
â–Ș Cross Language/Cross Platform
â–Ș Autogenerate API definitions for idiomatic client and server – just
implement the interfaces
â–Ș Bi-Directional Streaming
â–Ș Pluggable support for streaming with HTTP/2 transport
What is it?
GRPC Client
GRPC Server GRPC Server GRPC Server
HTTP /2
GRPC Example: AdTracking
GRPC
â–Ș Define Messages
â–Ș What kind of Data are your sending?
â–Ș Example: Click Tracking / Impression Tracking
â–Ș What is necessary for the public interface?
â–Ș Example: AdImpression and Response
How it works?
GRPC
â–Ș Service Definition
â–Ș Compile your rpc definition to generate Service Interfaces
â–Ș Uses the Same protobuf definition (service.proto) as your
Client/Server request and response objects
â–Ș Can be used to create a binding Service Contract within your
organization or publicly
How it works?
GRPC
â–Ș Implement the Service
â–Ș Compilation of the Service auto-generates your
interfaces.
â–Ș Just implement the service contracts.
How it works?
GRPC
â–Ș Protocol Streams
â–Ș Messages (protobuf) are emitted to Kafka topic(s)
from the Server Layer
â–Ș Protocol Streams are now available from the Kafka
Topics bound to a given Service / Collection of
Messages
â–Ș Sets up Spark for the Hand-Off
How it works?
GRPC
System Architecture
GRPC Client
GRPC Server GRPC Server GRPC Server
Kafka Broker
Kafka Broker
6
HTTP /2
Topic: ads.click.stream
Client: service.adTrack(trackedAd)
Server: ClickTrackService.adTrack(trackedAd)
Structuring Protocol Streams:
with Structured Streaming
and protobuf
Structured Streaming with Protobuf
â–Ș Expression Encoding
â–Ș Natively Interop with Protobuf in Apache Spark.
â–Ș Protobuf to Case Class conversion from
scalapb.
â–Ș Product encoding comes for free via import
sparkSession.implicits._
From Protocol Buffer to StructType through ExpressionEncoders
Structured Streaming with Protobuf
â–Ș Native is Better
â–Ș Strict Native Kafka to DataFrame conversion with no need
for transformation to intermediary types
â–Ș Mutations and Joins can be done across DataFrame or
Datasets API.
â–Ș Create RealTime Data Pipelines, Machine Learning
Pipelines and More.
â–Ș Rest at Night knowing the pipelines are safe!
From Protocol Buffer to StructType through ExpressionEncoders
Structured Streaming with Protobuf
â–Ș Strict Data Writer
â–Ș Compiled / Versioned Protobuf can be used to strictly
enforce the format of your Writers even
â–Ș Use Protobuf to define the StructType that can be used in
your conversions to *Parquet. (* must abide by parquet
nesting rules )
â–Ș Declarative Input / Output means that Streaming
Applications don’t go down due to incompatible Data
Streams
â–Ș Can also be used with Delta so that the version of the
schema lines up with compiled Protobuf.
From Protocol Buffer to StructType through ExpressionEncoders
Structured Streaming with Protobuf
â–Ș Real World Use Case
â–Ș Close of Books Data Lineage Job
â–Ș Uses End to End Protobuf
â–Ș Enables teams to move quick with guarantees regarding
the Data being published and at what Frequency
â–Ș Can be emitted at different speeds to different locations
based on configuration
Example: Streaming Transformation Pipeline
Streaming Microservice Architecture
GRPC Client
GRPC Server GRPC Server GRPC Server
1
2
3
Kafka Broker
4
Kafka Broker
5
6
Spark Application
7 8
HDFS
S39
HTTP /2
Recap
What We Learned
â–Ș Language
Agnostic
Structured Data
â–Ș Compile Time
Guarantees
â–Ș Lightning Fast
Serialization/Dese
rialization
â–Ș Language
Agnostic Binary
Services
â–Ș Low-Latency
â–Ș Compile Time
Guarantees
â–Ș Smart Framework
GRPCProtobuf
â–Ș Highly Available
â–Ș Native Connector
for Spark
â–Ș Topic Based Binary
Protobuf Store
â–Ș Use to Pass
Records to one or
more Downstream
Services
Kafka
â–Ș Handle Data
Reliably
â–Ș Protobuf to
Dataset /
DataFrames is
awesome
â–Ș Parquet / Delta
plays nice as
Columnar Data
Exchange format
Structured Streaming
Thanks @newfrontcreative
@newfront
Feedback
Your feedback is important to us.
Don’t forget to rate and
review the sessions.

Weitere Àhnliche Inhalte

Was ist angesagt?

Apache kafka ëȘšë‹ˆí„°ë§ì„ 위한 Metrics 읎핎 및 씜적화 방안
Apache kafka ëȘšë‹ˆí„°ë§ì„ 위한 Metrics 읎핎 및 씜적화 방안Apache kafka ëȘšë‹ˆí„°ë§ì„ 위한 Metrics 읎핎 및 씜적화 방안
Apache kafka ëȘšë‹ˆí„°ë§ì„ 위한 Metrics 읎핎 및 씜적화 방안
SANG WON PARK
 

Was ist angesagt? (20)

Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Envoy and Kafka
Envoy and KafkaEnvoy and Kafka
Envoy and Kafka
 
Apache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with Python
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Common issues with Apache KafkaÂź Producer
Common issues with Apache KafkaÂź ProducerCommon issues with Apache KafkaÂź Producer
Common issues with Apache KafkaÂź Producer
 
NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?
 
Apache kafka ëȘšë‹ˆí„°ë§ì„ 위한 Metrics 읎핎 및 씜적화 방안
Apache kafka ëȘšë‹ˆí„°ë§ì„ 위한 Metrics 읎핎 및 씜적화 방안Apache kafka ëȘšë‹ˆí„°ë§ì„ 위한 Metrics 읎핎 및 씜적화 방안
Apache kafka ëȘšë‹ˆí„°ë§ì„ 위한 Metrics 읎핎 및 씜적화 방안
 
Diving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka ConnectDiving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka Connect
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)
 

Ähnlich wie Building a Streaming Microservice Architecture: with Apache Spark Structured Streaming and Friends

Ähnlich wie Building a Streaming Microservice Architecture: with Apache Spark Structured Streaming and Friends (20)

Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
The new (is it really ) api stack
The new (is it really ) api stackThe new (is it really ) api stack
The new (is it really ) api stack
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streaming
 
.NET Core Today and Tomorrow
.NET Core Today and Tomorrow.NET Core Today and Tomorrow
.NET Core Today and Tomorrow
 
FIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using KurentoFIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using Kurento
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
KrakenD API Gateway
KrakenD API GatewayKrakenD API Gateway
KrakenD API Gateway
 
Up and Running with gRPC & Cloud Career [GDG-Cloud-Dhaka-IO/2022}
Up and Running with gRPC & Cloud Career [GDG-Cloud-Dhaka-IO/2022}Up and Running with gRPC & Cloud Career [GDG-Cloud-Dhaka-IO/2022}
Up and Running with gRPC & Cloud Career [GDG-Cloud-Dhaka-IO/2022}
 
Seattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APISeattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp API
 
Bye bye WCF, hello gRPC
Bye bye WCF, hello gRPCBye bye WCF, hello gRPC
Bye bye WCF, hello gRPC
 
JUG louvain websockets
JUG louvain websocketsJUG louvain websockets
JUG louvain websockets
 
The burden of a successful feature: Scaling our real time logging platform
The burden of a successful feature: Scaling our real time logging platformThe burden of a successful feature: Scaling our real time logging platform
The burden of a successful feature: Scaling our real time logging platform
 
6. The grid-COMPUTING OGSA and WSRF
6. The grid-COMPUTING OGSA and WSRF6. The grid-COMPUTING OGSA and WSRF
6. The grid-COMPUTING OGSA and WSRF
 
WebRTC presentation
WebRTC presentationWebRTC presentation
WebRTC presentation
 
CCNA v6.0 ITN - Chapter 10
CCNA v6.0 ITN - Chapter 10CCNA v6.0 ITN - Chapter 10
CCNA v6.0 ITN - Chapter 10
 
Yotpo microservices
Yotpo microservicesYotpo microservices
Yotpo microservices
 
GRPC.pptx
GRPC.pptxGRPC.pptx
GRPC.pptx
 
gRPC
gRPCgRPC
gRPC
 
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
 
Kafka on Pulsar:bringing native Kafka protocol support to Pulsar_Sijie&Pierre
Kafka on Pulsar:bringing native Kafka protocol support to Pulsar_Sijie&PierreKafka on Pulsar:bringing native Kafka protocol support to Pulsar_Sijie&Pierre
Kafka on Pulsar:bringing native Kafka protocol support to Pulsar_Sijie&Pierre
 

Mehr von Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

Mehr von Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

KĂŒrzlich hochgeladen

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
SUHANI PANDEY
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 

KĂŒrzlich hochgeladen (20)

Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 

Building a Streaming Microservice Architecture: with Apache Spark Structured Streaming and Friends

  • 1.
  • 2. Building a Streaming Microservice Architecture: With Spark Structured Streaming and Friends Scott Haines Senior Principal Software Engineer
  • 3. Introductions â–Ș I work at Twilio â–Ș Over 10 years working on Streaming Architectures â–Ș Helped Bring Streaming-First Spark Architecture to Voice & Voice Insights â–Ș Leads Spark Office Hours @ Twilio â–Ș Loves Distributed Systems About Me Scott Haines: Senior Principal Software Engineer @newfront
  • 4. Agenda The Big Picture What the Architecture looks like Protocol Buffers What they are. Why they rule! GRPC / Protocol Streams Versioned Data Lineage as a Service How this fits into Spark Structured Streaming with Protobuf support
  • 6. Streaming Microservice Architecture GRPC Client GRPC Server GRPC Server GRPC Server 1 2 3 Kafka Broker 4 Kafka Broker 5 6 Spark Application 7 8 HDFS S39 HTTP /2
  • 7. Streaming Microservice Architecture Kafka Topic Kafka Topic Spark Application Spark Application Spark Application Kafka Topic Data Table Data Table Spark Application GRPC Server
  • 9. Protocol Buffers â–Ș Strict Types â–Ș Enforce structure at compile time â–Ș Similar to StructType in Apache Spark â–Ș Interoperable with Spark via ExpressionEncoding extension â–Ș Versioning API / Data Pipeline â–Ș Compiled protobuf (*.proto) can be released like normal code â–Ș Interoperable â–Ș Pick your favorite programming language and compile and release. â–Ș Supports Java, Scala, C++, Go, Obj-C, Node-JS, Python and more Why use them?
  • 10. Protocol Buffers â–Ș Code Gen â–Ș Automatically generate Builder classes â–Ș Being lazy is okay! â–Ș Optimized â–Ș Messages are optimized and ship with their own Serialization/Deserialization mechanics (SerDe) Why use them?
  • 11. GRPC and Protocol Streams
  • 12. gRPC â–Ș High Performance â–Ș Compact Binary Exchange Format â–Ș Make API Calls to the Server like they were Client local â–Ș Cross Language/Cross Platform â–Ș Autogenerate API definitions for idiomatic client and server – just implement the interfaces â–Ș Bi-Directional Streaming â–Ș Pluggable support for streaming with HTTP/2 transport What is it? GRPC Client GRPC Server GRPC Server GRPC Server HTTP /2
  • 14. GRPC â–Ș Define Messages â–Ș What kind of Data are your sending? â–Ș Example: Click Tracking / Impression Tracking â–Ș What is necessary for the public interface? â–Ș Example: AdImpression and Response How it works?
  • 15. GRPC â–Ș Service Definition â–Ș Compile your rpc definition to generate Service Interfaces â–Ș Uses the Same protobuf definition (service.proto) as your Client/Server request and response objects â–Ș Can be used to create a binding Service Contract within your organization or publicly How it works?
  • 16. GRPC â–Ș Implement the Service â–Ș Compilation of the Service auto-generates your interfaces. â–Ș Just implement the service contracts. How it works?
  • 17. GRPC â–Ș Protocol Streams â–Ș Messages (protobuf) are emitted to Kafka topic(s) from the Server Layer â–Ș Protocol Streams are now available from the Kafka Topics bound to a given Service / Collection of Messages â–Ș Sets up Spark for the Hand-Off How it works?
  • 18. GRPC System Architecture GRPC Client GRPC Server GRPC Server GRPC Server Kafka Broker Kafka Broker 6 HTTP /2 Topic: ads.click.stream Client: service.adTrack(trackedAd) Server: ClickTrackService.adTrack(trackedAd)
  • 19. Structuring Protocol Streams: with Structured Streaming and protobuf
  • 20. Structured Streaming with Protobuf â–Ș Expression Encoding â–Ș Natively Interop with Protobuf in Apache Spark. â–Ș Protobuf to Case Class conversion from scalapb. â–Ș Product encoding comes for free via import sparkSession.implicits._ From Protocol Buffer to StructType through ExpressionEncoders
  • 21. Structured Streaming with Protobuf â–Ș Native is Better â–Ș Strict Native Kafka to DataFrame conversion with no need for transformation to intermediary types â–Ș Mutations and Joins can be done across DataFrame or Datasets API. â–Ș Create RealTime Data Pipelines, Machine Learning Pipelines and More. â–Ș Rest at Night knowing the pipelines are safe! From Protocol Buffer to StructType through ExpressionEncoders
  • 22. Structured Streaming with Protobuf â–Ș Strict Data Writer â–Ș Compiled / Versioned Protobuf can be used to strictly enforce the format of your Writers even â–Ș Use Protobuf to define the StructType that can be used in your conversions to *Parquet. (* must abide by parquet nesting rules ) â–Ș Declarative Input / Output means that Streaming Applications don’t go down due to incompatible Data Streams â–Ș Can also be used with Delta so that the version of the schema lines up with compiled Protobuf. From Protocol Buffer to StructType through ExpressionEncoders
  • 23. Structured Streaming with Protobuf â–Ș Real World Use Case â–Ș Close of Books Data Lineage Job â–Ș Uses End to End Protobuf â–Ș Enables teams to move quick with guarantees regarding the Data being published and at what Frequency â–Ș Can be emitted at different speeds to different locations based on configuration Example: Streaming Transformation Pipeline
  • 24. Streaming Microservice Architecture GRPC Client GRPC Server GRPC Server GRPC Server 1 2 3 Kafka Broker 4 Kafka Broker 5 6 Spark Application 7 8 HDFS S39 HTTP /2
  • 25. Recap
  • 26. What We Learned â–Ș Language Agnostic Structured Data â–Ș Compile Time Guarantees â–Ș Lightning Fast Serialization/Dese rialization â–Ș Language Agnostic Binary Services â–Ș Low-Latency â–Ș Compile Time Guarantees â–Ș Smart Framework GRPCProtobuf â–Ș Highly Available â–Ș Native Connector for Spark â–Ș Topic Based Binary Protobuf Store â–Ș Use to Pass Records to one or more Downstream Services Kafka â–Ș Handle Data Reliably â–Ș Protobuf to Dataset / DataFrames is awesome â–Ș Parquet / Delta plays nice as Columnar Data Exchange format Structured Streaming
  • 28. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.