SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Connect S3 with Kafka
leveraging Akka Streams
Seiya Mizuno @Saint1991
Developing data processing platform like below
Who am I?
Introduction to Akka Streams
Components of Akka Streams
Glance at GraphStage
Connect S3 with Kafka using Alpakka
Agenda
HERE!
Introduction to Akka Streams
The toolkit to process data streams on Akka actors
Describe processing pipeline as a graph
Easy to define complex pipeline
What is Akka Streams?
Source
Flow
SinkBroadcast
Flow
Merge
Input
Generating stream elements
Fetching stream elements from outside
Processing
Processing stream elements sent from
upstreams one by one
Output
To a File
To outer resources
Sample code!
implicit val system = ActorSystem()
implicit val dispatcher = system.dispatcher
implicit val mat = ActorMaterializer()
val s3Keys = List(“key1”, “key2”)
val sinkForeach = Sink.foreach(println)
val blueprint: RunnableGraph[Future[Done]] = RunnableGraph.fromGraph(GraphDSL.create(sinkForeach) {
implicit builder: GraphDSL.Builder[Future[Done]] =>
sink: Sink[String, Future[Done]]#Shape =>
import GraphDSL.Implicits._
val src = Source(s3Keys)
val flowA = Flow[String].map(key => s“s3://bucketA/$key”)
val flowB = Flow[String].map(key => s"s3://bucketB/$key")
val broadcast = builder.add(Broadcast[String](2))
val merge = builder.add(Merge[String](2))
src ~> broadcast ~> flowA ~> merge ~> sink
broadcast ~> flowB ~> merge
ClosedShape
})
blueprint.run() onComplete { _ =>
Await.ready(system.terminate(), 10 seconds)
}
// stream elements
// a sink that prints received stream elements
// a source send elements defined above
// a flow maps received element to the URL of Bucket A
// a flow maps received element to the URL of Bucket B
// a Junction that broadcasts received elements to 2 outlets
// a Junction that merge received elements from 2 inlets
// THIS IS GREAT FUNCTIONALITY OF GraphDSL
// easy to describe graph
// Run the graph!!!
// terminate actor system when the graph is completed
Easy to use without knowing the detail of Akka Actor
GOOD!
Akka Streams implicitly do everything
implicit val system = ActorSystem()
implicit val dispatcher = system.dispatcher
implicit val mat = ActorMaterializer()
val s3Keys = List(“key1”, “key2”)
val sinkForeach = Sink.foreach(println)
val blueprint: RunnableGraph[Future[Done]] = RunnableGraph.fromGraph(GraphDSL.create(sinkForeach) {
implicit builder: GraphDSL.Builder[Future[Done]] =>
sink: Sink[String, Future[Done]]#Shape =>
import GraphDSL.Implicits._
val src = Source(s3Keys)
val flowA = Flow[String].map(key => s“s3://bucketA/$key”)
val flowB = Flow[String].map(key => s"s3://bucketB/$key")
val broadcast = builder.add(Broadcast[String](2))
val merge = builder.add(Merge[String](2))
src ~> broadcast ~> flowA ~> merge ~> sink
broadcast ~> flowB ~> merge
ClosedShape
})
blueprint.run() onComplete { _ =>
Await.ready(system.terminate(), 10 seconds)
}
// dispatch threads to actors
// create actors
Materializer creates Akka Actors based on
the blueprint when called RunnableGraph#run
and processing is going!!!
Conclusion
Built a graph with
Source, Flow, Sink etc
Declare materializer with implicit
RunnableGraph ActorMaterializer Actors
Almost Automatically
working with actors!!!
Tips
implicit val system = ActorSystem()
implicit val dispatcher = system.dispatcher
implicit val mat = ActorMaterializer()
val s3Keys = List(“key1”, “key2”)
val sinkForeach = Sink.foreach(println)
val blueprint: RunnableGraph[Future[Done]] = RunnableGraph.fromGraph(GraphDSL.create(sinkForeach) {
implicit builder: GraphDSL.Builder[Future[Done]] =>
sink: Sink[String, Future[Done]]#Shape =>
import GraphDSL.Implicits._
val src = Source(s3Keys)
val flowA = Flow[String].map(key => s“s3://bucketA/$key”)
val flowB = Flow[String].map(key => s"s3://bucketB/$key")
val broadcast = builder.add(Broadcast[String](2))
val merge = builder.add(Merge[String](2))
src ~> broadcast ~> flowA ~> merge ~> sink
broadcast ~> flowB ~> merge
ClosedShape
})
blueprint.run() onComplete { _ =>
Await.ready(system.terminate(), 10 seconds)
}
To return MaterializedValue using GraphDSL, the graph
component that create MaterializedValue to return has to
be passed to GrapDSL#create. So it must be defined
outside GraphDSL builer… orz
Process will not be completed till
terminate ActorSystem
Don’t forget to terminate it!!!
If not define materialized value, blueprint does not
Return completion future…
Glance at GraphStage
Asynchronous message passing
Efficient use of CPU
Back pressure
Remarkable of Akka Streams are…
Source Sink
① Request a next element
② send a element
Upstreams send elements only when
received requests from downstream.
Down streams’ buffer will not overflow
What is GraphStage?
Source Sink
① Request a next element
Every Graph Component is
GraphStage!!
Not found in Akka streams standard library?
But want backpressure???
Implement custom GraphStages!!!
② send a element
SourceStage that emits Fibonacci
class FibonacciSource(to: Int) extends GraphStage[SourceShape[Int]] {
val out: Outlet[Int] = Outlet("Fibonacci.out")
override val shape = SourceShape(out)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic =
new GraphStageLogic(shape) {
var fn_2 = 0
var fn_1 = 0
var n = 0
setHandler(out, new OutHandler {
override def onPull(): Unit = {
val fn =
if (n == 0) 0
else if (n == 1) 1
else fn_2 + fn_1
if (fn >= to) completeStage()
else push(out, fn)
fn_2 = fn_1
fn_1 = fn
n += 1
}
})
}
}
Define a shape of Graph
SourceShape that has a outlet that emit int elements
// new instance is created every time
RunnableGraph#run is called
// terminate this stage with completion
// called when every time received a request
from downstream (backpressure)
So mutable state must be initizalized
within the GraphStageLogic
// send an element to the downstream
Connect S3 with Kafka
Connect S3 with Kafka
Docker Container
Direct connect
Put 2.5TB/day !!! Must be scalable
Our architecture
Direct connect
① Notify
Created Events
② Receive object
keys to ingest
…③ Download ④ Produce
Distribute object keys to containers
(Work as Load Balancer)
At least once
= Sometimes duplicate
Once an event is read, it becomes invisible and
basically any consumers does not receive
the same event until passed visibility timeout
Load Balancing
Elements are not deleted until sending Ack
It is retriable, by not sending Ack when a failure occurs
Amazon SQS
Alpakka (Implementation of GraphStages)
SQS Connector
• Read events from SQS
• Ack
S3 Connector
• Downloading content of a S3 object
Reactive Kafka
Produce content to Kafka
Various connector libraries!!
https://github.com/akka/alpakka/tree/master/sqs
https://github.com/akka/alpakka/tree/master/s3
https://github.com/akka/reactive-kafka
S3 → Kafka
val src: Source[ByteString, NotUsed] =
S3Client().download(bucket, key)
val decompress: Flow[ByteString, ByteString, NotUsed] =
Compression.gunzip()
val lineFraming: Flow[ByteString, ByteString, NotUsed] =
Framing.delimiter(delimiter = ByteString("n"),
maximumFrameLength = 65536, allowTruncation = false)
val sink: Sink[ProducerMessage.Message[Array[Byte], Array[Byte], Any], Future[Done]] =
Producer.plainSink(producerSettings)
val blueprint: RunnableGraph[Future[String]] = src
.via(decompress)
.via(lineFraming)
.via(Flow[ByteString]
.map(_.toArray)
.map { record => ProducerMessage.Message[Array[Byte], Array[Byte], Null](
new ProducerRecord[Array[Byte], Array[Byte]](conf.topic, record), null
)})
.toMat(sink)(Keep.right)
.mapMaterializedValue { done =>
done.map(_ => objectLocation)
}
// alpakka S3Connector
// a built-in flow to decompress gzipped content
// a built-in flow to divide file content into lines
// ReactiveKafka Producer Sink
// to return a future of completed object
key when called blueprint.run()
// convert binary to ProducerRecord of Kafka
Overall
implicit val mat: Materializer = ActorMaterializer(
ActorMaterializerSettings(system).withSupervisionStrategy( ex => ex match {
case ex: Throwable =>
system.log.error(ex, "an error occurs - skip and resume")
Supervision.Resume
})
)
val src = SqsSource(queueUrl)
val sink = SqsAckSink(queueUrl)
val blueprint: RunnableGraph[Future[Done]] =
src
.via(Flow[Message].map(parse)
.mapAsyncUnordered(concurrency) { case (msg, events) =>
Future.sequence(
events.collect {
case event: S3Created =>
S3KafkaGraph(event.location).run() map { completedLocation =>
s3.deleteObject(completedLocation.bucket, completedLocation.key)
}
}
) map (_ => msg -> Ack())
}
.toMat(sink)(Keep.right)
// alpakka SqsSource
// alpakka SqsAckSink
// Parse a SQS message to
keys of S3 object to consume
Run S3 -> Kafka graph
Delete success fully produced file
// Ack to a successfully handled message
Workaround for duplication in SQS, with supervision Resume,
app keeps going with ignoring failed message
(Such messages become visible after
visibility timeout but deleted after retention period)
Efficiency
Handle 3TB/day data with 24cores!!
Direct connect
① Notify
Created Events
② Receive object
locations to ingest
…③ Download ④ Produce
Conclusion
Easily implements stream processing with
high resource efficiency and back pressure
even if you do not familiar with Akka Actor!
Conclusion
Easy to connect outer resource
thanks to Alpakka connector!!!
A sample code of GraphDSL (First example)
FibonacciSource
FlowStage with Buffer (Not in this slide)
gists
https://gist.github.com/Saint1991/d2737721551bc908f48b08e15f0b12d4
https://gist.github.com/Saint1991/2aa5841eea5669e8b86a5eb2df8ecb15
https://gist.github.com/Saint1991/29d097f83942d52b598cda20372ad671

Weitere ähnliche Inhalte

Was ist angesagt?

ReactiveCocoa and Swift, Better Together
ReactiveCocoa and Swift, Better TogetherReactiveCocoa and Swift, Better Together
ReactiveCocoa and Swift, Better TogetherColin Eberhardt
 
Intro to RxJava/RxAndroid - GDG Munich Android
Intro to RxJava/RxAndroid - GDG Munich AndroidIntro to RxJava/RxAndroid - GDG Munich Android
Intro to RxJava/RxAndroid - GDG Munich AndroidEgor Andreevich
 
A dive into akka streams: from the basics to a real-world scenario
A dive into akka streams: from the basics to a real-world scenarioA dive into akka streams: from the basics to a real-world scenario
A dive into akka streams: from the basics to a real-world scenarioGioia Ballin
 
Akka streams - Umeå java usergroup
Akka streams - Umeå java usergroupAkka streams - Umeå java usergroup
Akka streams - Umeå java usergroupJohan Andrén
 
Introduction to RxJS
Introduction to RxJSIntroduction to RxJS
Introduction to RxJSBrainhub
 
JS Fest 2019. Anjana Vakil. Serverless Bebop
JS Fest 2019. Anjana Vakil. Serverless BebopJS Fest 2019. Anjana Vakil. Serverless Bebop
JS Fest 2019. Anjana Vakil. Serverless BebopJSFestUA
 
Reactive streams processing using Akka Streams
Reactive streams processing using Akka StreamsReactive streams processing using Akka Streams
Reactive streams processing using Akka StreamsJohan Andrén
 
Swift Ready for Production?
Swift Ready for Production?Swift Ready for Production?
Swift Ready for Production?Crispy Mountain
 
RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]Igor Lozynskyi
 
GPars howto - when to use which concurrency abstraction
GPars howto - when to use which concurrency abstractionGPars howto - when to use which concurrency abstraction
GPars howto - when to use which concurrency abstractionVaclav Pech
 
Intro to ReactiveCocoa
Intro to ReactiveCocoaIntro to ReactiveCocoa
Intro to ReactiveCocoakleneau
 
Concurrency on the JVM
Concurrency on the JVMConcurrency on the JVM
Concurrency on the JVMVaclav Pech
 
Reactive stream processing using Akka streams
Reactive stream processing using Akka streams Reactive stream processing using Akka streams
Reactive stream processing using Akka streams Johan Andrén
 
Pick up the low-hanging concurrency fruit
Pick up the low-hanging concurrency fruitPick up the low-hanging concurrency fruit
Pick up the low-hanging concurrency fruitVaclav Pech
 
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Till Rohrmann
 
Sharding and Load Balancing in Scala - Twitter's Finagle
Sharding and Load Balancing in Scala - Twitter's FinagleSharding and Load Balancing in Scala - Twitter's Finagle
Sharding and Load Balancing in Scala - Twitter's FinagleGeoff Ballinger
 

Was ist angesagt? (20)

ReactiveCocoa and Swift, Better Together
ReactiveCocoa and Swift, Better TogetherReactiveCocoa and Swift, Better Together
ReactiveCocoa and Swift, Better Together
 
Intro to RxJava/RxAndroid - GDG Munich Android
Intro to RxJava/RxAndroid - GDG Munich AndroidIntro to RxJava/RxAndroid - GDG Munich Android
Intro to RxJava/RxAndroid - GDG Munich Android
 
Rxjs ppt
Rxjs pptRxjs ppt
Rxjs ppt
 
A dive into akka streams: from the basics to a real-world scenario
A dive into akka streams: from the basics to a real-world scenarioA dive into akka streams: from the basics to a real-world scenario
A dive into akka streams: from the basics to a real-world scenario
 
Akka streams - Umeå java usergroup
Akka streams - Umeå java usergroupAkka streams - Umeå java usergroup
Akka streams - Umeå java usergroup
 
Introduction to RxJS
Introduction to RxJSIntroduction to RxJS
Introduction to RxJS
 
JS Fest 2019. Anjana Vakil. Serverless Bebop
JS Fest 2019. Anjana Vakil. Serverless BebopJS Fest 2019. Anjana Vakil. Serverless Bebop
JS Fest 2019. Anjana Vakil. Serverless Bebop
 
Reactive streams processing using Akka Streams
Reactive streams processing using Akka StreamsReactive streams processing using Akka Streams
Reactive streams processing using Akka Streams
 
Swift Ready for Production?
Swift Ready for Production?Swift Ready for Production?
Swift Ready for Production?
 
Reactive Applications in Java
Reactive Applications in JavaReactive Applications in Java
Reactive Applications in Java
 
RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]
 
GPars howto - when to use which concurrency abstraction
GPars howto - when to use which concurrency abstractionGPars howto - when to use which concurrency abstraction
GPars howto - when to use which concurrency abstraction
 
Gpars workshop
Gpars workshopGpars workshop
Gpars workshop
 
Intro to ReactiveCocoa
Intro to ReactiveCocoaIntro to ReactiveCocoa
Intro to ReactiveCocoa
 
Concurrency on the JVM
Concurrency on the JVMConcurrency on the JVM
Concurrency on the JVM
 
Reactive stream processing using Akka streams
Reactive stream processing using Akka streams Reactive stream processing using Akka streams
Reactive stream processing using Akka streams
 
Pick up the low-hanging concurrency fruit
Pick up the low-hanging concurrency fruitPick up the low-hanging concurrency fruit
Pick up the low-hanging concurrency fruit
 
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
 
Sharding and Load Balancing in Scala - Twitter's Finagle
Sharding and Load Balancing in Scala - Twitter's FinagleSharding and Load Balancing in Scala - Twitter's Finagle
Sharding and Load Balancing in Scala - Twitter's Finagle
 
RxJava Applied
RxJava AppliedRxJava Applied
RxJava Applied
 

Ähnlich wie Connect S3 with Kafka using Akka Streams

Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013Samir Bessalah
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with ClojureDmitry Buzdin
 
Akka stream and Akka CQRS
Akka stream and  Akka CQRSAkka stream and  Akka CQRS
Akka stream and Akka CQRSMilan Das
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!Guido Schmutz
 
FS2 for Fun and Profit
FS2 for Fun and ProfitFS2 for Fun and Profit
FS2 for Fun and ProfitAdil Akhter
 
The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31Mahmoud Samir Fayed
 
RESTful API using scalaz (3)
RESTful API using scalaz (3)RESTful API using scalaz (3)
RESTful API using scalaz (3)Yeshwanth Kumar
 
The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196Mahmoud Samir Fayed
 
Reactive cocoa cocoaheadsbe_2014
Reactive cocoa cocoaheadsbe_2014Reactive cocoa cocoaheadsbe_2014
Reactive cocoa cocoaheadsbe_2014Werner Ramaekers
 
CS101- Introduction to Computing- Lecture 35
CS101- Introduction to Computing- Lecture 35CS101- Introduction to Computing- Lecture 35
CS101- Introduction to Computing- Lecture 35Bilal Ahmed
 
Emerging Languages: A Tour of the Horizon
Emerging Languages: A Tour of the HorizonEmerging Languages: A Tour of the Horizon
Emerging Languages: A Tour of the HorizonAlex Payne
 
Lego: A brick system build by scala
Lego: A brick system build by scalaLego: A brick system build by scala
Lego: A brick system build by scalalunfu zhong
 
Introduction to Spark with Scala
Introduction to Spark with ScalaIntroduction to Spark with Scala
Introduction to Spark with ScalaHimanshu Gupta
 
Streaming Infrastructure at Wise with Levani Kokhreidze
Streaming Infrastructure at Wise with Levani KokhreidzeStreaming Infrastructure at Wise with Levani Kokhreidze
Streaming Infrastructure at Wise with Levani KokhreidzeHostedbyConfluent
 
VJUG24 - Reactive Integrations with Akka Streams
VJUG24  - Reactive Integrations with Akka StreamsVJUG24  - Reactive Integrations with Akka Streams
VJUG24 - Reactive Integrations with Akka StreamsJohan Andrén
 
Graph computation
Graph computationGraph computation
Graph computationSigmoid
 

Ähnlich wie Connect S3 with Kafka using Akka Streams (20)

Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
 
Fs2 - Crash Course
Fs2 - Crash CourseFs2 - Crash Course
Fs2 - Crash Course
 
Akka stream and Akka CQRS
Akka stream and  Akka CQRSAkka stream and  Akka CQRS
Akka stream and Akka CQRS
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
FS2 for Fun and Profit
FS2 for Fun and ProfitFS2 for Fun and Profit
FS2 for Fun and Profit
 
The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31
 
RESTful API using scalaz (3)
RESTful API using scalaz (3)RESTful API using scalaz (3)
RESTful API using scalaz (3)
 
Meetup spark structured streaming
Meetup spark structured streamingMeetup spark structured streaming
Meetup spark structured streaming
 
The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196
 
Reactive cocoa cocoaheadsbe_2014
Reactive cocoa cocoaheadsbe_2014Reactive cocoa cocoaheadsbe_2014
Reactive cocoa cocoaheadsbe_2014
 
Coding in Style
Coding in StyleCoding in Style
Coding in Style
 
CS101- Introduction to Computing- Lecture 35
CS101- Introduction to Computing- Lecture 35CS101- Introduction to Computing- Lecture 35
CS101- Introduction to Computing- Lecture 35
 
Emerging Languages: A Tour of the Horizon
Emerging Languages: A Tour of the HorizonEmerging Languages: A Tour of the Horizon
Emerging Languages: A Tour of the Horizon
 
Lego: A brick system build by scala
Lego: A brick system build by scalaLego: A brick system build by scala
Lego: A brick system build by scala
 
Introduction to Spark with Scala
Introduction to Spark with ScalaIntroduction to Spark with Scala
Introduction to Spark with Scala
 
A Shiny Example-- R
A Shiny Example-- RA Shiny Example-- R
A Shiny Example-- R
 
Streaming Infrastructure at Wise with Levani Kokhreidze
Streaming Infrastructure at Wise with Levani KokhreidzeStreaming Infrastructure at Wise with Levani Kokhreidze
Streaming Infrastructure at Wise with Levani Kokhreidze
 
VJUG24 - Reactive Integrations with Akka Streams
VJUG24  - Reactive Integrations with Akka StreamsVJUG24  - Reactive Integrations with Akka Streams
VJUG24 - Reactive Integrations with Akka Streams
 
Graph computation
Graph computationGraph computation
Graph computation
 

Mehr von Seiya Mizuno

Fluentd1.2 & Fluent Bit
Fluentd1.2 & Fluent BitFluentd1.2 & Fluent Bit
Fluentd1.2 & Fluent BitSeiya Mizuno
 
Apache Avro vs Protocol Buffers
Apache Avro vs Protocol BuffersApache Avro vs Protocol Buffers
Apache Avro vs Protocol BuffersSeiya Mizuno
 
Connect S3 with Kafka using Akka Streams
Connect S3 with Kafka using Akka StreamsConnect S3 with Kafka using Akka Streams
Connect S3 with Kafka using Akka StreamsSeiya Mizuno
 
Introduction to Finch
Introduction to FinchIntroduction to Finch
Introduction to FinchSeiya Mizuno
 
The future of Apache Hadoop YARN
The future of Apache Hadoop YARNThe future of Apache Hadoop YARN
The future of Apache Hadoop YARNSeiya Mizuno
 
Yarn application-master
Yarn application-masterYarn application-master
Yarn application-masterSeiya Mizuno
 
Yarn resource-manager
Yarn resource-managerYarn resource-manager
Yarn resource-managerSeiya Mizuno
 

Mehr von Seiya Mizuno (9)

Fluentd1.2 & Fluent Bit
Fluentd1.2 & Fluent BitFluentd1.2 & Fluent Bit
Fluentd1.2 & Fluent Bit
 
SysML meetup
SysML meetupSysML meetup
SysML meetup
 
Apache Avro vs Protocol Buffers
Apache Avro vs Protocol BuffersApache Avro vs Protocol Buffers
Apache Avro vs Protocol Buffers
 
Connect S3 with Kafka using Akka Streams
Connect S3 with Kafka using Akka StreamsConnect S3 with Kafka using Akka Streams
Connect S3 with Kafka using Akka Streams
 
Prometheus
PrometheusPrometheus
Prometheus
 
Introduction to Finch
Introduction to FinchIntroduction to Finch
Introduction to Finch
 
The future of Apache Hadoop YARN
The future of Apache Hadoop YARNThe future of Apache Hadoop YARN
The future of Apache Hadoop YARN
 
Yarn application-master
Yarn application-masterYarn application-master
Yarn application-master
 
Yarn resource-manager
Yarn resource-managerYarn resource-manager
Yarn resource-manager
 

Kürzlich hochgeladen

AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 

Kürzlich hochgeladen (20)

AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 

Connect S3 with Kafka using Akka Streams

  • 1. Connect S3 with Kafka leveraging Akka Streams
  • 2. Seiya Mizuno @Saint1991 Developing data processing platform like below Who am I?
  • 3. Introduction to Akka Streams Components of Akka Streams Glance at GraphStage Connect S3 with Kafka using Alpakka Agenda HERE!
  • 5. The toolkit to process data streams on Akka actors Describe processing pipeline as a graph Easy to define complex pipeline What is Akka Streams? Source Flow SinkBroadcast Flow Merge Input Generating stream elements Fetching stream elements from outside Processing Processing stream elements sent from upstreams one by one Output To a File To outer resources
  • 6. Sample code! implicit val system = ActorSystem() implicit val dispatcher = system.dispatcher implicit val mat = ActorMaterializer() val s3Keys = List(“key1”, “key2”) val sinkForeach = Sink.foreach(println) val blueprint: RunnableGraph[Future[Done]] = RunnableGraph.fromGraph(GraphDSL.create(sinkForeach) { implicit builder: GraphDSL.Builder[Future[Done]] => sink: Sink[String, Future[Done]]#Shape => import GraphDSL.Implicits._ val src = Source(s3Keys) val flowA = Flow[String].map(key => s“s3://bucketA/$key”) val flowB = Flow[String].map(key => s"s3://bucketB/$key") val broadcast = builder.add(Broadcast[String](2)) val merge = builder.add(Merge[String](2)) src ~> broadcast ~> flowA ~> merge ~> sink broadcast ~> flowB ~> merge ClosedShape }) blueprint.run() onComplete { _ => Await.ready(system.terminate(), 10 seconds) } // stream elements // a sink that prints received stream elements // a source send elements defined above // a flow maps received element to the URL of Bucket A // a flow maps received element to the URL of Bucket B // a Junction that broadcasts received elements to 2 outlets // a Junction that merge received elements from 2 inlets // THIS IS GREAT FUNCTIONALITY OF GraphDSL // easy to describe graph // Run the graph!!! // terminate actor system when the graph is completed
  • 7. Easy to use without knowing the detail of Akka Actor GOOD!
  • 8. Akka Streams implicitly do everything implicit val system = ActorSystem() implicit val dispatcher = system.dispatcher implicit val mat = ActorMaterializer() val s3Keys = List(“key1”, “key2”) val sinkForeach = Sink.foreach(println) val blueprint: RunnableGraph[Future[Done]] = RunnableGraph.fromGraph(GraphDSL.create(sinkForeach) { implicit builder: GraphDSL.Builder[Future[Done]] => sink: Sink[String, Future[Done]]#Shape => import GraphDSL.Implicits._ val src = Source(s3Keys) val flowA = Flow[String].map(key => s“s3://bucketA/$key”) val flowB = Flow[String].map(key => s"s3://bucketB/$key") val broadcast = builder.add(Broadcast[String](2)) val merge = builder.add(Merge[String](2)) src ~> broadcast ~> flowA ~> merge ~> sink broadcast ~> flowB ~> merge ClosedShape }) blueprint.run() onComplete { _ => Await.ready(system.terminate(), 10 seconds) } // dispatch threads to actors // create actors Materializer creates Akka Actors based on the blueprint when called RunnableGraph#run and processing is going!!!
  • 9. Conclusion Built a graph with Source, Flow, Sink etc Declare materializer with implicit RunnableGraph ActorMaterializer Actors Almost Automatically working with actors!!!
  • 10. Tips implicit val system = ActorSystem() implicit val dispatcher = system.dispatcher implicit val mat = ActorMaterializer() val s3Keys = List(“key1”, “key2”) val sinkForeach = Sink.foreach(println) val blueprint: RunnableGraph[Future[Done]] = RunnableGraph.fromGraph(GraphDSL.create(sinkForeach) { implicit builder: GraphDSL.Builder[Future[Done]] => sink: Sink[String, Future[Done]]#Shape => import GraphDSL.Implicits._ val src = Source(s3Keys) val flowA = Flow[String].map(key => s“s3://bucketA/$key”) val flowB = Flow[String].map(key => s"s3://bucketB/$key") val broadcast = builder.add(Broadcast[String](2)) val merge = builder.add(Merge[String](2)) src ~> broadcast ~> flowA ~> merge ~> sink broadcast ~> flowB ~> merge ClosedShape }) blueprint.run() onComplete { _ => Await.ready(system.terminate(), 10 seconds) } To return MaterializedValue using GraphDSL, the graph component that create MaterializedValue to return has to be passed to GrapDSL#create. So it must be defined outside GraphDSL builer… orz Process will not be completed till terminate ActorSystem Don’t forget to terminate it!!! If not define materialized value, blueprint does not Return completion future…
  • 12. Asynchronous message passing Efficient use of CPU Back pressure Remarkable of Akka Streams are… Source Sink ① Request a next element ② send a element Upstreams send elements only when received requests from downstream. Down streams’ buffer will not overflow
  • 13. What is GraphStage? Source Sink ① Request a next element Every Graph Component is GraphStage!! Not found in Akka streams standard library? But want backpressure??? Implement custom GraphStages!!! ② send a element
  • 14. SourceStage that emits Fibonacci class FibonacciSource(to: Int) extends GraphStage[SourceShape[Int]] { val out: Outlet[Int] = Outlet("Fibonacci.out") override val shape = SourceShape(out) override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new GraphStageLogic(shape) { var fn_2 = 0 var fn_1 = 0 var n = 0 setHandler(out, new OutHandler { override def onPull(): Unit = { val fn = if (n == 0) 0 else if (n == 1) 1 else fn_2 + fn_1 if (fn >= to) completeStage() else push(out, fn) fn_2 = fn_1 fn_1 = fn n += 1 } }) } } Define a shape of Graph SourceShape that has a outlet that emit int elements // new instance is created every time RunnableGraph#run is called // terminate this stage with completion // called when every time received a request from downstream (backpressure) So mutable state must be initizalized within the GraphStageLogic // send an element to the downstream
  • 16. Connect S3 with Kafka Docker Container Direct connect Put 2.5TB/day !!! Must be scalable
  • 17. Our architecture Direct connect ① Notify Created Events ② Receive object keys to ingest …③ Download ④ Produce Distribute object keys to containers (Work as Load Balancer)
  • 18. At least once = Sometimes duplicate Once an event is read, it becomes invisible and basically any consumers does not receive the same event until passed visibility timeout Load Balancing Elements are not deleted until sending Ack It is retriable, by not sending Ack when a failure occurs Amazon SQS
  • 19. Alpakka (Implementation of GraphStages) SQS Connector • Read events from SQS • Ack S3 Connector • Downloading content of a S3 object Reactive Kafka Produce content to Kafka Various connector libraries!! https://github.com/akka/alpakka/tree/master/sqs https://github.com/akka/alpakka/tree/master/s3 https://github.com/akka/reactive-kafka
  • 20. S3 → Kafka val src: Source[ByteString, NotUsed] = S3Client().download(bucket, key) val decompress: Flow[ByteString, ByteString, NotUsed] = Compression.gunzip() val lineFraming: Flow[ByteString, ByteString, NotUsed] = Framing.delimiter(delimiter = ByteString("n"), maximumFrameLength = 65536, allowTruncation = false) val sink: Sink[ProducerMessage.Message[Array[Byte], Array[Byte], Any], Future[Done]] = Producer.plainSink(producerSettings) val blueprint: RunnableGraph[Future[String]] = src .via(decompress) .via(lineFraming) .via(Flow[ByteString] .map(_.toArray) .map { record => ProducerMessage.Message[Array[Byte], Array[Byte], Null]( new ProducerRecord[Array[Byte], Array[Byte]](conf.topic, record), null )}) .toMat(sink)(Keep.right) .mapMaterializedValue { done => done.map(_ => objectLocation) } // alpakka S3Connector // a built-in flow to decompress gzipped content // a built-in flow to divide file content into lines // ReactiveKafka Producer Sink // to return a future of completed object key when called blueprint.run() // convert binary to ProducerRecord of Kafka
  • 21. Overall implicit val mat: Materializer = ActorMaterializer( ActorMaterializerSettings(system).withSupervisionStrategy( ex => ex match { case ex: Throwable => system.log.error(ex, "an error occurs - skip and resume") Supervision.Resume }) ) val src = SqsSource(queueUrl) val sink = SqsAckSink(queueUrl) val blueprint: RunnableGraph[Future[Done]] = src .via(Flow[Message].map(parse) .mapAsyncUnordered(concurrency) { case (msg, events) => Future.sequence( events.collect { case event: S3Created => S3KafkaGraph(event.location).run() map { completedLocation => s3.deleteObject(completedLocation.bucket, completedLocation.key) } } ) map (_ => msg -> Ack()) } .toMat(sink)(Keep.right) // alpakka SqsSource // alpakka SqsAckSink // Parse a SQS message to keys of S3 object to consume Run S3 -> Kafka graph Delete success fully produced file // Ack to a successfully handled message Workaround for duplication in SQS, with supervision Resume, app keeps going with ignoring failed message (Such messages become visible after visibility timeout but deleted after retention period)
  • 22. Efficiency Handle 3TB/day data with 24cores!! Direct connect ① Notify Created Events ② Receive object locations to ingest …③ Download ④ Produce
  • 23. Conclusion Easily implements stream processing with high resource efficiency and back pressure even if you do not familiar with Akka Actor!
  • 24. Conclusion Easy to connect outer resource thanks to Alpakka connector!!!
  • 25. A sample code of GraphDSL (First example) FibonacciSource FlowStage with Buffer (Not in this slide) gists https://gist.github.com/Saint1991/d2737721551bc908f48b08e15f0b12d4 https://gist.github.com/Saint1991/2aa5841eea5669e8b86a5eb2df8ecb15 https://gist.github.com/Saint1991/29d097f83942d52b598cda20372ad671

Hinweis der Redaktion

  1. ちなみに実行結果は以下のようになります s3://bucketA/key1 s3://bucketB/key1 s3://bucketA/key2 s3://bucketB/key2