The document discusses Akka Streams, which is a library for stream processing and reactive programming in Scala and Java. It provides linear flows for processing streams of data as well as more complex graph-based flows that allow fan-out and fan-in of streams. Akka Streams integrates with the Akka actor model and allows streams to be processed asynchronously using backpressure to prevent buffer overflows.
10. Streams
“You cannot enter the same river twice”
~ Heraclitus
http://en.wikiquote.org/wiki/Heraclitus
11. Streams
Real Time Stream Processing
!
When you attach “late” to a Publisher,
you may miss initial elements – it’s a river of data.
http://en.wikiquote.org/wiki/Heraclitus
20. Reactive Streams - Inter-op
We want to make different implementations
co-operate with each other.
http://reactive-streams.org
21. Reactive Streams - Inter-op
The different implementations “talk to each other”
using the Reactive Streams protocol.
http://reactive-streams.org
22. Reactive Streams - Inter-op
The Reactive Streams SPI is NOT meant to be user-api.
You should use one of the implementing libraries.
http://reactive-streams.org
43. Back-pressure? RS: Dynamic Push/Pull
Just push – not safe when Slow Subscriber
Just pull – too slow when Fast Subscriber
44. Back-pressure? RS: Dynamic Push/Pull
Just push – not safe when Slow Subscriber
Just pull – too slow when Fast Subscriber
!
Solution:
Dynamic adjustment (Reactive Streams)
45. Back-pressure? RS: Dynamic Push/Pull
Slow Subscriber sees it’s buffer can take 3 elements.
Publisher will never blow up it’s buffer.
46. Back-pressure? RS: Dynamic Push/Pull
Fast Publisher will send at-most 3 elements.
This is pull-based-backpressure.
47. Back-pressure? RS: Dynamic Push/Pull
Fast Subscriber can issue more Request(n),
before more data arrives!
48. Back-pressure? RS: Dynamic Push/Pull
Fast Subscriber can issue more Request(n),
before more data arrives!
50. Back-pressure? RS: Accumulate demand
Total demand of elements is safe to publish.
Subscriber’s buffer will not overflow.
51. Back-pressure? RS: Requesting “a lot”
Fast Subscriber, can request “a lot” from Publisher.
This is effectively “publisher push”, and is really fast.
Buffer size is known and this is safe.
55. Akka
Akka is a high-performance concurrency
library for Scala and Java.
!
At it’s core it focuses on the Actor Model:
56. Akka
Akka is a high-performance concurrency
library for Scala and Java.
!
At it’s core it focuses on the Actor Model:
An Actor can only:
• Send / receive messages
• Create Actors
• Change it’s behaviour
63. Akka Streams – Linear Flow
FlowFrom[Double].map(_.toInt). [...]
No Source attached yet.
“Pipe ready to work with Doubles”.
64. Akka Streams – Linear Flow
implicit val sys = ActorSystem("tokyo-sys")!
!
It’s the world in which Actors live in.
AkkaStreams uses Actors, so it needs ActorSystem.
65. Akka Streams – Linear Flow
implicit val sys = ActorSystem("tokyo-sys")!
implicit val mat = FlowMaterializer()!
Contains logic on HOW to materialise the stream.
Can be pure Actors, or (future) Apache Spark (in the future).
66. Akka Streams – Linear Flow
implicit val sys = ActorSystem("tokyo-sys")!
implicit val mat = FlowMaterializer()!
You can configure it’s buffer sizes etc.
(Or implement your own materialiser (“run on spark”))
67. Akka Streams – Linear Flow
implicit val sys = ActorSystem("tokyo-sys")!
implicit val mat = FlowMaterializer()!
val foreachSink = ForeachSink[Int](println)!
val mf = FlowFrom(1 to 3).withSink(foreachSink).run()
Uses the implicit FlowMaterializer
68. Akka Streams – Linear Flow
implicit val sys = ActorSystem("tokyo-sys")!
implicit val mat = FlowMaterializer()!
val foreachSink = ForeachSink[Int](println)!
val mf = FlowFrom(1 to 3).withSink(foreachSink).run()(mat)
69. Akka Streams – Linear Flow
val mf = FlowFrom[Int].!
map(_ * 2).!
withSink(ForeachSink(println)) // needs source,!
// can NOT run
70. Akka Streams – Linear Flow
val f = FlowFrom[Int].!
map(_ * 2).!
! ! ! withSink(ForeachSink(i => println(s"i = $i”))).!
! ! // needs Source to run!
71. Akka Streams – Linear Flow
val f = FlowFrom[Int].!
map(_ * 2).!
! ! ! withSink(ForeachSink(i => println(s"i = $i”))).!
! ! // needs Source to run!
72. Akka Streams – Linear Flow
val f = FlowFrom[Int].!
map(_ * 2).!
! ! ! withSink(ForeachSink(i => println(s"i = $i”))).!
! ! // needs Source to run!
73. Akka Streams – Linear Flow
val f = FlowFrom[Int].!
map(_ * 2).!
! ! ! withSink(ForeachSink(i => println(s"i = $i”))).!
! ! // needs Source to run!
74. Akka Streams – Linear Flow
val f = FlowFrom[Int].!
map(_ * 2).!
! ! ! withSink(ForeachSink(i => println(s"i = $i”))).!
! ! // needs Source to run!
!
! ! ! f.withSource(IterableSource(1 to 10)).run()
75. Akka Streams – Linear Flow
val f = FlowFrom[Int].!
map(_ * 2).!
! ! ! withSink(ForeachSink(i => println(s"i = $i”))).!
! ! // needs Source to run!
!
! ! ! f.withSource(IterableSource(1 to 10)).run()
76. Akka Streams – Linear Flow
val f = FlowFrom[Int].!
map(_ * 2).!
! ! ! withSink(ForeachSink(i => println(s"i = $i”))).!
! ! // needs Source to run!
!
! ! ! f.withSource(IterableSource(1 to 10)).run()
77. Akka Streams – Linear Flow
val f = FlowFrom[Int].!
map(_ * 2).!
! ! ! withSink(ForeachSink(i => println(s"i = $i”))).!
! ! // needs Source to run!
!
! ! ! f.withSource(IterableSource(1 to 10)).run()
78. Akka Streams – Linear Flow
val f = FlowFrom[Int].!
map(_ * 2).!
! ! ! withSink(ForeachSink(i => println(s"i = $i”))).!
! ! // needs Source to run!
!
! ! ! f.withSource(IterableSource(1 to 10)).run()
79. Akka Streams – Flows are reusable
!
! ! ! f.withSource(IterableSource(1 to 10)).run()!
! ! ! f.withSource(IterableSource(1 to 100)).run()!
! ! ! f.withSource(IterableSource(1 to 1000)).run()
94. Akka Streams – GraphFlow
Linear Flows
or
non-akka pipelines
Could be another RS implementation!
95. Akka Streams – GraphFlow
Fan-out elements
and
Fan-in elements
96. Akka Streams – GraphFlow
Fan-out elements
and
Fan-in elements
Now you need a FlowGraph
97. Akka Streams – GraphFlow
// first define some pipeline pieces!
val f1 = FlowFrom[Input].map(_.toIntermediate)!
val f2 = FlowFrom[Intermediate].map(_.enrich)!
val f3 = FlowFrom[Enriched].filter(_.isImportant)!
val f4 = FlowFrom[Intermediate].mapFuture(_.enrichAsync)!
!
// then add input and output placeholders!
val in = SubscriberSource[Input]!
val out = PublisherSink[Enriched]!
99. Akka Streams – GraphFlow
val b3 = Broadcast[Int]("b3")!
val b7 = Broadcast[Int]("b7")!
val b11 = Broadcast[Int]("b11")!
val m8 = Merge[Int]("m8")!
val m9 = Merge[Int]("m9")!
val m10 = Merge[Int]("m10")!
val m11 = Merge[Int]("m11")!
val in3 = IterableSource(List(3))!
val in5 = IterableSource(List(5))!
val in7 = IterableSource(List(7))!
106. Akka Streams – GraphFlow
Sinks and Sources are “keys”
which can be addressed within the graph
val resultFuture2 = FutureSink[Seq[Int]]!
val resultFuture9 = FutureSink[Seq[Int]]!
val resultFuture10 = FutureSink[Seq[Int]]!
!
val g = FlowGraph { implicit b =>!
// ...!
m10 ~> FlowFrom[Int].grouped(1000) ~> resultFuture10!
// ...!
}.run()!
!
Await.result(g.getSinkFor(resultFuture2), 3.seconds).sorted!
should be(List(5, 7))
107. Akka Streams – GraphFlow
Sinks and Sources are “keys”
which can be addressed within the graph
val resultFuture2 = FutureSink[Seq[Int]]!
val resultFuture9 = FutureSink[Seq[Int]]!
val resultFuture10 = FutureSink[Seq[Int]]!
!
val g = FlowGraph { implicit b =>!
// ...!
m10 ~> FlowFrom[Int].grouped(1000) ~> resultFuture10!
// ...!
}.run()!
!
Await.result(g.getSinkFor(resultFuture2), 3.seconds).sorted!
should be(List(5, 7))
108. Akka Streams – GraphFlow
!
val g = FlowGraph {}!
FlowGraph is immutable and safe to share and re-use!
Think of it as “the description” which then gets “run”.