- The document discusses data migration from an old platform to a new Scala/Play implementation using Akka actors, futures, and pipes.
- Futures are used to handle asynchronous operations, actors are used to organize complex asynchronous flows and deal with failures, and pipes are used to deal with asynchronous computation results inside actors.
- Several lessons are discussed, including designing the system around the data structure, knowing the limits of the source system to avoid overloading it, and being aware of rate limits for cloud APIs.
3. @ELMANU
who is speaking?
• freelance software consultant
based in Vienna
• Vienna Scala User Group
• web, web, web
• writing a book on reactive
web-applications
5. @ELMANU
talenthouse
• www.talenthouse.com
• based in Los Angeles
• connecting brands and artists
• 3+ million users
6.
7.
8.
9.
10. @ELMANU
BACKGROUND STORY
• old, slow (very slow) platform
• re-implementation from scratch with Scala & Play
• tight schedule, a lot of data to migrate
12. @ELMANU
SOURCE SYSTEM
DISCLAIMER:
What follows is not intended as a
bashing of the source system, but as a
necessary explanation of its complexity in
relation to data migration.
25. @ELMANU
FUTURES: HAPPY
PATH
import scala.concurrent._
import scala.concurrent.ExecutionContext.Implicits.global
val futureSum: Future[Int] = Future { 1 + 1 }
futureSum.map { sum =>
println("The sum is " + sum)
}
26. @ELMANU
FUTURES: SAD PATH
import scala.concurrent._
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
val futureDiv: Future[Int] = Future { 1 / 0 }
val futurePrint: Future[Unit] = futureDiv.map { div =>
println("The division result is " + div)
}
Await.result(futurePrint, 1 second)
27. @ELMANU
FUTURES: SAD PATH
import scala.concurrent._
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
val futureDiv: Future[Int] = Future { 1 / 0 }
val futurePrint: Future[Unit] = futureDiv.map { div =>
println("The division result is " + div)
}
Await.result(futurePrint, 1 second)
Avoid blocking if possible
28. @ELMANU
FUTURES: SAD PATH
import scala.concurrent._
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
scala>
Await.result(futureDiv,
1.second)
java.lang.ArithmeticException:
/
by
zero
at
$anonfun$1.apply$mcI$sp(<console>:11)
at
$anonfun$1.apply(<console>:11)
at
$anonfun$1.apply(<console>:11)
at
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at
scala.concurrent.impl.ExecutionContextImpl
$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
val futureDiv: Future[Int] = Future { 1 / 0 }
futureDiv.map { div =>
println("The division result is " + div)
}
Await.result(futureDiv, 1 second)
29. @ELMANU
FUTURES: SAD PATH
import scala.concurrent._
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
val futureDiv: Future[Int] = Future { 1 / 0 }
val futurePrint: Future[Unit] = futureDiv.map { div =>
println("The division result is " + div)
}.recover {
case a: java.lang.ArithmeticException =>
println("What on earth are you trying to do?")
}
Await.result(futurePrint, 1 second) Be mindful of failure
30. @ELMANU
FUTURES: SAD PATH
•Exceptions are propagated up the chain
•Without recover there is no guarantee that
failure will ever get noticed!
31. @ELMANU
COMPOSING FUTURES
val futureA: Future[Int] = Future { 1 + 1 }
val futureB: Future[Int] = Future { 2 + 2 }
val futureC: Future[Int] = for {
a <- futureA
b <- futureB
} yield {
a + b
}
32. @ELMANU
COMPOSING FUTURES
val futureC: Future[Int] = for {
a <- Future { 1 + 1 }
b <- Future { 2 + 2 }
} yield {
a + b
}
33. @ELMANU
COMPOSING FUTURES
val futureC: Future[Int] = for {
a <- Future { 1 + 1 }
b <- Future { 2 + 2 }
} yield {
a + b
}
This runs in sequence
Don’t do this
34. @ELMANU
FUTURES: CALLBACKS
import scala.concurrent._
import scala.concurrent.ExecutionContext.Implicits.global
val futureDiv: Future[Int] = Future { 1 / 0 }
futureDiv.onSuccess { case result =>
println("Result: " + result)
}
futureDiv.onFailure { case t: Throwable =>
println("Oh no!")
}
35. @ELMANU
using FUTURES
•a Future { … } block that doesn’t do any I/O
is code smell
•use them in combination with the “right”
ExecutionContext set-up
•when you have blocking operations, wrap
them into a blocking block
36. @ELMANU
using FUTURES
import scala.concurrent.blocking
Future {
blocking {
DB.withConnection { implicit connection =>
val query = SQL("select * from bar")
query()
}
}
}
42. @ELMANU
ACTORS
Mailbox Mailbox
Holly, I'm in love with you.
akka://application/user/georgePeppard akka://application/user/audreyHepburn
akka://application/user/audreyHepburn/cat
43. @ELMANU
ACTORS
Mailbox Mailbox
Holly, I'm in love with you.
So what?
akka://application/user/georgePeppard akka://application/user/audreyHepburn
akka://application/user/audreyHepburn/cat
44. @ELMANU
GETTING AN ACTOR
import akka.actor._
class AudreyHepburn extends Actor {
def receive = { ... }
}
val system: ActorSystem = ActorSystem()
val audrey: ActorRef = system.actorOf(Props[AudreyHepburn])
45. @ELMANU
SENDING AND
RECEIVING MESSAGES
case class Script(text: String)
class AudreyHepburn extends Actor {
def receive = {
case Script(text) =>
read(text)
}
}
46. @ELMANU
SENDING AND
RECEIVING MESSAGES
case class Script(text: String)
class AudreyHepburn extends Actor {
def receive = {
case Script(text) =>
read(text)
}
}
audrey ! Script(breakfastAtTiffany)
47. @ELMANU
SENDING AND
RECEIVING MESSAGES
case class Script(text: String)
class AudreyHepburn extends Actor {
def receive = {
case Script(text) =>
read(text)
}
}
audrey ! Script(breakfastAtTiffany)
“tell” - fire-forget
48. @ELMANU
ASK PATTERN
import akka.pattern.ask
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
implicit val timeout = akka.util.Timeout(1 second)
val maybeAnswer: Future[String] =
audrey ? "Where should we have breakfast?"
49. @ELMANU
ASK PATTERN
import akka.pattern.ask
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
implicit val timeout = akka.util.Timeout(1 second)
val maybeAnswer: Future[String] =
audrey ? "Where should we have breakfast?"
“ask”
50. @ELMANU
SUPERVISION
class UserMigrator extends Actor {
lazy val workers: ActorRef = context
.actorOf[UserMigrationWorker]
.withRouter(RoundRobinRouter(nrOfInstances = 100))
}
51. @ELMANU
SUPERVISION
class UserMigrator extends Actor {
actor context
lazy val workers: ActorRef = context
.actorOf[UserMigrationWorker]
.withRouter(RoundRobinRouter(nrOfInstances = 100))
}
router type many children
55. @ELMANU
CECI EST UNE PIPE
•Akka pattern to combine Futures and Actors
•Sends the result of a Future to an Actor
•Be careful with error handling
56. @ELMANU
CECI EST UNE PIPE
class FileFetcher extends Actor {
def receive = {
case FetchFile(url) =>
val originalSender = sender()
val download: Future[DownloadedFile] =
WS.url(url).get().map { response =>
DownloadedFile(
url,
response.ahcResponse.getResponseBodyAsBytes
)
}
import akka.pattern.pipe
download pipeTo originalSender
}
}
57. @ELMANU
CECI EST UNE PIPE
class FileFetcher extends Actor {
def receive = {
case FetchFile(url) =>
val originalSender = sender()
val download: Future[DownloadedFile] =
WS.url(url).get().map { response =>
DownloadedFile(
url,
response.ahcResponse.getResponseBodyAsBytes
)
}
import akka.pattern.pipe
download pipeTo originalSender
}
}
This is how you pipe
58. @ELMANU
CECI EST UNE PIPE
class FileFetcher extends Actor {
def receive = {
case FetchFile(url) =>
val originalSender = sender()
val download: Future[DownloadedFile] =
WS.url(url).get().map { response =>
DownloadedFile(
url,
response.ahcResponse.getResponseBodyAsBytes
)
}
import akka.pattern.pipe
download pipeTo originalSender
}
}
Keep reference to original sender - what follows is a Future!
59. @ELMANU
CECI EST UNE PIPE
class FileFetcher extends Actor {
def receive = {
case FetchFile(url) =>
val originalSender = sender()
val download: Future[DownloadedFile] =
WS.url(url).get().map { response =>
DownloadedFile(
url,
response.ahcResponse.getResponseBodyAsBytes
)
}
import akka.pattern.pipe
download pipeTo originalSender
}
}
Wrap your result into something you can easily match against
60. @ELMANU
CECI EST UNE PIPE
class FileFetcher extends Actor {
def receive = {
case FetchFile(url) =>
val originalSender = sender
val download: Future[Array[Byte]] =
WS.url(url).get().map { response =>
DownloadedFile(
url,
response.ahcResponse.getResponseBodyAsBytes
)
}
import akka.pattern.pipe
download pipeTo originalSender
}
}
Will this work?
61. @ELMANU
PIPES AND error
handling
class FileFetcher extends Actor {
def receive = {
case FetchFile(url) =>
val originalSender = sender()
val download =
WS.url(url).get().map { response =>
DownloadedFile(...)
} recover { case t: Throwable =>
DownloadFileFailure(url, t)
}
Don’t forget to recover!
import akka.pattern.pipe
download pipeTo originalSender
}
}
62. @ELMANU
SUMMARY
•Futures: manipulate and combine
asynchronous operation results
•Actors: organise complex asynchronous flows,
deal with failure via supervision
•Pipes: deal with results of asynchronous
computation inside of actors
65. @ELMANU
design according to
YOUR DATA
Item migrator
User item
migrator
Item
migration
worker
Item
migration
worker
User item
migrator
Item
migration
worker
Item
migration
worker
User item
migrator
Item
migration
worker
Item
migration
worker
design A
66. @ELMANU
design according to
YOUR DATA
Item migrator
User item
migrator
Item
migration
worker
Item
migration
worker
User item
migrator
Item
migration
worker
Item
migration
worker
User item
migrator
Item
migration
worker
Item
migration
worker
design A
Not all users have the same amount of items
67. @ELMANU
design according to
YOUR DATA
Item migrator
Item
migration
worker
User item
migrator
User item
migrator
User item
migrator
Item
migration
worker
Item
migration
worker
Item
migration
worker
Item
migration
worker
Item
migration
worker
File
fetcher
File
fetcher
File
uploader
File
uploader
Soundcloud
worker
design B
68. @ELMANU
design according to
YOUR DATA
Item
migration
worker
User item
migrator
User item
migrator
Item
migration
worker
Item
migration
worker
Item
migration
worker
Item
migration
worker
Item
migration
worker
File
fetcher
File
fetcher
File
uploader
File
uploader
Soundcloud
worker
Pools of actors
design B
Item migrator
User item
migrator
71. @ELMANU
DATA MIGRATION
SHOULD not BE A RACE
•Your goal is to get
the data, not to be
as fast as possible
•Be gentle to the
legacy system(s)
72. @ELMANU
CLOUD API
STANDARDS
•ISO-28601 Data formats in REST APIs
•ISO-28700 Response times and failure
communication of REST APIs
•ISO-28701 Rate limits in REST APIs and HTTP
error codes
73. @ELMANU
CLOUD API
STANDARDS
•ISO-28601 Data formats in REST APIs
•ISO-28700 Response times and failure
communication of REST APIs
•ISO-28701 Rate limits in REST APIs and HTTP
error codes DREAM ON
74. @ELMANU
NO STANDARDS!
•The cloud is heterogenous
•Response times, rate limits, error codes all
different
•Don’t even try to treat all systems the same
76. @ELMANU
RATE limits
•Read the docs - most cloud API docs will
warn you about them
•Design your actor system so that you can
queue if necessary
•Keep track of migration status
77. @ELMANU
RATE limits
•Example: Soundcloud API
•500 Internal Server Error after seemingly
random amount of requests
78. @ELMANU
RATE limits
•Example: Soundcloud API
•500 Internal Server Error after seemingly
random amount of requests
Magic User-Agent
WS
.url("http://api.soundcloud.com/resolve.json")
.withHeaders("User-Agent" -> “FOOBAR”) // the magic ingredient that
// opens the door to Soundcloud
80. @ELMANU
seriously, do not
BLOCK
•Seems innocent at first to block from time to
time
•OutOfMemory after 8 hours of migration run
is not very funny
•You will end up rewriting your whole code to
be async anyway
81. @ELMANU
MISC
•Unstable primary IDs in source system
•Build a lot of small tools, be pragmatic
•sbt-tasks (http://yobriefca.se/sbt-tasks/)