The document discusses building massive scale, fault tolerant job processing systems using the Scala Akka framework. It describes implementing a master-slave architecture with actors where an agent runs on each storage node to process jobs locally, achieving high throughput. It also covers controlling system load by dynamically adjusting parallelism, and implementing fine-grained fault tolerance through actor supervision strategies.
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Massive scale job processing with Scala Akka framework
1. Building massive scale,
fault tolerant,
job processing systems
with Scala Akka
framework
Vignesh Sukumar
SVCC 2012
2. About me
• Storage group, Backend Engineering at Box
• Love enterprise software!
• Interested in Big Data and building distributed
systems in the cloud
3. About Box
• Leader in enterprise cloud collaboration and
storage
• Cutting-edge work in backend, frontend,
platform and engineering services
• A really fun place to work – we have a long
slide!
4. Talk outline
• Job processing requirements
• Traditional & new models for job processing
• Akka actors framework
• Achieving and controlling high IO throughput
• Fine-grained fault tolerance
6. Practical realities
•Storage nodes are usually of varying
configurations (OS, processing power, storage
capacity, etc) mainly because of rapid evolution
in provisioning operations
•Some nodes are more over-worked than the
others (for ex, accepting live uploads)
•Billions of files; petabytes
7. Job processing requirements
• Iterate over all files (billions, petabyte scale):
for ex, check consistency of all files
• High throughput
• Fault tolerant
• Secure
9. Why traditional models fail in cloud
storage environments
• Not scalable: petabyte scale, billions of files
• Insecure: cannot move files out of storage
nodes
• No performance control: easy to overwhelm
any storage node
• No fine grained fault tolerance
10. Compute on Storage
• Move job computation directly to storage
nodes
• Utilize abundant CPU on storage nodes
• Metadata store still stays in a highly available
system like a RDBMS
• Results from operations on a file are
completely independent
12. Benefits
• High IO throughput: Direct access; no transfer
of files over a network
• Secure: files do not leave storage nodes
• Better performance control: compute can
easily monitor system load and back off
• Better fault tolerance handling: finer grained
handling of errors
13. Master node
• Responsible for accepting job submissions and
splitting them to tasks for slave nodes
• Stateful: keeps durable copy of jobs and tasks
in Zookeeper
• Horizontally scalable: service can be run on
multiple nodes
14. Agent
• Runs directly on the storage nodes on a
machine-independent JVM container
• Stateless: no task state is maintained
• Monitors system load with back-off
• Reports results directly to master without
synchronizing with other agents
16. Actors
• Concurrent threads abstraction with no
shared state
• Exchange messages
• Asynchronous, non-blocking
• Multiple actors can map to a single OS thread
• Parent-children hierarchical relationship
17. Actors and messages
• Class MyActor extends Actor {
def receive = {
case MsgType1 => // do something
}
}
// instantiation and sending messages
val actorRef = system.actorOf(Props(new MyActor))
actorRef ! MsgType1
19. Achieving high IO throughput
• Parallel, asynchronous IO through “Futures”
val fileIOResult = Future {
// issue high latency tasks like file IO
}
val networkIOResult = Future { // read from network }
Futures.awaitAll(<wait time>, fileIOResult, networkIOResult)
fileIOResult onSuccess { // do something }
networkIOResult onFailure { // retry }
20. Controlling system throughput
• The problem: agents need to throttle
themselves as storage nodes serve live traffic
• Adjust number of parallel workers dynamically
through a monitoring service
21. Controlling throughput: Examples
•Parallelism parameters can be gotten from a
separate configuration service on a per node
basis
•Some machines can be speeded up and others
slowed down this way
•The configuration can be updated on a cron
schedule to speed up during weekends
22. Fine grained fault tolerance with
Supervisors
• Parents of child actors can define specific
fault-handling strategies for each failure
scenario in their children
• Components can fail gracefully without
affecting the entire system
23. Supervision strategy: Examples
Class TaskActor extends Actor {
// create child workers
override val supervisorStrategy = OneForOneStrategy(maxNrOrRetries = 3) {
case SqlException => Resume // retry the same file
case FileCorruptionException => Stop // don’t clobber it!
case IOException => Restart // report and move on
}
24. Unit testing
• Scalatra test framework: very easy to read!
TaskActorTest.receive(BadFileMsg) must throw
FileNotFoundException
• Mocks for network and database calls
val mockHttp = mock[HttpExecutor]
TaskActorTest ! doHttpPost
there was atLeastOne(mockHttp).POST
• Extensive testing of failure injection scenarios
25. Takeaways
• Keep your architecture simple by modeling
actor message flow along the same paths as
parent-child actor hierarchy (i.e., no message
exchange between peer child actors)
• Design and implement for component failures
• Write unit tests extensively: we did not have
any fundamental level functionality breakage
• Box Engineering is awesome!
Hinweis der Redaktion
1. Example of a job is to check consistency of all the files: this will involve iterating over every file on all storage nodes, reading file and verifying content integrity.
Scalability: non-performant because of the IO bottleneck in getting files to the application cluster Insecure: application clusters can store the files locally. It’s easy to melt a single a storage node by reading or writing a lot to it Cannot perform fine grained fault tolerance