SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Programming the cloud withSkywriting Derek Murray withMalteSchwarzkopf, Chris Smowton, Anil Madhavapeddy and Steve Hand
Outline State of the art Skywriting by example Iterative algorithms Heterogeneous clusters Speculative execution Performance case studies Future directions
Task Task Task Task Task farming Task Task Task
Task farming Master Worker Worker Worker
Task farming A B runs before
MapReduce Input Map Shuffle Reduce Output
Dryad
Problem: iterative algorithms Not converged Task Converged
Problem: cluster heterogeneity Master Worker Worker Worker
Problem: cluster heterogeneity Master
Problem: cluster heterogeneity Master
Problem: speculative execution
Solution: Skywriting Turing-complete coordination language Support for spawning tasks Interface to external code Distributed execution engine Executes tasks in parallel on a cluster Handles failure, locality, data motion, etc.
Spawning a Skywriting task function f(arg1, arg2) { … } result = spawn(f, [arg1, arg2]);
Building a task graph function f(x, y) { … } function g(x, y){ … } function h(x, y) { … } a = spawn(f, [7, 8]); b = spawn(g, [a, 0]); c = spawn(g, [a, 1]); d = spawn(h, [b, c]); return d; f a a g g c b h d
Iterative algorithm current = …; do { prev = current;     a = spawn(f, [prev, 0]); b= spawn(f, [prev, 1]); c = spawn(f, [prev, 2]);     current = spawn(g, [a, b, c]);     done = spawn(h, [current]); while (!*done);
Iterative algorithm f f f g h f f f
Aside: recursive algorithm function f(x) { if (/* x is small enough */) { return /* do something with x */;     } else { x_lo = /* bottom half of x */; x_hi = /* top half of x */; return [spawn(f, [x_lo]), spawn(f, [x_hi])];     } }
Executing external code y = exec(executor_name,        { “inputs” : [x1, x2, x3], … }, num_outputs); ,[object Object]
Heterogeneous cluster support
Workers advertise “execution facilities”
Tasks migrate to necessary facilities,[object Object]
Speculative execution x = …; a = spawn(f, [x]); b= spawn(f, [x]); c= spawn(f, [x]); result =waituntil(any, [a, b, c]); return result[“available”];
Performance case studies All experiments used Amazon EC2 m1.smallinstances, running Ubuntu 8.10 Microbenchmark Smith-Waterman
Job creation overhead
Smith-Waterman data flow
Parallel Smith-Waterman
Parallel Smith-Waterman
Future work Distributed data structures Coping when the lists etc. get big Better language integration Compile to JVM, CLR, LLVM etc. Decentralised master-worker Run on multiple clouds Self-scaling clusters Add and remove workers as needed

Weitere ähnliche Inhalte

Was ist angesagt?

Hw5 2017-spring
Hw5 2017-springHw5 2017-spring
Hw5 2017-spring奕安 陳
 
13. dynamic allocation
13. dynamic allocation13. dynamic allocation
13. dynamic allocation웅식 전
 
Introducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowIntroducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowEtsuji Nakai
 
Dynamic memory allocation in c
Dynamic memory allocation in cDynamic memory allocation in c
Dynamic memory allocation in clavanya marichamy
 
Intoduction to dynamic memory allocation
Intoduction to dynamic memory allocationIntoduction to dynamic memory allocation
Intoduction to dynamic memory allocationUtsav276
 
Use the Matplotlib, Luke @ PyCon Taiwan 2012
Use the Matplotlib, Luke @ PyCon Taiwan 2012Use the Matplotlib, Luke @ PyCon Taiwan 2012
Use the Matplotlib, Luke @ PyCon Taiwan 2012Wen-Wei Liao
 
Malloc() and calloc() in c
Malloc() and calloc() in cMalloc() and calloc() in c
Malloc() and calloc() in cMahesh Tibrewal
 
DSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser Bootsma
DSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser BootsmaDSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser Bootsma
DSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser BootsmaDeltares
 
Build 2017 - B8037 - Explore the next generation of innovative UI in the Visu...
Build 2017 - B8037 - Explore the next generation of innovative UI in the Visu...Build 2017 - B8037 - Explore the next generation of innovative UI in the Visu...
Build 2017 - B8037 - Explore the next generation of innovative UI in the Visu...Windows Developer
 
Anders Nielsen template model-builder
Anders Nielsen template model-builderAnders Nielsen template model-builder
Anders Nielsen template model-builderDavid LeBauer
 
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlowRajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlowAI Frontiers
 
Anders Nielsen AD Model-Builder
Anders Nielsen AD Model-BuilderAnders Nielsen AD Model-Builder
Anders Nielsen AD Model-BuilderDavid LeBauer
 
Tensorflow windows installation
Tensorflow windows installationTensorflow windows installation
Tensorflow windows installationmarwa Ayad Mohamed
 

Was ist angesagt? (20)

MBrace: Cloud Computing with F#
MBrace: Cloud Computing with F#MBrace: Cloud Computing with F#
MBrace: Cloud Computing with F#
 
Hw5 2017-spring
Hw5 2017-springHw5 2017-spring
Hw5 2017-spring
 
13. dynamic allocation
13. dynamic allocation13. dynamic allocation
13. dynamic allocation
 
Py lecture5 python plots
Py lecture5 python plotsPy lecture5 python plots
Py lecture5 python plots
 
Introducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowIntroducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlow
 
Introduction to TensorFlow
Introduction to TensorFlowIntroduction to TensorFlow
Introduction to TensorFlow
 
Dynamic memory allocation in c
Dynamic memory allocation in cDynamic memory allocation in c
Dynamic memory allocation in c
 
Intoduction to dynamic memory allocation
Intoduction to dynamic memory allocationIntoduction to dynamic memory allocation
Intoduction to dynamic memory allocation
 
Use the Matplotlib, Luke @ PyCon Taiwan 2012
Use the Matplotlib, Luke @ PyCon Taiwan 2012Use the Matplotlib, Luke @ PyCon Taiwan 2012
Use the Matplotlib, Luke @ PyCon Taiwan 2012
 
Malloc() and calloc() in c
Malloc() and calloc() in cMalloc() and calloc() in c
Malloc() and calloc() in c
 
Lecture 1 mte 407
Lecture 1 mte 407Lecture 1 mte 407
Lecture 1 mte 407
 
Lecture 1 mte 407
Lecture 1 mte 407Lecture 1 mte 407
Lecture 1 mte 407
 
16829 memory management2
16829 memory management216829 memory management2
16829 memory management2
 
DSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser Bootsma
DSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser BootsmaDSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser Bootsma
DSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser Bootsma
 
proj
projproj
proj
 
Build 2017 - B8037 - Explore the next generation of innovative UI in the Visu...
Build 2017 - B8037 - Explore the next generation of innovative UI in the Visu...Build 2017 - B8037 - Explore the next generation of innovative UI in the Visu...
Build 2017 - B8037 - Explore the next generation of innovative UI in the Visu...
 
Anders Nielsen template model-builder
Anders Nielsen template model-builderAnders Nielsen template model-builder
Anders Nielsen template model-builder
 
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlowRajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
 
Anders Nielsen AD Model-Builder
Anders Nielsen AD Model-BuilderAnders Nielsen AD Model-Builder
Anders Nielsen AD Model-Builder
 
Tensorflow windows installation
Tensorflow windows installationTensorflow windows installation
Tensorflow windows installation
 

Ähnlich wie Programming the cloud with Skywriting

TypeScript Introduction
TypeScript IntroductionTypeScript Introduction
TypeScript IntroductionDmitry Sheiko
 
Столпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай МозговойСтолпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай МозговойSigma Software
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and MonoidsHugo Gävert
 
Go Programming Patterns
Go Programming PatternsGo Programming Patterns
Go Programming PatternsHao Chen
 
Short intro to scala and the play framework
Short intro to scala and the play frameworkShort intro to scala and the play framework
Short intro to scala and the play frameworkFelipe
 
Composition birds-and-recursion
Composition birds-and-recursionComposition birds-and-recursion
Composition birds-and-recursionDavid Atchley
 
ES6 - Next Generation Javascript
ES6 - Next Generation JavascriptES6 - Next Generation Javascript
ES6 - Next Generation JavascriptRamesh Nair
 
Pick up the low-hanging concurrency fruit
Pick up the low-hanging concurrency fruitPick up the low-hanging concurrency fruit
Pick up the low-hanging concurrency fruitVaclav Pech
 
Deep dive into deeplearn.js
Deep dive into deeplearn.jsDeep dive into deeplearn.js
Deep dive into deeplearn.jsKai Sasaki
 
golang_getting_started.pptx
golang_getting_started.pptxgolang_getting_started.pptx
golang_getting_started.pptxGuy Komari
 
function* - ES6, generators, and all that (JSRomandie meetup, February 2014)
function* - ES6, generators, and all that (JSRomandie meetup, February 2014)function* - ES6, generators, and all that (JSRomandie meetup, February 2014)
function* - ES6, generators, and all that (JSRomandie meetup, February 2014)Igalia
 
Pydiomatic
PydiomaticPydiomatic
Pydiomaticrik0
 
Program Assignment Process ManagementObjective This program a.docx
Program Assignment  Process ManagementObjective This program a.docxProgram Assignment  Process ManagementObjective This program a.docx
Program Assignment Process ManagementObjective This program a.docxwkyra78
 
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...Raffi Khatchadourian
 

Ähnlich wie Programming the cloud with Skywriting (20)

Advanced JavaScript
Advanced JavaScript Advanced JavaScript
Advanced JavaScript
 
TypeScript Introduction
TypeScript IntroductionTypeScript Introduction
TypeScript Introduction
 
Столпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай МозговойСтолпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай Мозговой
 
Javascript status 2016
Javascript status 2016Javascript status 2016
Javascript status 2016
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
 
Go Programming Patterns
Go Programming PatternsGo Programming Patterns
Go Programming Patterns
 
Short intro to scala and the play framework
Short intro to scala and the play frameworkShort intro to scala and the play framework
Short intro to scala and the play framework
 
Composition birds-and-recursion
Composition birds-and-recursionComposition birds-and-recursion
Composition birds-and-recursion
 
ES6 - Next Generation Javascript
ES6 - Next Generation JavascriptES6 - Next Generation Javascript
ES6 - Next Generation Javascript
 
ES6 is Nigh
ES6 is NighES6 is Nigh
ES6 is Nigh
 
Modern frontend in react.js
Modern frontend in react.jsModern frontend in react.js
Modern frontend in react.js
 
Pick up the low-hanging concurrency fruit
Pick up the low-hanging concurrency fruitPick up the low-hanging concurrency fruit
Pick up the low-hanging concurrency fruit
 
Deep dive into deeplearn.js
Deep dive into deeplearn.jsDeep dive into deeplearn.js
Deep dive into deeplearn.js
 
golang_getting_started.pptx
golang_getting_started.pptxgolang_getting_started.pptx
golang_getting_started.pptx
 
function* - ES6, generators, and all that (JSRomandie meetup, February 2014)
function* - ES6, generators, and all that (JSRomandie meetup, February 2014)function* - ES6, generators, and all that (JSRomandie meetup, February 2014)
function* - ES6, generators, and all that (JSRomandie meetup, February 2014)
 
Pydiomatic
PydiomaticPydiomatic
Pydiomatic
 
Python idiomatico
Python idiomaticoPython idiomatico
Python idiomatico
 
Program Assignment Process ManagementObjective This program a.docx
Program Assignment  Process ManagementObjective This program a.docxProgram Assignment  Process ManagementObjective This program a.docx
Program Assignment Process ManagementObjective This program a.docx
 
Object-oriented Basics
Object-oriented BasicsObject-oriented Basics
Object-oriented Basics
 
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
 

Kürzlich hochgeladen

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Kürzlich hochgeladen (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Programming the cloud with Skywriting

Hinweis der Redaktion

  1. Thanks for the introduction, Eva. Well, as Eva said, my name’s Derek Murray, I’m a third year PhD student at Cambridge, and today I’m going to talk about Skywriting, which is a little bit of work I’ve been doing with these guys: Malte, Chris, Anil and my supervisor Steve Hand.Skywriting is a system for large-scale distributed computation – in this respect it’s similar to things like Google MapReduce and Microsoft’s Dryad – so that’s systems where your data or compute need is so big that you have to use a cluster in parallel to get the job done.It was the success of these systems – in particular Hadoop, the open-source MapReduce – that motivated us to start this work. What I found interesting was that people were using these things in entirely unexpected ways… taking MapReduce, which is excellent for log-processing, and running some big iterative machine learning algorithm on it. We reckoned that people were using MapReduce not because of its programming model, but despite it.So we set out to build something that combines all the advantages of previous systems, with a very flexible programming model. The result was Skywriting, so let’s see what you think…
  2. All the systems we’ll discuss today use the simple notion of task parallelism. Many algorithms can be divided into tasks, which are just chunks of sequential code. The key observation is that two independent tasks can run in parallel. And when your whole job divides into a fully independent bag of tasks, it’s said to be “embarrassingly parallel”.
  3. And how do you run these embarrassingly parallel jobs? Well, you give your bag of tasks to a master, which doles them out on demand to a set of workers.This is a very simple architecture to program. And it has a lot of benefits. If one of the workers crashes, fine! The master will notice and give that worker’s current task to someone else. And if a worker is a bit slower than the others, that’s also fine! Each worker pulls a new task when it has completed the last one, so even a heterogeneous pool can do useful work.
  4. Embarrassing parallelism is not very interesting: it only lets you do boring things like search for aliens and brute-force people’s passwords.
  5. It gets much more interesting – i.e. commercially useful – when the tasks have dependencies between them. So here, we have two tasks A and B, and a relation that says A must run before B. The usual reason for this is because A writes some output, and B wants to read it.Think of this like makefile rules. You can build up graphs out of these dependencies, and resolve them in parallel.In fact, the original name for this project was “Cloud Make”. Fortunately it changed….
  6. Are you all familiar with MapReduce?Introduced by Google in 2004, MapReduce used the observation that the map() function from functional programming can run in parallel over large lists. So they broke down their huge data into chunks, and ran each through a “map task”, generating some key-value pairs that are then sorted by key in this shuffle phase, and then the values for each key are folded in parallel using a “reduce task”.This basically uses the same master-worker task farm that I showed on a previous slide, with the single constraint that all the map tasks must finish before the reduce tasks begin. Therefore it had the benefit of working at huge scale, and being very reliable.
  7. A couple of years later, Microsoft, which also has a search engine, released “Dryad”, which generalisesMapReduce by allowing the user to specify a job as any directed acyclic graph. The graph has vertices – which are arbitrary sequential code in your favourite language – and channels, which could be files, in-memory FIFOs, TCP connections or whatever.Clearly you can implement MapReduce in Dryad, since it’s just a DAG. But Dryad makes things like Joins much easier, because a task can have multiple inputs.
  8. So far, we can run any finite directed acyclic graph using Dryad. As the name suggests, however, Dryad is not terribly good at cyclic data flows.These turn up all the time in fields like machine learning, scientific computing and information retrieval. Take PageRank, for example, which involves repeatedly premultiplying a vector by a large sparse matrix representing the web. You keep doing this until you reach a fixpoint, and the PageRank vector has converged.At present, all you can do is submit one job after another. This is bad for a number of reasons. First of all, it’s very slow: MapReduce and Dryad are designed for batch submission, and so starting an individual job takes on the order of 30 seconds. If your iteration is shorter than that, you’re losing out on parallel speedup.It also introduces a co-dependency between the client and the cluster. Now the client, which is just some simple program that submits jobs to the cluster, has to stay running for the duration of the job, but since it’s outside the cluster, it gets none of the advantages of fault-tolerance, of data locality, of fair scheduling. Since the client now contains critical job state, it’s necessary to add all these features manually.
  9. Remember our Master-worker architecture? Well, if you’ve ever tried to setup Hadoop or Dryad, you’ll know that you need to make sure all of the workers are the same, running the same operating system, on the same local network.
  10. But what if all you have is a little ad-hoc cluster, with a Windows desktop, a Linux server and a Mac laptop?
  11. Or, perhaps less contrived, what if your data are spread between different cloud providers. So you might have some data in Amazon S3, some in Google’s App Engine, and some in Windows Azure. Our mantra is “put the computation near the data”, and it’s not practical to shift all the data to one place.
  12. And what about this? Say you have a really important task to complete, but you don’t know how long it’ll take – maybe you’re using some kind of randomised algorithm. So you fire off three copies of the same task… and eventually one finishes. At this point, you can just kill the other two.Although MapReduce and Dryad have limited support for this, it’s not first-class: you can’t do it on demand, only in response to “straggler” nodes that take much longer to complete than others.
  13. I’ve spent quite a lot of slides being rather coy about what’s to come, but if you’ve read the abstract, you’ll know that Skywriting is
  14. …two things. First, instead of using DAGs to describe a job, we use the most powerful thing available to us: a Turing-complete coordination language. This sounds ominous and theoretical, but actually it’s just a programming language that looks a lot like JavaScript, with all the usual control flow structures, loops, ifs, functions and so on.Since we want to run things efficiently in parallel, it has support for spawning tasks, and a way to call external code.The other main component is the distributed execution engine, which actually executes Skywriting programs in the cluster. The interesting thing about this is that a “task” is just a Skywriting function – a continuation to be more precise – which means that tasks can spawn other tasks, and thereby grow the job dynamically.
  15. 1.0 – 1.2 GHz Xeon or Opteron. 1.7GB RAM, 150GB disk.
  16. 50 x 50 on 50 workers.Input size is
  17. Best score is 15x15 = 225 tasks, at 83 s (2.6x speedup).